I looked up the answer on these threads but none are working in my case:
R change all columns of type factor to numeric,
http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f,
How to convert a data frame column to numeric type?
I am working with a data frame (8600 x 168) which I imported:
originaldf2<-read.csv("Occupanyrate_Train")
. Apart from the first three columns, all are numeric values. Many of the columns are of class factor
after importing. I need all columns from 3 to 168 in the numeric class for analysis. There were a number of empty values and "-" in these columns which I converted to NAs by doing this:
originaldf2[originaldf2=="-"]=NA
originaldf2[originaldf2==""]=NA
. The columns contain nothing but decimal numbers, Integers and NAs. I tried using the following command to convert all variables to numeric class:
originaldf2<-as.numeric(as.character(originaldf2[ , 4:168]))
and I get the error: Warning message: NAs introduced by coercion
and my dataframe itself becomes strange:
str(originaldf2)
num [1:165] NA NA NA NA NA NA NA NA NA NA ...
I also tried: as.numeric(levels(originaldf2))[as.integer(originaldf2)]
to try and coerce the whole dataframe but I got the error Error: (list) object cannot be coerced to type 'integer'
Then I noticed that there are unused levels which might be the reason, so I dropped the unused levels: originaldf2<-str(drop.levels(originaldf2))
and tried to again coerce but still not happening! Here's a subset of the df (10 x 12):
Property_ID Month Zipcode Occupancy_Rate.Response.Variable. VAR_1 VAR_2 VAR_3
1 A3FF8CD6 13-Jan 30064 0.93 468 10 0.7142857
2 A3FF8CD6 13-Feb 30064 0.93 468 10 0.7142857
3 A3FF8CD6 13-Mar 30064 0.94 468 10 0.7142857
4 A3FF8CD6 13-Apr 30064 0.96 468 10 0.7142857
5 A3FF8CD6 13-May 30064 0.953 468 10 0.7142857
6 A3FF8CD6 13-Jun 30064 0.93 468 10 0.7142857
7 A3FF8CD6 13-Jul 30064 0.925 468 10 0.7142857
8 A3FF8CD6 13-Aug 30064 0.925 468 10 0.7142857
9 A3FF8CD6 13-Sep 30064 0.95 468 10 0.7142857
10 A3FF8CD6 13-Oct 30064 0.945 468 10 0.7142857
11 A3FF8CD6 13-Nov 30064 0.9 NA
12 A3FF8CD6 13-Dec 30064 0.945 NA
VAR_4 VAR_5 VAR_6
1 0.5714286 0.8 0.75
2 0.5714286 0.8 0.75
3 0.5714286 0.8 0.75
4 0.5714286 0.8 0.75
5 0.5714286 0.8 0.75
6 0.5714286 0.8 0.75
7 0.5714286 0.8 0.75
8 0.5714286 0.8 0.75
9 0.5714286 0.8 0.75
10 0.5714286 0.8 0.75
11 NA NA NA
12 NA NA NA
Answer
Use the na.strings
argument to convert -
to NA
while reading:
x <- read.csv(na.strings=c('-'),
text="a,b,c
0,,
-,1,2")
x
a b c
1 0 NA NA
2 NA 1 2
Blank values are converted to NA
automatically in numeric columns. It is the -
values that are forcing the column to be interpreted as factor
.
No comments:
Post a Comment