Saturday 30 April 2016

r - Possible reasons for column not converting from factor to numeric



I looked up the answer on these threads but none are working in my case:




R change all columns of type factor to numeric,



http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f,



How to convert a data frame column to numeric type?



I am working with a data frame (8600 x 168) which I imported:



originaldf2<-read.csv("Occupanyrate_Train"). Apart from the first three columns, all are numeric values. Many of the columns are of class factor after importing. I need all columns from 3 to 168 in the numeric class for analysis. There were a number of empty values and "-" in these columns which I converted to NAs by doing this:




originaldf2[originaldf2=="-"]=NA originaldf2[originaldf2==""]=NA. The columns contain nothing but decimal numbers, Integers and NAs. I tried using the following command to convert all variables to numeric class:



originaldf2<-as.numeric(as.character(originaldf2[ , 4:168])) and I get the error: Warning message: NAs introduced by coercion and my dataframe itself becomes strange:



str(originaldf2)
num [1:165] NA NA NA NA NA NA NA NA NA NA ...



I also tried: as.numeric(levels(originaldf2))[as.integer(originaldf2)]




to try and coerce the whole dataframe but I got the error Error: (list) object cannot be coerced to type 'integer'



Then I noticed that there are unused levels which might be the reason, so I dropped the unused levels: originaldf2<-str(drop.levels(originaldf2)) and tried to again coerce but still not happening! Here's a subset of the df (10 x 12):



Property_ID Month Zipcode Occupancy_Rate.Response.Variable. VAR_1 VAR_2 VAR_3
1 A3FF8CD6 13-Jan 30064 0.93 468 10 0.7142857
2 A3FF8CD6 13-Feb 30064 0.93 468 10 0.7142857
3 A3FF8CD6 13-Mar 30064 0.94 468 10 0.7142857
4 A3FF8CD6 13-Apr 30064 0.96 468 10 0.7142857
5 A3FF8CD6 13-May 30064 0.953 468 10 0.7142857

6 A3FF8CD6 13-Jun 30064 0.93 468 10 0.7142857
7 A3FF8CD6 13-Jul 30064 0.925 468 10 0.7142857
8 A3FF8CD6 13-Aug 30064 0.925 468 10 0.7142857
9 A3FF8CD6 13-Sep 30064 0.95 468 10 0.7142857
10 A3FF8CD6 13-Oct 30064 0.945 468 10 0.7142857
11 A3FF8CD6 13-Nov 30064 0.9 NA NA
12 A3FF8CD6 13-Dec 30064 0.945 NA NA
VAR_4 VAR_5 VAR_6
1 0.5714286 0.8 0.75
2 0.5714286 0.8 0.75

3 0.5714286 0.8 0.75
4 0.5714286 0.8 0.75
5 0.5714286 0.8 0.75
6 0.5714286 0.8 0.75
7 0.5714286 0.8 0.75
8 0.5714286 0.8 0.75
9 0.5714286 0.8 0.75
10 0.5714286 0.8 0.75
11 NA NA NA
12 NA NA NA



Answer



Use the na.strings argument to convert - to NA while reading:



x <- read.csv(na.strings=c('-'),
text="a,b,c
0,,
-,1,2")

x
a b c

1 0 NA NA
2 NA 1 2


Blank values are converted to NA automatically in numeric columns. It is the - values that are forcing the column to be interpreted as factor.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...