Monday, 5 June 2017

r - Replacing NAs with latest non-NA value



In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a data.frame) is the following:



> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)


I would like a function fill.NAs() that allows me to construct yy such that:



> yy

[1] NA NA NA 2 2 2 2 3 3 3 4 4


I need to repeat this operation for many (total ~1 Tb) small sized data.frames (~30-50 Mb), where a row is NA is all its entries are. What is a good way to approach the problem?



The ugly solution I cooked up uses this function:



last <- function (x){
x[length(x)]
}


fill.NAs <- function(isNA){
if (isNA[1] == 1) {
isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs
# can't be forward filled
}
isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA)
isNA.pos[isNA.diff < 0] <- 0
isNA.neg[isNA.diff > 0] <- 0
which.isNA.neg <- which(as.logical(isNA.neg))

if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works
which.isNA.pos <- which(as.logical(isNA.pos))
which.isNA <- which(as.logical(isNA))
if (length(which.isNA.neg)==length(which.isNA.pos)){
replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)],
which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] -
which.isNA.pos[1:max(length(which.isNA.neg)-1,1)])
replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
} else {
replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)])

replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
}
replacement
}


The function fill.NAs is used as follows:



y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
isNA <- as.numeric(is.na(y))

replacement <- fill.NAs(isNA)
if (length(replacement)){
which.isNA <- which(as.logical(isNA))
to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)]
y[to.replace] <- y[replacement]
}


Output




> y
[1] NA 2 2 2 2 3 3 3 4 4 4


... which seems to work. But, man, is it ugly! Any suggestions?


Answer



You probably want to use the na.locf() function from the zoo package to carry the last observation forward to replace your NA values.



Here is the beginning of its usage example from the help page:




library(zoo)

az <- zoo(1:6)

bz <- zoo(c(2,NA,1,4,5,2))

na.locf(bz)
1 2 3 4 5 6
2 2 1 4 5 2


na.locf(bz, fromLast = TRUE)
1 2 3 4 5 6
2 1 1 4 5 2

cz <- zoo(c(NA,9,3,2,3,2))

na.locf(cz)
2 3 4 5 6
9 3 2 3 2


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...