Wednesday, 22 March 2017

r - Conditionally assign a value to a random subset of a vector




I want to assign a defined value (let's say 1) to a random sample of a subset of a vector that meets certain conditions. I can't seem to make it work.



I have tried this code:



a <- c(1:50)
df <- as.data.frame(a)
df$c <- 0
df$c[sample(x=(df$c[df$a>25]), size = round(NROW(df$c[df$a>25])/5), replace = F)] <- 1



I would like just to randomly make some of the df$c vector values to be equal to 1, exactly a random sample of one fifth of the values in df$c in which value of df$a is a is greater than 25 (that would be 5 observations switched to 1).



But so far all of them remain 0 :/



Thanks!


Answer



Here's a way with base R -



df$c[sample(which(df$a > 25), sum(df$a > 25)/5)] <- 1



Be aware that this will fail if there is only 1 value in df$a > 25.



Below approach will not fail for any case but is bit verbose. Feel free to use whatever suits your need the best depending on expected values in df$a -



df$c[which(df$a > 25)[sample(length(which(df$a > 25)), sum(df$a > 25)/5)]] <- 1


Also, note that since, relace = F, sample size = sum(df$a > 25)/5 must be <= length(which(df$a > 25)). You can include this condition in your code if you want to make it even more safer.




Also, there will be no change if sum(df$a > 25)/5 < 1 so you may want to use size = max(sum(df$a > 25)/5, 1) if you want at least 1 change.



Here's a nicer version of my first version, thanks to @Frank -



df$c <- replace(df$c, sample(w <- which(df$a > 25), length(w)*.2), 1)

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...