r - Conditionally assign a value to a random subset of a vector

Wednesday, 22 March 2017

r - Conditionally assign a value to a random subset of a vector

I want to assign a defined value (let's say 1) to a random sample of a subset of a vector that meets certain conditions. I can't seem to make it work.

I have tried this code:

a <- c(1:50)
df <- as.data.frame(a)
df$c <- 0 
df$c[sample(x=(df$c[df$a>25]), size = round(NROW(df$c[df$a>25])/5), replace = F)] <- 1

I would like just to randomly make some of the df$c vector values to be equal to 1, exactly a random sample of one fifth of the values in df$c in which value of df$a is a is greater than 25 (that would be 5 observations switched to 1).

But so far all of them remain 0 :/

Thanks!

Answer

Here's a way with base R -

df$c[sample(which(df$a > 25), sum(df$a > 25)/5)] <- 1

Be aware that this will fail if there is only 1 value in df$a > 25.

Below approach will not fail for any case but is bit verbose. Feel free to use whatever suits your need the best depending on expected values in df$a -

df$c[which(df$a > 25)[sample(length(which(df$a > 25)), sum(df$a > 25)/5)]] <- 1

Also, note that since, relace = F, sample size = sum(df$a > 25)/5 must be <= length(which(df$a > 25)). You can include this condition in your code if you want to make it even more safer.

Also, there will be no change if sum(df$a > 25)/5 < 1 so you may want to use size = max(sum(df$a > 25)/5, 1) if you want at least 1 change.

Here's a nicer version of my first version, thanks to @Frank -

df$c <- replace(df$c, sample(w <- which(df$a > 25), length(w)*.2), 1)

Blog

Wednesday, 22 March 2017

r - Conditionally assign a value to a random subset of a vector

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?