I want to assign a defined value (let's say 1) to a random sample of a subset of a vector that meets certain conditions. I can't seem to make it work.
I have tried this code:
a <- c(1:50)
df <- as.data.frame(a)
df$c <- 0
df$c[sample(x=(df$c[df$a>25]), size = round(NROW(df$c[df$a>25])/5), replace = F)] <- 1
I would like just to randomly make some of the df$c
vector values to be equal to 1, exactly a random sample of one fifth of the values in df$c
in which value of df$a
is a is greater than 25 (that would be 5 observations switched to 1).
But so far all of them remain 0 :/
Thanks!
Answer
Here's a way with base R -
df$c[sample(which(df$a > 25), sum(df$a > 25)/5)] <- 1
Be aware that this will fail if there is only 1 value in df$a > 25
.
Below approach will not fail for any case but is bit verbose. Feel free to use whatever suits your need the best depending on expected values in df$a
-
df$c[which(df$a > 25)[sample(length(which(df$a > 25)), sum(df$a > 25)/5)]] <- 1
Also, note that since, relace = F
, sample size = sum(df$a > 25)/5
must be <= length(which(df$a > 25))
. You can include this condition in your code if you want to make it even more safer.
Also, there will be no change if sum(df$a > 25)/5 < 1
so you may want to use size = max(sum(df$a > 25)/5, 1)
if you want at least 1 change.
Here's a nicer version of my first version, thanks to @Frank -
df$c <- replace(df$c, sample(w <- which(df$a > 25), length(w)*.2), 1)
No comments:
Post a Comment