Tuesday, 2 August 2016

Unique random string with alphanumberic required in Ruby



I'm using the following code to generate a unique 10-character random string of [A-Z a-z 0-9] in Ruby:




random_code = [*('a'..'z'),*('0'..'9'),*('A'..'Z')].shuffle[0, 10].join


However, sometimes this random string does not contain a number or an uppercase character. Could you help me have a method that generates a unique random string that requires at least one number, one uppercase and one downcase character?


Answer



down   = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
all = down + up + digits

[down.sample, up.sample, digits.sample].
concat(7.times.map { all.sample }).
shuffle.
join
#=> "TioS8TYw0F"


[Edit: The above reflects a misunderstanding of the question. I'll leave it, however. To have no characters appear more than once:



def rnd_str

down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
[extract1(down), extract1(up), extract1(digits)].
concat(((down+up+digits).sample(7))).shuffle.join
end

def extract1(arr)
i = arr.size.times.to_a.sample
c = arr[i]

arr.delete_at(i)
c
end

rnd_str #=> "YTLe0WGoa1"
rnd_str #=> "NrBmAnE9bT"


down.sample.shift (etc.) would have been more compact than extract1, but the inefficiency was just too much to bear.




If you do not want to repeat random strings, simply keep a list of the ones you generate. If you generate another that is in the list, discard it and generate another. It's pretty unlikely you'll have to generate any extra ones, however. If, for example, you generate 100 random strings (satisfying the requirement of at least one lowercase letter, uppercase letter and digit), the chances that there will be one or more duplicate strings is about one in 700,000:



t = 107_518_933_731
n = t+1
t = t.to_f
(1.0 - 100.times.reduce(1.0) { |prod,_| prod * (n -= 1)/t }).round(10)
#=> 1.39e-07


where t = C(62,10) and C(62,10) is defined below.




An alternative



There is a really simple way to do this that turns out to be pretty efficient: just sample without replacement until a sample is found that meets the requirement of at least lowercase letter, one uppercase letter and one digit. We can do that as follows:



DOWN   = ('a'..'z').to_a
UP = ('A'..'Z').to_a
DIGITS = ('0'..'9').to_a
ALL = DOWN + UP + DIGITS


def rnd_str
loop do
arr = ALL.sample(10)
break arr.shuffle.join unless (DOWN&&arr).empty? || (UP&&arr).empty? ||
(DIGITS&&arr).empty?
end
end

rnd_str #=> "3jRkHcP7Ge"
rnd_str #=> "B0s81x4Jto



How many samples must we reject, on average, before finding a "good" one? It turns out (see below if you are really, really interested) that the probability of getting a "bad" string (i.e, selecting 10 characters at random from the 62 elements of all, without replacement, that has no lowercase letters, no uppercase letters or no digits, is only about 0.15. (15%). That means that 85% of the time no bad samples will be rejected before a good one is found.



It turns out that the expected number of bad strings that will be sampled, before a good string is sampled, is:



0.15/0.85 =~ 0.17


The following shows how the above probability was derived, should anyone be interested.




Let n_down be the number of ways a sample of 10 can be drawn that has no lowercase letters:



n_down = C(36,10) = 36!/(10!*(36-10)!)


where (the binomial coefficient) C(36,10) equals the number of combinations of 36 "things" that can be "taken" 10 at a time, and equals:



C(36,10) = 36!/(10!*(36-10)!) #=> 254_186_856



Similarly,



n_up = n_down #=> 254_186_856


and



n_digits = C(52,10) #=> 15_820_024_220



We can add these three numbers together to obtain:



n_down + n_up + n_digits #=> 16_328_397_932


This is almost, but not quite, the number of ways to draw 10 characters, without replacement, that contains no lowercase letters characters, uppercase letters or digits. "Not quite" because there is a bit of double-counting going on. The necessary adjustment is as follows:



n_down + n_up + n_digits - 2*C(26,10) - 3
#=> 16_317_774_459



To obtain the probability of drawing a sample of 10 from a population of 62, without replacement, that has no lowercase letter, no uppercase letter or no digit, we divide this number by the total number of ways 10 characters can be drawn from 62 without replacement:



(16_317_774_459.0/c(62,10)).round(2)
#=> 0.15

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...