[ACCEPTED]-random.choice not random-random

Accepted answer
Score: 13

It shouldn't be generating duplicates.

import random
chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def gen():
    return ''.join([random.choice(chars) for x in range(32)])

test = [gen() for i in range(100000)]
print len(test), len(set(test)) # 100000 100000

The 13 chances of duplicates is significant with 12 chars = "ab"; 126 duplicates in 1000000 11 iterations. It's nonexistant with 62.

That 10 said, this isn't a good way to generate 9 cookies, because session cookies need to 8 be unpredictable, to avoid attacks involving 7 stealing other people's session cookies. The 6 Mersenne Twister is not designed for generating 5 secure random numbers. This is what I do:

import os, hashlib
def gen():
    return hashlib.sha1(os.urandom(512)).hexdigest()

test = [gen() for i in range(100000)]
print len(test), len(set(test))

... which 4 should be very secure (which is to say, difficult 3 to take a string of session cookies and 2 guess other existing session cookies from 1 them).

Score: 4

This is definitely not a normal collision 1 scenario:

  • 32 characters with 62 options per character is equivalent to 190 bits (log2(62) * 32)
  • According to the birthday paradox, you should be receiving a collision naturally once every 2**95 cookies, which means never

Could this be a concurrency issue?

  • If so, use different random.Random instances for each thread
  • Can save these instances in thread-local storage (threading.local())
  • On linux, Python should seed them using os.urandom() - not system time - so you should get different streams for each thread.
Score: 1
  1. I don't know how your FCGI processes are 21 being spawned, but is it possible that it's 20 using fork() after the Python interpreter 19 has started (and the random module has been 18 imported by something), hence effectively 17 seeding two processes' random._insts from the same source?

  2. Maybe 16 put some debugging in to check that it is 15 correctly seeding from urandom, and not 14 falling back to the less rigorous time-based 13 seed?

eta re comment: man! That's me stumped 12 then; if the RNG always has different state 11 at startup I can't see how you could possibly 10 get collisions. Weird. Would have to put 9 in a lot of state logging to investigate 8 the particular cases which result in collisions, I 7 guess, which sounds like a lot of work trawling 6 through logs. Could it be (1a) the FCGI 5 server usually doesn't fork, but occasionally 4 does (maybe under load, or something)?

Or 3 (3) some higher-level problem such as a 2 broken HTTP proxy passing the same Set-Cookie 1 to multiple clients?

Score: 0

I had to erase my original answer, which 5 suggested that generator is not seeded from 4 /dev/urandom, since its source (for Python 3.x) clearly says 3 that it is:

def seed(self, a=None):
    """Initialize internal state from hashable object.

    None or no argument seeds from current time or from an operating
    system specific randomness source if available.

    If a is not None or an int or long, hash(a) is used instead.
    """

    if a is None:
        try:
            a = int(_hexlify(_urandom(16)), 16)
        except NotImplementedError:
            import time
            a = int(time.time() * 256) # use fractional seconds

    super().seed(a)
    self.gauss_next = None

I therefore humbly accept that 2 there are mysteries in the world that I 1 may not be able to decipher.

More Related questions