[ACCEPTED]-Difference between random draws from scipy.stats....rvs and numpy.random-scipy

Accepted answer
Score: 16

scipy.stats.uniform actually uses numpy, here 32 is the corresponding function in stats (mtrand 31 is an alias for numpy.random)

class uniform_gen(rv_continuous):
    def _rvs(self):
        return mtrand.uniform(0.0,1.0,self._size)

scipy.stats 30 has a bit of overhead for error checking 29 and making the interface more flexible. The 28 speed difference should be minimal as long 27 as you don't call uniform.rvs in a loop 26 for each draw. You can get instead all random 25 draws at once, for example (10 million)

>>> rvs = stats.uniform.rvs(size=(10000, 1000))
>>> rvs.shape
(10000, 1000)

Here 24 is the long answer, that I wrote a while 23 ago:

The basic random numbers in scipy/numpy 22 are created by Mersenne-Twister PRNG in 21 numpy.random. The random numbers for distributions 20 in numpy.random are in cython/pyrex and 19 are pretty fast.

scipy.stats doesn't have 18 a random number generator, random numbers 17 are obtained in one of three ways:

  • directly 16 from numpy.random, e.g. normal, t, ... pretty 15 fast

  • random numbers by transformation of 14 other random numbers that are available 13 in numpy.random, also pretty fast because 12 this operates on entire arrays of numbers

  • generic: the 11 only generic generation random number generation 10 is by using the ppf (inverse cdf) to transform 9 uniform random numbers. This is relatively 8 fast if there is an explicit expression 7 for the ppf, but can be very slow if the ppf 6 has to be calculated indirectly. For example 5 if only the pdf is defined, then the cdf 4 is obtained through numerical integration 3 and the ppf is obtained through an equation 2 solver. So a few distributions are very 1 slow.

Score: 6

I ran into this today and just wanted to 17 add some timing details to this question. I 16 saw what joon mentioned where, in particular, random 15 numbers from the normal distribution were 14 much more quickly generated with numpy than from 13 rvs in scipy.stats. As user333700 mentioned there is some 12 overhead with rvs but if you are generating 11 an array of random values then that gap 10 closes compared to numpy. Here is a jupyter 9 timing example:

from scipy.stats import norm
import numpy as np

n = norm(0, 1)
%timeit -n 1000 n.rvs(1)[0]
%timeit -n 1000 np.random.normal(0,1)

%timeit -n 1000 a = n.rvs(1000)
%timeit -n 1000 a = [np.random.normal(0,1) for i in range(0, 1000)]
%timeit -n 1000 a = np.random.randn(1000)

This, on my run with numpy version 8 1.11.1 and scipy 0.17.0, outputs:

1000 loops, best of 3: 46.8 µs per loop
1000 loops, best of 3: 492 ns per loop
1000 loops, best of 3: 115 µs per loop
1000 loops, best of 3: 343 µs per loop
1000 loops, best of 3: 61.9 µs per loop

So just generating 7 one random sample from rvs was almost 100x 6 slower than using numpy directly. However, if 5 you are generating an array of values than 4 the gap closes (115 to 61.9 microseconds).

If 3 you can avoid it, probably don't call rvs to 2 get one random value a ton of times in a 1 loop.

More Related questions