1. 程式人生 > >如何生成均勻分佈隨機整數

如何生成均勻分佈隨機整數

前幾天在水木上看到一個帖子,問如何用硬體實現一個0-56的隨機數。這個問題初看起來不是很難,但是仔細想想還是蠻難實現的,尤其是希望能夠儘量少的佔用芯片面積時。

由這個問題,我想到另外一個稍微簡單一些的問題,就是如何在程式中生成一個[0, N-1] 的隨機整數。我們知道,C語言的標準庫中有個 rand() 函式,這個函式可以生成[0, RAND_MAX] 之間的隨機整數,並且理論上來說生成的隨機整數是均勻分佈的。我們就以此為基礎來構造一個[0, N-1] 的均勻分佈的隨機整數。

要生成[0, N-1] 的隨機整數,大多數的書上給出的方法是這樣的:

rand() % N

這樣生成的隨機數確實是在[0, N-1] 

之間,但是卻不一定是均勻分佈的。或者說只有當 可以寫為 2^m 的形式(m 為整數)時這樣生成的隨機數才是均勻的。因為rand() 實際上生成的是一個隨機位元序列,這個位元序列的每一位為0或為的概率是相等的。所以只有當[0, N-1]的隨機數可以用這個位元序列的子序列來表示的時候才是均勻分佈的。

也就是說用上面的方法可以直接生成 [0, 63] 的均勻分佈隨機數,但是卻無法生成[0, 56]的均勻分佈隨機數。

但是既然能生成 [0, 63] 的均勻分佈隨機數了,在這樣的隨機數中將抽樣結果落在 [57, 63] 的那一部分刨掉剩下的就是[0, 56]的均勻分佈隨機數了。按照這樣的思想,可以寫出下面的程式碼:

int x;
do
{
    x = rand() % 64;
}while ( x > 56);
cout << x endl;

按照類似的思路,可以寫出一個生成[0, N-1] 的均勻分佈隨機整數的函式。

int randn(int n)
{
    int max = RAND_MAX - RAND_MAX % n;
    int x;
    do
    {
        x = rand();
    }while ( x >= max );
    return x % n;
}

在我用的開發環境 (MinGW)上 RAND_MAX 為 0x7FFF,也就是

32767。當 n = 16385 時,上面函式的執行效率最低,大約要捨棄一半的rand() 函式的結果。下面是個小的測試程式。

int count = 0;
int randn(int n)
{
    int max = RAND_MAX - RAND_MAX % n;
    int x;
    do
    {
        x = rand();
        count ++;
    }while ( x >= max );
    return x % n;
}
int main()
{
    int i, x;
    srand(0);
    for(i = 0; i < 100000; ++i)
    {
        x = randn(16385);
    }
    cout << count << endl;
    return 0;
}

我這裡執行的結果是 199995。也就是為了產生10萬個隨機數,呼叫了差不多20萬次 rand() 函式。

另外,我在網上還找到了一個 Java 語言的實現。程式碼如下:
    /**
     * Returns a pseudorandom, uniformly distributed {@code int} value
     * between 0 (inclusive) and the specified value (exclusive), drawn from
     * this random number generator's sequence.  The general contract of
     * {@code nextInt} is that one {@code int} value in the specified range
     * is pseudorandomly generated and returned.  All {@code n} possible
     * {@code int} values are produced with (approximately) equal
     * probability.  The method {@code nextInt(int n)} is implemented by
     * class {@code Random} as if by:
     *  <pre> {@code
     * public int nextInt(int n) {
     *   if (n <= 0)
     *     throw new IllegalArgumentException("n must be positive");
     *
     *   if ((n & -n) == n)  // i.e., n is a power of 2
     *     return (int)((n * (long)next(31)) >> 31);
     *
     *   int bits, val;
     *   do {
     *       bits = next(31);
     *       val = bits % n;
     *   } while (bits - val + (n-1) < 0);
     *   return val;
     * }}</pre>
     *
     * <p>The hedge "approximately" is used in the foregoing description only
     * because the next method is only approximately an unbiased source of
     * independently chosen bits.  If it were a perfect source of randomly
     * chosen bits, then the algorithm shown would choose {@code int}
     * values from the stated range with perfect uniformity.
     * <p>
     * The algorithm is slightly tricky.  It rejects values that would result
     * in an uneven distribution (due to the fact that 2^31 is not divisible
     * by n). The probability of a value being rejected depends on n.  The
     * worst case is n=2^30+1, for which the probability of a reject is 1/2,
     * and the expected number of iterations before the loop terminates is 2.
     * <p>
     * The algorithm treats the case where n is a power of two specially: it
     * returns the correct number of high-order bits from the underlying
     * pseudo-random number generator.  In the absence of special treatment,
     * the correct number of <i>low-order</i> bits would be returned.  Linear
     * congruential pseudo-random number generators such as the one
     * implemented by this class are known to have short periods in the
     * sequence of values of their low-order bits.  Thus, this special case
     * greatly increases the length of the sequence of values returned by
     * successive calls to this method if n is a small power of two.
     *
     * @param n the bound on the random number to be returned.  Must be
     *        positive.
     * @return the next pseudorandom, uniformly distributed {@code int}
     *         value between {@code 0} (inclusive) and {@code n} (exclusive)
     *         from this random number generator's sequence
     * @exception IllegalArgumentException if n is not positive
     * @since 1.2
     */

    public int nextInt(int n) {
        if (n <= 0)
            throw new IllegalArgumentException("n must be positive");

        if ((n & -n) == n)  // i.e., n is a power of 2
            return (int)((n * (long)next(31)) >> 31);

        int bits, val;
        do {
            bits = next(31);
            val = bits % n;
        } while (bits - val + (n-1) < 0);
        return val;
    }
其中 next(31) 是生成一個31位元的隨機整數。判斷條件
while (bits - val + (n-1) < 0)
是怎麼來的我沒想明白,但是測試了一下結果沒有問題。

另外,這個程式碼中用

(n & -n) == n
來判斷 n 是否是 2 的整數次冪也很巧妙。要是讓我來寫,肯定寫不出這麼精彩的實現。

不過,這個程式碼的執行效率與我寫的那個簡單的程式碼基本相當,相比來說我那個程式碼還要更易讀一些。