1 Scrambler

Scrambler is also called randomizer. It is usually the first block in many communication systems, which 'randomizes' the source data (e.g., the binary sequence). Why do we want to randomize the data? One example is that the randomized data can help the receiver to achieve the symbol synchronization. For example, when you send a signal as shown in Fig. 1

Fig.1. The illustration of a transmitted binary sequence.

in an ideal case, the receiver sees the same signals above. Then, the receiver can sample the signal at arbitrary phase (indicated by the dash lines in Fig. 2) to recover the transmitted data.

Fig.2. Sample the ideal signal at the receiver side, where the dash lines indicate the sampling time.

In this ideal case, the receiver can arbitrarily down-sample the received sequence to recover the transmitted bits. In practice, when sending signal as shown in Fig. 1, the received signal may look like

Fig.3. Sample the non-ideal signal at the receiver side.

It is easy to see that the sampling time (or phase) indicated by red dash line is better than the blue dash line, since the signal at red sampling time is generally larger than the blue one. Thus, for the same noise, the SNR (signal noise ratio) will be higher. How could the receiver know where is the current sampling position? For example, is it close to the blue dash lines or the red dash lines? As we will shown later, one solution is to monitor the zero-crossing positions. For example, the zero-crossing position should be in the middle of two adjacent symbols, which leads to the worst sampling position. In this example, we assume the receiver knows the interval between two neighbour symbols (i.e., \(T\)), but not the sampling phase. The receiver can detect all the zero-crossing positions in the received signals, and adjust its sampling phase such that all the sampling positions are as far away from the zero-crossing position as possible. For example, if the receiver detect a zero-crossing position at \(t_0\), since the symbol interval is \(T\), then the next possible zero-crossing position will be \(t_0+T\), and the sampling position will be chosen as \(t_0+T/2\) (far away from its neighbour zero-crossing positions).

With that said, if the original data has a long run of 1 or -1, during that period, there will be no zero-crossing position. In other words, the receiver can not adjust its sampling phase accordingly. By applying the scrambler, we can decrease the probability of such annoying long run of 1 or -1.

Another benefit of the scrambler block is to reduce the correlation in the data, that is, to make the data more random. For example, if the data is a sine wave with frequency \(f\), then the majority of its power will locate in a narrow band (around \(f\)). It is not an efficient use of the available signal band (usually much larger than the band occupied by such sine wave). If there is a noise near frequency \(f\)1, the signal will be totally lost. By applying the scrambler, we can spread the signal power to all available signal band, which makes the system less sensitive to such narrow band noise. One way to think about it is that imaging you have 10 eggs, and 10 bags. If you put all 10 eggs in one bag, you lose all your eggs when that bag is broken. The scrambler will randomly distribute your eggs to all bags. In this case, even if some bag is broken, you will still have 'enough' eggs left.

1.1 Additive scrambler

One way to randomize the data is by adding a random binary sequence to the data. For example, for each input bit, retrieve 1 bit from the random sequence and add it to the input bit (mod 2, such that the result is still 1 bit), that is

$$ \begin{align} y[n] = \mathrm{mod}(x[n] + s[n], 2), \end{align} $$

where \(x[n]\) is the input binary sequence, \(s[n]\) is the random binary sequence, and \(y[n]\) is the output sequence. It is easy to see it is equivalent to the xor (\(\oplus\)) operator,

$$ \begin{align} y[n] = x[n] \oplus s[n]. \label{eqn:scrambler_encode} \end{align} $$

Does the above scrambler guarantee to eliminate the long run of 1 or 0? Actually not. It is easy to find a data sequence such that the output from the scrambler is constant (e.g, 1) (do you see how?). Wait, if the scrambler does not guarantee to eliminate the long run of 1/0, why do we still use it? The answer is the probability. For example, if we toss one fair coin (i.e., the probability of head/tail from each toss is 0.5), you will still have a chance to get a long run of heads2. In a fortunate (or unfortunate) case, when a fair coin is tossed 100 times, the longest run of heads/tails may exceed 10, but the probability is small (e.g., \(\sim 2.2\%\)). However, if the coin is biased (e.g., head is more likely than tail for each toss), then the probability that the longest heads/tails exceeds 10 will be much higher. What the scrambler does is to make the biased coin (input to the scrambler block) unbiased (output of the scrambler block)3

Let's see how it works. Assume at time \(n\), for a biased coin \(x\), the probability to get head is \(p_x[n]\). Then the probability to get tail is \(1-p_x[n]\). And now we choose another fair coin \(s\), that is, \(p_s[n] = 0.5\). At time \(n\), we toss both coins, and

  1. if both coins are heads or tails, output tail;
  2. if one is head and the other is tail, output head.
Do you see the above procedure is same as the 'mod 2' adder mentioned in Eq. \ref{eqn:scrambler_encode}?

What's the probability to get head as output? It is easy to see that

$$ \begin{align} p_y[n] &= p_s[n]\cdot (1-p_x[n]) + (1-p_s[n])\cdot p_x[n]\nonumber\\ &= 0.5(1-p_x[n]) + 0.5p_x[n]\nonumber\\ &= 0.5. \end{align} $$

Wow, it is a fair coin! The nice thing is that it does not make any assumption about the probability of the biased coin (\(p_x[n]\)). It does not matter whether each toss is independent (but biased), or correlated (e.g., the probability of the current toss may be affected by the previous toss.), the output will be same as the one from a fair coin.

The next question is how to de-scramble the randomized data? It does not make any sense if you can not recover the original data from the randomized out. Fortunately, it turns out the de-scrambling is very easy, which can be achieved by applying the same scrambler again. That is, at the receiver side, the data can be retrieved by4

$$ \begin{align} x[n] = \mathrm{mod}(y[n] + s[n], 2). \label{eqn:additive_descramble} \end{align} $$

The only thing left is how to create a random sequence (\(s[n]\)) at both the transmitter and receiver side. In practice, pseudo random binary sequence (PRBS) is widely used in many systems to generate a 'random' sequence. PRBS can be generated in advance and stored in a lookup table. Or, it can be generated by a linear feedback shift register (LFSR). Fig. 4 shows a LFSR corresponding to generator polynomial \(1+x^6+x^7\), and its current status is \(b[1]\sim b[7]\). It is easy to see that given the generator polynomial and the status, its output is fully determined.

To generate useful PRBS, the initial status of LFSR should not be all zeros. Can you see why?
Fig.4. Structure of a LFSR, where the generator polynomial is \(1+x^6+x^7\).

How long should we choose the length of the PRBS? Apparently, if it is too short, then the PRBS itself is not random enough to randomize the input. In an extreme case, if the length of the PRBS is 1, then \(s[n]==1\). In this case, \(y[n]=x[n]\oplus 1 = 1-x[n]\). The scrambler has no effect on the data at all. The one used in 802.11a standard is 127 bits with generator polynomial \(1+x^4+x^7\). The naive LFSR can be implemented in python as

def gen_prbs(p, s0, n):
    naive code to generate the pseudo-random binary sequence

        p: the generator polynomials, for example
            x^7 + x^6 + 1: n.array([0, 0, 0, 0, 0, 1, 1])
        s0: the initial state, same size as p
        prbs: the prbs array with length n
        s: the current state of the LFSR
    prbs = np.zeros((n,))
    s = s0
    for i in range(n):
        a = np.mod(np.sum(p*s), 2)
        s = np.roll(s, 1)
        s[0] = a
        prbs[i] = a
    return (prbs, s)

The output of the LFSR is fully determined by the generator polynomial and the initial status. To generate the proper PRBS, the receiver needs to know both. The generator polynomial is usually not a problem, since it is generally determined by the standard and both the transmitter and receiver know in advance. Potentially, the same way can be used for the receiver to determine the initial status, for example, by predefining a initial status. However, such method may not fully utilized the PRBS. Should the transmitter 'randomly' choose the initial phase, the PRBS would less likely be correlated to the input sequence. In this case, some way should be developed to synchronize the initial status between the transmitter and the receiver. For many communication systems, besides the data, the transmitter will also send additional information to the receiver, e.g., the data length, status, etc. Thus, initial status may be embedded in such field, so that the receiver can retrieve it before receiving the data block. For example, as shown in Eq. (\ref{eqn:scrambler_encode}), if \(x[n]=0\), the output \(y[n]\) will be equal to \(s[n]\),

$$ \begin{align} y[n] &= x[n] \oplus s[n]\nonumber\\ &= 0 \oplus s[n]\nonumber\\ &= s[n]. \end{align} $$

Thus, if we prepend enough zeros to the data bits, we can easily send the initial status to the receiver. For example, for the PRBS shown in Fig. 4, we may prepend 7 zeros, so the data (\(x^\prime[n]\)) may look like

$$ \begin{align} 0, 0, 0, 0, 0, 0, 0, x[0], x[1], \cdots \end{align} $$

The LFSR is initialized with arbitrary initial status as shown in Fig. 5, where \(X\) in the block indicated either 1 or 0.

Fig.5. Structure of a LFSR with arbitrarily initial status, where the generator polynomial is \(1+x^6+x^7\).

At time \(n=0\), \(s[0]\) is retrieved from the LFSR, and \(y[0] = 0 \oplus s[0] = s[0]\). \(s[0]\) is also sent to \(b[1]\) (i.e., \(b[1] = s[0]\)). The status of LFSR is shown in Fig. 6.

Fig.6. Status LFSR at time \(n=0\), where \(x^\prime[0]=0\). The current output \(s[0]\) is shown in red.

Similarly, at time \(n=1\) (Fig. 7), LFSR outputs \(s[1]\), and \(y[1] = 0 \oplus s[1] = s[1]\). Simultaneously, \(s[1]\) is sent to \(b[1]\) (\(b[1] = s[0]\)), and the original value in \(b[1]\) is shifted to \(b[2]\) (i.e., \(b[2]=s[0]\)).

Fig.7. Status LFSR at time \(n=1\), where \(x^\prime[1]=0\) and current output \(y[1]=s[1]\).

Following this procedure, at time \(n=6\), the status of LFSR will look like

Fig.8. Status LFSR at time \(n=6\), where \(x^\prime[6]=0\) and current output \(y[6]=s[6]\).

So far, we have output 7 bits (i.e., \(y[0]\sim y[6]\)), which are same as the ones in LFSR buffer.

Then, at time \(n=7\), when the actual data bit comes (\(x^\prime[7]=x[0]\)), the status of the LFSR is as shown in Fig. 8. In other words, the initial status of the LFSR for the data bits is \(s[0]\sim s[6]\) (or \(y[0]\sim y[6]\)). Thus, at the receiver side, it can simply use the first 7 received bits as the initial status of its LFSR to descramble the following data bits.

One nice thing to note about Eq. (\ref{eqn:additive_descramble}) is that the descrambler output \(x[n]\) only relies on the current input \(y[n]\), not any other inputs. Thus, if there is some error in \(y[n]\), the error is localized, which will only affect the current output \(x[n]\).

The implementation is straightforward. The naive code may look like

def scramble_add(x, p, s0):
    randomize the data with additive scrambler

        x: input data, 1-dim numpy array
        p: the generator polynomials
        s0: the initial state of the LFSR
        y: the randomized data (same length as x)
        s: the current state of the LFSR
    # generate the PRBS with same length as x
    prbs, s = gen_prbs(p, s0, x.shape[0])
    # add (mod 2) the PRBS to the data
    y = np.mod(x+prbs, 2)
    return (y, s)

Let's play with it to see how it works

>>> # generate binary data sequence
>>> x = (np.random.rand(100)+0.5).astype(int)
>>> # define the generator polynomial
>>> p = np.array([0, 0, 0, 0, 0, 1, 1])
>>> # and the initial state of the LSFR
>>> s0 = np.array([0, 0, 0, 0, 0, 0, 1])
>>> # randomize the data
>>> y, s = scramble_add(x, p, s0)

We can do a sanity check by scrambling output \(y\) with the same setup, which should give us the original data \(x\)

>>> np.sum(scramble_add(y, p, s0)[0]-x)

1.2 Multiplicative scrambler

As shown above, the additive scrambler generally needs to send the initial status to the receiver before it can de-scramble the data. If the initial status is not received correctly, the de-scrambling will be totally wrong. The multiplicative scrambler eliminates such requirement. It is achieved by 'randomizing' the data by itself only. Thus, it is also called self-synchronizing scrambler. For example, one such scrambler may look like

$$ \begin{align} y[n] = y[n-14] + y[n-17] + x[n]. \label{eqn:scramble_mul} \end{align} $$

Its structure may look like

Fig.9. Structure of a multiplicative scrambler, where the generator polynomial is \(1+x^{14}+x^{17}\).

Eq. (\ref{eqn:scramble_mul}) shows that the multiplicative scrambler may be implemented by an IIR filter. The only difference between the classical IIR filter and Eq. (\ref{eqn:scramble_mul}) is that here all the additions are mod 2. Could we implement the above scrambler by connecting a conventional IIR filter with a mod 2 block in serial?

And the de-scrambler is straightforward. For the one shown in Eq. (\ref{eqn:scramble_mul}), the corresponding de-scrambler is

$$ \begin{align} x[n] = y[n] - y[n-14] - y[n-17]. \label{eqn:scramble_mul_de} \end{align} $$

Since with mod 2 addition, subtraction is same as addition, Eq. (\ref{eqn:scramble_mul_de}) can be written as

$$ \begin{align} x[n] = y[n] + y[n-14] + y[n-17]. \label{eqn:scramble_mul_de2} \end{align} $$

And its structure may look like

Fig.10. Structure of a multiplicative descrambler, where the generator polynomial is \(1+x^{14}+x^{17}\).

As we mentioned above, the error in the additive scrambler is localized. However, it is not true for the multiplicative scrambler. Eq. (\ref{eqn:scramble_mul_de2}) shows that the current output \(x[n]\) relies both on the current input \(y[n]\) and the previous inputs \(y[n-14]\), \(y[n-17]\). Thus, if there is any errors in all this received data, the current output \(x[n]\) may be wrong5. This is called error propagation.

For the above example, how many outputs \(x\) will be effected if the current input \(y[n]\) is wrong? You may find it helpful to write all the equations of \(x\) that involve \(y[n]\) (Eq.(\ref{eqn:scramble_mul_de2}) is one example).

Different from the additive scrambler, the multiplicative scrambler only uses the data to randomize. If the bits from the input data are highly correlate to each other, the output may not be well randomized. In practice, you may need to detect the long sequence of 1 or 0 at the output, and use some heuristics to break it (e.g., invert the next input bit).

The multiplicative scrambler may be implemented as

def scramble_mul(x, p, z0=None, mode='normal'):
    randomize the data with multiplicative scrambler

        x: input data, 1-dim numpy array
        p: the generator polynomial
        z0: the initial buffer status
        mode: 'normal' --> scramble
              'inverse' --> descramble
        y: the randomized data (same length as x)
        z: the buffer status
    if z0:
        z = z0
        z = np.zeros(p.shape)
    y = np.zeros(x.shape)
    if mode == 'normal':
        for i in range(x.shape[0]):
            y[i] = np.mod(np.sum(p*z)+x[i], 2)
            z = np.roll(z, 1)
            z[0] = y[i]
    elif mode == 'inverse':
        for i in range(x.shape[0]):
            y[i] = np.mod(np.sum(p*z)+x[i], 2)
            z = np.roll(z, 1)
            z[0] = x[i]
    return (y, z)

Let's play with it to see how it works. After scrambling and descrambling, we should get the original data.

>>> # generate binary data sequence
>>> x = (np.random.rand(100)+0.5).astype(int)
>>> # define the generator polynomial
>>> p = np.array([0]*13 + [1] + [0]*2 + [1])
>>> y, s = scramble_mul(x, p)
>>> np.sum(scramble_mul(y, p, mode='inverse')[0]-x)

2 Interleaver

In practice, the communication system usually contains some kind of error correction mechanism (e.g., forward error correction (FEC)). So that if some errors occur in the received data, they can be detected or corrected. Generally such error correction method can only deal with certain amount of error bits within a window (e.g., every \(M\) bits). For example, every time, the error correction block will take \(M\) bits, then detect or correct the error bits. However, if the total number of errors exceed the limit, it will fail to correct/detect the error.

Considering two cases, where the total number of bit errors are same (e.g., \(N_e\)),

  • in one case, such \(N_e\) bits errors are uniformly distributed along the whole window (e.g., \(N\)),
  • in the other case, such \(N_e\) bits errors are concentrated in one error correction block window,

Which case is easy to deal with? For the first case, the error correction block needs to correct roughly \(\frac{M}{N}N_e\) errors. However, for the second case, the error correction block needs to correct \(N_e\) errors, which is apparently a much harder problem.

Interleaver is used to deal with the burst errors: we can not stop the burst errors, but we can spread the burst errors to a much wilder window, such that in each error correction block window, the number of error bits may be still within the limit.

2.1 Block interleaver

Block interleaver can be easily understood with an example. As shown in Fig. 11, a \(3\times 4\) block interleaver can be viewed as a \(3\times 4\) matrix .

Fig.11. Illustration of a block interleaver with size \(3\times 4\).

For example, when the input (\(x[0]\sim x[11]\)) comes, it will fill the 1st row first, thus, the first row will contain \(x[0]\), \(x[1]\), \(x[2]\), and \(x[3]\). Then the 2nd row, so on and so forth. At time \(n=9\), the buffer will look like (Fig. 12)

Fig.12. Status of interleaver buffer at time \(n=9\).

Once the matrix is filled, the first column will be output first, that is, \(x[0]\), \(x[3]\), \(x[6]\). Then the 2nd and 3rd columns. In summary, the output will be

$$ \begin{align} y[0]&=x[0], y[1]=x[4], y[2]=x[8],\nonumber\\ y[3]&=x[1], y[4]=x[5], y[5]=x[9],\nonumber\\ y[6]&=x[2], y[7]=x[6], y[8]=x[10],\nonumber\\ y[9]&=x[3], y[10]=x[7], y[11]=x[11]. \end{align} $$
Block interleaver demo. Click on the grid to start the demo.

The deinterleaving is straightforward: just swap the input and output order. In particular, for the example in Fig. 11, when the input (\(y[0]\sim y[11]\)) comes, it will fill the 1st column first, then the 2nd column...

Block deinterleaver demo. Click on the grid to start the demo.

It is easy to see in general, the interleaver/deinterleaver will not be able to output any data until its internal 'matrix' is filled, which introduces a delay. The delay is proportional to the interleaver size (\(N\)). The interleaver/deinterleaver also needs to have enough memory to hold all \(N\) samples.

Now, if there is a 3 bits burst error, e.g., \(y[1]\), \(y[2]\), \(y[3]\)

$$ \begin{align} y[0], {\color{red}y[1]}, {\color{red}y[2]}, {\color{red}y[3]}, y[4], y[5], y[6], y[7], y[8],y[9], y[10], y[11], \end{align} $$

that is

$$ \begin{align} x[0], {\color{red}x[4]}, {\color{red}x[8]}, {\color{red}x[1]}, x[5], x[9], x[2], x[6], x[10], x[3], x[7], x[11], \end{align} $$

where the error bits are indicated by red color. After deinterleaving, the data will look like

$$ \begin{align} x[0], {\color{red}x[1]}, x[2], x[3], {\color{red}x[4]}, x[5], x[6], x[7], {\color{red}x[8]},x[9],x[10],x[11]. \end{align} $$

Thus, the burst errors are spread out across the whole interleaver block.

The interleaving can be implemented in python as

def interleaver_block(x, n):
    block interleaver

        x: 1-dim numpy array
        n: the dimension of the interleaver, the length of x should be integral multiples of n
        y: the interleaved data
    y = np.reshape(x, (n, -1)).T.flatten()
    return y

Same code can also be used for de-interleaving. In particular, if the interleaver size is \(N\) (\(= R\times C\)), the interleaving can be done by

>>> # assume len(x) = C*R
>>> y = interleaver_block(x, C)

then the de-interleaving can be done by

>>> y = interleaver_block(x, R)

In ideal case, after interleaving and deinterleaving, you should get the original data bits

>>> x = np.arange(1000)
>>> # interleaver
>>> y = interleaver_block(x, 20)
>>> # deinterleaver
>>> xd = interleaver_block(y, 50)
>>> np.allclose(x, xd)

2.2 Convolutional interleaver

Convolutional interleaver is also called multiplexed interleaver. Its main structure is shown in Fig. 13.

Fig.13. Illustration of a convolutional interleaver.

For example, at time \(n=0\), both input (\(x[0]\)) and output (\(y[0]\)) are connected to the first row. And since there is no shift register in first row, \(y[0] = x[0]\). At time \(n=1\), both input (\(x[1]\)) and output (\(y[1]\)) are connected to the second row. There is a shift register \(s_{10}\) in second row. In this case, \(s_{10}\) is output as \(y[1]\), while \(x[1]\) is shifted into \(s_{10}\), that is

$$ \begin{align} y[1] &= s_{10},\nonumber\\ s_{10} &= x[1]. \end{align} $$

Similarly, at \(n=2\), both the input and output are connected to the third row.

$$ \begin{align} y[1] &= s_{21},\nonumber\\ s_{21} &= s_{20},\nonumber\\ s_{20} &= x[2]. \end{align} $$

So for each input sample, \(x[n]\) is shifted into the shift register in current row, and the sample shifted out of the registers will be sent to the output. After that, both the input and output will move to the next row until the last row, when they will move to the first row again.

Convolutional interleaver demo. Click on the Reset button to reset the demo, and elsewhere to get the next input.

The de-interleaving (e.g., Fig. 14) is almost identical to the interleaving operation. The only difference is that the shift registers are reversed in row order. For example, in the de-interleaver, the last row will be the direct row (without any shift register).

Fig.14. Illustration of a convolutional de-interleaver.
Convolutional deinterleaver demo. Click on the Reset button to reset the demo, and elsewhere to get the next input. The input is the output from the above convolutional interleaver output.

To see why the de-interleaver works, we can count the total delay of a particular input, e.g., \(x[1]\). At the interleaver \(x[1]\) is connected to the second row, where there is 1 register. Then, at the de-interleaver, \(x[1]\) is also connected to the second row (why?), where there are 3 registers. The total delay is \((1+3)*5=20\) samples. It is easy to see that it is true to all inputs. In other words, after the interleaver and de-interleaver processes, each samples will be delayed by 20 samples. So their relative positions do not change. In general, the total delay can be calculated as

$$ \begin{align} d = R \times S_R, \end{align} $$

where \(R\) is the number of rows in interleaver or de-interleaver (e.g., 5 in Fig. 14), and \(S_R\) is the number of shift register in row \(R\) (e.g., 4 in Fig. 14).

def interleaver_conv(x, nrows, slope, state=None, mode='normal'):
    convolutional interleaver

        x: 1-dim numpy arry
        nrows: the number of rows
        slope: the ith row will have i*slope registers
        state: a dict
            'value': the initial value of the internal registers
            'index': the initial index
            'normal': interleaver
            'deintlv': de-interleaver
    if state == None:
        # the only difference between the interleaver and de-interleaver is the
        # order of the shift registers.
        if mode == 'normal':
            value = [np.zeros(n*slope) for n in range(nrows)]
        elif mode == 'deintlv':
            value = [np.zeros(n*slope) for n in reversed(list(range(nrows)))]
        index = 0
        value = state['value']
        index = state['index']
    y = np.zeros(x.shape)
    for r in range(nrows):
        # process the data in each row
        n = np.mod(r + index, nrows)
        xn = np.hstack([value[n], x[n::nrows]])
        y[n::nrows] = xn[:y[n::nrows].shape[0]]
        value[n] = xn[y[n::nrows].shape[0]:]
    index = np.mod(x.shape[0] + index, nrows)
    state = {'value': value, 'index':index}
    return (y, state)

And we can pass the data through a interleaver and a de-interleaver to get the original data (except the delay)

>>> x = np.arange(1000)
>>> # interleaver
>>> y, istate = interleaver_conv(x, 3, 4)
>>> # de-interleaver
>>> xd, dstate = interleaver_conv(y, 3, 4, mode='deintlv')
>>> # calculate the delay from the interleaver and de-interleaver, which is
>>> # equal to the total number of the shift registers
>>> delay = int(2*4*(0+3-1)*3/2)
>>> np.allclose(x[:-delay], xd[delay:])

3 Mapping

The purpose of the transmitter is to modulate the data bits to the carrier signal by varying some of its properties (e.g., amplitude, phase or frequency). Mapping is to convert the data bits to symbols (or points) on the constellation, such that it is easy to modulate the carrier.

Suppose the following mapping is used to send the data bits, that is, every 2 data bits is mapped to a point on the constellation (Fig. 15).

Fig.15. Constellation for 4-ASK (amplitude shift keying).

It is equivalent to

Table.1. Illustration of a mapping scheme.
Data bitsPoint on constellation

And further suppose we have an 'ideal' channel, that is, the signal is only impacted by an additive noise

$$ r = s + n, $$

where \(r\) and \(s\) the received and transmitted signal, respectively, \(n\) is a additive white Gaussian noise (AWGN \(\sim\mathcal{N}(0, \sigma)\). In this example, the signal \(s\) will be either 0, 1, 2, or 3, which is determined by the data bits.

What's the problem of the above mapping scheme?

If we want to send signal 2 (e.g., \(s = 2\) or data bits b'10 ), what's the probability of receiving a signal \(r=1\) at the receiver, compared to the probability of receiving a signal with amplitude 0? It is easy to see that the former should be much higher. In particular, receiving the signal with smaller amplitude (e.g., \(r= 0\)), which implies that signal is impacted by a larger noise (e.g., \(n=-2\)), is less likely to happen.

For example, if the transmitted signal is 2 (i.e., bits b'10), the number of bit errors vs the received signal will be

Table.2. Bit error of the corresponding received signal when 2 (bits b'10) is sent, where the error bits are shown in red.
received signal (r)bits after de-mapping# of bit errors

Thus, if the received signal is 1 (i.e., the noise is -1), there will be 2 bit errors (e.g., b'10 \(\Rightarrow\) b'01). However, if the received signal is 0, which is resulted from a much larger noise (i.e., -2), there will be only 1 bit error (i.e., b'10 \(\Rightarrow\) b'00 ). That's not intuitive, and does not look like a good idea6. It's the main reason to use Gray code. As to be shown below, Gray code guarantees to have only 1 bit difference for adjacent codes.

There is an easy way to generate a Gray code from the binary coded decimal (BCD). Let \(g\) be the corresponding Gray code of N bits BCD \(b\), then one corresponding Gray code can be generated by (Fig. 16)

$$ \begin{align} g_i = \begin{cases} b_i & i=N-1\\ b_{i+1}\oplus b_i & 0\leq i<N-1 \end{cases}, \label{eqn:bcd2gray} \end{align} $$

where \(g_i\) and \(b_i\) are \(i^{th}\) bit of Gray and BCD codes, respectively.

Fig.16. Conversion from BCD to Gray code.

For the above example, its corresponding Gray code is

Table.3. 2 bits BCD to corresponding Gray.

Fig. 17 shows the constellation with Gray code, where \(x\)-axis is the BCD code and text shows its corresponding Gray code.

Fig.17. Constellation for 4-ASK (amplitude shift keying) with Gray code.

You can see that any adjacent Gray codes only differ in one bit. How does BCD to Gray mapping work? First, it is easy to see that the mapping is one to one

  1. apparently, one BCD code \(b\) will only generate one Gray code \(g\);
  2. if \(b1\neq b2\), then \(g1\neq g2\). For example, let \(i\) be the first position (when scanning from MSB) such that \(b1_i\neq b2_i\). Then, from Eq. (\ref{eqn:bcd2gray}), \(g1_i \neq g2_i\).

Next, we want to show that the there is only one bit difference between the corresponding Gray codes of two adjacent BCD codes. Let \(j\) be the first position (when scanning from LSB) of BCD \(b\), such that \(b[i]==0\). For example, \(b\) can be written in binary form as

$$ \begin{align} b_{N-1}, b_{N-2}, \cdots, b_{j+1}, 0, 1, \cdots, 1. \end{align} $$

Then, the next BCD code (\(b+1\)) can be written as

$$ \begin{align} b_{N-1}, b_{N-2}, \cdots, b_{j+1}, 1, 0, \cdots, 0. \end{align} $$

It is easy to see that their corresponding Gray code only differ in one position, that is, at \(j^{th}\) position.

Gray code is not unique. Can you find other mapping schemes?

Eq. (\ref{eqn:bcd2gray}) also tells us how to convert the Gray code back to BCD code

$$ \begin{align} b_i = \begin{cases} g_i & i=N\\ g_i\oplus b_{i+1} & 0\leq i<N \end{cases}. \label{eqn:gray2bcd} \end{align} $$

The procedure is shown in Fig. 18.

Fig.18. Conversion from Gray to BCD code.

So far, we talk about how to generate the Gray code from BCD, which is generally one-dimensional. However, most QAM (quadrature amplitude modulation) constellation will be 2-dimensional. How could we generate the 2-dimensional Gray code? It turns out that if the bits number of each symbol is even, there is an easy way:

  1. split the source bits (e.g., 2N bits) into two groups evenly (i.e., each with N bits). For example, if the input is 4 bits, then each group will have 2 bits. For example,

    • group 1: \(b_3\), \(b_2\);
    • group 2: \(b_1\), \(b_0\).
    Thus, in this case, 4-bits input b'1011 will be split into:
    • group 1: b'10
    • group 2: b'11

    Basic, what we did here is to split the source bits in to two independent groups. It makes sense since the \(x\)- and \(y\)- axis on the 2-dimensional constellation are orthogonal to each other.

  2. Apply the "Gray to BCD" mapping (Eq. (\ref{eqn:gray2bcd})) for each group. For the above example, the corresponding Gray code for each group is

    group 1b'10b'11
    group 2b'11b'10

    Why use the Gray to BCD mapping, instead of the BCD to Gray mapping? For each point on the constellation, its coordinates can be viewed as BCD code, while the data bits it represents are Gray code. In this case, we have the data bits (Gray code), and need to find its location on the constellation (BCD code) with the Gray to BCD mapping. Thus, b'1011 will be sent as position (3(b'11), 2(b'10)) on the constellation (Fig. 19), and adjacent points only have 1 bit difference.

    Fig.19. 16 QAM Gray code constellation.
def bcd2gray(b):
    get the corresponding Gray from the BCD code

        b: BCD code

    return: its Gray BCD
    c = np.roll(b, 1)
    c[0] = 0
    return np.logical_xor(b, c).astype(int)
def gray2bcd(G):
    get the corresponding BCD from the Gray code

        G: Gray code

    return: its corresponding BCD
    b = np.copy(G)
    for i in range(1, b.shape[0]):
        b[i] = np.logical_xor(b[i-1], G[i])
    return b.astype(int)
def gen_gray_table(n):
    generate 2-d Gray constellation table

        n: the number of bits
        g: the mapping from Gray code (source bits) to BCD (symbol (x, y) on the
           constellation), where ith row is for BCD(i)
    # generate the binary representation of BCD
    power_of_two = 2**np.arange(n-1,-1,-1)
    t = (np.arange(2**n)[:, np.newaxis] & power_of_two) / power_of_two

    # split into two groups
    g1 = t[:, 0:n/2]
    g2 = t[:, n/2:]
    g1 = np.apply_along_axis(gray2bcd, axis=1, arr=g1)
    g2 = np.apply_along_axis(gray2bcd, axis=1, arr=g2)
    g1 =, 2**np.arange(n/2-1, -1, -1))
    g2 =, 2**np.arange(n-n/2-1, -1, -1))
    g = np.stack([g1, g2]).T
    return g

4 Random Number

4.1 Uniform Random Number

Most software package/library include function to generate uniform random number; but occasionally we may need to generate manually (e.g., to get a consist result on different platforms). One simple way is to use linear congruential generator. Its general format is

$$ \begin{align} x(n+1) = (a*x(n) + c) \mod m, \end{align} $$

where \(a\), \(c\) and \(m\) are constant, and \(x(n)\) is the last random number. There are many popular choices of these parameters; one implementation shown in man srand looks like

static unsigned long next = 1;

/* RAND_MAX assumed to be 32767 */
int myrand(void) {
    next = next * 1103515245 + 12345;
    return((unsigned)(next/65536) % 32768);

void mysrand(unsigned int seed) {
    next = seed;

It will give you a pseudo random number between 0 and RAND_MAX. If you want to generate a random number between \(L\) and \(U\), you may follow the suggestion here if the following simple way doesn't give you good result.

int r = L + myrand() * (U-L)/RAND_MAX;

4.2 Gaussian Random Number

Similarly, occasionally we may need to generate it by ourselves (e.g., on a embedded microcontroller). Box-Muller transform gives us an efficient way to generate random number with standard normal distribution from uniform distribution (Alternatively, I saw one uses central limit theorem to generate normal distribution samples by adding multiple samples from uniform distribution.). Suppose you have two independent samples from uniform distribution, then the following samples are from standard normal distribution,

$$ \begin{align} x &= \sqrt{-2\ln{u_1}}\cos(2\pi u_2)\nonumber \\ y &= \sqrt{-2\ln{u_1}}\sin(2\pi u_2), \end{align} $$

where \(u_1\), \(u_2\) are from uniform distribution between 0 and 1, that is \(u_1\sim U(0, 1)\).

To see why this is the case, let us first define

$$ \begin{align} r = \sqrt{-2\ln{u_1}}, \end{align} $$

The cumulative distribution function (CDF) of \(r\) is defined as

$$ \begin{align} P(r<R) &= P(\sqrt{-2\ln(u_1)}<R) \nonumber \\ &= P(u_1>e^{-\frac{R^2}{2}}) \nonumber\\ &= 1 - e^{-\frac{R^2}{2}}. \end{align} $$

Thus, its PDF is

$$ \begin{align} f(R) &= \frac{\partial P(r<R)}{\partial R} \nonumber \\ &= R e^{-\frac{R^2}{2}}. \end{align} $$

This is the PDF for Rayleigh distribution.

Similarly, the PDF of \(\theta = 2\pi u_2\) (\(u_2\sim U(0, 1)\)) is \(\sim U(0, 2\pi)\).

Now, \(x\), \(y\) can be defined as

$$ \begin{align} x &= r\cos(\theta),\nonumber \\ y &= r\sin(\theta). \end{align} $$

To get the joined PDF of \(x\), \(y\) (i.e., \(f(x, y)\)), first we need to calculate the Jacobian

$$ \begin{align} \frac{\partial x, y}{\partial r, \theta} = \begin{bmatrix} \cos(\theta) & -r\sin(\theta) \\ \sin(\theta) & r\cos(\theta) \\ \end{bmatrix} = r. \end{align} $$

Then, \(f(x, y)\) can be written as

$$ \begin{align} f(x, y) &= \frac{f(r, \theta)}{\frac{\partial x, y}{\partial r, \theta}} \nonumber \\ &= \frac{r}{2\pi}e^{-\frac{r^2}{2}}\times \frac{1}{r} \nonumber \\ &= \frac{1}{2\pi}e^{-\frac{x^2+y^2}{2}}. \end{align} $$

It is easy to see that \(x\), \(y\) are independent standard normal distribution.