# 1 Scrambler

Scrambler is also called **randomizer**. It is usually the first block in many communication systems, which 'randomizes' the source data (e.g., the binary sequence). Why do we want to randomize the data? One example is that the randomized data can help the receiver to achieve the symbol synchronization.
For example, when you send a signal as shown in Fig. 1

in an ideal case, the receiver sees the same signals above. Then, the receiver can sample the signal at arbitrary phase (indicated by the dash lines in Fig. 2) to recover the transmitted data.

In this ideal case, the receiver can arbitrarily down-sample the received sequence to recover the transmitted bits. In practice, when sending signal as shown in Fig. 1, the received signal may look like

It is easy to see that the sampling time (or phase) indicated by red dash line is better than the blue dash line, since the signal at red sampling time is generally larger than the blue one. Thus, for the same noise, the SNR (signal noise ratio) will be higher. How could the receiver know where is the current sampling position? For example, is it close to the blue dash lines or the red dash lines? As we will shown later, one solution is to monitor the zero-crossing positions. For example, the zero-crossing position should be in the middle of two adjacent symbols, which leads to the worst sampling position. In this example, we assume the receiver knows the interval between two neighbour symbols (i.e., $T$), but not the sampling phase. The receiver can detect all the zero-crossing positions in the received signals, and adjust its sampling phase such that all the sampling positions are as far away from the zero-crossing position as possible. For example, if the receiver detect a zero-crossing position at $t_0$, since the symbol interval is $T$, then the next possible zero-crossing position will be $t_0+T$, and the sampling position will be chosen as $t_0+T/2$ (far away from its neighbour zero-crossing positions).

With that said, if the original data has a long run of `1`

or `-1`

, during that period, there will be no zero-crossing position. In other words, the receiver can not adjust its sampling phase accordingly. By applying the scrambler, we can decrease the probability of such annoying long run of `1`

or `-1`

.

Another benefit of the scrambler block is to reduce the correlation in the data, that is, to make the data more random. For example, if the data is a sine wave with frequency $f$, then the majority of its power will locate in a narrow band (around $f$). It is not an efficient use of the available signal band (usually much larger than the band occupied by such sine wave). If there is a noise near frequency $f$^{1}, the signal will be totally lost. By applying the scrambler, we can spread the signal power to all available signal band, which makes the system less sensitive to such narrow band noise. One way to think about it is that imaging you have 10 eggs, and 10 bags. If you put all 10 eggs in one bag, you lose all your eggs when that bag is broken. The scrambler will randomly distribute your eggs to all bags. In this case, even if some bag is broken, you will still have 'enough' eggs left.

## 1.1 Additive scrambler

One way to randomize the data is by adding a random binary sequence to the data. For example, for each input bit, retrieve 1 bit from the random sequence and add it to the input bit (mod 2, such that the result is still 1 bit), that is

where $x[n]$ is the input binary sequence, $s[n]$ is the random binary sequence, and $y[n]$ is the output sequence.
It is easy to see it is equivalent to the `xor`

($\oplus$) operator,

Does the above scrambler guarantee to eliminate the long run of **1** or **0**? Actually not. It is easy to find a data sequence such that the output from the scrambler is constant (e.g, 1) (do you see how?). Wait, if the scrambler does not guarantee to eliminate the long run of 1/0, why do we still use it? The answer is the probability.
For example, if we toss one fair coin (i.e., the probability of head/tail from each toss is 0.5), you will still have a chance to get a long run of heads^{2}. In a fortunate (or unfortunate) case, when a fair coin is tossed 100 times, the longest run of heads/tails may exceed 10, but the probability is small (e.g., $\sim 2.2\%$). However, if the coin is biased (e.g., head is more likely than tail for each toss), then the probability that the longest heads/tails exceeds 10 will be much higher. What the scrambler does is to make the biased coin (input to the scrambler block) unbiased (output of the scrambler block)^{3}

Let's see how it works. Assume at time $n$, for a biased coin $x$, the probability to get head is $p_x[n]$. Then the probability to get tail is $1-p_x[n]$. And now we choose another fair coin $s$, that is, $p_s[n] = 0.5$. At time $n$, we toss both coins, and

- if both coins are heads or tails, output tail;
- if one is head and the other is tail, output head.

What's the probability to get head as output? It is easy to see that

Wow, it is a fair coin! The nice thing is that it does not make any assumption about the probability of the biased coin ($p_x[n]$). It does not matter whether each toss is independent (but biased), or correlated (e.g., the probability of the current toss may be affected by the previous toss.), the output will be same as the one from a fair coin.

The next question is how to de-scramble the randomized data? It does not make any sense if you can not recover the original data from the randomized out. Fortunately, it turns out the de-scrambling is very easy, which can be achieved by applying the same scrambler again. That is, at the receiver side, the data can be retrieved by^{4}

The only thing left is how to create a random sequence ($s[n]$) at both the transmitter and receiver side. In practice, pseudo random binary sequence (PRBS) is widely used in many systems to generate a 'random' sequence. PRBS can be generated in advance and stored in a lookup table. Or, it can be generated by a linear feedback shift register (LFSR). Fig. 4 shows a LFSR corresponding to generator polynomial $1+x^6+x^7$, and its current status is $b[1]\sim b[7]$. It is easy to see that given the generator polynomial and the status, its output is fully determined.

How long should we choose the length of the PRBS? Apparently, if it is too short, then the PRBS itself is not random enough to randomize the input. In an extreme case, if the length of the PRBS is 1, then $s[n]==1$. In this case, $y[n]=x[n]\oplus 1 = 1-x[n]$. The scrambler has no effect on the data at all. The one used in 802.11a standard is 127 bits with generator polynomial $1+x^4+x^7$. The naive LFSR can be implemented in python as

def gen_prbs(p, s0, n): """ naive code to generate the pseudo-random binary sequence input: p: the generator polynomials, for example x^7 + x^6 + 1: n.array([0, 0, 0, 0, 0, 1, 1]) s0: the initial state, same size as p return: prbs: the prbs array with length n s: the current state of the LFSR """ prbs = np.zeros((n,)) s = s0 for i in range(n): a = np.mod(np.sum(p*s), 2) s = np.roll(s, 1) s[0] = a prbs[i] = a return (prbs, s)

The output of the LFSR is fully determined by the generator polynomial and the initial status. To generate the proper PRBS, the receiver needs to know both. The generator polynomial is usually not a problem, since it is generally determined by the standard and both the transmitter and receiver know in advance. Potentially, the same way can be used for the receiver to determine the initial status, for example, by predefining a initial status. However, such method may not fully utilized the PRBS. Should the transmitter 'randomly' choose the initial phase, the PRBS would less likely be correlated to the input sequence. In this case, some way should be developed to synchronize the initial status between the transmitter and the receiver. For many communication systems, besides the data, the transmitter will also send additional information to the receiver, e.g., the data length, status, etc. Thus, initial status may be embedded in such field, so that the receiver can retrieve it before receiving the data block. For example, as shown in Eq. (\ref{eqn:scrambler_encode}), if $x[n]=0$, the output $y[n]$ will be equal to $s[n]$,

Thus, if we prepend enough zeros to the data bits, we can easily send the initial status to the receiver. For example, for the PRBS shown in Fig. 4, we may prepend 7 zeros, so the data ($x^\prime[n]$) may look like

The LFSR is initialized with arbitrary initial status as shown in Fig. 5, where $X$ in the block indicated either 1 or 0.

At time $n=0$, $s[0]$ is retrieved from the LFSR, and $y[0] = 0 \oplus s[0] = s[0]$. $s[0]$ is also sent to $b[1]$ (i.e., $b[1] = s[0]$). The status of LFSR is shown in Fig. 6.

Similarly, at time $n=1$ (Fig. 7), LFSR outputs $s[1]$, and $y[1] = 0 \oplus s[1] = s[1]$. Simultaneously, $s[1]$ is sent to $b[1]$ ($b[1] = s[0]$), and the original value in $b[1]$ is shifted to $b[2]$ (i.e., $b[2]=s[0]$).

Following this procedure, at time $n=6$, the status of LFSR will look like

So far, we have output 7 bits (i.e., $y[0]\sim y[6]$), which are same as the ones in LFSR buffer.

Then, at time $n=7$, when the actual data bit comes ($x^\prime[7]=x[0]$), the status of the LFSR is as shown in Fig. 8. In other words, the initial status of the LFSR for the data bits is $s[0]\sim s[6]$ (or $y[0]\sim y[6]$). Thus, at the receiver side, it can simply use the first 7 received bits as the initial status of its LFSR to descramble the following data bits.

One nice thing to note about Eq. (\ref{eqn:additive_descramble}) is that the descrambler output $x[n]$ only relies on the current input $y[n]$, not any other inputs. Thus, if there is some error in $y[n]$, the error is localized, which will only affect the current output $x[n]$.

The implementation is straightforward. The naive code may look like

def scramble_add(x, p, s0): """ randomize the data with additive scrambler input: x: input data, 1-dim numpy array p: the generator polynomials s0: the initial state of the LFSR return: y: the randomized data (same length as x) s: the current state of the LFSR """ # generate the PRBS with same length as x prbs, s = gen_prbs(p, s0, x.shape[0]) # add (mod 2) the PRBS to the data y = np.mod(x+prbs, 2) return (y, s)

Let's play with it to see how it works

>>> # generate binary data sequence >>> x = (np.random.rand(100)+0.5).astype(int) >>> # define the generator polynomial >>> p = np.array([0, 0, 0, 0, 0, 1, 1]) >>> # and the initial state of the LSFR >>> s0 = np.array([0, 0, 0, 0, 0, 0, 1]) >>> # randomize the data >>> y, s = scramble_add(x, p, s0)

We can do a sanity check by scrambling output $y$ with the same setup, which should give us the original data $x$

>>> np.sum(scramble_add(y, p, s0)[0]-x) 0.0

## 1.2 Multiplicative scrambler

As shown above, the additive scrambler generally needs to send the initial status to the receiver before it can de-scramble the data. If the initial status is not received correctly, the de-scrambling will be totally wrong. The multiplicative scrambler eliminates such requirement. It is achieved by 'randomizing' the data by itself only. Thus, it is also called self-synchronizing scrambler. For example, one such scrambler may look like

Its structure may look like

Eq. (\ref{eqn:scramble_mul}) shows that the multiplicative scrambler may be implemented by an IIR filter. The only difference between the classical IIR filter and Eq. (\ref{eqn:scramble_mul}) is that here all the additions are mod 2. Could we implement the above scrambler by connecting a conventional IIR filter with a mod 2 block in serial?

And the de-scrambler is straightforward. For the one shown in Eq. (\ref{eqn:scramble_mul}), the corresponding de-scrambler is

Since with mod 2 addition, subtraction is same as addition, Eq. (\ref{eqn:scramble_mul_de}) can be written as

And its structure may look like

As we mentioned above, the error in the additive scrambler is localized. However, it is not true for the multiplicative scrambler. Eq. (\ref{eqn:scramble_mul_de2}) shows that the current output $x[n]$ relies both on the current input $y[n]$ and the previous inputs $y[n-14]$, $y[n-17]$. Thus, if there is any errors in all this received data, the current output $x[n]$ may be wrong^{5}. This is called error propagation.

Different from the additive scrambler, the multiplicative scrambler only uses the data to randomize. If the bits from the input data are highly correlate to each other, the output may not be well randomized. In practice, you may need to detect the long sequence of 1 or 0 at the output, and use some heuristics to break it (e.g., invert the next input bit).

The multiplicative scrambler may be implemented as

def scramble_mul(x, p, z0=None, mode='normal'): """ randomize the data with multiplicative scrambler input: x: input data, 1-dim numpy array p: the generator polynomial z0: the initial buffer status mode: 'normal' --> scramble 'inverse' --> descramble return: y: the randomized data (same length as x) z: the buffer status """ if z0: z = z0 else: z = np.zeros(p.shape) y = np.zeros(x.shape) if mode == 'normal': for i in range(x.shape[0]): y[i] = np.mod(np.sum(p*z)+x[i], 2) z = np.roll(z, 1) z[0] = y[i] elif mode == 'inverse': for i in range(x.shape[0]): y[i] = np.mod(np.sum(p*z)+x[i], 2) z = np.roll(z, 1) z[0] = x[i] return (y, z)

Let's play with it to see how it works. After scrambling and descrambling, we should get the original data.

>>> # generate binary data sequence >>> x = (np.random.rand(100)+0.5).astype(int) >>> # define the generator polynomial >>> p = np.array([0]*13 + [1] + [0]*2 + [1]) >>> y, s = scramble_mul(x, p) >>> np.sum(scramble_mul(y, p, mode='inverse')[0]-x) 0.0

# 2 Interleaver

In practice, the communication system usually contains some kind of error correction mechanism (e.g., forward error correction (FEC)). So that if some errors occur in the received data, they can be detected or corrected. Generally such error correction method can only deal with certain amount of error bits within a window (e.g., every $M$ bits). For example, every time, the error correction block will take $M$ bits, then detect or correct the error bits. However, if the total number of errors exceed the limit, it will fail to correct/detect the error.

Considering two cases, where the total number of bit errors are same (e.g., $N_e$),

- in one case, such $N_e$ bits errors are uniformly distributed along the whole window (e.g., $N$),
- in the other case, such $N_e$ bits errors are concentrated in one error correction block window,

Which case is easy to deal with? For the first case, the error correction block needs to correct roughly $\frac{M}{N}N_e$ errors. However, for the second case, the error correction block needs to correct $N_e$ errors, which is apparently a much harder problem.

Interleaver is used to deal with the burst errors: we can not stop the burst errors, but we can spread the burst errors to a much wilder window, such that in each error correction block window, the number of error bits may be still within the limit.

## 2.1 Block interleaver

Block interleaver can be easily understood with an example. As shown in Fig. 11, a $3\times 4$ block interleaver can be viewed as a $3\times 4$ matrix .

For example, when the input ($x[0]\sim x[11]$) comes, it will fill the 1st row first, thus, the first row will contain $x[0]$, $x[1]$, $x[2]$, and $x[3]$. Then the 2nd row, so on and so forth. At time $n=9$, the buffer will look like (Fig. 12)

Once the matrix is filled, the first column will be output first, that is, $x[0]$, $x[3]$, $x[6]$. Then the 2nd and 3rd columns. In summary, the output will be

The deinterleaving is straightforward: just swap the input and output order. In particular, for the example in Fig. 11, when the input ($y[0]\sim y[11]$) comes, it will fill the 1st column first, then the 2nd column...

It is easy to see in general, the interleaver/deinterleaver will not be able to output any data until its internal 'matrix' is filled, which introduces a delay. The delay is proportional to the interleaver size ($N$). The interleaver/deinterleaver also needs to have enough memory to hold all $N$ samples.

Now, if there is a 3 bits burst error, e.g., $y[1]$, $y[2]$, $y[3]$

that is

where the error bits are indicated by red color. After deinterleaving, the data will look like

Thus, the burst errors are spread out across the whole interleaver block.

TODO: real interleaver example

The interleaving can be implemented in python as

def interleaver_block(x, n): """ block interleaver input: x: 1-dim numpy array n: the dimension of the interleaver, the length of x should be integral multiples of n return: y: the interleaved data """ y = np.reshape(x, (n, -1)).T.flatten() return y

Same code can also be used for de-interleaving. In particular, if the interleaver size is $N$ ($= R\times C$), the interleaving can be done by

>>> # assume len(x) = C*R >>> y = interleaver_block(x, C)

then the de-interleaving can be done by

>>> y = interleaver_block(x, R)

In ideal case, after interleaving and deinterleaving, you should get the original data bits

>>> x = np.arange(1000) >>> # interleaver >>> y = interleaver_block(x, 20) >>> # deinterleaver >>> xd = interleaver_block(y, 50) >>> np.allclose(x, xd) True

## 2.2 Convolutional interleaver

Convolutional interleaver is also called multiplexed interleaver. Its main structure is shown in Fig. 13.

For example, at time $n=0$, both **input** ($x[0]$) and **output** ($y[0]$) are connected to the first row. And since there is no shift register in first row, $y[0] = x[0]$. At time $n=1$, both **input** ($x[1]$) and **output** ($y[1]$) are connected to the second row. There is a shift register $s_{10}$ in second row. In this case, $s_{10}$ is output as $y[1]$, while $x[1]$ is shifted into $s_{10}$, that is

Similarly, at $n=2$, both the **input** and **output** are connected to the third row.

So for each input sample, $x[n]$ is shifted into the shift register in current row, and the sample shifted out of the registers will be sent to the output. After that, both the input and output will move to the next row until the last row, when they will move to the first row again.

`Reset`

button to reset the demo, and elsewhere to get the next input.
The de-interleaving (e.g., Fig. 14) is almost identical to the interleaving operation. The only difference is that the shift registers are reversed in row order. For example, in the de-interleaver, the last row will be the direct row (without any shift register).

`Reset`

button to reset the demo, and elsewhere to get the next input.
The input is the output from the above convolutional interleaver output.
To see why the de-interleaver works, we can count the total delay of a particular input, e.g., $x[1]$. At the interleaver $x[1]$ is connected to the second row, where there is 1 register. Then, at the de-interleaver, $x[1]$ is also connected to the second row (why?), where there are 3 registers. The total delay is $(1+3)*5=20$ samples. It is easy to see that it is true to all inputs. In other words, after the interleaver and de-interleaver processes, each samples will be delayed by 20 samples. So their relative positions do not change. In general, the total delay can be calculated as

where $R$ is the number of rows in interleaver or de-interleaver (e.g., 5 in Fig. 14), and $S_R$ is the number of shift register in row $R$ (e.g., 4 in Fig. 14).

def interleaver_conv(x, nrows, slope, state=None, mode='normal'): """ convolutional interleaver input: x: 1-dim numpy arry nrows: the number of rows slope: the ith row will have i*slope registers state: a dict 'value': the initial value of the internal registers 'index': the initial index mode: 'normal': interleaver 'deintlv': de-interleaver """ if state == None: # the only difference between the interleaver and de-interleaver is the # order of the shift registers. if mode == 'normal': value = [np.zeros(n*slope) for n in range(nrows)] elif mode == 'deintlv': value = [np.zeros(n*slope) for n in reversed(list(range(nrows)))] index = 0 else: value = state['value'] index = state['index'] y = np.zeros(x.shape) for r in range(nrows): # process the data in each row n = np.mod(r + index, nrows) xn = np.hstack([value[n], x[n::nrows]]) y[n::nrows] = xn[:y[n::nrows].shape[0]] value[n] = xn[y[n::nrows].shape[0]:] index = np.mod(x.shape[0] + index, nrows) state = {'value': value, 'index':index} return (y, state)

And we can pass the data through a interleaver and a de-interleaver to get the original data (except the delay)

>>> x = np.arange(1000) >>> # interleaver >>> y, istate = interleaver_conv(x, 3, 4) >>> # de-interleaver >>> xd, dstate = interleaver_conv(y, 3, 4, mode='deintlv') >>> # calculate the delay from the interleaver and de-interleaver, which is >>> # equal to the total number of the shift registers >>> delay = int(2*4*(0+3-1)*3/2) >>> np.allclose(x[:-delay], xd[delay:]) True

# 3 Mapping

The purpose of the transmitter is to modulate the data bits to the carrier signal by varying some of its properties (e.g., amplitude, phase or frequency). Mapping is to convert the data bits to symbols (or points) on the constellation, such that it is easy to modulate the carrier.

Suppose the following mapping is used to send the data bits, that is, every 2 data bits is mapped to a point on the constellation (Fig. 15).

It is equivalent to

Data bits | Point on constellation |
---|---|

b'00 | 0 |

b'01 | 1 |

b'10 | 2 |

b'11 | 3 |

And further suppose we have an 'ideal' channel, that is, the signal is only impacted by an additive noise

where $r$ and $s$ the received and transmitted signal, respectively, $n$ is a additive white Gaussian noise (AWGN $\sim\mathcal{N}(0, \sigma)$. In this example, the signal $s$ will be either 0, 1, 2, or 3, which is determined by the data bits.

What's the problem of the above mapping scheme?

If we want to send signal 2 (e.g., $s = 2$ or data bits b'10 ), what's the probability of receiving a signal $r=1$ at the receiver, compared to the probability of receiving a signal with amplitude 0? It is easy to see that the former should be much higher. In particular, receiving the signal with smaller amplitude (e.g., $r= 0$), which implies that signal is impacted by a larger noise (e.g., $n=-2$), is less likely to happen.

For example, if the transmitted signal is **2** (i.e., bits b'10), the number of bit errors vs the received signal will be

received signal (r) | bits after de-mapping | # of bit errors |
---|---|---|

0 | b'00 | 1 |

1 | b'01 | 2 |

2 | b'10 | 0 |

3 | b'11 | 1 |

Thus, if the received signal is **1** (i.e., the noise is **-1**), there will be 2 bit errors (e.g., b'10 $\Rightarrow$ b'01). However, if the received signal is **0**, which is resulted from a much larger noise (i.e., -2), there will be only 1 bit error (i.e., b'10 $\Rightarrow$ b'00 ). That's not intuitive, and does not look like a good idea^{6}. It's the main reason to use Gray code. As to be shown below, Gray code guarantees to have only 1 bit difference for adjacent codes.

There is an easy way to generate a Gray code from the binary coded decimal (BCD). Let $g$ be the corresponding Gray code of N bits BCD $b$, then one corresponding Gray code can be generated by (Fig. 16)

where $g_i$ and $b_i$ are $i^{th}$ bit of Gray and BCD codes, respectively.

For the above example, its corresponding Gray code is

BCD | Gray |
---|---|

b'00 | b'00 |

b'01 | b'01 |

b'10 | b'11 |

b'11 | b'10 |

Fig. 17 shows the constellation with Gray code, where $x$-axis is the BCD code and text shows its corresponding Gray code.

You can see that any adjacent Gray codes only differ in one bit. How does BCD to Gray mapping work? First, it is easy to see that the mapping is one to one

- apparently, one BCD code $b$ will only generate one Gray code $g$;
- if $b1\neq b2$, then $g1\neq g2$. For example, let $i$ be the first position (when scanning from MSB) such that $b1_i\neq b2_i$. Then, from Eq. (\ref{eqn:bcd2gray}), $g1_i \neq g2_i$.

Next, we want to show that the there is only one bit difference between the corresponding Gray codes of two adjacent BCD codes. Let $j$ be the first position (when scanning from LSB) of BCD $b$, such that $b[i]==0$. For example, $b$ can be written in binary form as

Then, the next BCD code ($b+1$) can be written as

It is easy to see that their corresponding Gray code only differ in one position, that is, at $j^{th}$ position.

Eq. (\ref{eqn:bcd2gray}) also tells us how to convert the Gray code back to BCD code

The procedure is shown in Fig. 18.

So far, we talk about how to generate the Gray code from BCD, which is generally one-dimensional. However, most QAM (quadrature amplitude modulation) constellation will be 2-dimensional. How could we generate the 2-dimensional Gray code? It turns out that if the bits number of each symbol is even, there is an easy way:

split the source bits (e.g., 2N bits) into two groups evenly (i.e., each with N bits). For example, if the input is 4 bits, then each group will have 2 bits. For example,

- group 1: $b_3$, $b_2$;
- group 2: $b_1$, $b_0$.

Thus, in this case, 4-bits input b'**1011**will be split into:- group 1: b'10
- group 2: b'11

Basic, what we did here is to split the source bits in to two independent groups. It makes sense since the $x$- and $y$- axis on the 2-dimensional constellation are orthogonal to each other.

Apply the "Gray to BCD" mapping (Eq. (\ref{eqn:gray2bcd})) for each group. For the above example, the corresponding Gray code for each group is

Gray BCD group 1 b'10 b'11 group 2 b'11 b'10 Why use the Gray to BCD mapping, instead of the BCD to Gray mapping? For each point on the constellation, its coordinates can be viewed as BCD code, while the data bits it represents are Gray code. In this case, we have the data bits (Gray code), and need to find its location on the constellation (BCD code) with the Gray to BCD mapping. Thus, b'1011 will be sent as position (3(b'11), 2(b'10)) on the constellation (Fig. 19), and adjacent points only have 1 bit difference.

def bcd2gray(b): """ get the corresponding Gray from the BCD code input b: BCD code return: its Gray BCD """ c = np.roll(b, 1) c[0] = 0 return np.logical_xor(b, c).astype(int) def gray2bcd(G): """ get the corresponding BCD from the Gray code input G: Gray code return: its corresponding BCD """ b = np.copy(G) for i in range(1, b.shape[0]): b[i] = np.logical_xor(b[i-1], G[i]) return b.astype(int) def gen_gray_table(n): """ generate 2-d Gray constellation table input n: the number of bits return g: the mapping from Gray code (source bits) to BCD (symbol (x, y) on the constellation), where ith row is for BCD(i) """ # generate the binary representation of BCD power_of_two = 2**np.arange(n-1,-1,-1) t = (np.arange(2**n)[:, np.newaxis] & power_of_two) / power_of_two # split into two groups g1 = t[:, 0:n/2] g2 = t[:, n/2:] g1 = np.apply_along_axis(gray2bcd, axis=1, arr=g1) g2 = np.apply_along_axis(gray2bcd, axis=1, arr=g2) g1 = np.dot(g1, 2**np.arange(n/2-1, -1, -1)) g2 = np.dot(g2, 2**np.arange(n-n/2-1, -1, -1)) g = np.stack([g1, g2]).T return g