24
CHAPTER 3. REQUIREMENTS
types of architectures for the hardware implementation of the algorithm. The
first type uses the off-chip SDRAM to store the intermediate results of the
processing. The architectures presented in [14] and [15] are based on this
type. In this type it is important to minimize the time required to open and
close a page of the SDRAM when accessing the data for the second and the
third FFT processings. The second type of architecture uses on-chip FPGA
memory blocks to store the intermediate results. This type of architecture
is more efficient and can achieve faster processing due to the fact that there
is much less overhead in accessing the intermediate data of on-chip FPGA
memory blocks rather than the SDRAM. However, it should be noted that
this architecture is limited by the amount of available on-chip memory and
will bound the number of points used for the FFT processing.
3.4
Signal-flow Analysis
In the Matlab model described in Section 3.1, we have considered only a single
radar scanning. From the provided model it is not clear if the radar will start
the consecutive scanning immediately after finishing the previous scanning or
there will be a time interval between them. Here we consider a model in which
the consecutive scannings happen without any time interval (see Figure 3.3).
Therefore, we will consider the performance of the implementation to be real-
time if it provides enough computational power to process the consecutive
radar scannings without any delays.
Figure 3.3: Radar scannings
Based on the signal processing architectures described in the previous
section and our requirements, we can construct the signal-flow graph of the
algorithm. The Figure 3.4 shows the signal-flow graph of the signal processing
algorithm for one receiving antenna. After sampling by the 40 MHz ADC
and decimation by a factor of 2, the first FFT can be performed. The first
3.4. SIGNAL-FLOW ANALYSIS
25
FFT is performed on 1024 time samples from one chirp period. To achieve
a real-time performance, the worst case computation time of the first FFT
block should be equal to the chirp time which is 35.6 µs based on our model.
It means that we can process a frame as soon as it is available, thus avoiding
any time delays. In addition, the outputs of the FFT block should be stored
in a memory for further Doppler processing. Given the worst-case execution
time (WCET) of the FFT block and the number of samples required to be
stored in the memory, we can calculate the required minimum bandwidth
from the first FFT block to the memory
B
1
=
1024
T
c
=
1024
35.6 · 10
−6
= 28.76 M S/s
(3.5)
Figure 3.4: Signal Flow Graph of 3D FFT Procesing
The ADC used for the sampling of the received signal has 12 bits of
resolution. Knowing that the output of the FFT is a complex-valued number,
we can easily calculate the minimal memory required to store the FFT data.
Our model uses 96 chirps per frame, thus our memory requirement equals to
96 · 1024 · 2 · 12 = 2359296 bits = 294912 bytes.
The second FFT block computes the column-wise FFT for each transmit-
ting antenna from the stored data. To illustrate, the first row contains the
samples from the first transmit antenna, the second row contains the sam-
ples from the second one and the third row contains the samples from the
third transmit antenna. Similarly, the fourth row will contain the samples
from the first transmit antenna too. So, FFT will be performed on samples
from the rows 1, 4, 7. . . 91, 94 for the first transmit antenna, 2, 5, 8. . . 92,
95 for the second transmit antenna and 3, 6, 9. . . 93, 96 for the third one.
Out of the 1024 columns of the matrix, it is sufficient to process the first 512
columns since the output of the real valued FFT is always symmetric and
the second half of the columns will not provide any additional information.
Given that we have 512 columns, in total 1536 (512 · 3) 32 point FFTs should
be performed for the single radar antenna image processing. We know that
26
CHAPTER 3. REQUIREMENTS
the total time for that processing is n · T c, where n is the number of chirps
and T
c
is the chirp time. Therefore, given the parameters we can find the
worst-case computation time for the second FFT:
T
2
=
n · T
c
1536
=
96 · 35.6 · 10
−6
1536
= 2.22 µs
(3.6)
All the outputs from the FFT block should be stored for the third FFT
processing. As it can be seen in Figure 3.4, there are three 2D arrays for the
single receiving antenna each of them containing 16384 (512 · 32) complex
values. We can easily calculate the memory required for each of the arrays
which equals to 32 · 512 · 2 · 12 = 393216 bits = 49152 bytes.
Given WCET of the second FFT block we can find the required minimum
bandwidth from block to the memory:
B
2
=
32
T
2
=
32
2.22 · 10
−6
= 14.4 M S/s
(3.7)
The third FFT is performed on samples from all Range-Doppler spectrum’s.
Considering the real-time constraints, the required time to complete all the
FFTs equals to n · T c. The number of points that the third FFT performs
is based on the equation provided on Matlab model:
N = 2
log
2
A∗K
(3.8)
where A is the interpolation factor for the Angle of Arrival spectrum and
K is the number of virtual antennas. Considering only a single FFT block,
worst-case execution time of the block will be:
T
3
=
96 · 35.6 · 10
−6
16384
= 0.21 µs
(3.9)
Consequently, the bandwidth can be found as:
B
3
=
N
0.21 · 10
−6
(3.10)
Additionally, we can find the memory requirements for the processing. One
thing to note is that we need double buffering to prevent the overwriting of
the data that is already in the memory. The reason is that while the second
FFT stage will be busy performing the column-wise memory reads, writing
the received new data to the same memory will cause the previous data to
be lost. Therefore, if the calculated latency and bandwidth constraints are
met, the double buffering should be sufficient for the real-time performance.