3.4. SIGNAL-FLOW ANALYSIS
27
Based on the Figure 3.4, we can calculate that the memory requirement for
a single antenna as:
M EM = 2 · (294912 + 49152 · 3) = 884736 bytes
(3.11)
The application requires to have 4 receiver antennas. We can find that the
memory requirement for the receiver with four antennas is around 3.5 MByte
(4·884736 bytes). According to the Xilinx Virtex-6 FPGA family documen-
tation, the Virtex-6 FPGA deployed on the ML605 board - XC6VLX240T -
has maximum 1.872 MByte block ram capability which is considerably less
than the required memory for our application. This requirement adds a con-
straint of using the off-chip SDRAM to store the intermediate results of the
FFT processing.
Furthermore, it should be noted that the above mentioned requirement
can change based on the design decisions. To illustrate, if we consider having
enough time between consecutive radar scannings and consider using an in-
place computation, then the actual minimum memory requirement will be
equal to 4 · 96 · 1024 · 2 · 12 = 9437184 bits = 1.125 MByte. We can see that
it is considerably less than the memory available in the FPGA.
However, representing a 12 bit value with 12 bit fixed-point format will
not be very reliable as it does not allow any bit growth and might result
in serious errors in the calculations. Instead, a common 16 bit fixed-point
format can be used for that purpose. We can find that the memory require-
ment in this case will be equal to 4 · 96 · 1024 · 2 · 16 = 12582912 bits = 1.5
MByte. It is still less than the available on-chip FPGA memory and can
fit in it if the other hardware components require less than 0.372 MByte of
on-chip memory.
In the current case, a single-precision floating-point format was used for
the implementation. It requires each value to be represented by 32 bits thus,
the total memory requirement in this case will be equal to 4 · 96 · 1024 · 2 · 32 =
25165824 bits = 3 MByte. It is clear that these amount of data cannot fit on
on-chip FPGA memory blocks. Therefore, the implementation will require
to store the data on off-chip SDRAM.
Chapter 4
System Implementation
The previous chapter presented the analysis of the algorithm and the archi-
tectures found in the literature to implement it. This chapter will describe the
architecture that is used to implement the algorithm on the Virtex-6 FPGA
based on the given requirements. The first section describes the implemented
algorithm based on the signal processing scheme described in Chapter 2 and
the requirements found in Chapter 3. The second section presents the com-
ponents or hardware blocks required to implement the processes found in the
algorithm. Finally, the last section describes the hardware architecture that
has been used to implement the algorithm.
4.1
The algorithm
This section describes the three dimensional FFT processing algorithm on
which the signal processing is based on.
The first process in the algorithm is performing 1024 point FFT on the
time samples. In Chapter 3 we found that the storage of the intermediate
results of the FFT processing should be stored in the off-chip SDRAM. There-
fore, the output of the first FFT process must be written to the SDRAM. The
second FFT process will read the data from the SDRAM and perform the
transform. It was mentioned in Chapter 2 that this process can be thought
as a column-wise FFT of a matrix. Thus, all the 512 data samples from the
32 different chirps (rows) will be read at a different time slices. To illustrate,
first the first column of data samples will be read from 32 different chirp
outputs, second the second column of data samples will be read from the 32
different chirps and so on. This process will continue till all the 512 data sam-
ples have been read. Knowing how the modern DRAM memories function,
we observe that this is not an efficient way of addressing the SDRAM.
29
30
CHAPTER 4. SYSTEM IMPLEMENTATION
Modern SDRAM memories are usually organized in multiple banks. Each
bank has a matrix structure and consists of rows and columns. To access a
memory address for reading or writing requires to activate a row which will
read the data stored in the row to the row buffer. After activating the row
the data can be read or written based on the column addresses. After reading
or writing the data, the row will be closed and the data will be written back
to the bank. Thus, accessing the memory address requires three operations;
activating the row, doing a read or write operation and closing the row. It
is clear that it will introduce a huge overhead if the memory is addressed in
an arbitrary order.
The ML-605 board contains 512 MB DDR3 SDRAM from Micron Tech-
nology (MT4JSF6464HY-1G1B) [8]. The module has 4 chips placed on the
board each having 16 bits data output. In addition, the module is organized
in 8 internal device banks. Each bank has 8K rows and 1K columns. It is
easy to find that each row of the bank can store 8 KByte of data. If we use
single-precision floating point representation, each row of a bank will contain
a processed FFT data from a single chirp, since each complex-valued number
contains 8 Bytes and having 1024 numbers will make 8 KByte. Therefore,
the second FFT will require to open and close a row for reading of each sam-
ple which will make in total 16384 (32 · 512) requests per virtual antenna.
This process can add significant delays to the FFT processing time.
One way to overcome this overhead is to transpose the data matrix. We
can transpose the data stored in 32x1024 matrix to 1024x32 matrix form.
In this way the memory addressing will be in sequential order resulting in
less overhead in reading the data from the SDRAM. Thus, we need to have
a memory transpose process after finishing the first FFT processing of all
chirps from a given frame. After completing the transpose operation, the
second FFT can be performed on the data.
According to the requirements, we have 3 transmitting and 4 receiving
antennas making in total 12 virtual antennas. After the transpose operation
and the second FFT processing, the data will be stored in the memory as
in 12x512x32 3D matrix. The third FFT requires the data samples from
all virtual antennas. As it can be seen, these data are not located in the
consecutive memory locations and will need to open and close a row for each
read operation. As it was discussed above, this can add a big overhead. Thus,
we need to transpose the memory again make it suitable for the third FFT
processing. In this case, the transpose operation will take the 12x512x32 3D
matrix and output the 512x32x12 3D matrix. Now, the third FFT can be
performed on the data. After finishing the third FFT, the data can be stored
in the SDRAM for further processing. At this moment, the range, velocity
and the angle information can be extracted from the data.