4.2. THE HARDWARE COMPONENTS
31
To summarize, we have seen that the algorithm consists of multiple FFT
and transpose operations.
The whole process can be described with the
following algorithmic flowchart.
Figure 4.1: Signal processing algorithm flowchart
4.2
The hardware components
This section describes the hardware components used in the architecture
implementation. A brief description and function of all the components have
been provided.
4.2.1
FFT Core
It can be seen from the algorithm that the FFT is the major process in
the realization of the algorithm.
In order to reduce the implementation
time, the FFT algorithm is implemented using Xilinx LogiCORE IP Fast
Fourier Transform v8.0 [18]. The IP core implements the Cooley-Tukey FFT
algorithm for the transform sizes of N = 2
m
where m ranges between 3 and
32
CHAPTER 4. SYSTEM IMPLEMENTATION
16. The core supports processing with fixed-point data ranging from 8 to
34 bits as well as single-precision floating point data. In the latter case,
the input data is a vector of N complex values represented as dual 32-bit
floating-point numbers with a phase factors represented as 24 or 25-bit fixed
point numbers.
The FFT core provides four architecture options;
• Pipelined Streaming I/O
• Radix-4 Burst I/O
• Radix-2 Burst I/O
• Radix-2 Lite Burst I/O
The pipelined streaming architecture pipelines several Radix-2 butterfly
processing engines to allow continuous data processing.
Each processing
engine has its own dedicated memory banks which are used to store the
input and intermediate data. This allows the core to simultaneously perform
a transform on the current frame of data, load input data for the next frame
of data and unload the results of the previous frame of data.
For the current implementation, the pipelined streaming architecture was
chosen for the two main reasons. First, the pipelining allows the FFT block
to receive the data while it is processing the data from the previous frame.
This is convenient for the first FFT processing in our application, since it
eliminates the need for buffering of the incoming data and allows the data
immediately to be received by the FFT block. Second, the processing latency
of the pipelined streaming architecture is much less than the latency of the
burst-architectures and meets the latency constraints found in Section 3.4.
The FFT IP core is compliant with the AXI4-Stream interface. All in-
puts and outputs to the FFT core use the AXI4-Stream protocol. Since the
FFT core needs to access to the main memory to read a data, we need an
additional hardware block which can access the memory and translate the
AXI4-Memory Mapped (AXI4-MM) transactions to AXI4-Stream (AXI4-S)
transfers and vice versa. This is achieved by using LogiCORE IP AXI DMA
core [19] of Xilinx.
4.2.2
AXI DMA Core
The AXI DMA engine supports high-bandwidth direct memory access be-
tween memory and AXI-Stream peripherals. The data movement is achieved
through two data channels; Memory-Map to Stream (MM2S) channel and
4.3. THE ARCHITECTURE AND OPERATION
33
Stream to Memory-Map (S2MM) channel. Reading a data from the mem-
ory is accomplished by AXI4 Memory Map Read Master interface and AXI
MM2S Stream Master interface. On the other hand, writing a data to the
memory is achieved through AXI S2MM Stream Slave interface and AXI4
Memory Map Write Master interface. The core also has an AXI4-Lite slave
interface which is used to access the registers and control the DMA engine.
The DMA core allows maximum 8 MByte of data to be transferred be-
tween a memory and a stream peripheral per transaction.
According to
the documentation [19], the core can achieve high throughput in transfers,
namely; 399.04 MByte/s in MM2S channel and 298.59 MByte/s in S2MM
channel.
4.2.3
Memory Interface Core
To access an off-chip memory from an FPGA a memory controller is required.
Xilinx provides a memory interface core [20] to interface the FPGA designs to
DDR3 SDRAM devices. The core handles the memory requests from hard-
ware blocks such as AXI DMA and translates them to SDRAM commands.
It allows the data movement between FPGA user designs and the external
memory. In addition, the core also manages the refresh operation of the
memory.
4.2.4
Microblaze Core
The information about the Microblaze core was provided in Chapter 1. The
design uses a single Microblaze core to generate the input data for the algo-
rithm, to configure the AXI DMA blocks for data transfers, to transpose the
memory, to measure the time required for each process and to extract the
range, velocity and the angle information form the frequency spectrum data.
4.3
The architecture and operation
The hardware components of the architecture were described in the previous
section. This section describes how the components are interconnected to
each other and how the architecture functions.
It was mentioned in the previous chapters that the RF front of the design
has four receiving antennas. By using an FPGA for a signal processing we
can achieve a parallel processing of the received signals from all receiving
antennas. However, since the RF front end of the design is not yet ready,
the architecture also includes an input signal generation as part of it.