Bus Master DMA implementation on Xilinx FPGAs


The Xilinx Integrated Block for PCI Express IP core for Ultrascale and 7- series FPGAs provides a PIO PCIe example design which is a complete application for PCI Express. In this design, the FPGA responds to read and write requests to its memory space. PIO stands for Programmed Input/Output, which is a protocol for data transfer. It involves the CPU, so the use of PIO mode for data transfer slows the CPU down.

The term Bus Master, used in the context of PCI Express, indicates the ability of a PCIe port to initiate PCIe transactions, typically memory read and write transactions. The most common application for Bus Mastering Endpoints is for DMA (Direct Memory Access). DMA is a technique used for the efficient transfer of data to and from host CPU system memory. DMA doesn’t involve the CPU, the PCIe endpoint moves data directly to/from host memory, bypassing the CPU.

DMA implementations have many advantages over standard programmed input/output (PIO) data transfers. PIO data transfers are executed directly by the CPU and are typically limited to one or two DWORDs (of 32/64 bits) at a time. For large data transfers, DMA implementations result in higher data throughput because the DMA hardware engine transmits data in burst mode, so the number of TLP* headers is reduced, reducing the bus overhead. In addition, the DMA engine offloads the CPU from directly transferring the data, resulting in better overall system performance through lower CPU utilization.

A BMDMA implementation is by far the most common type of DMA found in systems based on PCIe, it is used by GPUs and other devices when working with video transfers. BMD implementations reside within the endpoint device (FPGA in our case) and are called Bus Masters because they initiate the movement of data to (Memory Writes) and from (Memory Reads) system memory. To implement a BMDMA design we need a driver in the root complex side (host) that grants access permission to a memory window in which the buffers to be used are allocated. In this way, the FPGA can directly access the host RAM through the PCIe controller first, and then through the host memory controller. The driver must also indicate through a configuration register (within the host) that the FPGA is a bus master device to allow it to access its memory. Otherwise, the FPGA requests will be considered Unsupported Requests. The FPGA should have a configuration register (within the BAR) indicating the beginning of the memory window and its size. In addition, there should be a register where the host will indicate to the FPGA that it can start transferring data once its memory buffers have been properly allocated. The FPGA can tell the host that a transfer has been completed through an interrupt or a polling-accessible register. This depends on the application and the communication protocol that is established between FPGA and its driver

AI VIDEX has designed and implemented several BMDMA PCIe designs for hybrid platforms (FPGA-GPU and FPGA-uP/ARM) working with SDI I/O interfaces. AI VIDEX will provide you with all you need to get started with confidence.  We can help you through our «PCIe for Xilinx FPGAs» course or implement the complete solution according to your specifications.

* Transaction Layer Packet (TLP): A PCI Express system transfers data in the payload of TLPs. Therefore, a communication package in the PCIe specification is called TLP.

Figure 1. BMDMA design.

Deja un comentario