Introduction:
PCI Express is a serial expansion bus standard operating at multi-gigabit data rates. It is the third generation, high performance I/O bus which is used for interconnecting peripheral devices. PCI Express provides a higher rate of data transfer and lower latency compared to the older PCI and PCI-X technologies which implemented parallel I/O buses. Each device that is connected to the motherboard via PCI Express link has a dedicated point-to-point connection and as it is not sharing the same bus, it doesn’t have to compete for bandwidth. PCI Express is based on the point-to-point topology where dedicated serial links are connecting every device to the root complex. This article implements a simple design to demonstrate how to write and read data to Aller AU-Plus FPGA Board with M.2 Interface which acts as a PCI Express endpoint device. Let us get started!
Hardware required:
- Host PC with Linux or Windows (Linux preferred)
- Aller AU-Plus FPGA Board with M.2 Interface
- Aller AU-Plus Carrier
- Xilinx Platform Cable USB II (JTAG cable)
- USB Type C cable
Software required:
- Xilinx Vivado Design Suite 2024.1
- RW-Everything (for Windows host)
Step 1:
Download and install Vivado Board Support Package files for Aller AU-Plus from here. Follow the README.md file on how to install Vivado Board Support Package files for Numato Lab’s boards.
Step 2:
Start Vivado Design Suite, and select “Create New Project” from Quick Start section. The project wizard will pop up. Press next to proceed with creating the project.
Step 3:
In the Project Name tab, give the appropriate project name and location and click “Next”.
Step 4:
In the “Project Type” window, select “RTL Project” and select the option “Do not specify sources at this time”. Click “Next”.
Step 5:
In the “Default Part” wizard, select Boards and choose “numato.com” as Vendor. Select “Aller_AU_Plus” and click “Next”. If Aller_AU_Plus is not listed, make sure board support files are installed correctly. Click “Finish” in the next wizard. A new Vivado project will open with the selected settings.
Step 6:
Under Flow Navigator, select “Create Block Design” in IP Integrator. Give a name to the block design. This article used “PCIe” for the block design name. You can use the name as per your preference.
Step 7:
Go to the Diagram window and click “Add IP” from the toolbar as shown in the image below. Type “PCIe” in the search box and double click “Ultrascale+ Integrated Block (PCIE4C for PCI Express” IP to customize it.
Step 8:
Double click on pcie4c_uscale_plus_0 IP, this will opens the “Re-customize IP” window. In the Basic tab, Select “Lane Width” as “X4”, “Maximum Link Speed” as “5.0 GT/s”. Leave the other tabs in their default state and click “OK”.
Step 9:
Save the design and then right-click on the pcie4c_uscale_plus_0 IP block in the block diagram and select “Open Example IP design”. Enter a location as to where the example project has to be created and then click “OK”.
It will open an example project for the PCI Express Endpoint Device as per the customized IP settings. This tutorial uses this generated example project by Xilinx. It already has RTL logic enabling users to write data to FPGA and read back from it via PCI Express.
Step 10:
In the pcie4c_uscale_plus_0 IP example design, there is a user_lnk_up logic to indicate that the PCIe link between the host PC and the FPGA is ready to exchange the data when we connect the FPGA board to the PCIe slot of the motherboard. Aller board features an RGB led. Hence, connect the user_lnk_up logic output to the blue led and connect the complement output of user_lnk_up to the green led. The blue led will glow when the PCIe link is ready to exchange the data and the green led will glow when the PCIe link is not ready or when the host PC and FPGA are attempting to establish communication or when communication with the FPGA is lost due to errors on the transmission channel. So by connecting the user_lnk_up to these LEDs, we can observe that the PCIe link between the host PC and the FPGA is ready to exchange the data or not. Also, add the counter to check whether the PCIe clock is working or not, and assign the counter output to red led. So that we can observe the RGB led blinking when the board is detected by the host PC.
To add this logic in the example design, open the Verilog file from the Design Sources category in the “Sources”.
Declare the RGB led ports in the Ports declaration of the Top module(xilinx_pcie4_uscale_ep.v).
And under AXI Interface, declare the ‘counter’ variable as register and assign the ‘user_lnk_up’, ‘compliment of user_lnk_up’ and ‘counter’ output to RGB led.
And then add this counter implementation after the I/O BUFFERS instantiation.
Step 11:
Open the .xdc file from the “Constraints” category under the Sources “Hierarchy” tab.
Step 12:
Select everything and delete it. Simply copy the following constraints in that file.
create_clock -period 10.000 -name sys_clk [get_ports sys_clk_p] set_property PACKAGE_PIN AA18 [get_ports sys_clk_n] set_property PACKAGE_PIN AA17 [get_ports sys_clk_p] set_false_path -from [get_ports sys_rst_n] set_property PULLTYPE PULLUP [get_ports sys_rst_n] set_property IOSTANDARD LVCMOS18 [get_ports sys_rst_n] set_property PACKAGE_PIN AA6 [get_ports sys_rst_n] set_clock_groups -name async18 -asynchronous -group [get_clocks sys_clk] -group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[0].*gen_gthe4_channel_inst[3].GTHE4_CHANNEL_PRIM_INST/TXOUTCLK}]] set_clock_groups -name async24 -asynchronous -group [get_clocks -of_objects [get_pins pcie4c_uscale_plus_0_i/inst/pcie4c_uscale_plus_0_gt_top_i/diablo_gt.diablo_gt_phy_wrapper/phy_clk_i/bufg_gt_intclk/O]] -group [get_clocks sys_clk] create_waiver -type DRC -id {REQP-1839} -user "pcie4c_uscale_plus" -desc "DRC expects synchronous pins to be provided to BRAM inputs. Since synchronization is present one stage before, it is safe to ignore" -tags "1166844" -scope -internal -objects [get_cells -hierarchical -filter {NAME =~ {pcie_app_uscale_i/PIO_i/pio_ep/ep_mem/ep_xpm_sdpram/*mem_reg_bram_0}}] -timestamp "Tue Feb 11 04:45:17 GMT 2025" create_waiver -type DRC -id {REQP-1840} -user "pcie4c_uscale_plus" -desc "DRC expects synchronous pins to be provided to BRAM inputs. Since synchronization is present one stage before, it is safe to ignore" -tags "1166844" -scope -internal -objects [get_cells -hierarchical -filter {NAME =~ {pcie_app_uscale_i/PIO_i/pio_ep/ep_mem/ep_xpm_sdpram/*mem_reg_bram_0}}] -timestamp "Tue Feb 11 04:45:17 GMT 2025" #set_property LOC GTHE4_CHANNEL_X0Y3 [get_cells {pcie4c_uscale_plus_0_i/inst/pcie4c_uscale_plus_0_gt_top_i/diablo_gt.diablo_gt_phy_wrapper/gt_wizard.gtwizard_top_i/pcie4c_uscale_plus_0_gt_i/inst/gen_gtwizard_gthe4_top.pcie4c_uscale_plus_0_gt_gtwizard_gthe4_inst/gen_gtwizard_gthe4.gen_channel_container[0].gen_enabled_channel.gthe4_channel_wrapper_inst/channel_inst/gthe4_channel_gen.gen_gthe4_channel_inst[0].GTHE4_CHANNEL_PRIM_INST}] set_property PACKAGE_PIN AA11 [get_ports led_red] set_property PACKAGE_PIN AB10 [get_ports led_green] set_property PACKAGE_PIN AA10 [get_ports led_blue] set_property IOSTANDARD LVCMOS18 [get_ports led_blue] set_property IOSTANDARD LVCMOS18 [get_ports led_green] set_property IOSTANDARD LVCMOS18 [get_ports led_red] set_property BITSTREAM.CONFIG.CONFIGRATE 10.6 [current_design] set_property BITSTREAM.GENERAL.COMPRESS TRUE [current_design] set_property BITSTREAM.CONFIG.SPI_BUSWIDTH 4 [current_design]
Step 13:
Click on “Run Synthesis” and once its complete, Open the synthesized design. Click on “Windows” and click on “IO ports” change the PCIe constraints correctly, save the constraints and “Run Implementation”.
Step 14:
Right-Click on “Generate Bitstream” under the PROGRAM AND DEBUG section of Flow Navigator. and select “Bitstream Settings”.
In “Bitstream Settings”, select “-bin_file” and click “OK”.
Step 15:
Now, in the Project Manager Tab, click on “Generate Bitstream” under the “PROGRAM AND DEBUG” section to synthesis, implement and to generate the bitstream.
Step 16:
After Successful generation of the bitstream, Open Hardware Manager.
Step 17:
Make sure Aller Board is either connected to USB-JTAG or JTAG and power supply. Click on ‘Open target’ and ‘Auto Connect’. Vivado Hardware Manager will connect to Aller.
Step 18:
Right-click on the device and select ‘Add Configuration Memory Device…’. It will open a new window. Select Manufacturer as ‘Micron’, Density(Mb) as ‘512’ and Type as ‘spi’. Select device ‘mt25ql 01g-spi-x1_x2_x4 (give your device)’ and click on OK.
<Replace the below images w yours>
Step 19:
A Dialogue window will appear asking if you want to program the configuration device now. Click Yes and program the .bin bitstream file which is located in “<project location>/ProjectName.runs/impl_1” directory.
<replace the image w yours>
Step 20:
Once the device is programmed, test it on a Windows or Linux machine.
Communicating with Aller via PCI Express on Linux Machines:
Step 1:
Download the complete pcimem application code zip file and unzip it in a specific location. Go to the command line terminal and check the PCIe base address by using the command
lspci -vv
the output of the command is shown below. Make sure the Aller Board is inserted correctly into the PCIe Slot of the host system’s motherboard. If the host is unable to detect Aller (which should show up as “Memory controller: Xilinx Corporation Device 7024 [verify this after testing the board] as in the image below), make sure the board is inserted correctly into PCIe Slot and do a soft reset after the host is powered up. A soft-reset after the host is powered up helps the host detect FPGA-based PCIe devices.
<replace the image w yours>
Step 2:
In the command line terminal, open the path where you saved the ‘pciemem’ code. First, compile the C program by using the command “make”. Once it is compiled successfully, use the following command:
sudo ./pcimem /dev/mem f7000000 w 0xffffff14
Here,
f7000000
: indicates the base address + offset, it is the address to which write is performed.
w
: indicates whether it is word, byte or half-word.
0xffffff14
: 32-bit data value for write purpose.
You will observe the following output indicating that the 32-bit data has been written to the specified address and read back from it. If the written data matches the data read, it means data was successfully written to Aller.
Communicating with Aller via PCI Express on Windows Machines:
Step 1:
For Windows machines, use RW-Everything software to write data. (test the board using windows and verify the images)
Step 2:
Insert Aller in the PCI Express slot of the host system’s motherboard. Power up the host and then do a soft restart again. Boot into Windows. Open RW-Everything and click on “PCI Devices”. It will open the PCI Devices window.
Step 3:
Select Xilinx PCIe Device from the dropdown list.
Step 4:
Locate the BAR Address from the addresses on the left side.
Step 5:
Double-click to open the BAR Address and select any one of the address memory locations and write some data to it. The data is persistent on Aller as long as it is powered up. We can verify it by closing and re-opening the application, which should show the data written previously, indicating that the write operation was successful.
So, this was a basic introduction to getting started with PCI Express using Aller Artix Ultrascale Plus FPGA Board with M.2 Interface. PCI Express offers a lot more capability such as DMA transfers and bus mastering. High-performance PCI Express projects will most necessarily need custom drivers for either Windows or Linux, depending on the Operating System which is to be used. This article is just a start of the big journey into PCI Express. We encourage you to keep moving forward!