For those of you who want to experiment with processorless Ethernet on FPGAs, I’ve just released a 4-port example design that supports these Xilinx FPGA development boards:
- Artix-7 AC701 Evaluation board
- Kintex-7 KC705 Evaluation board
- Kintex Ultrascale KCU105 Evaluation board
- Virtex-7 VC707 Evaluation board
- Virtex-7 VC709 Evaluation board
- Virtex UltraScale VCU108 Evaluation board
- Virtex UltraScale+ VCU118 Evaluation board
Here’s the Git repo for the project: Processorless Ethernet on FPGA
Why processorless?
Pure hardware designs can trump software where the need for low latency and/or high throughput is greater than the need for flexibility and complexity (eg. the support of complex protocols). There are lots of applications that rely on hardware based packet processing to achieve their superior performance. High frequency trading platforms are often fed market pricing over multicast UDP, so their profitability is directly linked to their ability to process UDP with the lowest possible latency. Network security devices that monitor traffic usually need to be as transparent as possible while also being able to detect threats and take action with the lowest possible delay. Whatever your reason for processing Ethernet frames in the FPGA fabric, make sure that you consider both sides of the coin:
-
Pros: Speed and ultra-low latency
The natural advantage of running your algorithms closer to the machine, FPGAs allow you to perform packet processing on dedicated hardware, without the overheads that slow down software based designs. -
Cons: Difficulty of design and lack of flexibility
The possibilities for processorless Ethernet are limited by the difficulty of designing state machines and logic to handle complex IP protocols. Furthermore, handling updates and maintenance is much more difficult and costly with pure hardware designs.
For most applications, a processor brings far more value to the design than it costs in resources and complexity. These example designs can get you started with packet processing on FPGA, but there’s obviously nothing stopping you from running a processor alongside them.
4-port design
The block diagram below illustrates the new design which has four ports vs the single port of the original design. To create this new design, we have created an IP block called “Ethernet driver”, the yellow blocks in the diagram below. This IP block contains the key elements of the original example design, without the TEMAC, clocking and reset logic. We split things up this way so that we can hook everything together in a block design in Vivado IP integrator, and easily extend the design to 4 ports.
The resulting block diagram (in Vivado) only uses three block types:
- Clocking wizard
- Tri-mode Ethernet MAC
- Ethernet driver (module)
The block design has 4x TEMACs and 4x Ethernet drivers so that there is one connected to each port. Each one of these operates independently of the other ports.
Basic operation
As before, the design has two main modes of operation: loopback mode and packet generating/checking mode. In loopback mode, the received packets on a port are sent back out the same port, after having their destination/source addresses swapped. In the packet generating/checking mode, the port sends a stream of packets and checks the received packets to make sure that they fit the same format of the outgoing frames.
A more detailed description of operation can be found in the TEMAC Product Guide. For more instructions on using the example designs, refer to the Github page and my original post.
Play with packets
The best place to start playing around with the packet processing in this design is to checkout the packet generator and
checker, found in files tri_mode_ethernet_mac_0_axi_pat_gen.v
and tri_mode_ethernet_mac_0_axi_pat_check.v
respectively.
They’re both written in Verilog, which like VHDL, is a great language for designing Ethernet packet processing in FPGAs.
Another option is to use a high level synthesis tool like Vivado HLS, and replace the pattern generator/checker with your HLS core.
I have mixed opinions of high level synthesis and I would only recommend using it if your application requires a complex algorithm
that would be very difficult to code in Verilog or VHDL. Even then, sometimes you’re still better off with Verilog and VHDL because
timing or placement can become more of an issue in big complex designs.
For examples on packet processing in Verilog, checkout these excellent resources:
Add a TCP/IP core
One interesting thing to do with this design would be to add an IP core to implement the full TCP/IP stack. There are lots of TCP/IP cores on the market but just don’t expect them to be cheap. Here is a list of some of them:
- Missing Link Electronics TCP/UDP/IP Network Protocol Accelerator
- Enyx TCP/IP and MAC Ethernet IP cores
- Design Gateway TCP Offloading Engine
- Easics TCP Offload Engine
- Skytechnology S.r.l TCP Offload Engine TOEFX101
- Cast UDP/IP Hardware Protocol Stack
Another alternative (if you have lots of time) is to experiment with an open source TCP/IP core:
The end
This was a 3 part tutorial (you’ve just read the last):
- Driving Ethernet Ports without a processor
- Processorless Ethernet: Part 2
- Processorless Ethernet: Part 3 (this post)
If you find this design useful, or you do anything interesting with it, I’d be keen to know about it. Here’s the link to the Git repo again:
Processorless Ethernet on FPGA