Technology

FireFly Technology

Our unique DSP technology began life 9 years ago in the back corner of the Advanced Products Division of an embedded compiler and tool chain company. As such, it is based on over 25 years of experience developing compilers for virtually every embedded and signal processing platform. The inventors of FireFly felt the team had seen the bad and good points of virtually every architecture ever developed. Leveraging that experience, it was possible to say, "we can do better."

With the help of advanced compiler retargeting technology, and the existence of more than 30,000 individual C and C++ compiler test cases we set out to do just that. Defining the first architecture to the machine independent input of the compiler generator we immediately created a compiler, assembler, linker and simulator. We worked on the definition until it was able to pass all 30K test cases. At that point, it was possible to measure code size and approximate cycles. With access to a tool chain that supported all major processor architectures, it was easy to compare these initial results. By carefully measuring against every iteration of the architecture, we made refinement after refinement, noting what directions improved code compression and cycle counts and which changes actually hurt one or the other or both. After 9 years of scientific tuning, an architecture was born (or "evolved" that could beat the then best-of-breed CPUs achieving more than 30% reduction in code size for the 30K test cases and between 5% to 8% fewer instructions, dynamically. This was an astounding achievement for Craig Franklin (now FireFly´s CTO) and his team.

After four years of software tuning we hired a micro-architecture and implementation expert with more than 30 years experience defining and developing processor micro-architectures. The first custom hardware designs developed were 5-stage and a very short 3-stage RISC processors. Each of these micro-architectures utilized a 32-bit MCU target for the ISA and the FireFly was born. Now, these hardware pipelines have been implemented on 3 different FPGA platforms across both Xilinx and Altera FPGAs and tool chains. These development platforms have provided a very stable base for the development of compilers and debuggers for interfacing to real world cores inside real world hardware.

FireFly technology is designed to naturally support both 32-bit and 64-bit targets. Calling upon his extensive DSP background superscalar and VLIW implementations at IBM and VLIW DSP implementations at Equator Technologies, BOPS (Billions of Operations Per Second) and On Demand Micro-electronics, our CEO (Dr. David Baker) defined additional DSP elements for Firefly. The technology was already a strong digital signal processor target since it had been designed from day one to be inherently SIMD, i.e. every data ALU instruction is defined to be a SIMD instruction. The architecture naturally supports paired register loads and stores and on our DSP this translates to 128-bit loads and stores. These double wide data path instruction help balance the computation and data movement portions of DSP algorithms. In addition, our DSP now supports a strong set of REAL and COMPLEX MAC acceleration instructions, and it enjoys strong permutation support.





One might ask why target the architecture to such a short pipeline when the competition is so tightly focused on long pipelines with very high FMax. The reason is that our customers often target embedded systems in modern, deep sub-micron processes. Very long pipelines were absolutely essential to achieving the needed signal processing when silicon processes had severe limits on gate delays and the silicon budget severely restricted the number of transistors that could be applied to any given problem. One had to do whatever was necessary to make a small number of transistors run fast enough to perform the needed computation. Energy consumption and power dissipation were never a significant part of the decision in "the good old days." But today, energy consumption and power dissipation are at least as important as the very cost of packaged die. In servers, this means high HVAC and power costs. In consumer devices this means not penetrating a market if your talk-time or compute-time is not competitive.





Long pipelines mean lots of routing overhead, lots of pipeline registers, large clock trees and many micro-architectural "band aids" such as highly sophisticated branch prediction. Very short pipelines are much smaller with tiny consumption of all these resources. It is quite practical to put many short-pipe DSP cores in the place of one large ultra-deep multi-GHz DSP – saving design time, reducing risk and lowering power consumption.

Our compiler tool chain and micro-architectural implementations are designed to facilitate the easy introduction of user defined instructions. One can easily add user defined instructions to the compiler, assembler, linker and simulator without requiring recompilation or reassembly of the tool chain. These extension are described to the tool chain in a very simple ascii file. In a similar way one can add the user defined instructions to the hardware in a simple to use modular fashion. Validation between the C simulator extensions and the hardware extensions are easily verified in verilog simulations or on real hardware (FPGA or SOC) using the powerful tool chain and debugger technology available for Firefly processors.

FireFly has developed and filed patents on a variety of new technologies that we use for achieving superior performance, low power and superb code density for our customers. These include not only the DSP/MCU IP products above but also encompass a number of fixed function RTL aimed aimed at embedded vision. Such information is only available under NDA at this time.

Please contact info@fireflydsp.com for information.