CruxML – making real-time machine learning easy

By Philip Leong – CTO and Co-founder CruxML

Almost a decade ago, in the first lecture of my 3rd year undergraduate computer architecture class, I posed a challenge. Even though we could put billions of transistors on a single chip, it seemed the best architecture was an evolution of the von Neumann architecture, proposed in the 1940s. There was an opportunity to discover novel computer architectures for important computing problems, and I suggested that they were the generation to do so.

Although it’s always hard to tell whether or not anyone was listening, those words turned out to be prophetic. Major advances in algorithms, particularly machine learning, have led to application-specific integrated circuit (ASIC) accelerators like the Google Tensor Processing Unit (TPU) for accelerating the workloads of deep neural networks (DNNs). A myriad of new architectures from the likes of Google and Intel have been proposed, and the 2018 Turing Award winners, David Patterson and John Hennessy stated they believe we are in a new Golden Age for computer architecture [1].

Microsoft has taken a different approach with field-programmable gate arrays (FPGAs) in their Azure cloud computing service [2]. While FPGAs do not have the same raw performance as ASICs, their higher flexibility allows them to be used in a wider range of applications. An analogy is that domain-specific ASICs for DNNs are like Formula 1 cars, graphics processing units (GPUs) are like Indy cars, and FPGAs are like four wheel drives. Conventional processors are like walking.

FPGAs are suitable for accelerating a broad range of applications. In the world of DNNs, advances in network architecture and arithmetic have had a dramatic effect on optimisations for performance. Computer systems are not just about computing. Data acquisition, signal processing, compression, interfaces, memory buffering and transfers, price, design time and other factors need to be considered, and FPGAs often provide the best solution. This is particularly the case in specialised applications for which highly-optimised ASICs are not available. Staying with the racing car analogy, while FPGAs may not be better on a racetrack, they are much more suitable technology for the street.

The main problem with FPGAs is that they are really, really, hard to program. ASICs and GPUs provide applications programmer interfaces (APIs) which appear as library calls to the programmer, achieving seamless integration with software. Unfortunately, the flexibility of FPGAs makes it difficult to standardise APIs, and most of the complexity of integrated circuit design is exposed to the designer. While high-level synthesis from C/C++ and optimised domain-specific libraries have gone a long way towards addressing this problem, achieving a working design (let alone maximum performance) on an FPGA is still considered an art form.

Regarding the challenge, I’m delighted that a couple of my ex-students have gone overseas to Intel and Xilinx to work on this problem. I do hope though, that this new Golden Age will help keep more of our talented engineers in Australia.

This is why CruxML was formed – there are a large number of problems for which commercial, off-the-shelf FPGAs are the best solution.  The founders have been studying how difficult computing problems can be solved with FPGAs since their introduction in the late 1980’s, but this appears to be the time where the convergence of machine learning and computer architecture can significantly improve technology. CruxML will be using its knowledge and experience to help clients in the finance, cybersecurity, robotics, defence and space industries develop specialised machine learning systems with order of magnitude performance improvements. By dramatically improving speed, size, weight and power (SWaP), we aspire to be a key Australian company addressing the next generation of computing problems. 

1. John L. Hennessy, David A. Patterson, “A New Golden Age for Computer Architecture”, Communications of the ACM, February 2019, Vol. 62 No. 2, Pages 48-60 10.1145/3282307

2. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-fpga-web-service

Scroll to Top