Revolutionizing AI: Eliminating Matrix Multiplication in Language Models


Revolutionizing AI: Eliminating Matrix Multiplication in Language Models

In a groundbreaking development, researchers from the University of California Santa Cruz, UC Davis, LuxiTech, and Soochow University have proposed a novel method to run A.I. language models more efficiently by eliminating matrix multiplication from the process. This innovative approach could significantly reduce the power consumption and the need for GPUs, fundamentally redesigning neural network operations.

The Role of Matrix Multiplication in AI

Matrix multiplication, often referred to as MatMul, is central to neural network computations. GPUs excel at performing these operations quickly due to their ability to handle large numbers of multiplication operations in parallel. This capability has given Nvidia a dominant position in the A.I. hardware market, with an estimated 98 percent share for data center GPUs. These GPUs power major A.I. systems such as ChatGPT and Google Gemini, highlighting their critical role in current A.I. implementations.

Scalable MatMul-Free Language Modeling

In their paper titled 'Scalable MatMul-free Language Modeling,' the researchers describe creating a custom 2.7 billion parameter model that operates without using MatMul, yet delivers performance similar to conventional large language models (LLMs). They also demonstrated running a 1.3 billion parameter model at 23.8 tokens per second on a GPU accelerated by a custom-programmed FPGA chip, which uses approximately 13 watts of power. This approach suggests a more efficient and hardware-friendly architecture could be on the horizon.

Power Efficiency and Implications

To put this into perspective, conventional LLMs typically require around 700 watts, whereas a 2.7 billion parameter version of an LLM like Llama 2 can run on a home PC with an RTX 3060, which uses about 200 watts at peak. If an LLM could theoretically run entirely on an FPGA using only 13 watts, it would represent a 38-fold decrease in power consumption. This leap in efficiency could lead to significant reductions in operational costs and environmental impact.

Challenging the Status Quo: Researchers' Insights

The paper's authors—Rui-Jie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, and Jason Eshraghian—argue that their work challenges the prevailing belief that matrix multiplication operations are essential for building high-performing language models. They claim that their approach could make LLMs more accessible, efficient, and sustainable, particularly for deployment on resource-constrained hardware such as smartphones.

Inspiration from BitNet

The researchers acknowledge the influence of BitNet, a 1-bit transformer technique that demonstrated the feasibility of using binary and ternary weights in language models, scaling up to 3 billion parameters while maintaining competitive performance. However, BitNet still relied on MatMul in its self-attention mechanism. These limitations motivated the current study, leading to the development of a completely MatMul-free architecture.

The Future of A.I. Without MatMul

Eliminating matrix multiplication from A.I. models represents a substantial shift in A.I. research and development. By reducing power consumption and reliance on GPUs, this approach opens the door to more sustainable and cost-effective A.I. implementations. It has the potential to democratize access to advanced A.I. technologies, enabling deployment on a broader range of devices, including those with limited resources.


The development of scalable MatMul-free language models marks a significant milestone in A.I. research. By challenging the traditional reliance on matrix multiplication, researchers are paving the way for more efficient, accessible, and sustainable A.I. systems. As this technique undergoes further validation and peer review, it could herald a new era in A.I. deployments, transforming how we design and utilize these powerful technologies.

Would you like to learn more about how these innovations could benefit your business? Contact us today to set up an appointment and explore the future of A.I. technology.

For more detailed information on the research, you can refer to the sources provided.