Hyena arrives, the AI ​​that makes a fool of ChatGPT

Behind this program there are very illustrious names in the field of artificial intelligence, such as Yoshua Bengio, 2019 Turing Award winner, or Christopher Ré, who has helped promote the notion of AI as “software 2.0” in recent years. Hyena achieves the same as ChatGPT with less training and much less processing.

Although the OpenAI program, ChatGPT, deserves all the admiration that has been paid to it of late, the reality is that it is hardly more complex software than usual. But one that requires an enormous amount of training to function and requires remarkable power to respond to increasingly complex challenges. And this is when it can start to fail.


It all started in 2017 when Ashish Vaswani, then one of Google’s research leaders, presented the Transformer AI program, the foundation or father of current AI programs. The problem is that Transformer had a big flaw . To carry out the tasks it uses what is known as “attention”: the program receives the information in a group of symbols, such as words, and moves that information to a new group of symbols, such as the answer we see in ChatGPT.

That attention operation, the essential tool of all current programs, including ChatGPT and GPT-4, has “quadratic” computational complexity. That basically means that the time it takes for ChatGPT to generate a response increases as the square of the information it receives .

That is, if there is too much data (too many words, many lines of chat, or pixels of an image), the program needs more computer power to respond. And that need multiplies by itself until it reaches a limit in which it no longer responds adequately.

In a recent article, a team of scientists from Stanford University and Canada’s MILA institute proposed a technology that could be much more efficient than GPT-4 and named it Hyena. Authors include Stanford’s Michael Poli and Yoshua Bengio, MILA’s chief scientific officer and winner of the 2019 Turing Prize (the computer science equivalent of the Nobel Prize). Bengio is credited with developing the attention mechanism long before Google’s Transformer program existed. They are joined by Christopher Ré, who has helped in recent years to promote the notion of AI as “software 2.0”. Come on, an interesting selection of specialized brains.

Performance multiplied by 100

To demonstrate Hyena’s ability, the authors subjected it to different tests. One of these is known as The Pile, an 825-gigabyte collection of texts ( equivalent to more than 250,000 books ) assembled in 2020 by Eleuther.ai, a nonprofit AI research team. The texts are obtained from “high quality” sources such as PubMed, arXiv, GitHub, the US Patent Office and others, so the information is more rigorous than the discussions that can be seen on Twitter.

The Hyena program achieved an equivalent score to ChatGPT, but with 20% fewer compute operations. In other tasks Hyena achieved scores equal to or close to a version of GPT even though it was trained on less than half the data.

But, and here comes the interesting thing, when Poli’s team increased the demand on Hyena (more data was requested and the exchange increased over time), the program behaved better. At 2,048 tokens, which can be thought of as words, Hyena takes less time to complete a language task than GhatGPT, but by the time they hit 64,000 tokens, the authors note that ” Hyena speedups reach 100x ,” a performance improvement. of a hundred times

As if all this were not enough, the program is much smaller than GPT-4 or even GPT-3. While GPT-3, for example, has 175 billion parameters, the largest version of Hyena has only 1.3 billion. That is to say, it has a hundred times the performance improvement when it is most demanded… with a hundred times fewer parameters. A more than interesting advance and one that could leave ChatGPT as a very nice memory… while it lasted.