This New AI Chipmaker, Taalas, Hard-Wires AI Models Into Silicon to Make Them Faster and Cheaper; Early Results Crush Modern Solutions

•

Feb 20, 2026 at 01:21pm EST

The image shows a Taalas HCI Technology Demonstrator featuring the Llama 3.1 8B model, TSMC 6nm technology, 815mm² area, 53

Well, it appears that the chip startup Taalas has found a solution to LLM response latency and performance by creating dedicated hardware that 'hardwires' AI models.

Taalas Manages to Achieve 10x Higher TPS With Meta's Llama 8B LLM, That Too With 20x Lower Production Costs

When you look at today's world of AI compute, latency is emerging as a massive constraint for modern-day compute providers, mainly because, in an agentic environment, the primary moat lies in token-per-second (TPS) figures and how quickly you can get a task done. One solution the industry sees is integrating SRAM into their offerings, and companies like Cerebras and Groq are already exploring it. However, the startup Taalas has apparently explored a rather intriguing route: pivot away from general-purpose computing towards ASICs for LLMs.

Founded 2.5 years ago, Taalas developed a platform for transforming any AI model into custom silicon. From the moment a previously unseen model is received, it can be realized in hardware in only two months. The resulting Hardcore Models are an order of magnitude faster, cheaper, and lower power than software-based implementations.

- Taalas

According to the company, its approach focuses on two different fundamentals. The first is the specialization of AI workloads at the hardware level. And when we say hardware-focused, it literally means mapping specific neural networks of LLMs onto the silicon itself, to optimize infrastructure for each model. The second target area is what the company calls "merging storage and computation", and here, the focus is on overcoming memory walls and the overhead in data communications within a general-purpose system.

With their solution, all computation happens at "DRAM-level" density to ensure faster intercommunication, which is one of the reasons Taalas has managed to solve the latency problem with LLMs. Their solution doesn't include advanced cooling, HBM, packaging, and complex integration; instead, all the innovation happens within the engineering dynamics of silicon. Taalas has also showcased its first product, called HC1, which integrates Meta's Llama 3.1 8B LLM. The performance results are 'shocking' to say the least.

Taalas delivers 10x the TPS of today's "high-end" infrastructure while achieving 20x lower production costs. Well, you might think that latency and performance constraints are solved here, but let's look at the HC1 chip from a technical angle. It features TSMC's 6nm node and a chip size up to 815 mm², which is almost the size of NVIDIA's H100 chip. The HC1 hosts an eight-billion-parameter model, while today's frontier LLMs scale up to one trillion parameters. And, if you have guessed it by now, Taalas would need to rework its silicon strategy.

And the only way to scale up performance is to offer a cluster-based approach, and according to Taalas, they have already done this with DeepSeek's R1, achieving a 12,000 TPS/user figure in a 30-chip configuration. So, the primary constraints now lie in market adoption and the business model. Given this hardwired approach, hardware would indeed be specific to certain LLMs, without the option to change model weights, but given the startup's speed figures, it isn't a bad bet.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.

This New AI Chipmaker, Taalas, Hard-Wires AI Models Into Silicon to Make Them Faster and Cheaper; Early Results Crush Modern Solutions

Taalas Manages to Achieve 10x Higher TPS With Meta's Llama 8B LLM, That Too With 20x Lower Production Costs

Related Story NVIDIA Blackwell Costs Twice As Much As Google And Amazon’s Custom AI Chips, Yet Morgan Stanley Says It’s Worth It

Further Reading

Groq's Inference Chips Are Beating NVIDIA's Blackwell by 5x on Cost - And Doing It Twice as Fast

"I Produce The Lowest Cost Tokens In The World" Says NVIDIA CEO As He Highlights The Full-Stack Approach To AI

Hyperscalers Are 'Scratching Their Heads' with Rising Memory Costs, But NVIDIA Might Be the Only One Smiling

Google's Gemma 4 Model Can Now Be Deployed on NVIDIA's RTX GPUs, Delivering Optimized Performance for a 'Personalized' Agentic AI Environment