Tenstorrent made a bold claim during their TT-Deploy livestream, saying they are going to crush everyone at everything, including AI, with their Galaxy servers.
Tenstorrent Galaxy Supercluster Offers 10x Faster GenAI Video, And Destroys Current-Gen GPUs With "Blitz Mode", Offering 350+ Tokens/s In DeepSeek R1
Jim Keller and his Tenstorrent are on a mission to challenge the existing AI hierarchy with their RISC-V-powered platforms.
As such, the company unveiled its latest Galaxy Blackhole servers for AI at scale. With Galaxy Blackhole, Tenstorrent offers a fully Networked and native AI solution that includes compute, memory, and networking, all unified into a single system optimized for the latest AI workloads.
The chip inside Galaxy servers is called Blackhole and is based on the RISC-V architecture, which competes against ARM and x86. During the event, Jim Keller said that the A0 silicon is already shipping, but there are software bugs that they are addressing.
To showcase the performance of its Galaxy Blackhole supercluster, Tenstorrent ran various demos during the TT-Deploy livestream.
Let's start with the specifications set by Tenstorrent. The Tensor core powering the Blackhole chips is called Tensix and features five RISC processors with matrix-multiply units, vector units, and local SRAM. Each RISC processor is fully programmable, and each core is attached to a high-bandwidth NOC. And several of these Tensor "Tensix" cores are deployed together to make a chip.
Tenstorrent explains that while competing GPUs such as the GB300 from NVIDIA. The company claims that to achieve higher Token throughput, the number of users is drastically decreased on competing platforms. That's not the case with Tenstorrent's Galaxy servers, which retain lower Token Cost ($6 vs ~$30), and achieve much lower TCO for firms using these servers.
We talked about this last week, too, and Tenstorrent has officially showcased up to 10x faster Video GenAI performance running on its Galaxy Supercluster. The system is able to generate an 81-frame (720p) video in just 2.4 seconds. That's a 5-sec video being generated in 2.4 seconds, faster than real-time.
In addition to the GenAI demo, Tenstorrent also showcased Blitz Mode for its Galaxy Blackhole server. Blitz Mode on Galaxy is optimized for premium, latency-sensitive AI workloads. With this mode, Galaxy servers can rack up to 350 tokens/s on Deepseek R1-0528 671B, swiftly outpacing the GPU competition. The two benchmarks demoed are listed below:
- Decode: DeepSeek-R1-0528 671B up to 350+ tokens/second/user –– faster than the fastest inference systems from Groq and Cerebras in performance and capacity supporting batch sizes from 8 to 64 and up to 128k context (Running on 16 Galaxy servers)
- Prefill: DeepSeek-R1-0528 671B sub-4-second time-to-first-token on 100K context –– running on the same general-purpose AI Tenstorrent Galaxy superclusters
In terms of pricing and availability, the Tenstorrent Galaxy Blackhole server will be available in an air-cooled rack configuration with next-generation Blackhole chips and a fully open-source software stack, starting at $110,000. The system offers 23 PFLOPs of FP8 (AI) compute through 32 Blackhole chips, 6.2 GB of on-chip SRAM at 2.9 PB/s, 1 TB of DRAM at 16 TB/s, and 56 x 800G Ethernet Ports for up to 11.2 GB/s of scale-out bandwidth.
Customers can also purchase Galaxy Blackhole in supercluster configurations with 4-36 Galaxy servers. The base configuration with 4 Galaxy servers starts at $440,000.
Follow Wccftech on Google to get more of our news coverage in your feeds.
