At GDC 2026, Intel graphics engineer Marissa du Bois took the stage to present Intel's version of neural texture compression, very similar to NVIDIA's NTC in that both technologies are deterministic. The presentation was a follow-up to the original R&D prototype shown at GDC 2025, with the key news being that Intel has now productized that research into a standalone SDK.
Texture Set Neural Compression (TSNC) is essentially a smarter way to store game textures. Traditional GPU block compression formats (BC1 through BC7) use fixed mathematical rules to reduce texture size, and while they're fast and universally supported, they leave significant compression potential on the table. TSNC takes a fundamentally different approach: it trains a small neural network using stochastic gradient descent to learn to encode and decode the specific textures in a given set. The result is a compact latent space representation that a tiny multi-layer perceptron can reconstruct at runtime into the original diffuse, normal, roughness, metallics, ambient occlusion, and emissive data.
The key insight is that a texture set (all the PBR maps for a single material) has a lot of redundant structure across its channels. TSNC exploits that shared structure in ways that generic block compression simply cannot.
Feature Pyramids: The Two Tiers
At the heart of TSNC's compression scheme is the feature pyramid, a set of four BC1-encoded latent-space textures arranged across different resolution configurations. Intel currently offers two variants with different quality/compression trade-offs:
- Variant A uses two full-resolution latent images and two half-resolution ones. For 4K input textures, this means two 4K and two 2K latent images, totaling around 26.8 MB versus the original 256 MB of uncompressed bitmaps. That works out to over 9x compression, nearly double the 4.8x you'd get from standard BC block compression alone. Perceptual quality loss, measured via NVIDIA's FLIP analysis tool, sits at roughly 5%, which in practice shows up as minor precision loss in normal maps but little else.
- Variant B is the aggressive option. It cascades the latent images down to 1/2, 1/4, and 1/8 of the original resolution, achieving over 17x compression, more than twice as much as Variant A. The quality hit, however, is more noticeable: BC1 block artifacts start appearing in normal maps and AO/roughness channels, and FLIP clocks perceptual error at around 6–7%. That might sound small in absolute terms, but Intel acknowledges it's "enough to be noticeable to a viewer." Variant B is, therefore, probably best suited for distant or secondary materials where quality loss is less likely to be scrutinized.
Since last year's R&D prototype, originally built on PyTorch, the entire Texture Set Neural Compression compressor has been rewritten from scratch using Slang compute shaders. Also, whether a developer is working in Unreal, a custom engine, or running decompression on the CPU, the same decompressor code can target the right backend.
On the GPU side, Intel now supports Microsoft's DirectX 12 Cooperative Vectors API, leveraging Intel Arc's XMX matrix cores (present on both A-series and B-series GPUs) for hardware-accelerated matrix inference. For hardware without XMX support, a standard FMA (fused multiply-and-add) fallback works on both CPUs and non-Intel GPUs.
Intel's Marissa du Bois broke down four different deployment strategies, each with a different trade-off between disk space savings and memory usage:
- At install time — ship compressed, decompress locally during installation. Textures live uncompressed on the user's drive. Savings are mainly on distribution bandwidth.
- At load time — textures stay compressed on disk; decompression happens into VRAM as the game loads. Reduces both install size and VRAM pressure during loading.
- At stream time — combined with texture streaming, textures decompress on demand. Best of both worlds for disk and memory, but adds runtime inference load.
- At sample time — textures remain compressed in VRAM permanently and decode per-pixel in the shader. The most aggressive option for VRAM reduction, with constant inference cost.
Developers will have to pick one depending on their particular use case and underlying engine.
Intel benchmarked inference on a Panther Lake laptop using B390 integrated graphics at a full 1080p compute shader workload. The results were:
- FMA path: 0.661 nanoseconds per pixel
- XMX linear algebra path: 0.194 nanoseconds per pixel
That's a 3.4x speedup from hardware-accelerated matrix math, and the fact that these numbers hold up on integrated graphics makes the per-pixel sample-time deployment scenario look more viable than it might have seemed. For discrete GPUs, the overhead would be even lower. Intel plans to release an Alpha version of the Texture Set Neural Compression SDK later this year, followed by a beta and a public release, though those dates are not yet set in stone.
Follow Wccftech on Google to get more of our news coverage in your feeds.
