
Nvidia has reportedly scrapped plans for a four-die Rubin Ultra GPU, scaling back to a dual-die design after encountering manufacturing hurdles with TSMC’s advanced packaging, according to Taiwanese media reports.
What changed
The original Rubin Ultra was slated as Nvidia’s most ambitious GPU package to date: four reticle-sized compute dies integrated via TSMC’s CoWoS-L packaging, 16 HBM4 memory stacks delivering approximately 1 terabyte of capacity, and an estimated 100 petaflops of FP4 compute per package.
Taiwan’s Commercial Times and WCCFTech reported that the quad-die design caused warping and thermal stress during packaging, bending the substrate and disrupting die-to-substrate contact.
Rather than attempting to push four dies through a single packaging run, Nvidia has reportedly moved to a dual-die architecture with a 2+2 board-level arrangement, two identical dies per package, with two packages on a single rack-level board. This preserves the total compute and memory capacity per rack while substantially easing manufacturing complexity and improving scalability for hyperscaler customers.
No performance loss, on paper
According to the reports, the compute performance per rack is not expected to decrease. A single Kyber blade will still host four Rubin Ultra GPU dies, just not integrated into a single complex package. HBM4e capacity and memory bandwidth targets remain unchanged.
The design shift avoids a more radical alternative: moving to TSMC’s Chip-on-Panel-on-Substrate (CoPoS) packaging, which is designed for larger AI accelerators but is not expected to reach mass production until late 2028, far too late for Rubin Ultra’s planned 2027 deployment.
Standard Rubin on track
The standard dual-die Rubin GPU, designed for large-scale AI training, remains on schedule for mass shipments beginning in mid-2026. The Rubin Ultra variant, riding on an upgraded Kyber rack platform, was expected in the second half of 2027.
Industry context
The packaging retrenchment underscores a growing challenge in the AI chip industry: while transistor scaling continues, advanced packaging has become the critical bottleneck. As GPU dies grow larger and memory demands skyrocket, the physics of keeping multiple silicon chips flat, cool, and connected across a single substrate is proving harder than the chip design itself.
Nvidia has not officially confirmed the design revision. Final specifications for the Rubin Ultra platform are expected at a later date.

