The memory industry launches next-generation HBM-PNM chip development with Samsung and Nvidia participating

[ Gearbest Technology News]On May 11, according to Korean media reports, the memory semiconductor industry has officially launched research on the next generation technology HBM-PNM on the eve of the mass production of HBM4. This technology aims to break through the limitations of GPU-centered architecture and promote the transformation of computing memory-centered architecture by performing calculations directly in memory.

memory chip
memory chip

Recently, research teams from Samsung Electronics, NVIDIA, University of California, San Diego, Columbia University, and Yonsei University published a paper proposing multi-chip memory center architecture technology and demonstrating the implementation path of HBM-PNM technology. PNM technology implements direct processing of data near the memory by configuring dedicated computing units in the logic layer of the HBM stack. Compared with PIM technology that embeds circuits in memory cells, PNM can support more complex and powerful computing tasks while maintaining memory capacity.

Currently, when large language models process long context decoding, the computing power utilization of the GPU is usually less than 5%, and most resources are used for memory bandwidth support, resulting in resource waste and high energy consumption. Research shows that when NVIDIA's Rubin GPU handles long-context tasks, its computing chip resource usage is extremely low.

As HBM4 introduces the logic layer of advanced processes of 5 nanometers and below, the implementation threshold of PNM is significantly lowered. The architectural solution proposed by the research team eliminates the traditional GPU computing chip and connects 16 HBM-PNM units to increase the memory bandwidth within the package to 44TB per second, doubling the current level.

Picture source network
Picture source network

Test results show that when this architecture handles 1 million-level long-context reasoning tasks, the delay of its attention mechanism is 15.5 times lower than that of NVIDIA H100, and the energy consumption is reduced by 6.9 times. This architecture also shows significant advantages in speed and energy efficiency compared to Rubin GPUs. The research team believes that this study has verified the potential of memory-centric architecture as a new computing system, and is expected to play a central role in heterogeneous platforms in the future.

Translate »
Gearbest
Logo
Compare items
  • Total (0)
Compare
0