收藏切换
A mixed-precision quantization method for large language models via memory alignment
收藏切换
PDF
Zhangming LI1, Weifan GUAN1, Zhengwei CHANG2, Linghao ZHANG2, Qinghao HU1
Journal of Graphics | 2026, 47(1) : 39 - 47
Less
收藏切换
Journal of Graphics | 2026, 47(1): 39-47
Image Processing and Computer Vision
A mixed-precision quantization method for large language models via memory alignment
Full
Zhangming LI1, Weifan GUAN1, Zhengwei CHANG2, Linghao ZHANG2, Qinghao HU1
Affiliations
  • 1 The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  • 2 State Grid Sichuan Electric Power Company, Chengdu Sichuan 610041, China
Published: 2026-02-28 doi: 10.11996/JG.j.2095-302X.2026010039
Outline
收藏切换

As large models continue to grow in scale, the memory footprint and computational overhead of model inference have become critical challenges. Mixed-precision quantization is an effective approach to reduce resource consumption, but existing methods suffer from insufficient outlier handling, significant quantization accuracy loss, and inefficient memory access. To address these issues, a memory-aligned mixed-precision quantization method for large models was proposed. First, weights were divided into SIMD-aligned groups, and outlier groups were identified via group-wise significance analysis, with high-significance groups quantized to 8 bit and others to 2 bit. A block-wise compensation strategy was introduced to mitigate accuracy degradation caused by 2 bit quantization. Furthermore, an efficient packing and storage scheme was designed for mixed-precision weights, where a bitmap was used to record the bit width of each data block, enabling random access. Experimental results demonstrated that the proposed method significantly reduced memory usage and improved computational efficiency while maintaining model accuracy. Specifically, on Llama2-7 B/13 B/70 B, the approach achieved perplexity reductions of 8.13/2.84/1.37 on WikiText-2 and 5.80 on C4 relative to state-of-the-art baselines. The quantized 70 B model reduced weight storage by approximately 87% compared with BF16. Across seven QA benchmarks, an average accuracy gain of 6.24% was achieved. Last, these results indicated that a mixed-precision quantization method for large language models via memory alignment could simultaneously improve compression ratio, memory-access efficiency, and overall model performance.

large language model compression  /  post-training quantization  /  low-bit quantization  /  mixed-precision quantization  /  outlier extraction
Zhangming LI, Weifan GUAN, Zhengwei CHANG, Linghao ZHANG, Qinghao HU. A mixed-precision quantization method for large language models via memory alignment[J]. Journal of Graphics, 2026 , 47 (1) : 39 -47 . DOI: 10.11996/JG.j.2095-302X.2026010039
  • Science and Technology Project of State Grid Corporation of China(5700-202426249A-1-1-ZN)
Year 2026 volume 47 Issue 1
PDF
23
9
Cite this Article
BibTeX
Article Info
doi: 10.11996/JG.j.2095-302X.2026010039
  • Receive Date:2025-06-10
  • Online Date:2026-05-19
  • Published:2026-02-28
Article Data
Affiliations
History
  • Received:2025-06-10
  • Accepted:2025-10-11
Funding
Science and Technology Project of State Grid Corporation of China(5700-202426249A-1-1-ZN)
Affiliations
    1 The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2 State Grid Sichuan Electric Power Company, Chengdu Sichuan 610041, China

Corresponding:

HU Qinghao,E-mail:
References
Share
https://castjournals.cast.org.cn/joweb/txxb/EN/10.11996/JG.j.2095-302X.2026010039
Share to
QR

Scan QR to access full text

Cite this article
BibTeX
Citations
表12种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏
  • BibTeX
  • EndNote
  • RefWorks
  • TxT