Audio-based High-resolution Face Portrait Method

Audio-based High-resolution Face Portrait Method

PDF

Hao-ming QIN, Fan-liang BU^*, Fang-hao ZHONG

Science Technology and Engineering | 2025, 25(13) : 5515 - 5526

Less

Science Technology and Engineering | 2025, 25(13): 5515-5526

• Papers·Automation and Computational Technology •

Audio-based High-resolution Face Portrait Method

Full

Hao-ming QIN, Fan-liang BU^*, Fang-hao ZHONG

Affiliations

School of Information Network Security, People's Public Security University of China, Beijing 100038, China

Published: 2025-05-08 doi: 10.12404/j.issn.1671-1815.2403342

Outline

Abstract

Less

Existing voice-driven facial generation methods still face challenges in feature extraction and generation quality, and have yet to fully explore the deep correlation between audio and facial features. To address above mentioned issues, a research approach that combines Mel frequency cepstral coefficients (MFCC) was proposedfor audio feature extraction with the image generation capabilities of the second generation of style generative adversarial networks (StyleGAN2) was proposed. In terms of audio processing, MFCC was employed as the feature extraction method. To more effectively extract and transmit features from the audio, a ResNet18-based residual module was designed and integrated with the squeeze-and-excitation (SE) attention mechanism. Additionally, the activation function in the original residual blocks was optimized and improved by using the Mish activation function, aiming to mitigate the gradient vanishing problem in deep networks, maintain the integrity of feature information, and enhance the accuracy and generalization ability of the model. The StyleGAN2 model was then utilized as the facial image generation model. Experimental results demonstrate that the integration of the designed audio processing network with the StyleGAN2 facial generation model exhibits outstanding performance in the task of voice-driven facial generation. Through comprehensive evaluation using metrics such as Fréchet inception distance (FID) and path length, the proposed method shows a significant improvement in generation quality compared to existing methods, thus fully proving its effectiveness and superiority.

Key words

voice-to-face generation / Mel frequency cepstral coefficients / style generative adversarial networks / attention mechanism

Cite this Article

Hao-ming QIN, Fan-liang BU, Fang-hao ZHONG. Audio-based High-resolution Face Portrait Method[J]. Science Technology and Engineering, 2025 , 25 (13) : 5515 -5526 . DOI: 10.12404/j.issn.1671-1815.2403342

Appendix

Less

Year 2025 volume 25 Issue 13

PDF

333

135

Cite this Article

BibTeX

Article Info

doi: 10.12404/j.issn.1671-1815.2403342

Receive Date：2024-05-07
Online Date：2025-07-09
Published：2025-05-08

Article Data

Affiliations

History

Received：2024-05-07
Revised：2025-01-10

Affiliations

School of Information Network Security, People's Public Security University of China, Beijing 100038, China

References

Share

https://castjournals.cast.org.cn/joweb/kxjsygc/EN/10.12404/j.issn.1671-1815.2403342

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House