收藏切换
Audio-based High-resolution Face Portrait Method
收藏切换
PDF
Hao-ming QIN, Fan-liang BU*, Fang-hao ZHONG
Science Technology and Engineering | 2025, 25(13) : 5515 - 5526
Less
收藏切换
Science Technology and Engineering | 2025, 25(13): 5515-5526
Papers·Automation and Computational Technology
Audio-based High-resolution Face Portrait Method
Full
Hao-ming QIN, Fan-liang BU*, Fang-hao ZHONG
Affiliations
  • School of Information Network Security, People's Public Security University of China, Beijing 100038, China
Published: 2025-05-08 doi: 10.12404/j.issn.1671-1815.2403342
Outline
收藏切换

Existing voice-driven facial generation methods still face challenges in feature extraction and generation quality, and have yet to fully explore the deep correlation between audio and facial features. To address above mentioned issues, a research approach that combines Mel frequency cepstral coefficients (MFCC) was proposedfor audio feature extraction with the image generation capabilities of the second generation of style generative adversarial networks (StyleGAN2) was proposed. In terms of audio processing, MFCC was employed as the feature extraction method. To more effectively extract and transmit features from the audio, a ResNet18-based residual module was designed and integrated with the squeeze-and-excitation (SE) attention mechanism. Additionally, the activation function in the original residual blocks was optimized and improved by using the Mish activation function, aiming to mitigate the gradient vanishing problem in deep networks, maintain the integrity of feature information, and enhance the accuracy and generalization ability of the model. The StyleGAN2 model was then utilized as the facial image generation model. Experimental results demonstrate that the integration of the designed audio processing network with the StyleGAN2 facial generation model exhibits outstanding performance in the task of voice-driven facial generation. Through comprehensive evaluation using metrics such as Fréchet inception distance (FID) and path length, the proposed method shows a significant improvement in generation quality compared to existing methods, thus fully proving its effectiveness and superiority.

voice-to-face generation  /  Mel frequency cepstral coefficients  /  style generative adversarial networks  /  attention mechanism
Hao-ming QIN, Fan-liang BU, Fang-hao ZHONG. Audio-based High-resolution Face Portrait Method[J]. Science Technology and Engineering, 2025 , 25 (13) : 5515 -5526 . DOI: 10.12404/j.issn.1671-1815.2403342
Year 2025 volume 25 Issue 13
PDF
333
135
Cite this Article
BibTeX
Article Info
doi: 10.12404/j.issn.1671-1815.2403342
  • Receive Date:2024-05-07
  • Online Date:2025-07-09
  • Published:2025-05-08
Article Data
Affiliations
History
  • Received:2024-05-07
  • Revised:2025-01-10
Affiliations
    School of Information Network Security, People's Public Security University of China, Beijing 100038, China
References
Share
https://castjournals.cast.org.cn/joweb/kxjsygc/EN/10.12404/j.issn.1671-1815.2403342
Share to
QR

Scan QR to access full text

Cite this article
BibTeX
Citations
表12种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏
  • BibTeX
  • EndNote
  • RefWorks
  • TxT