第十九届全国人机语音通讯学术会议第二轮审稿结果已公布-2025年第二十届全国人机语音通讯学术会议

新闻动态

第十九届全国人机语音通讯学术会议第二轮审稿结果已公布

发布时间：2024/7/14 12:29:22

经过程序委员会和评审专家们的严格审稿，本届会议论文的第二轮审稿结果已公布，录用通知书已以邮件方式发送完毕，请作者们注意查收。按照修改要求，望在规定时间内提交定稿（Camera Ready Paper）为盼。

谢谢大家的支持！

已录用论文信息表（包括第一轮）

论文编号	已录用论文题目
2	the analysis for vowels in neutral tone syllable
3	面向心音识别的自监督联邦学习
4	The attention-based fusion of master-auxiliary network for speech enhancement
5	“去1+VP+去2”中“去2”的轻化程度及语义虚实分析
7	D-AGNet: A Dual-branch Network with Attention Guidance for Speech Emotion Recognition
8	基于跨语言数据迁移的端到端伪造语音检测方法
9	Zero-Shot Personalized Voice Synthesis with Cross-Attention Speaker Embeddings
10	ASD-Diff: Unsupervised Anomalous Sound Detection With Masked Diffusion Model
11	A longitudinal study on the acquisition of standard Chinese monophthongs by intermediate- and advanced- level Korean Learners
12	PadAug: Robust Speaker Verification with Simple Waveform-Level Silence Padding
14	A Comprehensive Data Processing Pipeline for LLM based Text-to-Speech Models
15	《广州话韵律边界时长模式研究》
16	MIXDIFF: Mixture Diffusion Model for Efficient Text-to-Speech
17	Analysis and Construction of Corpus Based on Kazakh Text Character Encoding
19	声调音高曲线的调头处理办法初探
21	普通话与重庆话长时基频特征的时长阈限研究
22	StyleSVC: Singing Voice Conversion with Multi-Scale Style Transfer
25	基于互信息特征解耦的情感一致性语音转换
29	基于音高线索的普通话对比焦点感知研究 ——以阳平、去声为例
30	基于去噪扩散概率模型的对抗攻击方法
31	Contrastive focus perception in Mandarin from pitch -- Take Tone 1 and Tone 3 as an example(Agree to publish/participate in the Best Paper evaluation in the journal)
33	中级俄语母语者汉语阳平双字调声学分析
36	维吾尔语清塞音k习得的声学特征研究
37	Speech emotion recognition based on multi acoustic feature fusion
38	跨语言音系对比及错误分析研究
40	Data Augmentation and Progressive Learning in Acoustic Echo Cancellation for Duplex conversations on Mobile Devices
41	Emergence of Hemispheric Asymmetries and Predictive Coding in the Neural Mechanism of Speech Perception
42	A pilot study on the perception of "dearing" emotional speech
43	Phoneme Semantic Backdoor Attacks with Multiple Task Learning for Speech Classification Task
45	Burmese Speech Synthesis Based on Diffusion Model
46	IUMSS-CETL：Low-Resource Iu Mien Speech Synthesis based on Transfer Learning
50	AESR: Speech Recognition With Speech Emotion Recogniting Learning
51	基于脑控嵌入向量的语音分离网络
55	基于声学参数的蒙古呼麦情感表现研究
56	鄂温克语阴阳元音舌根位置的声学表现
57	A Comparative Analysis of Diphthong Acquisition in Standard Chinese by Learners from ‘the Belt and Road’
58	复杂动态系统背景下邹平方言入声调变异速率实证研究
60	Domain Adaptation for Front-End Module in TTS with LoRAs
63	东部裕固语长短元音空间分布的声学统计分析
64	ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
65	Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
66	M-CMGAN: Attempting to Use Mamba on Speech Enhancement
68	DA-KWFormer: A Domain Adaptation Network with K-Weight Transformer for Speech Emotion Recognition
69	Anomalous Sound Detection Using Time-Frequency Feature and Mixbatch
70	Improving Speaker Verification Back-end with Graph Neural Networks
71	Integrating Time-Frequency Domain Shallow and Deep Features for Speech-EEG Match-Mismatch of Auditory Attention Decoding
72	基于注意力机制和数据过采样的酒瓶裂纹敲击异常声音检测系统
74	Dual-Path Spectrogram Refinement Network for Robust Speaker Verification
75	Transformer-based Model for Auditory EEG Decoding
76	刻意伪装场景下的说话人确认
77	基于语音谐波结构的多通道语音增强网络
78	基于文化分析的跨语言语音情感识别
79	布朗语学龄儿童国家通用语声调习得研究
80	融合多源知识的文物描述自动生成方法研究
81	Enhancing Transducers for Robust Keyword Spotting by Duration Modeling
82	A Backend-friendly On-device Multi-channel Speech Enhancement System with IPD and PHM
83	SESNet: A Speech Enhancement and Separation Network in Noisy Reverberant Environments
85	普通话双音节词连上变调的信息机制研究
86	基于解耦学习的鲁棒说话人验证研究
87	A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions
89	BER: Balanced Error Rate For Speaker Diarization
90	Adaptive Context Biasing in Transformer-based ASR Systems
91	Baseline Systems for Chinese Continuous Visual Speech Recognition Challenge 2024
92	Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations
93	潮州话入声调的韵律边界效应
96	Enhancing Mispronunciation Detection with Multi-Speaker Text-to-Speech and Mixture-of-Experts Network
97	A study on the allocations of prosody boundaries in L2 Mandarin speech by Vietnamese learners based on dependency syntax
99	MHAN: Bottleneck Fusion Model Based on Hybrid Attention Network for Multimodal Emotion Recognition
100	Paraformer-v2: An improved single step non-autoregressive transformer for noise-robust speech recognition
101	Sound Zone Control Based On A Kronecker Second-order Tensor Decomposition
102	多民族语言语音到语音翻译研究进展综述
103	面向心理治疗的共情对话系统
104	A Brief Survey on the Explainability for Deep Speech Processing Models
105	Are Transformers in pretrained LM A Good ASR Encoder? An Empirical Study
107	基于统计分类的一级甲等普通话五度调值划分研究
108	俄罗斯学生普通话辅音/t/、/tɕ/的感知同化与区分
111	MCDubber: Multimodal Context-Aware Expressive Video Dubbing
114	面向说话人识别的最近邻惩罚圆损失函数
115	TeleSpeechPT: Large-Scale Chinese Multi-Dialect And Multi-Accent Speech Pre-Training
116	Investigation into the Impact of Speaker Adversarial Perturbation on Speech Recognition
117	Speaker extraction with verification of present and absent target speakers
118	Exploring Discrete Tokens Suitable for Speech Synthesis
119	Stage-Wise and Piror-Aware Neural Speech Phase Prediction
120	Pruning and Quantization Enhanced Densely Connected Neural Network for Efficient Acoustic Echo Cancellation
122	MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios
123	LightAC: Lightweight Accent Conversion via Module-Wise Distillation
125	基于双对齐和对比学习的多模态情感识别
126	越南学生汉语-越南语声调的感知同化模式
127	德昂语的焦点韵律实现
128	基于神经网络的重音提取及重音描述提示生成
129	Improved DOA Estimation of Sound Source of Small Amplitudes Using a Single Acoustic Vector Sensor
130	基于通道注意力特征融合的异常音检测方法
133	融合语种信息的端到端多语种语音识别方法
135	Investigation on Training Strategy for Cross-Modal Large Language Models with Speech and Text
136	XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition
137	An local-aggregation based method for robust speaker verification
138	An acoustic study of Mandarin Chinese vowels produced by Uyghur-speaking learners
139	ExARN: Target Speaker Extraction with Attentive Recurrent Networks
141	Emotional Speech Synthesize via Visual Context Perception
143	Tone Perception by Putonghua-Learning Preschool Children in South Xinjiang Uyghur Autonomous Region
145	基于多流序列和对比学习的音乐生成研究
147	宜人性与开放性人格特质对情绪语音特征的影响
148	外倾性和神经质人格特质对情绪语音特征的影响
150	单音节阳平声调拐点位置与音节结构关系考察
151	基于注意力机制软切分的发音偏误检测探究
152	Study on Prosodic Disambiguation of VP/NP syntactic structure by Chinese EFL Learners
153	An electroencephalogram-based study of neural responses to imagined speech in Mandarin
154	基于语音病理特征的不流畅语音片段标注系统
155	基于脑电信号的汉语普通话语音分类研究
156	A Speech Corpus of Putonghua-Learning Preschoolers From the Uygur Ethnic Group in South Xinjiang Uygur Autonomous Region
157	Evaluation of Data Inconsistency for Multi-modal Sentiment Analysis
159	基于尖峰特征的口音识别和语音识别多任务学习方法
160	A Study of Phonetic Differences in Hankou Dialect in the late Qing Dynasty -- Based on the records of A Chinese-English Dictionary and The Hankow Syllabary
163	Efficient Singular Spectrum Mode Ensembler Capable of Extracting Wide-band Components in Spectrum Overlapping Scenarios
167	俄语母语者普通话舌尖元音[ɿ]、[ʅ]产出训练研究
168	普通话擦音的空气动力学特性
171	LDMME: Latent Diffusion Model for Music Editing
172	基于前音节锚点音高规整的重音和间断特征考察
174	Analyzing the Improvement of Supplementary Features on Voice Conversion Using Disentangled Representation Learning
178	基于计划梯度反转的说话人无关韵律表征研究
180	基于Transformer编码改进GCRN网络的单通道语音增强方法
181	面向CPEP3评估的孤独症谱系障碍儿童语言表达能力自动化预测方法
182	Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
183	Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy
184	Mongolian Speech Recognition Based on Semi-Supervised Learning and Syllable Subword Modeling Units
185	视频信息辅助的歌声旋律提取
186	Dynamic properties of diphthongs in Hefei Mandarin
187	削波语音声学特征研究
188	中级水平俄语母语者汉语停延习得研究
189	哈萨克斯坦汉语学习者辅音声母习得分析
190	On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
191	俄语母语者汉语同声调句中的声调产出研究
193	Demystifying the Robustness of Deep Speaker Recognition Against Non-Speech Segments
196	An Unsupervised Domain Adaptation Method based on Distribution Alignment for Speaker Verification
199	Improving Emotion Recognition with Pre-trained Models, Multimodality, and Contextual Information
200	普通话学习者与母语者的嗓音质量对比分析
202	Cross-Model Knowledge Distillation and Metadata Fusion for Respiratory Sound Classification
203	Tibetan-Chinese Speech-to-Speech Translation Based on Large Speech Models
204	A Study on the Effect of Focus on Vowel Duration and Formant in Cantonese
205	Tibetan Speech Synthesis Based on Pre-traind Mixture Alignment FastSpeech2
206	儿化感知影响因素研究
207	俄语母语者汉语朗读流利度评分模型探究
210	A New Parameter to Indicate the Syllable Stress Levels in Mandarin
212	藏语安多方言塞音发声的声学和电声门图研究
213	基于超声成像的藏语安多方言塞音研究
214	Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats
215	Task-Specific Dementia Risk Detection Based on Psycholinguistic Knowledge and Linguistic Features
216	Dementia Risk Detection via Text Augmentation-based Multi-Task Learning
218	Language-independent Prosody-enhanced Speech Representations for Multilingual Speech Synthesis
219	Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
220	Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
226	样本量大小对双元音/ei/话者区分能力评判的影响
227	16年跨度对普通话青年女声声学特征的影响
230	NDVQ: Robust Neural Audio Codec with Distribution-Based Vector Quantization
231	CTC-Assisted LLM-Based Contextual ASR
233	抑郁症情感表达的多模态生理数据库构建及分析

新闻动态

会议地址

会议电话

会议邮箱