新闻动态

第十九届全国人机语音通讯学术会议 第二轮审稿结果已公布
发布时间:2024/7/14 12:29:22

        经过程序委员会和评审专家们的严格审稿,本届会议论文的第二轮审稿结果已公布,录用通知书已以邮件方式发送完毕,请作者们注意查收。按照修改要求,望在规定时间内提交定稿(Camera Ready Paper)为盼。

         谢谢大家的支持!

已录用论文信息表(包括第一轮)

论文编号 已录用论文题目
2 the analysis for vowels in neutral tone syllable
3 面向心音识别的自监督联邦学习
4 The attention-based fusion of master-auxiliary network for speech enhancement
5 “去1+VP+去2”中“去2”的轻化程度及语义虚实分析
7 D-AGNet: A Dual-branch Network with Attention Guidance for Speech Emotion Recognition
8 基于跨语言数据迁移的端到端伪造语音检测方法
9 Zero-Shot Personalized Voice Synthesis with Cross-Attention Speaker Embeddings
10 ASD-Diff: Unsupervised Anomalous Sound Detection With Masked Diffusion Model
11 A longitudinal study on the acquisition of standard Chinese monophthongs by intermediate- and advanced- level Korean Learners
12 PadAug: Robust Speaker Verification with Simple Waveform-Level  Silence Padding
14 A Comprehensive Data Processing Pipeline for LLM based Text-to-Speech Models
15 《广州话韵律边界时长模式研究》
16 MIXDIFF: Mixture Diffusion Model for Efficient Text-to-Speech
17 Analysis and Construction of Corpus Based on Kazakh Text Character Encoding
19 声调音高曲线的调头处理办法初探
21 普通话与重庆话长时基频特征的时长阈限研究
22 StyleSVC: Singing Voice Conversion with Multi-Scale Style Transfer
25 基于互信息特征解耦的情感一致性语音转换
29 基于音高线索的普通话对比焦点感知研究 ——以阳平、去声为例
30 基于去噪扩散概率模型的对抗攻击方法
31 Contrastive focus perception in Mandarin from pitch -- Take Tone 1 and Tone 3 as an example(Agree to publish/participate in the Best Paper evaluation in the journal)
33 中级俄语母语者汉语阳平双字调声学分析
36 维吾尔语清塞音k习得的声学特征研究
37 Speech emotion recognition based on multi acoustic  feature fusion
38 跨语言音系对比及错误分析研究
40 Data Augmentation and Progressive Learning in Acoustic Echo Cancellation for Duplex conversations on Mobile Devices
41 Emergence of Hemispheric Asymmetries and Predictive Coding in the Neural Mechanism of Speech Perception
42 A pilot study on the perception of "dearing" emotional speech
43 Phoneme Semantic Backdoor Attacks with Multiple Task Learning for Speech Classification Task
45 Burmese Speech Synthesis Based on Diffusion Model
46 IUMSS-CETL:Low-Resource Iu Mien Speech Synthesis based on Transfer Learning
50 AESR: Speech Recognition With Speech Emotion Recogniting Learning
51 基于脑控嵌入向量的语音分离网络
55 基于声学参数的蒙古呼麦情感表现研究
56 鄂温克语阴阳元音舌根位置的声学表现
57 A Comparative Analysis of Diphthong Acquisition in Standard Chinese by Learners from ‘the Belt and Road’
58 复杂动态系统背景下邹平方言入声调变异速率实证研究
60 Domain Adaptation for Front-End Module in TTS with LoRAs
63 东部裕固语长短元音空间分布的声学统计分析
64 ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
65 Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
66 M-CMGAN: Attempting to Use Mamba on Speech Enhancement
68 DA-KWFormer: A Domain Adaptation Network with K-Weight Transformer for Speech Emotion Recognition
69 Anomalous Sound Detection Using Time-Frequency Feature and Mixbatch
70 Improving Speaker Verification Back-end with Graph Neural Networks
71 Integrating Time-Frequency Domain Shallow and Deep Features for Speech-EEG Match-Mismatch of Auditory Attention Decoding
72 基于注意力机制和数据过采样的酒瓶裂纹敲击异常声音检测系统
74 Dual-Path Spectrogram Refinement Network for Robust Speaker Verification
75 Transformer-based Model for Auditory EEG Decoding
76 刻意伪装场景下的说话人确认
77 基于语音谐波结构的多通道语音增强网络
78 基于文化分析的跨语言语音情感识别
79 布朗语学龄儿童国家通用语声调习得研究
80 融合多源知识的文物描述自动生成方法研究
81 Enhancing Transducers for Robust Keyword Spotting by Duration Modeling
82 A Backend-friendly On-device Multi-channel Speech Enhancement System with IPD and PHM
83 SESNet: A Speech Enhancement and Separation Network in Noisy Reverberant Environments
85 普通话双音节词连上变调的信息机制研究
86 基于解耦学习的鲁棒说话人验证研究
87 A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions
89 BER: Balanced Error Rate For Speaker Diarization
90 Adaptive Context Biasing in Transformer-based ASR Systems
91 Baseline Systems for Chinese Continuous Visual Speech Recognition Challenge 2024
92 Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations
93 潮州话入声调的韵律边界效应
96 Enhancing Mispronunciation Detection with Multi-Speaker Text-to-Speech and Mixture-of-Experts Network
97 A study on the allocations of prosody boundaries  in L2 Mandarin speech by Vietnamese learners  based on dependency syntax
99 MHAN: Bottleneck Fusion Model Based on Hybrid Attention Network for Multimodal Emotion Recognition
100 Paraformer-v2: An improved single step non-autoregressive transformer for noise-robust speech recognition
101 Sound Zone Control Based On A Kronecker Second-order Tensor Decomposition
102 多民族语言语音到语音翻译研究进展综述
103 面向心理治疗的共情对话系统
104 A Brief Survey on the Explainability for Deep Speech Processing Models
105 Are Transformers in pretrained LM A Good ASR Encoder? An Empirical Study
107 基于统计分类的一级甲等普通话五度调值划分研究
108 俄罗斯学生普通话辅音/t/、/tɕ/的感知同化与区分
111 MCDubber: Multimodal Context-Aware Expressive Video Dubbing
114 面向说话人识别的最近邻惩罚圆损失函数
115 TeleSpeechPT: Large-Scale Chinese Multi-Dialect And Multi-Accent Speech Pre-Training
116 Investigation into the Impact of Speaker Adversarial Perturbation on Speech Recognition
117 Speaker extraction with verification of present and absent target speakers
118 Exploring Discrete Tokens Suitable for Speech Synthesis
119 Stage-Wise and Piror-Aware Neural Speech Phase Prediction
120 Pruning and Quantization Enhanced Densely Connected Neural Network for Efficient Acoustic Echo Cancellation
122 MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios
123 LightAC: Lightweight Accent Conversion via Module-Wise Distillation
125 基于双对齐和对比学习的多模态情感识别
126 越南学生汉语-越南语声调的感知同化模式
127 德昂语的焦点韵律实现
128 基于神经网络的重音提取及重音描述提示生成
129 Improved DOA Estimation of Sound Source of Small Amplitudes Using a Single Acoustic Vector Sensor
130 基于通道注意力特征融合的异常音检测方法
133 融合语种信息的端到端多语种语音识别方法
135 Investigation on Training Strategy for Cross-Modal Large Language Models with Speech and Text
136 XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition
137 An local-aggregation based method for robust speaker verification
138 An acoustic study of Mandarin Chinese vowels produced by Uyghur-speaking learners
139 ExARN: Target Speaker Extraction with Attentive Recurrent Networks
141 Emotional Speech Synthesize via Visual Context Perception
143 Tone Perception by Putonghua-Learning Preschool Children in South Xinjiang Uyghur Autonomous Region
145 基于多流序列和对比学习的音乐生成研究
147 宜人性与开放性人格特质对情绪语音特征的影响
148 外倾性和神经质人格特质对情绪语音特征的影响
150 单音节阳平声调拐点位置与音节结构关系考察
151 基于注意力机制软切分的发音偏误检测探究
152 Study on Prosodic Disambiguation of VP/NP syntactic structure by Chinese EFL Learners
153 An electroencephalogram-based study of neural responses to imagined speech in Mandarin
154 基于语音病理特征的不流畅语音片段标注系统
155 基于脑电信号的汉语普通话语音分类研究
156 A Speech Corpus of Putonghua-Learning Preschoolers From the Uygur Ethnic Group in South Xinjiang Uygur Autonomous Region
157 Evaluation of Data Inconsistency for Multi-modal Sentiment Analysis
159 基于尖峰特征的口音识别和语音识别多任务学习方法
160 A Study of Phonetic Differences in Hankou Dialect in the late Qing Dynasty -- Based on the records of A Chinese-English Dictionary and The Hankow Syllabary
163 Efficient Singular Spectrum Mode Ensembler Capable of Extracting Wide-band Components in Spectrum Overlapping Scenarios
167 俄语母语者普通话舌尖元音[ɿ]、[ʅ]产出训练研究
168 普通话擦音的空气动力学特性
171 LDMME: Latent Diffusion Model for Music Editing
172 基于前音节锚点音高规整的重音和间断特征考察
174 Analyzing the Improvement of Supplementary Features on Voice Conversion Using Disentangled Representation Learning
178 基于计划梯度反转的说话人无关韵律表征研究
180 基于Transformer编码改进GCRN网络的单通道语音增强方法
181 面向CPEP3评估的孤独症谱系障碍儿童语言表达能力自动化预测方法
182 Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
183 Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy
184 Mongolian Speech Recognition Based on Semi-Supervised Learning and Syllable Subword Modeling Units
185 视频信息辅助的歌声旋律提取
186 Dynamic properties of diphthongs in Hefei  Mandarin
187 削波语音声学特征研究
188 中级水平俄语母语者汉语停延习得研究
189 哈萨克斯坦汉语学习者辅音声母习得分析
190 On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
191 俄语母语者汉语同声调句中的声调产出研究
193 Demystifying the Robustness of Deep Speaker Recognition Against Non-Speech Segments
196 An Unsupervised Domain Adaptation Method based on Distribution Alignment for Speaker Verification
199 Improving Emotion Recognition with Pre-trained Models, Multimodality, and Contextual Information
200 普通话学习者与母语者的嗓音质量对比分析
202 Cross-Model Knowledge Distillation and Metadata Fusion for Respiratory Sound Classification
203 Tibetan-Chinese Speech-to-Speech Translation Based on Large Speech Models
204 A Study on the Effect of Focus on Vowel Duration and Formant in Cantonese
205 Tibetan Speech Synthesis Based on Pre-traind Mixture Alignment FastSpeech2
206 儿化感知影响因素研究
207 俄语母语者汉语朗读流利度评分模型探究
210 A New Parameter to Indicate the Syllable Stress Levels in Mandarin
212 藏语安多方言塞音发声的声学和电声门图研究
213 基于超声成像的藏语安多方言塞音研究
214 Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats
215 Task-Specific Dementia Risk Detection Based on Psycholinguistic Knowledge and Linguistic Features
216 Dementia Risk Detection via Text Augmentation-based Multi-Task Learning
218 Language-independent Prosody-enhanced Speech Representations for Multilingual Speech Synthesis
219 Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
220 Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
226 样本量大小对双元音/ei/话者区分能力评判的影响 
227 16年跨度对普通话青年女声声学特征的影响
230 NDVQ: Robust Neural Audio Codec with Distribution-Based Vector Quantization
231 CTC-Assisted LLM-Based Contextual ASR
233 抑郁症情感表达的多模态生理数据库构建及分析