经过程序委员会和评审专家们的严格审稿,本届会议论文的第二轮审稿结果已公布,录用通知书已以邮件方式发送完毕,请作者们注意查收。按照修改要求,望在规定时间内提交定稿(Camera Ready Paper)为盼。
谢谢大家的支持!
已录用论文信息表(包括第一轮)
论文编号 | 已录用论文题目 |
2 | the analysis for vowels in neutral tone syllable |
3 | 面向心音识别的自监督联邦学习 |
4 | The attention-based fusion of master-auxiliary network for speech enhancement |
5 | “去1+VP+去2”中“去2”的轻化程度及语义虚实分析 |
7 | D-AGNet: A Dual-branch Network with Attention Guidance for Speech Emotion Recognition |
8 | 基于跨语言数据迁移的端到端伪造语音检测方法 |
9 | Zero-Shot Personalized Voice Synthesis with Cross-Attention Speaker Embeddings |
10 | ASD-Diff: Unsupervised Anomalous Sound Detection With Masked Diffusion Model |
11 | A longitudinal study on the acquisition of standard Chinese monophthongs by intermediate- and advanced- level Korean Learners |
12 | PadAug: Robust Speaker Verification with Simple Waveform-Level Silence Padding |
14 | A Comprehensive Data Processing Pipeline for LLM based Text-to-Speech Models |
15 | 《广州话韵律边界时长模式研究》 |
16 | MIXDIFF: Mixture Diffusion Model for Efficient Text-to-Speech |
17 | Analysis and Construction of Corpus Based on Kazakh Text Character Encoding |
19 | 声调音高曲线的调头处理办法初探 |
21 | 普通话与重庆话长时基频特征的时长阈限研究 |
22 | StyleSVC: Singing Voice Conversion with Multi-Scale Style Transfer |
25 | 基于互信息特征解耦的情感一致性语音转换 |
29 | 基于音高线索的普通话对比焦点感知研究 ——以阳平、去声为例 |
30 | 基于去噪扩散概率模型的对抗攻击方法 |
31 | Contrastive focus perception in Mandarin from pitch -- Take Tone 1 and Tone 3 as an example(Agree to publish/participate in the Best Paper evaluation in the journal) |
33 | 中级俄语母语者汉语阳平双字调声学分析 |
36 | 维吾尔语清塞音k习得的声学特征研究 |
37 | Speech emotion recognition based on multi acoustic feature fusion |
38 | 跨语言音系对比及错误分析研究 |
40 | Data Augmentation and Progressive Learning in Acoustic Echo Cancellation for Duplex conversations on Mobile Devices |
41 | Emergence of Hemispheric Asymmetries and Predictive Coding in the Neural Mechanism of Speech Perception |
42 | A pilot study on the perception of "dearing" emotional speech |
43 | Phoneme Semantic Backdoor Attacks with Multiple Task Learning for Speech Classification Task |
45 | Burmese Speech Synthesis Based on Diffusion Model |
46 | IUMSS-CETL:Low-Resource Iu Mien Speech Synthesis based on Transfer Learning |
50 | AESR: Speech Recognition With Speech Emotion Recogniting Learning |
51 | 基于脑控嵌入向量的语音分离网络 |
55 | 基于声学参数的蒙古呼麦情感表现研究 |
56 | 鄂温克语阴阳元音舌根位置的声学表现 |
57 | A Comparative Analysis of Diphthong Acquisition in Standard Chinese by Learners from ‘the Belt and Road’ |
58 | 复杂动态系统背景下邹平方言入声调变异速率实证研究 |
60 | Domain Adaptation for Front-End Module in TTS with LoRAs |
63 | 东部裕固语长短元音空间分布的声学统计分析 |
64 | ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram |
65 | Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech |
66 | M-CMGAN: Attempting to Use Mamba on Speech Enhancement |
68 | DA-KWFormer: A Domain Adaptation Network with K-Weight Transformer for Speech Emotion Recognition |
69 | Anomalous Sound Detection Using Time-Frequency Feature and Mixbatch |
70 | Improving Speaker Verification Back-end with Graph Neural Networks |
71 | Integrating Time-Frequency Domain Shallow and Deep Features for Speech-EEG Match-Mismatch of Auditory Attention Decoding |
72 | 基于注意力机制和数据过采样的酒瓶裂纹敲击异常声音检测系统 |
74 | Dual-Path Spectrogram Refinement Network for Robust Speaker Verification |
75 | Transformer-based Model for Auditory EEG Decoding |
76 | 刻意伪装场景下的说话人确认 |
77 | 基于语音谐波结构的多通道语音增强网络 |
78 | 基于文化分析的跨语言语音情感识别 |
79 | 布朗语学龄儿童国家通用语声调习得研究 |
80 | 融合多源知识的文物描述自动生成方法研究 |
81 | Enhancing Transducers for Robust Keyword Spotting by Duration Modeling |
82 | A Backend-friendly On-device Multi-channel Speech Enhancement System with IPD and PHM |
83 | SESNet: A Speech Enhancement and Separation Network in Noisy Reverberant Environments |
85 | 普通话双音节词连上变调的信息机制研究 |
86 | 基于解耦学习的鲁棒说话人验证研究 |
87 | A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions |
89 | BER: Balanced Error Rate For Speaker Diarization |
90 | Adaptive Context Biasing in Transformer-based ASR Systems |
91 | Baseline Systems for Chinese Continuous Visual Speech Recognition Challenge 2024 |
92 | Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations |
93 | 潮州话入声调的韵律边界效应 |
96 | Enhancing Mispronunciation Detection with Multi-Speaker Text-to-Speech and Mixture-of-Experts Network |
97 | A study on the allocations of prosody boundaries in L2 Mandarin speech by Vietnamese learners based on dependency syntax |
99 | MHAN: Bottleneck Fusion Model Based on Hybrid Attention Network for Multimodal Emotion Recognition |
100 | Paraformer-v2: An improved single step non-autoregressive transformer for noise-robust speech recognition |
101 | Sound Zone Control Based On A Kronecker Second-order Tensor Decomposition |
102 | 多民族语言语音到语音翻译研究进展综述 |
103 | 面向心理治疗的共情对话系统 |
104 | A Brief Survey on the Explainability for Deep Speech Processing Models |
105 | Are Transformers in pretrained LM A Good ASR Encoder? An Empirical Study |
107 | 基于统计分类的一级甲等普通话五度调值划分研究 |
108 | 俄罗斯学生普通话辅音/t/、/tɕ/的感知同化与区分 |
111 | MCDubber: Multimodal Context-Aware Expressive Video Dubbing |
114 | 面向说话人识别的最近邻惩罚圆损失函数 |
115 | TeleSpeechPT: Large-Scale Chinese Multi-Dialect And Multi-Accent Speech Pre-Training |
116 | Investigation into the Impact of Speaker Adversarial Perturbation on Speech Recognition |
117 | Speaker extraction with verification of present and absent target speakers |
118 | Exploring Discrete Tokens Suitable for Speech Synthesis |
119 | Stage-Wise and Piror-Aware Neural Speech Phase Prediction |
120 | Pruning and Quantization Enhanced Densely Connected Neural Network for Efficient Acoustic Echo Cancellation |
122 | MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios |
123 | LightAC: Lightweight Accent Conversion via Module-Wise Distillation |
125 | 基于双对齐和对比学习的多模态情感识别 |
126 | 越南学生汉语-越南语声调的感知同化模式 |
127 | 德昂语的焦点韵律实现 |
128 | 基于神经网络的重音提取及重音描述提示生成 |
129 | Improved DOA Estimation of Sound Source of Small Amplitudes Using a Single Acoustic Vector Sensor |
130 | 基于通道注意力特征融合的异常音检测方法 |
133 | 融合语种信息的端到端多语种语音识别方法 |
135 | Investigation on Training Strategy for Cross-Modal Large Language Models with Speech and Text |
136 | XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition |
137 | An local-aggregation based method for robust speaker verification |
138 | An acoustic study of Mandarin Chinese vowels produced by Uyghur-speaking learners |
139 | ExARN: Target Speaker Extraction with Attentive Recurrent Networks |
141 | Emotional Speech Synthesize via Visual Context Perception |
143 | Tone Perception by Putonghua-Learning Preschool Children in South Xinjiang Uyghur Autonomous Region |
145 | 基于多流序列和对比学习的音乐生成研究 |
147 | 宜人性与开放性人格特质对情绪语音特征的影响 |
148 | 外倾性和神经质人格特质对情绪语音特征的影响 |
150 | 单音节阳平声调拐点位置与音节结构关系考察 |
151 | 基于注意力机制软切分的发音偏误检测探究 |
152 | Study on Prosodic Disambiguation of VP/NP syntactic structure by Chinese EFL Learners |
153 | An electroencephalogram-based study of neural responses to imagined speech in Mandarin |
154 | 基于语音病理特征的不流畅语音片段标注系统 |
155 | 基于脑电信号的汉语普通话语音分类研究 |
156 | A Speech Corpus of Putonghua-Learning Preschoolers From the Uygur Ethnic Group in South Xinjiang Uygur Autonomous Region |
157 | Evaluation of Data Inconsistency for Multi-modal Sentiment Analysis |
159 | 基于尖峰特征的口音识别和语音识别多任务学习方法 |
160 | A Study of Phonetic Differences in Hankou Dialect in the late Qing Dynasty -- Based on the records of A Chinese-English Dictionary and The Hankow Syllabary |
163 | Efficient Singular Spectrum Mode Ensembler Capable of Extracting Wide-band Components in Spectrum Overlapping Scenarios |
167 | 俄语母语者普通话舌尖元音[ɿ]、[ʅ]产出训练研究 |
168 | 普通话擦音的空气动力学特性 |
171 | LDMME: Latent Diffusion Model for Music Editing |
172 | 基于前音节锚点音高规整的重音和间断特征考察 |
174 | Analyzing the Improvement of Supplementary Features on Voice Conversion Using Disentangled Representation Learning |
178 | 基于计划梯度反转的说话人无关韵律表征研究 |
180 | 基于Transformer编码改进GCRN网络的单通道语音增强方法 |
181 | 面向CPEP3评估的孤独症谱系障碍儿童语言表达能力自动化预测方法 |
182 | Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech |
183 | Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy |
184 | Mongolian Speech Recognition Based on Semi-Supervised Learning and Syllable Subword Modeling Units |
185 | 视频信息辅助的歌声旋律提取 |
186 | Dynamic properties of diphthongs in Hefei Mandarin |
187 | 削波语音声学特征研究 |
188 | 中级水平俄语母语者汉语停延习得研究 |
189 | 哈萨克斯坦汉语学习者辅音声母习得分析 |
190 | On the effectiveness of enrollment speech augmentation for Target Speaker Extraction |
191 | 俄语母语者汉语同声调句中的声调产出研究 |
193 | Demystifying the Robustness of Deep Speaker Recognition Against Non-Speech Segments |
196 | An Unsupervised Domain Adaptation Method based on Distribution Alignment for Speaker Verification |
199 | Improving Emotion Recognition with Pre-trained Models, Multimodality, and Contextual Information |
200 | 普通话学习者与母语者的嗓音质量对比分析 |
202 | Cross-Model Knowledge Distillation and Metadata Fusion for Respiratory Sound Classification |
203 | Tibetan-Chinese Speech-to-Speech Translation Based on Large Speech Models |
204 | A Study on the Effect of Focus on Vowel Duration and Formant in Cantonese |
205 | Tibetan Speech Synthesis Based on Pre-traind Mixture Alignment FastSpeech2 |
206 | 儿化感知影响因素研究 |
207 | 俄语母语者汉语朗读流利度评分模型探究 |
210 | A New Parameter to Indicate the Syllable Stress Levels in Mandarin |
212 | 藏语安多方言塞音发声的声学和电声门图研究 |
213 | 基于超声成像的藏语安多方言塞音研究 |
214 | Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats |
215 | Task-Specific Dementia Risk Detection Based on Psycholinguistic Knowledge and Linguistic Features |
216 | Dementia Risk Detection via Text Augmentation-based Multi-Task Learning |
218 | Language-independent Prosody-enhanced Speech Representations for Multilingual Speech Synthesis |
219 | Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation |
220 | Amphion: An Open-Source Audio, Music and Speech Generation Toolkit |
226 | 样本量大小对双元音/ei/话者区分能力评判的影响 |
227 | 16年跨度对普通话青年女声声学特征的影响 |
230 | NDVQ: Robust Neural Audio Codec with Distribution-Based Vector Quantization |
231 | CTC-Assisted LLM-Based Contextual ASR |
233 | 抑郁症情感表达的多模态生理数据库构建及分析 |