教程报告1:深度生成模型在语音合成中的应用
报告人:谭旭
报告人简介:谭旭,微软亚洲研究院高级研究经理,研究领域包括深度学习、自然语言/语音/音乐、AI内容生成等,在学术会议上发表论文100余篇。研发的机器翻译和语音合成系统获得多项比赛冠军并在学术评测集上达到人类水平,研究工作如预训练语言模型MASS、语音合成模型FastSpeech/NaturalSpeech、AI音乐项目Muzic受到业界广泛关注,多项研究成果应用于微软产品(Azure,Bing等),在多个学术会议和期刊担任程序委员、高级程序委员以及执行编委(包括NeurIPS、AAAI、ICASSP、TMLR等)。个人主页:https://tan-xu.github.io/。
报告摘要:在文本到语音合成(Text-to-Speech, TTS)中,语音数据的分布通常符合一个给定文本的条件概率分布,可用生成模型来进行建模。随着近年来深度学习的发展,深度生成模型(例如自回归模型、GAN、VAE、Flow、Diffusion等)被广泛应用于TTS并且取得了显著的质量提升。在本次报告中,首先简要介绍TTS以及深度生成模型的背景,然后详细介绍深度生成模型应用于TTS的典型工作,并针对不同生成模型的优缺点进行对比分析,最后探讨深度生成模型应用于TTS的潜在研究方向。
教程报告2:复杂场景说话人识别方法
报告人:王东 李蓝天
报告人简介:
王东,爱丁堡大学博士,清华大学副研究员,IEEE高级会员,APSIPA杰出讲师,在语音信号处理领域发表论文150余篇,最佳论文奖4次,Google Schlar引用4000余次,著有《人工智能》《机器学习导论》等著作,其主持发布的THCHS30数据库及其相关的Kaldi例程是首个全开源的中文语音识别系统。
李蓝天,清华大学博士、博士后,北京邮电大学副教授,APSIPA SLA委员,在语音识别、说话人识别等领域发表论文50余篇,最佳论文奖2次,著有《Robust Speaker Recognition》、《语音识别基本法》等著作。李蓝天博士是多场景声纹识别数据库CNCeleb的主要发起人,也是CNSRC多场景声纹识别竞赛的发起者和组织者。
报告摘要:随着深度学习技术的进步和数据的积累,说话人识别技术近几年取得长足进展,然而在实际应用场景中依然存在显著的性能下降。基于此,复杂现实场景中的说话人识别方法成为当前的研究热点,包括噪声鲁棒性、跨设备识别、跨场景识别等。本讲座将介绍说话人识别的基本原理及当前基于深度神经网络的主流方法,在此基础上总结应对复杂场景的前沿技术,并介绍在跨场景声纹识别挑战赛中被证明行之有效的技术方案。
教程报告3:Quantum Machine Learning: Theoretical Foundations and Applications on NISQ Devices
报告人:祁均
报告人简介:Dr. Jun Qi received his Ph.D. in the School of Electrical and Computer Engineering at Georgia Institute of Technology, Atlanta, GA, in 2022, advised by Prof. Chin-Hui Lee and Prof. Xiaoli Ma. He is currently an Assistant Professor in the Department of Electronic Engineering at Fudan University. Previously, he obtained two Masters in Electrical Engineering from the University of Washington, Seattle, and Tsinghua University, Beijing, in 2013 and 2017, respectively. Besides, he was a research intern in the Deep Learning Technology Center at Microsoft Research, Redmond, WA, Tencent AI Lab, WA, and MERL, MA, USA. Dr. Qi was the recipient of 1st prize in Xanadu AI Quantum Machine Learning Competition 2019, and his ICASSP paper on quantum speech recognition was nominated as the best paper candidate in 2022. Besides, he gave two Tutorials on Quantum Neural Networks for Speech and Language Processing at the venues of IJCAI’21 and ICASSP’22.
报告摘要:Quantum computing has undergone rapid development over recent years: from first conceptualization in the 1980s, and early proof of principles for hardware in the 2000s, quantum computers can now be built with hundreds of qubits. While the technology remains in its infancy, the fast progress of quantum hardware has led many to assert that so-called Noisy-Intermediate Scale Quantum (NISQ) devices could outperform conventional computers shortly. Remarkably, the Variational Quantum Eigensolver (VQE) was put forth to be the most promising algorithm on NISQ devices because VQE admits only a small number of qubits and shows some degree of noise resilience. The VQE mechanisms are often cast as hybrid algorithms that practically allow a variational quantum classifier (VQC) with classical machine learning and signal processing models. Moreover, the quantum kernel algorithms realize the property of non-linearity in quantum feature space and have even been regarded as an alternative to VQC for quantum machine learning data. Thus, this proposal aims at the state-of-the-art quantum machine learning algorithms by investigating the VQC-based quantum algorithms in-depth and exploiting the related applications in machine learning and signal processing problems.