青年论坛

青年论坛报告嘉宾
车万翔
个人简介:车万翔,哈尔滨工业大学计算学部长聘教授/博士生导师,人工智能研究院副院长,国家级青年人才,龙江学者“青年学者”,斯坦福大学访问学者。现任中国中文信息学会理事、计算语言学专业委员会副主任兼秘书长;国际计算语言学学会亚太分会(AACL)执委兼秘书长。目前承担国家自然科学基金重点项目、2030“新一代人工智能”重大项目课题等多项科研项目。著有《自然语言处理:基于预训练模型的方法》一书。曾获 AAAI 2013 最佳论文提名奖。负责研发的语言技术平台(LTP)已授权给百度、腾讯、华为等公司付费使用。2016 年获黑龙江省科技进步一等奖(排名第 2),2020 年获黑龙江省青年科技奖。
报告题目:大语言模型的原理、实现及应用
报告摘要:随着以ChatGPT为代表的大语言模型(LLM)在语言理解、生成以及知识推理等方面表现出的惊艳能力,使人们看到了解决自然语言处理这一认知智能核心问题的一条可能的路径,并被认为向通用人工智能目标迈出了坚实的一步,甚至将取代很多人类的工作。那么,大模型究竟解决了什么科学问题,是如何解决该问题的,又是如何应用于各个领域领域和行业呢?本报告将通过我们的分析以及在相关领域对大模型的应用实践,部分回答以上的问题。

Li Liu
 
Biography:Dr. Li Liu is an assistant professor at Artificial Intelligence Thrust, Information Hub, Hong Kong University of Science and Technology (Guangzhou). She obtained her Ph.D. degree in 2018 from Gipsa-lab, University Grenoble Alpes, France. Her main research interests include multi-modal audio-visual speech processing, few-shot transfer learning, AI robustness and AI for healthcare. As the first author or corresponding author, she has published over 30 papers in top journals and conferences in related fields, including IEEE TPAMI, IEEE TMM, IEEE TMI, NeurIPS, ICCV, ACM MM, ECCV, MICCAI and ICASSP etc. In 2017, she won the French Sephora Berribi Scholarship for Female Scientists in Mathematics and Computers (totally four in the world, two in France and two in Israel). In the same year, she was awarded a scholarship for young researchers by the French Speech Association. Her research has been funded by several scientific research projects, including the National Natural Science Foundation of China-Youth Fund, Guangdong Provincial Natural Science Foundation of Youth Science Fund, Alibaba Innovation Research Fund and Tencent Technology Venture Philanthropy Program. She was awarded the Tencent AI Lab Rhino Bird Grant for 2023. In 2022, her two papers were selected as the Shenzhen Excellent Science and Technology Academic Papers.
Title:AI for Cross-Modal Interaction: Speech-Driven Human Body Language Generation 
Abstract:Speech/text-driven human body pose generation is a critical research direction in the field of computer vision, aiming to transform multimodal information (speech and text) into facial expressions, lip movements, and hand gestures. This technology enables vivid communication between virtual or animated characters and audiences and finds wide applications in multimodal speech gestures, facial expression generation, robotics, digital avatars, and the metaverse. Traditional pose generation methods have limitations, but in recent years, the emergence of deep learning and large-scale pretrained models has provided new opportunities to address these challenges. Large-scale pretrained models offer several advantages in speech/text-driven pose generation, including improved feature extraction, semantic alignment, context modeling, and multimodal fusion capabilities. In this talk, I will delve into our research efforts in generating high-quality human body pose movements using speech and text inputs.

朱晨阳
个人简介:朱晨阳,南京邮电大学通信与信息工程学院副研究员,美国麻省理工学院博士后,研究方向为 信号处理,水下声学,机器学习, 遥感。在 Remote Sensing, ICES Journal of Marine Science, Journal of the Acoustical Society of America, Journal of Marine Science and Engineering 等国际顶级遥感和海洋声学类期刊发表论文若干,参加 the Acoustical Society of America, IEEE OCEANS, IEEE International Conference on Machine Learning and Applications 等国际知名学术会议。
报告题目:广域海洋水下多目标被动声学感知
报告摘要:在建设海洋强国的战略中,如何对广域范围内水下多种目标进行有效的被动声学感知尤为重要。本次演讲主要围绕海洋中的船舶噪声和生物声信号的感知等进行讨论,对北大西洋(包括美国东海岸缅因湾和挪威海域)的水下海测数据进行分析。对于各类船舶噪声,通过基于相干性特征和能量特征的方法进行探测,利用阵列移动三角化进行定位,借助无监督学习方法进行分类,并提取有效的声纹特征,估算出船舶动力系统参数。对于海洋生物,基于海洋声学波导原理并融合主被动声呐信号处理技术,实现对各类海洋生物的有效感知,绘制出广域海洋水下生物系统的声学景观。另外,通过开发大孔径拖拽型被动声呐以及实时信号处理软件,建立广域水下声学遥感观测系统。

易江燕
 
个人简介:易江燕,中国科学院自动化所多模态人工智能系统全国重点实验室副研究员,国家优秀青年基金获得者(国家优青),曾任阿里巴巴iDST(达摩院)资深算法工程师。2018年博士毕业于中国科学院自动化所,主要研究方向为语音信息处理、语音生成与鉴别,主持国家自然科学基金、科技部重大项目课题和国际合作项目等8项。多次担任领域顶级国际会议Interspeech和ICASSP的Area Chair和Session Chair,在国际重要会议IJCAI和ICASSP上发起了“深度合成音频鉴别”国际挑战赛ADD 2022和2023,多次担任国际和国内重要会议APSIPA ASC 2019和NCMMSC 2019出版主席。在IEEE TASLP、ICML、ICASSP和Interspeech等重要国际期刊和会议上发表论文70余篇,已授权发明专利50项(含国际发明专利9项),10余次获国内外重要学术会议论文奖和竞赛冠军,2022年度获吴文俊人工智能科学技术奖技术发明特等奖。研究成果应用于国家主要部门和百度、华为、腾讯、中国移动、中国建行等知名企业。
报告题目:深度生成语音鉴别和取证溯源
报告摘要:近年来,基于深度学习的语音深度合成和转换技术发展迅速,生成的语音已达到人类水平。但是该技术具有“双重性”,在造福人类的同时,也不可避免地带来了安全风险。不良用途的深度合成和转换技术给人类社会带来巨大危害,因此亟需采取应对措施。本报告拟对深度生成语音鉴别、取证和溯源技术的研究现状进行梳理与阐释,着重介绍现有关键技术的优势与不足,进一步探讨解决主要挑战的可能思路和方法。

罗艺
个人简介:罗艺,腾讯AI Lab高级研究员。2021年博士毕业于哥伦比亚大学,研究兴趣涵盖语音前端处理、音乐信号处理、麦克风阵列处理和深度学习等。在语音领域各顶级期刊与会议(如TASLP,ICASSP,Interspeech等)上发表40余篇论文。获得2021年度IEEE SPS最佳论文奖。
报告题目:真实世界语音前端:从数据仿真到模型设计
报告摘要:语音前端系统在真实世界中的应用常常会遇到数据仿真不真实、训练流程不匹配或模型设计不鲁棒的问题。本报告介绍腾讯AI Lab在训练真实世界语音前端系统中的进展,包括更真实的数据仿真流程与更通用的音频前端模型设计,并展示各系统在真实世界的音乐分离、去混响、回声消除、单通道语音增强、多通道音区提取等任务下的性能。

Qiuqiang Kong
Biography: Qiuqiang Kong received his Ph.D. degree from the University of Surrey, Guildford, UK, in 2019. Following his Ph.D., he joined ByteDance as a research scientist. His research topic includes the classification, detection, separation, and generation of general sounds and music. He was the top 2% scientist in 2021 in “Updated science-wide author databases of standardized citation indicators. He was known for developing pretrained audio neural networks (PANNs) for audio tagging and was awarded the IEEE SPS Young Author Best Paper in 2023. He won the detection and classification of acoustic scenes and events (DCASE) challenge in 2017. He was known for transcribing the largest piano MIDI dataset GiantMIDI-Piano in the world. He has co-authored over 50 papers in journals and conferences, including IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), ICASSP, INTERSPEECH, IJCAI, DCASE, EUSIPCO, LVA-ICA. He has been cited 2588 times, with an H-index of 27 till Feb. 2023. He was a frequent reviewer for world well known journals and conferences, including TASLP, TMM, SPL, TKDD, JASM, EURASIP, Neurocomputing, Neural Networks, ISMIR, CSMT. He assisted with organizing the LVA-ICA 2018 in Guildford, UK and the DCASE 2018 Workshop in Woking, UK. He is serving as a co-editor for the Frontiers in Signal Processing journal.
Title AI for Sound Understanding: Classification, Detection, and Separation
AbstractAI for sound understanding has been a popular topic in recent days. Audio pattern recognition is an essential topic in the machine learning area, including audio tagging, acoustic scene classification, music classification, speech emotion classification, and sound event detection. Previous audio pattern recognition systems are built on specific datasets with limited durations. We propose pretrained audio neural networks (PANNs) (IEEE SPS Young Author Best Paper 2023) trained on the large-scale AudioSet dataset to address the general audio pattern recognition problem. PANNs have achieved several state-of-the-art performance in downstream audio pattern recognition tasks. Beyond PANNs, we propose a weakly labelled learning framework to address the sound event detection and source separation problems trained with large-scale weakly labelled data only. We propose a universal source separation system to address the computation auditory scene analysis (CASA) problem to automatically detect and separate arbitrary sounds. We further propose to use natural language as queries to separate sounds. We apply the proposed sound understanding techniques to music tasks and build a state-of-the-art piano transcription system and the largest piano dataset GiantMIDI-Piano in the world. We forecast the future works of sound understanding in relation to vision, language, robotics, and security.