Yucheng Wang
I am a second-year master's student at ETH Zürich, majoring in Computer Science with a specialization in Machine Intelligence. My research focuses on multimodal large language models, with a particular emphasis on audio. I obtained my Bachelor's degree in Computer Science at Columbia University and City University of Hong Kong in 2024, as part of the Joint Bachelor's Degree Program between these two institutions. Previously, I interned at Tencent AI Lab and AISpeech.
MSc CS @ ETH Zürich
Multimodal LLMs (Speech/Audio)
Suzhou · Hong Kong · New York · Zürich
Selected Publications
COMPOSE AND FUSE: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Accepted at ICLR 2026
A Survey on Speech Large Language Models for Understanding
IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2025
Making Sense of Post-match Fan Behaviors in the Online Football Communities
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Selected Projects
Course projects and applied research systems.
Identity-Preserving Comic Story Generator
Course project for COMS4995 Applied Computer Vision at Columbia University. The system combines StableIdentity with Stable Diffusion XL to generate personalized comic stories with identity-preserved images.
Refined Prototypical Network for Hierarchical Few-shot Image Classification
Course project for COMS4995 Neural Networks & Deep Learning at Columbia. We revised the prototypical network to generalize to unseen classes in hierarchical few-shot image classification.
Reddit Visualizer
Extended visualization system for the Final Year Project on online community user behavior patterns. This work supports the CHI 2023 publication and was selected as a CityU Outstanding Academic Paper in 2022.
Mahjong Legend [麻雀传奇]
Group project for CS3343 Software Engineering Practice at CityU. A single-player mahjong game based on Hong Kong rules, designed for beginners with an AI opponent.
Message Animation: a Facial Animation based Instant Messaging App
Group project for CS3483 Multimodal Interface Design at CityU. The app converts vocal or text messages into expressive cartoon animations for richer messaging.
Daily Hang Seng Index Trend Prediction applying Deep Learning
Research mentoring study on predicting daily Hang Seng Index trends using LSTM and its variants.
Music
Original compositions and experiments.
Escape Nowhere
Original beat. Waking up from the indulgence of your dreams and face the repression of your real life.