Understanding Neural Networks
Analysis on neural network representations and capabilities
Despite significant advances in speech technology, much remains unknown about what neural models truly learn and what they fail to capture. Understanding the representations and limitations of these models is crucial for designing robust, interpretable systems. Through our analysis, we aim to uncover how neural networks process information and how these insights can enhance their performance and fairness across diverse populations. My research specifically focuses on computational phonetics and phonology, exploring how these models encode fine-grained linguistic details and variability.
In (Choi* & Yeo*, 2022), we analyzed what phoentic/phonemic information are encoded in wav2vec2.0 variants. We are on the way of follow-up studies, so please stay tuned!
In (Huang et al., 2024), CMU-Linguistics team consist of Kwanghee Choi, Kalvin Chang, and myself, has proposed various tasks related to phonetics, phonology, and prosody, focusing on the capabilities of speech foundation models in addressing linguistic theory-based tasks.
References
2024
- ICLRDynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks2024
2022
- arXivOpening the black box of wav2vec feature encoderarXiv preprint arXiv:2210.15386, 2022