Ancrage multimodal et social des modèles de langue oraux pour l'étude du développement du langage humain // Multimodal and Social Grounding of Speech Language Models for Studying Human Language Acquisition
|
ABG-139236
ADUM-75091 |
Thesis topic | |
| 2026-05-22 | Other public funding |
Université Grenoble Alpes
Saint Martin d'Hères cedex - Auvergne-Rhône-Alpes - France
Ancrage multimodal et social des modèles de langue oraux pour l'étude du développement du langage humain // Multimodal and Social Grounding of Speech Language Models for Studying Human Language Acquisition
- Computer science
parole, langage, IA , cognition, vision par ordinateur
speech, natural language processing, AI, cognition, computer vision
speech, natural language processing, AI, cognition, computer vision
Topic description
Ce projet de thèse vise à étudier comment les interactions multimodales et sociales contribuent à l'acquisition du langage humain à travers le développement de modèles de langage oral (SpeechLMs) ancrés dans le monde réel et appris directement à partir de la parole brute. Le premier objectif consiste à étudier comment des environnements audiovisuels réalistes peuvent favoriser la segmentation de la parole, la découverte lexicale et l'émergence de représentations robustes de la parole. Le second objectif est de modéliser l'acquisition du langage comme un processus interactif enfant–parent, dans lequel les retours communicatifs guident l'apprentissage lexical. Enfin, une direction plus exploratoire consistera à intégrer ces modèles dans des plateformes robotiques humanoïdes afin d'étudier la communication ancrée dans des situations d'interaction réelles.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This PhD project aims to investigate how multimodal and social interactions contribute to human language acquisition through the development of grounded Speech Language Models (SpeechLMs) trained directly from raw speech. The first objective is to study how realistic audiovisual environments can support speech segmentation, lexical discovery, and the emergence of robust speech representations. The second objective is to model language acquisition as an interactive child–caregiver learning process, where communicative feedback and adaptive social behaviors guide lexical learning. Finally, an exploratory direction will investigate the integration of these models into humanoid robots to study grounded communication in real-world interaction settings.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2026
WEB : https://ultraspeech.com/devai/Sujet_Thèse_DevAI_Speech___Multimodal_and_Social_Grouding_of_SpeechLM.pdf
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This PhD project aims to investigate how multimodal and social interactions contribute to human language acquisition through the development of grounded Speech Language Models (SpeechLMs) trained directly from raw speech. The first objective is to study how realistic audiovisual environments can support speech segmentation, lexical discovery, and the emergence of robust speech representations. The second objective is to model language acquisition as an interactive child–caregiver learning process, where communicative feedback and adaptive social behaviors guide lexical learning. Finally, an exploratory direction will investigate the integration of these models into humanoid robots to study grounded communication in real-world interaction settings.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2026
WEB : https://ultraspeech.com/devai/Sujet_Thèse_DevAI_Speech___Multimodal_and_Social_Grouding_of_SpeechLM.pdf
Funding category
Other public funding
Funding further details
ANR Financement d'Agences de financement de la recherche
Presentation of host institution and host laboratory
Université Grenoble Alpes
Institution awarding doctoral degree
Université Grenoble Alpes
Graduate school
220 EEATS - Electronique, Electrotechnique, Automatique, Traitement du Signal
Candidate's profile
Les candidats devront être titulaires d'un Master (ou diplôme équivalent) dans un ou plusieurs des domaines suivants : traitement automatique des langues et de la parole, vision par ordinateur, linguistique computationnelle, informatique, science des données, apprentissage automatique, ou domaines connexes. De bonnes compétences en programmation Python ainsi qu'une expérience des frameworks d'apprentissage profond tels que PyTorch sont attendues.
Le ou la candidate devra également démontrer un fort intérêt pour la recherche interdisciplinaire à l'intersection de l'intelligence artificielle, des technologies de la parole et des sciences cognitives (un intérêt pour les approches visant à rapprocher l'IA et la cognition humaine sera particulièrement apprécié).
De bonnes capacités de communication et d'organisation sont importantes, le doctorant ou la doctorante étant amené(e) à travailler de manière collaborative dans un environnement de recherche interdisciplinaire et à participer activement aux activités de diffusion scientifique. Un bon niveau d'anglais écrit et oral est requis, incluant la capacité à présenter clairement des résultats de recherche en conférence et à rédiger des publications scientifiques.
Applicants should hold a Master's degree (or equivalent) in one or several of the following fields: natural language and speech processing, computer vision, computational linguistics, computer science, data science, machine learning, or related areas. Good programming skills in Python and experience with deep learning frameworks such as PyTorch are expected. The candidate should also demonstrate a strong interest in interdisciplinary research at the intersection of artificial intelligence, speech technologies, and cognitive science (an interest in bridging AI and human cognition is highly desirable). Strong communication and organizational skills are important, as the PhD student will be expected to work collaboratively within an interdisciplinary research environment and actively participate in scientific dissemination activities. A good level of spoken and written English is required, including the ability to present research results clearly at conferences and to write scientific publications.
Applicants should hold a Master's degree (or equivalent) in one or several of the following fields: natural language and speech processing, computer vision, computational linguistics, computer science, data science, machine learning, or related areas. Good programming skills in Python and experience with deep learning frameworks such as PyTorch are expected. The candidate should also demonstrate a strong interest in interdisciplinary research at the intersection of artificial intelligence, speech technologies, and cognitive science (an interest in bridging AI and human cognition is highly desirable). Strong communication and organizational skills are important, as the PhD student will be expected to work collaboratively within an interdisciplinary research environment and actively participate in scientific dissemination activities. A good level of spoken and written English is required, including the ability to present research results clearly at conferences and to write scientific publications.
2026-06-16
Apply
Close
Vous avez déjà un compte ?
Nouvel utilisateur ?
Get ABG’s monthly newsletters including news, job offers, grants & fellowships and a selection of relevant events…
Discover our members
ANRT
SUEZ
Ifremer
Nantes Université
Généthon
Nokia Bell Labs France
Medicen Paris Region
ASNR - Autorité de sûreté nucléaire et de radioprotection - Siège
Institut Sup'biotech de Paris
Tecknowmetrix
Groupe AFNOR - Association française de normalisation
ONERA - The French Aerospace Lab
TotalEnergies
Aérocentre, Pôle d'excellence régional
Servier
ADEME
Laboratoire National de Métrologie et d'Essais - LNE


