I am*	↓ ↓
E-mail*	↓ ↓

Back to search results

Ancrage multimodal et social des modèles de langue oraux pour l'étude du développement du langage humain // Multimodal and Social Grounding of Speech Language Models for Studying Human Language Acquisition

Ref. ABG-139236 ADUM-75091	Thesis topic
2026-05-22		Other public funding

Université Grenoble Alpes

Workplace

Saint Martin d'Hères cedex - Auvergne-Rhône-Alpes - France

Topic title

Ancrage multimodal et social des modèles de langue oraux pour l'étude du développement du langage humain // Multimodal and Social Grounding of Speech Language Models for Studying Human Language Acquisition

Scientific expertise

Computer science

Keywords

parole, langage, IA , cognition, vision par ordinateur
speech, natural language processing, AI, cognition, computer vision

Topic description

Ce projet de thèse vise à étudier comment les interactions multimodales et sociales contribuent à l'acquisition du langage humain à travers le développement de modèles de langage oral (SpeechLMs) ancrés dans le monde réel et appris directement à partir de la parole brute. Le premier objectif consiste à étudier comment des environnements audiovisuels réalistes peuvent favoriser la segmentation de la parole, la découverte lexicale et l'émergence de représentations robustes de la parole. Le second objectif est de modéliser l'acquisition du langage comme un processus interactif enfant–parent, dans lequel les retours communicatifs guident l'apprentissage lexical. Enfin, une direction plus exploratoire consistera à intégrer ces modèles dans des plateformes robotiques humanoïdes afin d'étudier la communication ancrée dans des situations d'interaction réelles.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This PhD project aims to investigate how multimodal and social interactions contribute to human language acquisition through the development of grounded Speech Language Models (SpeechLMs) trained directly from raw speech. The first objective is to study how realistic audiovisual environments can support speech segmentation, lexical discovery, and the emergence of robust speech representations. The second objective is to model language acquisition as an interactive child–caregiver learning process, where communicative feedback and adaptive social behaviors guide lexical learning. Finally, an exploratory direction will investigate the integration of these models into humanoid robots to study grounded communication in real-world interaction settings.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Début de la thèse : 01/10/2026
WEB : https://ultraspeech.com/devai/Sujet_Thèse_DevAI_Speech___Multimodal_and_Social_Grouding_of_SpeechLM.pdf

Funding category

Other public funding

Funding further details

ANR Financement d'Agences de financement de la recherche

Presentation of host institution and host laboratory

Université Grenoble Alpes

Institution awarding doctoral degree

Université Grenoble Alpes

Graduate school

220 EEATS - Electronique, Electrotechnique, Automatique, Traitement du Signal

Candidate's profile

Les candidats devront être titulaires d'un Master (ou diplôme équivalent) dans un ou plusieurs des domaines suivants : traitement automatique des langues et de la parole, vision par ordinateur, linguistique computationnelle, informatique, science des données, apprentissage automatique, ou domaines connexes. De bonnes compétences en programmation Python ainsi qu'une expérience des frameworks d'apprentissage profond tels que PyTorch sont attendues. Le ou la candidate devra également démontrer un fort intérêt pour la recherche interdisciplinaire à l'intersection de l'intelligence artificielle, des technologies de la parole et des sciences cognitives (un intérêt pour les approches visant à rapprocher l'IA et la cognition humaine sera particulièrement apprécié). De bonnes capacités de communication et d'organisation sont importantes, le doctorant ou la doctorante étant amené(e) à travailler de manière collaborative dans un environnement de recherche interdisciplinaire et à participer activement aux activités de diffusion scientifique. Un bon niveau d'anglais écrit et oral est requis, incluant la capacité à présenter clairement des résultats de recherche en conférence et à rédiger des publications scientifiques.
Applicants should hold a Master's degree (or equivalent) in one or several of the following fields: natural language and speech processing, computer vision, computational linguistics, computer science, data science, machine learning, or related areas. Good programming skills in Python and experience with deep learning frameworks such as PyTorch are expected. The candidate should also demonstrate a strong interest in interdisciplinary research at the intersection of artificial intelligence, speech technologies, and cognitive science (an interest in bridging AI and human cognition is highly desirable). Strong communication and organizational skills are important, as the PhD student will be expected to work collaboratively within an interdisciplinary research environment and actively participate in scientific dissemination activities. A good level of spoken and written English is required, including the ability to present research results clearly at conferences and to write scientific publications.

Application deadline

2026-06-16

Partager via

Apply

Vous avez déjà un compte ?

Nouvel utilisateur ?

Mr/Mrs*	↓ ↓
First name*	↓ ↓
Last name*	↓ ↓
E-mail*	↓ ↓
Confirm your e-mail*	↓ ↓
Password*	8 characters minimum, including at least one figure, one lower case letter and one uppercase letter. ↓ ↓
Please confirm password*	↓ ↓