SpeechRepair : Restauration de la parole pathologique par inpainting audiovisuel // SpeechRepair: Audiovisual Inpainting for Pathological Speech Restoration
ABG-132257
ADUM-66099 |
Thesis topic | |
2025-05-28 | Public funding alone (i.e. government, region, European, international organization research grant) |
Université Grenoble Alpes
Saint Martin d'Hères cedex - Auvergne-Rhône-Alpes - France
SpeechRepair : Restauration de la parole pathologique par inpainting audiovisuel // SpeechRepair: Audiovisual Inpainting for Pathological Speech Restoration
- Computer science
traitement de la parole, vision par ordinateur, intelligence artificielle , traitement automatique des langues, pathologie, apprentissage profond
speech processing, computer vision, artificial intelligence, natural language processing, pathology, deep learning
speech processing, computer vision, artificial intelligence, natural language processing, pathology, deep learning
Topic description
Le projet de thèse proposé porte sur le défi de l'inpainting de la parole, qui consiste à reconstruire automatiquement les portions masquées ou corrompues d'un signal audio de parole (par exemple, en raison d'événements sonores concurrents). Pour aborder cette tâche complexe, nous proposons : (1) d'exploiter les techniques récentes d'apprentissage auto-supervisé (self-supervised learning, SSL) afin d'extraire des informations contextuelles riches à partir des segments de parole non corrompus, ainsi que des vocodeurs neuronaux pour une synthèse audio de haute qualité ; (2) d'explorer les bénéfices potentiels de l'ajout d'entrées visuelles, telles que les mouvements labiaux du locuteur pendant un dialogue ou une image représentant l'environnement environnant. Après une première phase d'évaluation sur des enregistrements de parole en laboratoire, nous viserons à étendre ce cadre d'inpainting à des voix pathologiques, en particulier les voix hypophoniques, caractérisées par une émission vocale anormalement faible, fréquemment observée chez les personnes atteintes de la maladie de Parkinson ou chez celles ayant subi une laryngectomie.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The proposed PhD project addresses the challenge of speech inpainting, which involves the automatic reconstruction of masked or corrupted portions of a speech audio signal (e.g., due to competing audio events). To tackle this complex task, we propose: (1) leveraging recent self-supervised learning (SSL) techniques to extract rich contextual information from uncorrupted speech segments, and neural vocoder for high-quality audio synthesis - (2) investigating the potential benefits of incorporating visual input, such as the speaker's lip movements during dialogue or an image depicting the surrounding environment. After initial evaluations on clean laboratory speech, we aim to extend the inpainting framework to pathological voices, particularly hypophonic voices, characterized by abnormally weak vocal output, commonly observed in individuals with Parkinson's disease or in people who have undergone a laryngectomy.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2025
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The proposed PhD project addresses the challenge of speech inpainting, which involves the automatic reconstruction of masked or corrupted portions of a speech audio signal (e.g., due to competing audio events). To tackle this complex task, we propose: (1) leveraging recent self-supervised learning (SSL) techniques to extract rich contextual information from uncorrupted speech segments, and neural vocoder for high-quality audio synthesis - (2) investigating the potential benefits of incorporating visual input, such as the speaker's lip movements during dialogue or an image depicting the surrounding environment. After initial evaluations on clean laboratory speech, we aim to extend the inpainting framework to pathological voices, particularly hypophonic voices, characterized by abnormally weak vocal output, commonly observed in individuals with Parkinson's disease or in people who have undergone a laryngectomy.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2025
Funding category
Public funding alone (i.e. government, region, European, international organization research grant)
Funding further details
Concours pour un contrat doctoral
Presentation of host institution and host laboratory
Université Grenoble Alpes
Institution awarding doctoral degree
Université Grenoble Alpes
Graduate school
220 EEATS - Electronique, Electrotechnique, Automatique, Traitement du Signal
Candidate's profile
Pour ce projet, nous recherchons des candidat·e·s titulaires d'un Master 2 (ou équivalent) ou diplômé·e·s d'une école d'ingénieur, dans des domaines tels que le traitement du signal, le traitement automatique du langage, la vision par ordinateur, l'informatique ou la science des données (liste non exhaustive). Le ou la candidate devra posséder de solides bases en techniques d'apprentissage profond, de bonnes compétences en programmation Python, et de préférence une expérience avec le framework PyTorch. Une expérience préalable en recherche, qu'elle soit académique ou industrielle, n'est pas obligatoire mais constitue un atout.
Le ou la candidate devra également avoir un bon niveau d'anglais, à l'oral comme à l'écrit, pour présenter ses travaux. En complément des compétences techniques, il ou elle devra faire preuve de curiosité, d'autonomie, d'esprit critique, ainsi que de bonnes capacités de communication et de collaboration.
For this project, we are looking for candidates with a Master's degree (Master 2 or equivalent) or a degree from an Engineering School, in fields such as signal processing, natural language processing, computer vision, computer science, or data science (this list is not exhaustive). The ideal candidate will have a solid foundation in core deep learning techniques, strong programming skills in Python, and preferably experience with the PyTorch framework. Previous research experience, whether in academia or industry, is a plus. The candidate should have a good level of English for presenting their work both orally and in writing. In addition to technical skills, the candidate is expected to demonstrate curiosity, autonomy, critical thinking, and good communication and collaboration skills.
For this project, we are looking for candidates with a Master's degree (Master 2 or equivalent) or a degree from an Engineering School, in fields such as signal processing, natural language processing, computer vision, computer science, or data science (this list is not exhaustive). The ideal candidate will have a solid foundation in core deep learning techniques, strong programming skills in Python, and preferably experience with the PyTorch framework. Previous research experience, whether in academia or industry, is a plus. The candidate should have a good level of English for presenting their work both orally and in writing. In addition to technical skills, the candidate is expected to demonstrate curiosity, autonomy, critical thinking, and good communication and collaboration skills.
2025-05-30
Apply
Close
Vous avez déjà un compte ?
Nouvel utilisateur ?
More information about ABG?
Get ABG’s monthly newsletters including news, job offers, grants & fellowships and a selection of relevant events…
Discover our members
MabDesign
Ifremer
Laboratoire National de Métrologie et d'Essais - LNE
ADEME
ANRT
MabDesign
TotalEnergies
Groupe AFNOR - Association française de normalisation
ONERA - The French Aerospace Lab
Aérocentre, Pôle d'excellence régional
CASDEN
SUEZ
Généthon
Institut Sup'biotech de Paris
Nokia Bell Labs France
ASNR - Autorité de sûreté nucléaire et de radioprotection - Siège
PhDOOC
Tecknowmetrix
CESI