Reconnaissance et traduction de la Langue des Signes Française (LSF) à l'aide de réseaux prototypiques en apprentissage à peu d'exemples (Few-Shot Prototypical Networks) // Recognition and Translation of French Sign Language (LSF) Using Few-Shot Prototypi
|
ABG-139115
ADUM-75084 |
Sujet de Thèse | |
| 13/05/2026 | Contrat doctoral |
Université Grenoble Alpes
Saint Martin d'Hères cedex - Auvergne-Rhône-Alpes - France
Reconnaissance et traduction de la Langue des Signes Française (LSF) à l'aide de réseaux prototypiques en apprentissage à peu d'exemples (Few-Shot Prototypical Networks) // Recognition and Translation of French Sign Language (LSF) Using Few-Shot Prototypi
- Informatique
Reconnaissance, Traduction, Langue des Signes, Vidéo à images, Apprentissage Profond, Apprentissage à peu d'exemples
Recognition, Translation, Sign Language, Video images, Deep Learning, Few-Shot Learning
Recognition, Translation, Sign Language, Video images, Deep Learning, Few-Shot Learning
Description du sujet
La reconnaissance automatique de la langue des signes française (LSF) et sa traduction vers le français écrit constituent un défi scientifique majeur. Contrairement aux langues écrites, la langue des signes repose sur un signal visuel continu mobilisant simultanément les mains, le visage et le corps. Plusieurs difficultés spécifiques compliquent cette tâche : la présence de signes plus ou moins lexicalisés, la segmentation des séquences en unités linguistiques, les différences syntaxiques entre la LSF et le français, ainsi que la nécessité de convertir directement une séquence vidéo en texte. Les travaux fondateurs de Necati Cihan Camgöz ont montré l'intérêt des architectures encodeur-décodeur basées sur les Transformers pour la traduction automatique de la langue des signes. En particulier, l'utilisation d'une représentation intermédiaire sous forme de gloses améliore significativement les performances par rapport à une traduction directe de la vidéo vers le texte. Ces approches ont principalement été évaluées sur le corpus RWTH-PHOENIX-Weather 2014, constitué de vidéos en langue des signes allemande dans un contexte lexical relativement contraint. Dans le cas de la LSF, les travaux menés au LISN et au GIPSA-lab ont mis en évidence la richesse linguistique des données, avec une proportion importante de signes non lexicaux ou illustratifs. Des modèles de reconnaissance exploitant les coordonnées 3D issues de MediaPipe Holistic et des architectures récurrentes ou multi-flux ont permis d'obtenir des résultats encourageants pour la reconnaissance des signes lexicaux. Parallèlement, la base de données MediaPi-RGB, composée de plus de 80 heures de vidéos sous-titrées en LSF, offre un cadre particulièrement favorable au développement de modèles de traduction à grande échelle. L'objectif de cette thèse est de concevoir de nouvelles méthodes de reconnaissance et de traduction de la LSF vers le français écrit en s'appuyant sur des architectures neuronales avancées. Un premier axe consistera à approfondir les méthodes de clusterisation automatique (K-means, auto-encodeurs, etc.) afin de regrouper des formes de signes similaires et de produire des représentations intermédiaires facilitant l'apprentissage. Ces représentations seront intégrées dans des architectures encodeur-décodeur à base de Transformers pour améliorer la traduction signes-vers-texte. Un second axe portera sur l'analyse des représentations internes des modèles afin de mieux comprendre les correspondances entre clusters, caractéristiques spatio-temporelles et catégories linguistiques de signes. La thèse explorera également des approches de Few-Shot Learning, en particulier les réseaux prototypiques, afin d'améliorer la reconnaissance des signes rares ou faiblement représentés dans les corpus, dans un contexte de distribution à longue queue du vocabulaire gestuel. Enfin, l'utilisation de modèles de langage de grande taille (LLM) et de modèles visuo-linguistiques sera étudiée pour régulariser et améliorer la qualité du texte généré. Ce travail contribuera ainsi au développement de systèmes robustes et extensibles de traduction automatique de la LSF, capables de mieux prendre en compte la diversité linguistique des productions signées dans des conditions réalistes.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Automatic recognition of French Sign Language (LSF) and its translation into written French remains a major scientific challenge. Unlike written languages, sign language is conveyed through a continuous visual signal involving the coordinated use of the hands, face, and body. Several specific issues make this task particularly complex: the coexistence of highly lexicalized and more illustrative signs, the segmentation of continuous signing into linguistic units, syntactic differences between LSF and spoken French, and the need to map visual sequences directly to text. Pioneering work by Necati Cihan Camgöz demonstrated the effectiveness of encoder-decoder architectures based on Transformers for sign language translation. In particular, the use of an intermediate gloss representation (Sign2Gloss2Text) significantly improves performance compared with direct video-to-text translation. These approaches were mainly evaluated on the RWTH-PHOENIX-Weather 2014 corpus, which contains German Sign Language weather forecasts in a relatively constrained lexical domain. For French Sign Language, research conducted at LISN and GIPSA-lab has highlighted the linguistic richness of the data, including a substantial proportion of non-lexical and illustrative signs. Recognition models based on 3D skeletal features extracted with MediaPipe Holistic and recurrent or multi-stream neural architectures have achieved promising results for lexical sign recognition. In parallel, the MediaPi-RGB dataset, comprising more than 80 hours of captioned LSF videos, provides a large-scale and ecologically valid benchmark for developing translation systems. The objective of this PhD project is to design novel methods for the recognition and translation of LSF into written French using advanced neural architectures. The first research direction will focus on automatic clustering techniques, such as K-means and autoencoders, to group visually similar sign forms and generate intermediate representations that facilitate learning. These representations will be integrated into Transformer-based encoder-decoder models to improve sign-to-text translation. A second research direction will investigate the internal representations learned by the models in order to better understand the relationships between cluster assignments, spatiotemporal features, and linguistic sign categories. Visualization tools and ablation studies will be used to assess the contribution of the different articulators, including hands, face, and body. The thesis will also explore Few-Shot Learning approaches, particularly Prototypical Networks, to improve recognition of rare signs that are underrepresented in training corpora. These methods are especially relevant for addressing the long-tail distribution of sign vocabularies, where a small number of signs are frequent while many others have very few examples. Finally, the potential of Visual Language Models and Large Language Models will be investigated to regularize and improve the fluency and consistency of the generated text. Overall, this work aims to advance robust and scalable sign language translation systems capable of handling the linguistic diversity of French Sign Language in realistic communication settings.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2026
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Automatic recognition of French Sign Language (LSF) and its translation into written French remains a major scientific challenge. Unlike written languages, sign language is conveyed through a continuous visual signal involving the coordinated use of the hands, face, and body. Several specific issues make this task particularly complex: the coexistence of highly lexicalized and more illustrative signs, the segmentation of continuous signing into linguistic units, syntactic differences between LSF and spoken French, and the need to map visual sequences directly to text. Pioneering work by Necati Cihan Camgöz demonstrated the effectiveness of encoder-decoder architectures based on Transformers for sign language translation. In particular, the use of an intermediate gloss representation (Sign2Gloss2Text) significantly improves performance compared with direct video-to-text translation. These approaches were mainly evaluated on the RWTH-PHOENIX-Weather 2014 corpus, which contains German Sign Language weather forecasts in a relatively constrained lexical domain. For French Sign Language, research conducted at LISN and GIPSA-lab has highlighted the linguistic richness of the data, including a substantial proportion of non-lexical and illustrative signs. Recognition models based on 3D skeletal features extracted with MediaPipe Holistic and recurrent or multi-stream neural architectures have achieved promising results for lexical sign recognition. In parallel, the MediaPi-RGB dataset, comprising more than 80 hours of captioned LSF videos, provides a large-scale and ecologically valid benchmark for developing translation systems. The objective of this PhD project is to design novel methods for the recognition and translation of LSF into written French using advanced neural architectures. The first research direction will focus on automatic clustering techniques, such as K-means and autoencoders, to group visually similar sign forms and generate intermediate representations that facilitate learning. These representations will be integrated into Transformer-based encoder-decoder models to improve sign-to-text translation. A second research direction will investigate the internal representations learned by the models in order to better understand the relationships between cluster assignments, spatiotemporal features, and linguistic sign categories. Visualization tools and ablation studies will be used to assess the contribution of the different articulators, including hands, face, and body. The thesis will also explore Few-Shot Learning approaches, particularly Prototypical Networks, to improve recognition of rare signs that are underrepresented in training corpora. These methods are especially relevant for addressing the long-tail distribution of sign vocabularies, where a small number of signs are frequent while many others have very few examples. Finally, the potential of Visual Language Models and Large Language Models will be investigated to regularize and improve the fluency and consistency of the generated text. Overall, this work aims to advance robust and scalable sign language translation systems capable of handling the linguistic diversity of French Sign Language in realistic communication settings.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2026
Nature du financement
Contrat doctoral
Précisions sur le financement
Concours pour un contrat doctoral
Présentation établissement et labo d'accueil
Université Grenoble Alpes
Etablissement délivrant le doctorat
Université Grenoble Alpes
Ecole doctorale
220 EEATS - Electronique, Electrotechnique, Automatique, Traitement du Signal
Profil du candidat
Cette thèse nécessite une bonne maîtrise de l'écosystème Python dédié à l'apprentissage profond, en particulier des bibliothèques Keras, PyTorch et TensorFlow, ainsi qu'une capacité à intégrer, comprendre et adapter des codes de recherche. De solides compétences rédactionnelles en français et en anglais scientifique, ainsi qu'une aisance à l'oral, sont également attendues. Le/la candidat.e doit avoir un bagage solide en mathématiques, statistique et informatique appliqués à l'Intelligence Artificielle et à l'Apprentissage Profond. La connaissance de la langue des signes française (LSF) et une expérience préalable dans le domaine du traitement automatique des langues des signes constituent un atout, sans être indispensables. Des formations spécifiques pourront être suivies au cours de la thèse. Il est cependant nécessaire pour le/la candidat.e d'avoir un fort intérêt pour la LSF.
This PhD project requires a strong knowledge of the Python ecosystem for deep learning, particularly the Keras, PyTorch, and TensorFlow libraries, as well as the ability to integrate, understand, and adapt research code. Strong scientific writing skills in both French and English, along with good oral communication abilities, are also expected. The candidate should have a solid background in mathematics, statistics, and computer science applied to Artificial Intelligence and Deep Learning. Knowledge of French Sign Language (LSF) and prior experience in sign language processing would be positively considered, but are not mandatory. Specific training opportunities may be undertaken during the PhD. However, the candidate must demonstrate a strong interest in French Sign Language and its linguistic and technological challenges.
This PhD project requires a strong knowledge of the Python ecosystem for deep learning, particularly the Keras, PyTorch, and TensorFlow libraries, as well as the ability to integrate, understand, and adapt research code. Strong scientific writing skills in both French and English, along with good oral communication abilities, are also expected. The candidate should have a solid background in mathematics, statistics, and computer science applied to Artificial Intelligence and Deep Learning. Knowledge of French Sign Language (LSF) and prior experience in sign language processing would be positively considered, but are not mandatory. Specific training opportunities may be undertaken during the PhD. However, the candidate must demonstrate a strong interest in French Sign Language and its linguistic and technological challenges.
31/05/2026
Postuler
Fermer
Vous avez déjà un compte ?
Nouvel utilisateur ?
Vous souhaitez recevoir nos infolettres ?
Découvrez nos adhérents
Groupe AFNOR - Association française de normalisation
Ifremer
ONERA - The French Aerospace Lab
Aérocentre, Pôle d'excellence régional
Tecknowmetrix
ADEME
Généthon
SUEZ
TotalEnergies
ASNR - Autorité de sûreté nucléaire et de radioprotection - Siège
ANRT
Medicen Paris Region
Laboratoire National de Métrologie et d'Essais - LNE
Nokia Bell Labs France
Servier
Institut Sup'biotech de Paris
Nantes Université



