Foundation Models and Predictive AI for Video-Based Analysis of Social Synchronization in Autism Therapy
| ABG-136425 | Thesis topic | |
| 2026-03-09 | Public funding alone (i.e. government, region, European, international organization research grant) |
- Computer science
- Health, human and veterinary medicine
- Engineering sciences
Topic description
Context
Autism Spectrum Disorder (ASD) affects approximately 1 in 100 children worldwide and represents a major public health challenge. Early interventions are critical to improve developmental outcomes, particularly therapies aimed at enhancing social communication and interaction. Exchange and Development Therapy (EDT), developed at the University Hospital of Tours, focuses on promoting social synchronizations between the child and the therapist—such as eye contact, imitation, and coordinated gestures—through structured play-based sessions. These synchronizations are considered key markers of therapeutic progress.
EDT sessions are routinely video-recorded to support clinical supervision and behavioral assessment. Over the years, a large archive of videos documenting interactions between therapists and children with ASD has been collected. These recordings constitute a unique dataset capturing multimodal social behaviors, including body posture, gaze direction, facial expressions, speech, and prosody.
The TEDIA project (https://exac-t.univ-tours.fr/tedia) aims to leverage these videos using artificial intelligence (AI) in order to assist therapists and improve understanding of the mechanisms underlying successful therapy. In particular, two major clinical questions arise:
- Can early indicators of social synchronization between the child and the therapist be detected in advance?
- Which behavioral patterns observed during therapy predict long-term developmental progress?
Answering these questions requires advanced machine learning techniques capable of analyzing multimodal, temporal, and highly heterogeneous data.
Research Objectives
The objective of this PhD is to develop foundation models (FMs) and predictive AI methods for the analysis of therapy videos involving children with ASD. More specifically, the thesis will address the following research questions:
- Learning representations of social interaction in videos. Develop self-supervised learning (SSL) approaches to learn rich representations of multimodal interactions (visual, audio, and linguistic cues) from large collections of unlabeled therapy and diagnostic videos.
- Prediction of social synchronization events. Design predictive models capable of identifying behavioral cues that precede synchronization events between the child and the therapist several seconds before they occur.
- Modeling longitudinal therapeutic progression. Develop models capable of analyzing sequences of therapy sessions to identify patterns associated with improvements in behavioral assessments, such as the Behavioral Summarized Evaluation (BSE) scale.
- Explainable AI for clinical interpretation. Develop interpretable models that highlight the behavioral features used for prediction (e.g., gaze patterns, posture, speech prosody), allowing clinicians to interpret the results and derive therapeutic recommendations.
The ultimate goal is not only to build predictive tools, but also to discover new behavioral markers that explain why certain therapy sessions are more successful than others.
Methodology
Dataset
The PhD will rely on a unique dataset collected at CHU Tours, including videos of EDT therapy sessions; additional videos from diagnostic sessions (e.g., ADOS assessments); a subset of videos manually annotated by clinicians to identify synchronization events. The videos contain multimodal signals such as body posture, gaze, facial expressions, speech, and contextual information from clinical assessments.
Representation Learning
A key challenge is the scarcity of labeled data compared with the large volume of unlabeled videos. The thesis will therefore rely on self-supervised learning (SSL) to pretrain models on large collections of unlabeled videos before fine-tuning them on specific tasks. Self-supervised objectives may include contrastive learning, masked modeling, and cross-modal representation learning.
Predictive Modeling
Two predictive tasks will be investigated:
- Synchronization prediction. The model will estimate the probability of a synchronization event occurring within the next few seconds based on the ongoing interaction between the child and the therapist.
- Prediction of developmental trajectories. Using repeated therapy sessions and behavioral assessments, the model will predict future changes in clinical scores, providing insights into which behaviors during therapy are associated with long-term improvements.
Longitudinal modeling techniques will be explored to capture temporal dynamics across sessions.
Explainability
Since the goal is to generate clinically meaningful insights, explainable AI methods will be integrated into the models. These explanations will be analyzed by clinicians to identify new predictors of therapeutic success.
Starting date
Funding category
Funding further details
Presentation of host institution and host laboratory
The PhD student will be hosted by LaTIM, in Brest, France, which leads AI development in the TEDIA project. Born from the complementarity between Health and Communication sciences, the LaTIM ("laboratoire de traitement de l’information médicale" for laboratory of medical information processing) develops multidisciplinary research driven by members from University of Western Brittany (UBO), IMT Atlantique, INSERM and Brest University Hospitals (CHRU de Brest). Information is at the heart of the research project of the unit; being by nature multimodal, complex, heterogeneous, shared and distributed, it is integrated by researchers into methodological solutions for improving medical care. Benefiting from a unit within the CHRU, the UMR (joint research unit) has (in addition to access to its own platforms) a privileged access to hospital technical platforms, as well as to all clinical data and patients, in a strong dynamic of translational research.
Candidate's profile
- Training in AI, ideally in computer vision or multimodal machine learning
- Experience in Python programming
- Familiarity with deep learning libraries (especially PyTorch)
- Interest in interdisciplinary research at the intersection of AI and healthcare
Vous avez déjà un compte ?
Nouvel utilisateur ?
Get ABG’s monthly newsletters including news, job offers, grants & fellowships and a selection of relevant events…
Discover our members
Groupe AFNOR - Association française de normalisation
Nokia Bell Labs France
Nantes Université
Servier
TotalEnergies
ONERA - The French Aerospace Lab
Tecknowmetrix
Aérocentre, Pôle d'excellence régional
Généthon
ADEME
Institut Sup'biotech de Paris
Medicen Paris Region
Laboratoire National de Métrologie et d'Essais - LNE
Ifremer
ANRT
ASNR - Autorité de sûreté nucléaire et de radioprotection - Siège
SUEZ
