Where PhDs and companies meet
Menu
Login

Foundation Models and Predictive AI for Video-Based Analysis of Social Synchronization in Autism Therapy

ABG-136425 Thesis topic
2026-03-09 Public funding alone (i.e. government, region, European, international organization research grant)
Université de Bretagne Occidentale
- Bretagne - France
Foundation Models and Predictive AI for Video-Based Analysis of Social Synchronization in Autism Therapy
  • Computer science
  • Health, human and veterinary medicine
  • Engineering sciences
Artificial intelligence, multimodal machine learning, video analysis, self-supervised learning, foundation models, autism, social interaction analysis, computer vision, speech processing

Topic description

Context

Autism Spectrum Disorder (ASD) affects approximately 1 in 100 children worldwide and represents a major public health challenge. Early interventions are critical to improve developmental outcomes, particularly therapies aimed at enhancing social communication and interaction. Exchange and Development Therapy (EDT), developed at the University Hospital of Tours, focuses on promoting social synchronizations between the child and the therapist—such as eye contact, imitation, and coordinated gestures—through structured play-based sessions. These synchronizations are considered key markers of therapeutic progress.

EDT sessions are routinely video-recorded to support clinical supervision and behavioral assessment. Over the years, a large archive of videos documenting interactions between therapists and children with ASD has been collected. These recordings constitute a unique dataset capturing multimodal social behaviors, including body posture, gaze direction, facial expressions, speech, and prosody.

The TEDIA project (https://exac-t.univ-tours.fr/tedia) aims to leverage these videos using artificial intelligence (AI) in order to assist therapists and improve understanding of the mechanisms underlying successful therapy. In particular, two major clinical questions arise:

  • Can early indicators of social synchronization between the child and the therapist be detected in advance?
  • Which behavioral patterns observed during therapy predict long-term developmental progress?

Answering these questions requires advanced machine learning techniques capable of analyzing multimodal, temporal, and highly heterogeneous data.

Research Objectives

The objective of this PhD is to develop foundation models (FMs) and predictive AI methods for the analysis of therapy videos involving children with ASD. More specifically, the thesis will address the following research questions:

  • Learning representations of social interaction in videos. Develop self-supervised learning (SSL) approaches to learn rich representations of multimodal interactions (visual, audio, and linguistic cues) from large collections of unlabeled therapy and diagnostic videos.
  • Prediction of social synchronization events. Design predictive models capable of identifying behavioral cues that precede synchronization events between the child and the therapist several seconds before they occur.
  • Modeling longitudinal therapeutic progression. Develop models capable of analyzing sequences of therapy sessions to identify patterns associated with improvements in behavioral assessments, such as the Behavioral Summarized Evaluation (BSE) scale.
  • Explainable AI for clinical interpretation. Develop interpretable models that highlight the behavioral features used for prediction (e.g., gaze patterns, posture, speech prosody), allowing clinicians to interpret the results and derive therapeutic recommendations.

The ultimate goal is not only to build predictive tools, but also to discover new behavioral markers that explain why certain therapy sessions are more successful than others.

Methodology

Dataset

The PhD will rely on a unique dataset collected at CHU Tours, including videos of EDT therapy sessions; additional videos from diagnostic sessions (e.g., ADOS assessments); a subset of videos manually annotated by clinicians to identify synchronization events. The videos contain multimodal signals such as body posture, gaze, facial expressions, speech, and contextual information from clinical assessments.

Representation Learning

A key challenge is the scarcity of labeled data compared with the large volume of unlabeled videos. The thesis will therefore rely on self-supervised learning (SSL) to pretrain models on large collections of unlabeled videos before fine-tuning them on specific tasks. Self-supervised objectives may include contrastive learning, masked modeling, and cross-modal representation learning.

Predictive Modeling

Two predictive tasks will be investigated:

  • Synchronization prediction. The model will estimate the probability of a synchronization event occurring within the next few seconds based on the ongoing interaction between the child and the therapist.
  • Prediction of developmental trajectories. Using repeated therapy sessions and behavioral assessments, the model will predict future changes in clinical scores, providing insights into which behaviors during therapy are associated with long-term improvements.

Longitudinal modeling techniques will be explored to capture temporal dynamics across sessions.

Explainability

Since the goal is to generate clinically meaningful insights, explainable AI methods will be integrated into the models. These explanations will be analyzed by clinicians to identify new predictors of therapeutic success.

 

Starting date

2026-09-01

Funding category

Public funding alone (i.e. government, region, European, international organization research grant)

Funding further details

Financement par le projet ANR TEDIA.

Presentation of host institution and host laboratory

Université de Bretagne Occidentale

The PhD student will be hosted by LaTIM, in Brest, France, which leads AI development in the TEDIA project. Born from the complementarity between Health and Communication sciences, the LaTIM ("laboratoire de traitement de l’information médicale" for laboratory of medical information processing) develops multidisciplinary research driven by members from University of Western Brittany (UBO), IMT Atlantique, INSERM and Brest University Hospitals (CHRU de Brest). Information is at the heart of the research project of the unit; being by nature multimodal, complex, heterogeneous, shared and distributed, it is integrated by researchers into methodological solutions for improving medical care. Benefiting from a unit within the CHRU, the UMR (joint research unit) has (in addition to access to its own platforms) a privileged access to hospital technical platforms, as well as to all clinical data and patients, in a strong dynamic of translational research.

Candidate's profile

  • Training in AI, ideally in computer vision or multimodal machine learning
  • Experience in Python programming
  • Familiarity with deep learning libraries (especially PyTorch)
  • Interest in interdisciplinary research at the intersection of AI and healthcare
2026-04-30
Partager via
Apply
Close

Vous avez déjà un compte ?

Nouvel utilisateur ?