Human behavior understanding in videos using multimodal foundation models
| ABG-135841 | Master internship | 4 months | gratification de stage |
| 2026-02-18 |
- Computer science
Employer organisation
Website :
Le Laboratoire d’InfoRmatique en Image et Systèmes d’information (LIRIS) est une unité mixte de recherche (UMR 5205) du CNRS, de l'INSA de Lyon, de l'Université Claude Bernard Lyon 1, de l'Université Lumière Lyon 2 et de l'Ecole Centrale de Lyon.
Description
Context of the study:
Human behavior understanding is a key task for several fields of application, from human assisted living and disease diagnosis in healthcare to industry problems, like task training and completion evaluation. Deep neural networks and, more recently, multimodal foundation models have brought a new level of performance to research problems in video understanding (e.g., Dino v3, VideoLLaMA, InternVideo2). However, the performance of such methods in behavior understanding, like emotion recognition, is still limited compared to generic scene understanding (Lian et al., 2024).
This internship subject will study and evaluate the latest multimodal foundations models as building blocks for a pipeline for human behavior understanding. We will focus on methods capable of describing emotion and gesture recognition in long videos and explore their performance outside datasets with controlled conditions (i.e., in the wild).
Tasks:
- Revise the state of the art on methods for multimodal video understanding applicable for behavior understanding, identifying their limitations on the characterization of the target behavioral aspects.
- Propose a spatio-temporal based deep neural pipeline that can detect the target behavioral events in space and time.
- Write a research article to share the developed work with the computer vision community, accompanied by an open-source repository to foster reproducible research.
Related bibliographic references:
- Zheng Lian, et al., GPT-4V with emotion: A zero-shot benchmark for Generalized Emotion Recognition, Information Fusion, Volume 108, 2024, 102367, ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2024.102367.
- Boqiang Zhang, et al., VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding, 2025, https://arxiv.org/abs/2501.13106
- Yi Wang, et al. InternVideo2: Scaling Foundation Models for Multimodal Video Understanding. In Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXV. Springer-Verlag, Berlin, Heidelberg, 396–416. https://doi.org/10.1007/978-3-031-73013-9_23
- Oriane Siméoni, et al., DinoV3, 2025, https://arxiv.org/abs/2508.10104
Profile
Profile of the candidate:
We are looking for a motivated candidate with a strong background in computer science or applied mathematics.
- The candidate must currently be enrolled in a Master 1 or 2 program, or be in the final years of engineering school (Bac+4 or +5 in France)
- Experience in image processing, computer vision, and/or machine learning will be a plus.
If the internship leads to an international publication, we may study opportunities to pursue the research carried out with a PhD in a similar topic.
Language: French or English
Expected skills:
- Mastering of Python language
- OpenCV library
- Versioning tools (GIT)
The following skills would be considered as a plus:
- Framework PyTorch or TensorFlow.
- Dockerlike tools and platforms
Duration: 4-6 months
Expected internship period: Late April-October, with an imposed summer break
Starting date
Vous avez déjà un compte ?
Nouvel utilisateur ?
Get ABG’s monthly newsletters including news, job offers, grants & fellowships and a selection of relevant events…
Discover our members
ANRT
Nantes Université
ADEME
Ifremer
SUEZ
ONERA - The French Aerospace Lab
Groupe AFNOR - Association française de normalisation
Généthon
Tecknowmetrix
Nokia Bell Labs France
Servier
TotalEnergies
Aérocentre, Pôle d'excellence régional
ASNR - Autorité de sûreté nucléaire et de radioprotection - Siège
Medicen Paris Region
Institut Sup'biotech de Paris
Laboratoire National de Métrologie et d'Essais - LNE
