Où docteurs et entreprises se rencontrent
Menu
Connexion

Human behavior understanding in videos using multimodal foundation models

ABG-135841 Stage master 2 / Ingénieur 4 mois gratification de stage
18/02/2026
LIRIS
Bron Auvergne-Rhône-Alpes France
  • Informatique
computer vision, automatic video analysis, human behavior understanding, deep learning
05/03/2027

Établissement recruteur

Le Laboratoire d’InfoRmatique en Image et Systèmes d’information (LIRIS) est une unité mixte de recherche (UMR 5205) du CNRS, de l'INSA de Lyon, de l'Université Claude Bernard Lyon 1, de l'Université Lumière Lyon 2 et de l'Ecole Centrale de Lyon.

Description

Context of the study:

                Human behavior understanding is a key task for several fields of application, from human assisted living and disease diagnosis in healthcare to industry problems, like task training and completion evaluation. Deep neural networks and, more recently, multimodal foundation models have brought a new level of performance to research problems in video understanding (e.g., Dino v3, VideoLLaMA, InternVideo2). However, the performance of such methods in behavior understanding, like emotion recognition, is still limited compared to generic scene understanding (Lian et al., 2024).

This internship subject will study and evaluate the latest multimodal foundations models as building blocks for a pipeline for human behavior understanding. We will focus on methods capable of describing emotion and gesture recognition in long videos and explore their performance outside datasets with controlled conditions (i.e., in the wild).

 

Tasks:

  • Revise the state of the art on methods for multimodal video understanding applicable for behavior understanding, identifying their limitations on the characterization of the target behavioral aspects.
  • Propose a spatio-temporal based deep neural pipeline that can detect the target behavioral events in space and time.
  • Write a research article to share the developed work with the computer vision community, accompanied by an open-source repository to foster reproducible research.

Related bibliographic references:

Profil

Profile of the candidate:

We are looking for a motivated candidate with a strong background in computer science or applied mathematics.

  • The candidate must currently be enrolled in a Master 1 or 2 program, or be in the final years of engineering school (Bac+4 or +5 in France)
  • Experience in image processing, computer vision, and/or machine learning will be a plus.

If the internship leads to an international publication, we may study opportunities to pursue the research carried out with a PhD in a similar topic.

Language: French or English

Expected skills:

  • Mastering of Python language
  • OpenCV library
  • Versioning tools (GIT)

The following skills would be considered as a plus:

  • Framework PyTorch or TensorFlow.
  • Dockerlike tools and platforms

Duration: 4-6 months

Expected internship period: Late April-October, with an imposed summer break

Prise de fonction

01/05/2026
Partager via
Postuler
Fermer

Vous avez déjà un compte ?

Nouvel utilisateur ?