Where PhDs and companies meet
Menu
Login

COFUND PhD position - Document Understanding

ABG-133942 Thesis topic
2025-10-21 EU funding
Logo de
La Rochelle Université
La Rochelle - Nouvelle Aquitaine - France
COFUND PhD position - Document Understanding
  • Computer science
antimicrobial, biomaterials, microbial resistance, artificial intelligence, deep learning, graph neural networks

Topic description

Title of the thesis project: Towards Fair and Explainable Lightweight Multimodal Learning Models for Effective Document Understanding

Scientific description of the research project 

The proposed research project aims to develop lightweight, generalizable, and multimodal learning models for document analysis. The integration of deep learning (DL) has greatly advanced the field, allowing for the analysis of complex documents by incorporating vision, text, and layout information.

Multimodal learning has become an essential strategy for understanding various document types such as legal, medical, administrative, and historical archives. Despite these advancements, current models face limitations in terms of size, computational efficiency, generalizability, and adaptability to different domains. Additionally, addressing social biases, ensuring fairness, and providing explainability in these models remain significantly challenging.

The main objective of this research is to create multimodal, multitask learning models that are lightweight and can effectively process multimodal data while preserving fairness and transparency. The focus will be on developing innovative compression and quantization techniques to reduce model size, ensuring that the models can be deployed in environments with limited resources. The project will explore knowledge distillation methods to transfer knowledge from large, complex teacher models to smaller, efficient student models. This research introduces an innovative paradigm by enabling resource-efficient AI systems to handle complex medical, administrative, and legal data, where multimodal processing, fairness, and interpretability are critical for accuracy, ethical compliance, and real-world applicability.

Scientific challenges in this project include enhancing the generalizability of models so that they perform well across various document types without overfitting or requiring excessive retraining. For instance, medical documents often include dense, domain-specific terminologies, structured tables, and diagnostic imagery, while legal documents are characterized by long, text-heavy clauses with complex semantic structures. Adapting a single model to excel across these diverse formats requires innovative approaches to avoid performance degradation in one domain while optimizing for another. The project will tackle modality biases and improve cross-modal interactions between vision, text, and layout information, which are critical for accurate document analysis. Another key challenge is adapting models to diverse document domains, such as legal or medical documents, with minimal fine-tuning. Compression and quantization will be explored to develop lightweight models suitable for fast and adaptive inference, ensuring computational efficiency without sacrificing accuracy. This is particularly critical in real-time medical triage systems or portable legal aid tools, where rapid responses are essential, and computational resources may be constrained. Finally, ensuring fairness by addressing social biases and enhancing explainability will be pivotal, allowing users to trust the model's decisions and insights. For example, when analyzing loan applications, ensuring that the model does not unfairly disadvantage applicants based on gender or ethnicity is essential. Similarly, providing clear rationales for extracted insights from medical records or legal agreements can foster trust and compliance in highly sensitive and regulated environments.

The state-of-the-art methods chosen for this research include a comprehensive combination of model compression and quantization strategies such as pruning, weight-sharing, and mixed-precision representations. These techniques aim to maintain accuracy while significantly reducing model size. Knowledge distillation will be utilized to train smaller student models that replicate the capabilities of larger teacher models. For the multimodal learning component, the project will design unified models that process visual, textual, and layout data cohesively, using shared parameter spaces and cross- modal attention mechanisms to facilitate seamless integration of information. Techniques like adversarial training and transfer learning will be employed for domain adaptation, ensuring the model's ability to adapt to new document types. Meta-learning approaches will be incorporated to enhance few-shot and zero-shot learning, boosting generalizability with minimal data. To ensure fairness and interpretability, the project will integrate metrics and loss functions that detect and mitigate social biases during training. Explainable AI tools, such as attention visualization and layer- wise relevance propagation (LRP), will be used to make the model’s decision-making process transparent. Counterfactual fairness algorithms will also be explored to guarantee that the model provides unbiased results across different demographics.

The expected outcomes of this project include scalable and efficient models that achieve state-of- the-art results with significantly reduced size, enabling deployment in real-world, resource- constrained environments. The research aims to produce a multimodal learning model capable of generalizing across diverse document types with minimal retraining while ensuring fairness and transparency. The project will also contribute to the development of multitask learning frameworks that can handle multiple related tasks, such as machine translation, content summarization, docVQA, etc. within a single unified system. By leveraging knowledge distillation, smaller models will effectively inherit the capabilities of their larger counterparts, providing practical solutions without sacrificing performance.

This research has the potential to transform the field of document analysis by creating models that are lightweight, fair, and adaptable to various domains. Such advancements will benefit industries dealing with vast quantities of documents, including legal, healthcare, and administrative sectors, by offering AI solutions that are both cost-effective and trustworthy. Furthermore, by emphasizing fairness and transparency, the project will set new benchmarks for ethical AI practices in document understanding, promoting broader adoption and trust in AI technologies.

Starting date

2026-09-15

Funding category

EU funding

Funding further details

Horizon Europe – COFUND

Presentation of host institution and host laboratory

La Rochelle Université

Since its creation in 1993, La Rochelle Université has been on a path of differentiation.

Thirty years later, as the university landscape recomposes itself, it continues to assert an original proposition, based on a strong identity and bold projects, in a human-scale establishment located in an exceptional setting.

Anchored in a region with highly distinctive coastal features, La Rochelle Université has turned this singularity into a veritable signature, in the service of a new model. Its research it addresses
the societal challenges related to Smart Urban Coastal Sustainability (SmUCS).

The new recruit will join the L3i, Computer Science Department.

Cotutelle: South East Technological University (SETU), Ireland. CompuCore-Lab, Department of Computing.

Institution awarding doctoral degree

UNIVERSITE DE LA ROCHELLE

Candidate's profile

The ideal candidate for this PhD project will have a strong academic background and demonstrated motivation for research at the intersection of artificial intelligence, machine learning, and document analysis. The student should be able to combine technical expertise with an interest in addressing ethical, societal, and practical challenges in AI.

 

Educational Background

  • Master’s degree (or equivalent) in Computer Science, Artificial Intelligence, Machine Learning, Data Science, or a closely related field.
  • Solid understanding of deep learning techniques and neural architectures, with prior coursework or research experience in natural language processing (NLP), computer vision, or multimodal learning.

Technical Skills

  • Strong programming skills in Python, with experience using ML/DL frameworks such as PyTorch or TensorFlow.
  • Knowledge of multimodal architectures (vision, text, layout) and related methods such as attention mechanisms and transformers.
  • Experience with data preprocessing, annotation, and large-scale dataset handling.
  • Ability to use high-performance computing resources, and an interest in deploying models on resource-constrained devices.

Research and Analytical Skills

  • Strong problem-solving abilities and a capacity to work on open-ended research problems.
  • Familiarity with scientific writing, experimentation protocols, and reproducible research practices.
  • Interest in explainable AI, bias detection/mitigation, and fairness evaluation in machine learning.

Soft Skills

  • Motivation to work independently while collaborating effectively within a multidisciplinary research team.
  • Strong communication skills in English, both written and oral, for research dissemination (publications, conferences, seminars).
  • Openness to interdisciplinary collaboration, particularly with domains such as law, healthcare, and administrative sciences, where document analysis is crucial.
  • Curiosity, adaptability, and commitment to advancing both the scientific and societal impact of AI.

Desired but not Mandatory

  • Experience with transfer learning or meta-learning approaches.

Interest in sustainable computing, energy-efficient AI, or applications of AI for societal good.

This PhD offers the candidate an opportunity to contribute to cutting-edge AI research while addressing real-world societal and environmental challenges. The student will develop strong expertise in multimodal learning, model efficiency, and ethical AI, positioning them as a future leader in both academic and applied AI research.

2025-12-12
Partager via
Apply
Close

Vous avez déjà un compte ?

Nouvel utilisateur ?