Taming Linux memory management for data science: process placement, NUMA topology, and runtime interaction // Taming Linux memory management for data science: process placement, NUMA topology, and runtime interaction
|
ABG-138650
ADUM-74411 |
Thesis topic | |
| 2026-04-22 |
Université Côte d'Azur
Sophia Antipolis Cedex - Provence-Alpes-Côte d'Azur - France
Taming Linux memory management for data science: process placement, NUMA topology, and runtime interaction // Taming Linux memory management for data science: process placement, NUMA topology, and runtime interaction
- Computer science
Memory management, process placement, data science, experimental science
Memory management, process placement, data science, experimental science
Memory management, process placement, data science, experimental science
Topic description
The subject is only described in English. B2/C1 level of English is mandatory to apply.
Large-scale data science workloads are increasingly constrained not by algorithmic complexity or model architecture, but by the physical limits of memory hierarchies and processor topology. On modern servers, Non-Uniform Memory Access (NUMA) architectures and GPU accelerators introduce asymmetric memory access costs that remain largely invisible to application-level code yet have a decisive impact on performance. The Linux kernel mediates access to these resources through a set of scheduling, memory placement, and migration policies that were designed and configured for general-purpose workloads in standard distributions, and that interact in poorly documented ways with the Python runtime, its garbage collector, and the memory allocation patterns of data science libraries such as NumPy, Pandas, and Polars. The result is a class of performance pathologies that are reproducible in practice but have not yet been systematically characterized or addressed.
The central research question is: 'How do Linux process and thread placement policies interact with NUMA topology and GPU memory hierarchies to produce performance pathologies in memory-intensive data science workloads, and what kernel-level or runtime-level interventions can systematically eliminate them?'
Two axes structure the investigation. The first, system-level, characterizes and models the interaction between the Linux memory management subsystem, NUMA placement policies, and the Python runtime through kernel instrumentation, source code analysis, and controlled experiments designed to reproduce and bound the conditions under which pathological behaviour emerges.
The second, intervention-level, proposes, implements, and evaluates concrete mechanisms for process and thread placement optimization, ranging from numactl-based static binding strategies to dynamic kernel-level migration policies and runtime-aware allocation hooks, with evaluation on realistic data science workloads.
The thesis sits at the intersection of three active research communities: operating systems memory management (NUMA policy, page migration, swap behaviour), high-performance and GPU computing (memory coalescing, unified memory, PCIe transfer costs), and Python runtime internals (the CPython allocator, garbage collector, and their interaction with the OS virtual memory subsystem). Its distinguishing contribution is the exclusive focus on the data science execution context, where large working sets, irregular access patterns, and the overhead of interpreted runtimes create a qualitatively different performance profile from the HPC or database workloads that dominate the existing NUMA literature.
What makes this thesis distinctive is its methodological depth: it combines low-level kernel instrumentation and source code analysis with controlled performance experiments, bridging the gap between operating systems research and the practical realities of data science at scale. This combination is rare in the systems performance literature and constitutes the core scientific bet of the thesis.
The specific research directions, experimental designs, and target systems will be refined throughout the thesis in response to emerging results and ongoing collaboration with supervisors. The above framing defines the problem space and initial methodology, not a fixed programme.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The subject is only described in English. B2/C1 level of English is mandatory to apply.
Large-scale data science workloads are increasingly constrained not by algorithmic complexity or model architecture, but by the physical limits of memory hierarchies and processor topology. On modern servers, Non-Uniform Memory Access (NUMA) architectures and GPU accelerators introduce asymmetric memory access costs that remain largely invisible to application-level code yet have a decisive impact on performance. The Linux kernel mediates access to these resources through a set of scheduling, memory placement, and migration policies that were designed and configured for general-purpose workloads in standard distributions, and that interact in poorly documented ways with the Python runtime, its garbage collector, and the memory allocation patterns of data science libraries such as NumPy, Pandas, and Polars. The result is a class of performance pathologies that are reproducible in practice but have not yet been systematically characterized or addressed.
The central research question is: 'How do Linux process and thread placement policies interact with NUMA topology and GPU memory hierarchies to produce performance pathologies in memory-intensive data science workloads, and what kernel-level or runtime-level interventions can systematically eliminate them?'
Two axes structure the investigation. The first, system-level, characterizes and models the interaction between the Linux memory management subsystem, NUMA placement policies, and the Python runtime through kernel instrumentation, source code analysis, and controlled experiments designed to reproduce and bound the conditions under which pathological behaviour emerges.
The second, intervention-level, proposes, implements, and evaluates concrete mechanisms for process and thread placement optimization, ranging from numactl-based static binding strategies to dynamic kernel-level migration policies and runtime-aware allocation hooks, with evaluation on realistic data science workloads.
The thesis sits at the intersection of three active research communities: operating systems memory management (NUMA policy, page migration, swap behaviour), high-performance and GPU computing (memory coalescing, unified memory, PCIe transfer costs), and Python runtime internals (the CPython allocator, garbage collector, and their interaction with the OS virtual memory subsystem). Its distinguishing contribution is the exclusive focus on the data science execution context, where large working sets, irregular access patterns, and the overhead of interpreted runtimes create a qualitatively different performance profile from the HPC or database workloads that dominate the existing NUMA literature.
What makes this thesis distinctive is its methodological depth: it combines low-level kernel instrumentation and source code analysis with controlled performance experiments, bridging the gap between operating systems research and the practical realities of data science at scale. This combination is rare in the systems performance literature and constitutes the core scientific bet of the thesis.
The specific research directions, experimental designs, and target systems will be refined throughout the thesis in response to emerging results and ongoing collaboration with supervisors. The above framing defines the problem space and initial methodology, not a fixed programme.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2026
Large-scale data science workloads are increasingly constrained not by algorithmic complexity or model architecture, but by the physical limits of memory hierarchies and processor topology. On modern servers, Non-Uniform Memory Access (NUMA) architectures and GPU accelerators introduce asymmetric memory access costs that remain largely invisible to application-level code yet have a decisive impact on performance. The Linux kernel mediates access to these resources through a set of scheduling, memory placement, and migration policies that were designed and configured for general-purpose workloads in standard distributions, and that interact in poorly documented ways with the Python runtime, its garbage collector, and the memory allocation patterns of data science libraries such as NumPy, Pandas, and Polars. The result is a class of performance pathologies that are reproducible in practice but have not yet been systematically characterized or addressed.
The central research question is: 'How do Linux process and thread placement policies interact with NUMA topology and GPU memory hierarchies to produce performance pathologies in memory-intensive data science workloads, and what kernel-level or runtime-level interventions can systematically eliminate them?'
Two axes structure the investigation. The first, system-level, characterizes and models the interaction between the Linux memory management subsystem, NUMA placement policies, and the Python runtime through kernel instrumentation, source code analysis, and controlled experiments designed to reproduce and bound the conditions under which pathological behaviour emerges.
The second, intervention-level, proposes, implements, and evaluates concrete mechanisms for process and thread placement optimization, ranging from numactl-based static binding strategies to dynamic kernel-level migration policies and runtime-aware allocation hooks, with evaluation on realistic data science workloads.
The thesis sits at the intersection of three active research communities: operating systems memory management (NUMA policy, page migration, swap behaviour), high-performance and GPU computing (memory coalescing, unified memory, PCIe transfer costs), and Python runtime internals (the CPython allocator, garbage collector, and their interaction with the OS virtual memory subsystem). Its distinguishing contribution is the exclusive focus on the data science execution context, where large working sets, irregular access patterns, and the overhead of interpreted runtimes create a qualitatively different performance profile from the HPC or database workloads that dominate the existing NUMA literature.
What makes this thesis distinctive is its methodological depth: it combines low-level kernel instrumentation and source code analysis with controlled performance experiments, bridging the gap between operating systems research and the practical realities of data science at scale. This combination is rare in the systems performance literature and constitutes the core scientific bet of the thesis.
The specific research directions, experimental designs, and target systems will be refined throughout the thesis in response to emerging results and ongoing collaboration with supervisors. The above framing defines the problem space and initial methodology, not a fixed programme.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The subject is only described in English. B2/C1 level of English is mandatory to apply.
Large-scale data science workloads are increasingly constrained not by algorithmic complexity or model architecture, but by the physical limits of memory hierarchies and processor topology. On modern servers, Non-Uniform Memory Access (NUMA) architectures and GPU accelerators introduce asymmetric memory access costs that remain largely invisible to application-level code yet have a decisive impact on performance. The Linux kernel mediates access to these resources through a set of scheduling, memory placement, and migration policies that were designed and configured for general-purpose workloads in standard distributions, and that interact in poorly documented ways with the Python runtime, its garbage collector, and the memory allocation patterns of data science libraries such as NumPy, Pandas, and Polars. The result is a class of performance pathologies that are reproducible in practice but have not yet been systematically characterized or addressed.
The central research question is: 'How do Linux process and thread placement policies interact with NUMA topology and GPU memory hierarchies to produce performance pathologies in memory-intensive data science workloads, and what kernel-level or runtime-level interventions can systematically eliminate them?'
Two axes structure the investigation. The first, system-level, characterizes and models the interaction between the Linux memory management subsystem, NUMA placement policies, and the Python runtime through kernel instrumentation, source code analysis, and controlled experiments designed to reproduce and bound the conditions under which pathological behaviour emerges.
The second, intervention-level, proposes, implements, and evaluates concrete mechanisms for process and thread placement optimization, ranging from numactl-based static binding strategies to dynamic kernel-level migration policies and runtime-aware allocation hooks, with evaluation on realistic data science workloads.
The thesis sits at the intersection of three active research communities: operating systems memory management (NUMA policy, page migration, swap behaviour), high-performance and GPU computing (memory coalescing, unified memory, PCIe transfer costs), and Python runtime internals (the CPython allocator, garbage collector, and their interaction with the OS virtual memory subsystem). Its distinguishing contribution is the exclusive focus on the data science execution context, where large working sets, irregular access patterns, and the overhead of interpreted runtimes create a qualitatively different performance profile from the HPC or database workloads that dominate the existing NUMA literature.
What makes this thesis distinctive is its methodological depth: it combines low-level kernel instrumentation and source code analysis with controlled performance experiments, bridging the gap between operating systems research and the practical realities of data science at scale. This combination is rare in the systems performance literature and constitutes the core scientific bet of the thesis.
The specific research directions, experimental designs, and target systems will be refined throughout the thesis in response to emerging results and ongoing collaboration with supervisors. The above framing defines the problem space and initial methodology, not a fixed programme.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Début de la thèse : 01/10/2026
Funding category
Funding further details
Contrat doctoral EDSTIC-DS4H
Presentation of host institution and host laboratory
Université Côte d'Azur
Institution awarding doctoral degree
Université Côte d'Azur
Graduate school
84 STIC - Sciences et Technologies de l'Information et de la Communication
Candidate's profile
The profile is only described in English. B2/C1 level of English is mandatory to apply.
The candidate must hold a Master or equivalent degree when starting the PhD.
The candidate required skills are:
- C1 level in English (possibly B2 close to reach C1)
- Excellent programming and systems skills (including C programming). We will work in a Linux environment with python and with its data science libraries (numpy,
pandas, polars, seaborn, scikit-learn, statmodels). We also use Git. If the candidate is not fluent in Python, they must be *fluent* in another
language and able to learn Python fast.
- The ideal candidate will have a solid grasp of operating systems architecture. Experience in kernel development will be considered a strong asset.
- Excellent communication skills. An important part of the Ph.D. is to communicate on the results. The candidate must be ready to write
high quality papers and give stunning talks. These skills will be nurtured during the Ph.D. thesis.
- Curious, highly motivated, hard worker, autonomous, perfectionist. A good sign you have a profile to make an excellent Ph.D. thesis is
when you cannot stand to do not understand something and can work hard to get it (or make your stuff work).
Before deciding to make a Ph.D. thesis, you must read references in this page to be sure you made the right decision
http://www-sop.inria.fr/members/Arnaud.Legout/phdstudents.html
If you apply, I expect that you will directly get in touch with us very early in the process (arnaud.legout@inria.fr, damien.saucez@inria.fr) to discuss if you are a
good fit for the subject and I am a good fit as a supervisor. Discussing early with a potential supervisor about the subject and the
supervision style (even if you are not sure to apply) is a sign of maturity and will be highly appreciated.
The profile is only described in English. B2/C1 level of English is mandatory to apply. The candidate must hold a Master or equivalent degree when starting the PhD. The candidate required skills are: - C1 level in English (possibly B2 close to reach C1) - Excellent programming and systems skills (including C programming). We will work in a Linux environment with python and with its data science libraries (numpy, pandas, polars, seaborn, scikit-learn, statmodels). We also use Git. If the candidate is not fluent in Python, they must be *fluent* in another language and able to learn Python fast. - The ideal candidate will have a solid grasp of operating systems architecture. Experience in kernel development will be considered a strong asset. - Excellent communication skills. An important part of the Ph.D. is to communicate on the results. The candidate must be ready to write high quality papers and give stunning talks. These skills will be nurtured during the Ph.D. thesis. - Curious, highly motivated, hard worker, autonomous, perfectionist. A good sign you have a profile to make an excellent Ph.D. thesis is when you cannot stand to do not understand something and can work hard to get it (or make your stuff work). Before deciding to make a Ph.D. thesis, you must read references in this page to be sure you made the right decision http://www-sop.inria.fr/members/Arnaud.Legout/phdstudents.html If you apply, I expect that you will directly get in touch with us very early in the process (arnaud.legout@inria.fr, damien.saucez@inria.fr) to discuss if you are a good fit for the subject and I am a good fit as a supervisor. Discussing early with a potential supervisor about the subject and the supervision style (even if you are not sure to apply) is a sign of maturity and will be highly appreciated.
The profile is only described in English. B2/C1 level of English is mandatory to apply. The candidate must hold a Master or equivalent degree when starting the PhD. The candidate required skills are: - C1 level in English (possibly B2 close to reach C1) - Excellent programming and systems skills (including C programming). We will work in a Linux environment with python and with its data science libraries (numpy, pandas, polars, seaborn, scikit-learn, statmodels). We also use Git. If the candidate is not fluent in Python, they must be *fluent* in another language and able to learn Python fast. - The ideal candidate will have a solid grasp of operating systems architecture. Experience in kernel development will be considered a strong asset. - Excellent communication skills. An important part of the Ph.D. is to communicate on the results. The candidate must be ready to write high quality papers and give stunning talks. These skills will be nurtured during the Ph.D. thesis. - Curious, highly motivated, hard worker, autonomous, perfectionist. A good sign you have a profile to make an excellent Ph.D. thesis is when you cannot stand to do not understand something and can work hard to get it (or make your stuff work). Before deciding to make a Ph.D. thesis, you must read references in this page to be sure you made the right decision http://www-sop.inria.fr/members/Arnaud.Legout/phdstudents.html If you apply, I expect that you will directly get in touch with us very early in the process (arnaud.legout@inria.fr, damien.saucez@inria.fr) to discuss if you are a good fit for the subject and I am a good fit as a supervisor. Discussing early with a potential supervisor about the subject and the supervision style (even if you are not sure to apply) is a sign of maturity and will be highly appreciated.
2026-05-03
Apply
Close
Vous avez déjà un compte ?
Nouvel utilisateur ?
Get ABG’s monthly newsletters including news, job offers, grants & fellowships and a selection of relevant events…
Discover our members
Nokia Bell Labs France
ONERA - The French Aerospace Lab
Aérocentre, Pôle d'excellence régional
ASNR - Autorité de sûreté nucléaire et de radioprotection - Siège
ANRT
Medicen Paris Region
Institut Sup'biotech de Paris
Groupe AFNOR - Association française de normalisation
TotalEnergies
Ifremer
Tecknowmetrix
Nantes Université
Généthon
ADEME
Servier
Laboratoire National de Métrologie et d'Essais - LNE
SUEZ
