Gaël Varoquaux: AI for everyone
Gaël Varoquaux is a researcher in computer science and head of the Soda project team at the Inria Paris-Saclay Centre. A key player in the field of artificial intelligence (AI), he is one of the Highly Cited Researchers for 2022, the scientists whose publications are the most cited by their peers.
Gaël Varoquaux is an accomplished researcher whose work in the field of AI is pioneering. He is particularly well known for his work on statistical learning. Another of his particularities that is acknowledged by the academic community is his commitment to open science and the development of free software, particularly through the Scikit-learn project. In his curiosity about scientific fields, he constantly questions the boundaries of science, while promoting cooperation and exchange among scientists.
From a classical trajectory to a unique pathway
After a preparatory scientific course with the MathsSup option and a course at the ENS Paris, Gaël Varoquaux began a PhD in Quantum Physics in 2005. Under the direction of Alain Aspect, winner of the 2022 Nobel Prize in Physics, Gaël Varoquaux's research focused on atomic cooling and the use of atomic interferometry to measure gravitational fields.
It is because of the software challenges encountered during his thesis that Gaël Varoquaux switched from quantum physics to machine learning. Passionate about computer science, mathematics and abstraction, he was quickly attracted to data processing and became involved in the development of open source software during his thesis, more specifically Mayavi, a 3D data visualisation software. After various experiences abroad, including a post-doctorate in Italy in 2007 in Quantum Physics and a stint at the scientific computing start-up Enthought in the United States in 2008, he came back to France that same year to work on brain imaging in collaboration with researchers in cognitive sciences from the Inria Parietal team.
Very quickly, Gaël Varoquaux's search for meaning and utility led him to machine learning. His interest in real-life problems and health are striking illustrations of this. As part of the Parietal and Soda teams, he develops tools to make people's lives easier. He emphasises that, in order to be truly useful, the issues to be addressed should go beyond quantitative aspects and include qualitative aspects related to quality of life.
The problem of the machinery of science
The importance given to qualitative questions in research is a key element for Gaël Varoquaux. “I am convinced that it is possible to apply a scientific approach without resorting to a mechanistic model, while maintaining a quantitative approach thanks to statistics,” explains Gaël Varoquaux. A mechanistic model describes the underlying mechanisms of a phenomenon by specifying the mathematical relationships between its constituent elements. For Gaël Varoquaux, there is a real need to integrate human and social sciences into AI research, in order to overcome the inherent limitations of numbers. “For example, if we look at the Covid-19 prevalence data, the analysis reveals that it is influenced by the testing policy implemented. It's similar with polls, where people tend to self-censor.” At the heart of the researcher's thinking is the idea of science not only as quantitative, but science that integrates observational data, i.e. outside the framework of experimentation where objects can be freely manipulated.
An artisan of open science...
With the development of Scikit-Learn in 2010, Gaël Varoquaux and his team are working as fast as they can in favour of open science. This Python data analysis library, conceived for brain imaging purposes, is designed to be independent of its possible areas of use. Intended for a wide audience, it facilitates the use of statistical tools by people less familiar with these methods. “Our approach differs from other tools that simplify things so much that they end up being opaque. The goal is to address people who are competent but in a hurry,” says Varoquaux. The choice of the Python programming language is deliberate, because as it is neither complex nor specific, it is able to reach a wide audience.
One of the most powerful ambitions of open science is also access to data sharing and storage. Among Gaël Varoquaux's most cited works are his reflections on data infrastructures. “We fought to make it easier to share and store data publicly,” he insists. As head of the Soda team at the Inria Paris-Saclay Centre, he works at the intersection of computer science, statistics, human and social sciences, and health. His current research focuses on complex data processing in public health. Future projects will explore the adaptation of AI and deep learning to tabular data - advances that could have a significant impact on individual and collective health decisions.
... and defender of free and fair science
Realistic and modest, Gaël Varoquaux regrets that all too often the attention in science is focused on personalities, whereas the knowledge acquired on a given subject is essentially built in teams. In his opinion, science depends above all on the strength of the collective, exchanges and mutual aid. He emphasises the need to give more recognition to young researchers for their contributions and the need for a more equitable distribution of funding to support atypical and innovative projects. “It is essential to avoid giving resources only to those who are already successful or in the mainstream. We also have to support projects, even if they are unusual, that seek to achieve something interesting. It is crucial for research to go beyond funding projects such as Scikit-learn, for example,” he concludes.