Data Engineering team

The research activities of the IDD team fall within the broad domain of Data Management and Artificial Intelligence (AI). The targeted data types include semantic data, multi-sensor data, scientific/industrial data, and complex data (such as knowledge graphs), among others.

Knowledge and expertise

The team's work covers the main operations of data science pipelines, aiming to transform raw data into understandable and actionable information:

  • Modeling and Persistence: Data collection and preparation, selection of data models, ontological modeling, data/model persistence, representation of various types of data imperfections.
  • Storage and Optimization : Selection of data/knowledge storage systems, optimization structures tailored to the target system, consideration of energy constraints.
  • Interaction, Analysis, and Valorization : Multifaceted exploitation of massive data, anomaly detection, value/knowledge extraction and discovery from massive datasets, machine learning methods.

Application Domain

  • Transport, Aeronautics, Space,
  • Energy, Environment, Smart Cities,
  • Cultural Heritage, Healthcare.

Valeur ajoutée

  • Improved Data Quality: By explicitly handling three key quality dimensions:
    • Heterogeneity of data stemming from multiple sources (handled using ontologies as a formal framework),
    • Various imperfections (uncertainty, imprecision, missing values, etc.) in real-world data (addressed using computational intelligence as a theoretical foundation);
    • o Data volume: A distinctive feature of our approaches is their scalability and their ability to handle large volumes of data while ensuring Energy Efficiency (EE) in both storage and processing.
  • Strong synergy between Databases and AI: Integration of AI methods and tools into the solutions developed for each step of the data processing chain.
  • Reduced carbon footprint and environmental impact of algorithms we develop: By optimizing the EE criteria via the development of sophisticated cost models.
  • Advanced anomaly detection in multivariate time series: Leveraging AI tools, particularly in the context of graphs and aerial images.
  • Development of innovative and robust solutions, both theoretically and methodologically, through the integration of recent AI trends, notably: LLMs (for task automation support and decision-making) and Trustworthy AI (for result explainability and reliability assessment).

In summary, the team's work covers the entire data processing chain for the development innovative systems for large-scale data and knowledge management. These systems are designed to interact both with domain experts and end-users.

Keywords: Data Science, Massive Data, Heterogeneity, Imperfection, Scalability, Energy Efficiency, Anomaly Detection, Artificial Intelligence, Machine Learning.

Contacts