Talks
1
Changing Large Tables
“Everything changes and nothing stays the same”. Yet somehow, when dealing with datasets, we often consider change as merely an afterthought. But very quickly, the world moves on, and the dataset needs to catch up to remain useful. Rows have to be inserted, deleted or updated. As a data management environment, managing change is thus not optional. However, managing changes correctly is difficult. All-too-common are the wild collections of CSV and Parquet files that are somehow derived from each other. We can do better.
Recent developments like the Lakehouse formats and the various initiatives at schema management aim at improving things, but its not yet entirely clear where this road will lead. In my talk, I will discuss the benefits and challengis of bringing traditional transactional semantics to large-scale data analysis workflows. We will see data and schema changes and even actual time travel in action.
2
Mixed Model Arts - The Convergence of Data Modeling Across Apps, Analytics, and AI
For decades, data modeling has been fragmented by use cases: applications, analytics, and machine learning/AI. This leads to data siloing and “throwing data over the wall.”With the emergence of AI, streaming data, and “shifting left" are changing data modeling, these siloed approaches are insufficient for the diverse world of data use cases. Today's practitioners must possess an end-to-end understanding of the myriad techniques for modeling data throughout the data lifecycle. This presentation covers "mixed model arts," which advocates converging various data modeling methods and the innovations of new ones.
3
(Gen)AI at the Heart of Mirakl's Product: From Inception to Deployment of the Mirakl Catalog Transformer
Mirakl was founded in 2012 with the idea of enabling any retailer to offer the best deals to their customers by developing a marketplace activity. Since 2020, the Data Science team has introduced AI features to enhance user experience by enriching the product's functionalities with AI (automatic recategorization, price anomaly detection etc.), and in 2023, post the birth of ChatGPT, with the enrichment of product descriptions. With the arrival of GenAI, we saw the opportunity to rethink the engineering of our products by rebuilding them with AI at their core. This challenged us to transition from a Data Science project mode to an AI product approach with AI squads. In this talk, we will break down the steps that allowed us to move from three features in the product to a "One Click Transformer" in just six months:
The Discovery/Product Research phase, How to reach a Beta with a Generative AI feature in the hands of our customers, Iteration and Continuous improvement based on user problems, and The scaling phase with layering and fine-tuning to improve quality while controlling costs at scale. And the scale to an AI Product Approach to all the team
4
Break
Recharge your batteries, enjoy a coffee, exchange ideas, and, if you’re curious, explore the conference stands
Break
Rechargez vos batteries, prenez un café, échangez quelques idées et, si la curiosité vous en dit, laissez-vous tenter par une visite des stands de la cofnérence.
Break
Recharge your batteries, enjoy a coffee, exchange ideas, and, if you’re curious, explore the conference stands
Comment le département du Gard conjugue Modern Data Stack et Géomatique ?
The Gard Department is a local authority that employs nearly 3,000 agents. Every day, they serve the public interest in various areas: public health and social action, departmental roadworks, middle school, high-speed broadband, etc.
Through these activities, the Gard Department generates and processes large amounts of data daily, which the Department, through its Innovation and Information Systems Directorate, aims to leverage. To this end, a study was conducted with fellow local authorities and more broadly within the data ecosystem to identify the state of the art and define a way forward.
This study revealed that the Modern Data Stack is widely adopted by both large corporations and smaller companies. Dedicated websites, podcasts, communities, and conferences extol its virtues. However, despite this widespread enthusiasm, implementations in local authorities are scarce: is the Modern Data Stack unsuitable for the challenges faced by public local authorities ?
A local authority is primarily defined by its territory. Consequently, the data we handle is largely geolocated:
Are there accident-prone areas that would require changes to road layouts? Where are the populations most in need of assistance, and how should we distribute our resources across the territory? What is the most relevant location to build the future college considering the students' locations ? Therefore, the stack must be geographical! With this constraint established, which tools should be chosen to build our Modern Data Stack?
This talk will present the stack implemented by the Gard Department and the reasons behind these choices. We will see that while some reference components, such as DBT, handle geographical data well, others struggle, leading us to rely on lesser-known data science tools but widely used by geomaticians. We will also explore some use cases illustrating how geography enhances data analysis and decision-making. Finally, we will conclude with the prospects for evolving our stack and organization.
5
Building a Robust Data Platform on the Road to Self-Service Analytics
At Malt, our central data team is at the core of stakeholder data requests, often becoming a bottleneck. Enabling self-service is crucial to removing this bottleneck, and scaling by providing the right tools for the right personas and usage.
To achieve this, we built strong and clean foundations in our data warehouse, organized in layers with a clean exposition layer. This requires collaboration between data engineers and analytics engineers.
We approached self-service by addressing different personas and needs:
- A self-service layer for business users in Looker through a generative AI app.
- Tools for the data team to unlock ad-hoc analysis and sharing.
- A self-service layer for data users via a dedicated AI assistant to help navigate the data warehouse.
The objective is to share our journey and our challenges. We'll detail how we tackled challenges from and organization perspective, and the tools we implemented. This should give keys and idea for anyone moving toward the self-service road.
Comment construire une vision et une stratégie Data & IA impactantes ?
Reservation-only workshop to share advice and concrete examples of Data & AI vision and strategy from different business sizes and contexts. The aim is to give examples of documented strategies and visions, but also (and especially) the process to create a meaningful and inspiring Data vision and strategy.
6
The Intrinsic Limitations of Large Language Models: Understanding Hallucinations and Their Impact on Data Workflows
Large Language Models (LLMs) have revolutionized natural language processing and opened new frontiers in data applications. However, they are not without limitations.
This talk will delve into the primary constraints of LLMs, focusing on the phenomenon of hallucinations—instances where models generate incorrect or nonsensical information. Contrary to common perception, these hallucinations are not mere bugs but an inherent feature of how LLMs are designed and trained : in other words, hallucinations will never disappear from LLMs even in 10 years. Besides, hallucinations are, by design of LLMs, very convincing, and sometimes hard to detect !
We will explore the underlying reasons for these limitations, rooted in the probabilistic and auto-regressive nature of LLMs. Understanding why hallucinations occur is crucial for recognizing that they cannot be entirely eliminated. Instead, they must be managed effectively, especially when integrating LLMs into data pipelines.
The talk will address the concrete implications of LLM limitations for data engineers, data analysts, and business users. We will examine scenarios where hallucinations can lead to data misinterpretation, flawed analysis, and erroneous business decisions.
Additionally, practical strategies for mitigating the impact of these limitations will be discussed, including model fine-tuning, incorporating human-in-the-loop approaches, and leveraging complementary technologies to enhance reliability.
How to create a GDPR-compliant Iceberg Lakehouse
Before open table formats became popular, European tech companies had two options for implementing GDPR compliance on their data: Moving all data to the warehouse to delete some when necessary. Implementing a data retention policy in the data lake using the lifecycle to delete all data after a certain amount of time.
In this talk, I will introduce the Lakehouse architecture and explain how to design and implement an Iceberg Lakehouse for your data to reduce costs and increase throughput while maintaining GDPR compliance.
7
Maman, j’ai raté la mise en production de mon algo d’IA!
It’s a well-known statistic in the field: the vast majority of AI projects fail. But the most discouraging failures, with the greatest business impact, are those that occur once the model is in production: a poor recommendation model erodes user trust in the algorithm, a faulty facial recognition model prevents phone unlocking, a bad pedestrian detection model can cause a fatal accident…
Over the past four years, I have deployed several AI algorithms in different contexts: each presented challenges and learnings that have shaped my beliefs on how to properly deploy AI algorithms (MLOps). I will share the most interesting insights from these experiences in this talk :
1) The importance of load testing
2) Watch out for data drift!
3) How to know if my model works in real life?
4) Let’s (re)discover together through these experiences the fundamentals of ML monitoring!
Team meetings #1
Team meetings #1
8
Why LLMs can’t do analytics ?
In this talk, we'll see why language models (LLMs) aren't really made for data analysis. Even though they promise quick answers thanks to AI, they often lack the precision and reliability to make good decisions. We'll talk about the limitations of LLMs for doing comprehensive analyses, and we'll show you a better option: using them to research and exploit existing analyses. You'll see why relying on reliable data is the best way to get useful insights.
Le Domain-Driven Design : Une approche révolutionnaire pour l'ingénierie des données
In a world where data is at the heart of strategic decisions, it is crucial to ensure that data models accurately reflect the realities of the business domain. Domain-Driven Design (DDD) offers an innovative approach to solving common problems such as data silos, misunderstandings between technical and business teams, and increasing system complexity.
In this 5-minute talk, we'll explore how DDD can transform data engineering by improving data quality, system maintainability, and process efficiency. Through concrete examples, we will demonstrate the tangible benefits of this approach and offer keys for its adoption. Join us to discover how DDD can revolutionize your data projects and maximize their impact.
10
Unlock new SQL capabilities with BigFunctions !
Explore a framework for building and using 100+ powerful BigQuery functions to supercharge your data analysis.Learn how to efforlessly collect data, perform advanced transformations and activate them without leaving SQL.
11
Lunch
Enjoy a friendly and gourmet lunch break to recharge your batteries. The meal will be prepared by Meet My Mama, a committed caterer highlighting the world's cuisines through the talent of “Mamas”, woman entrepreneurs who share their culinary passion while promoting a more inclusive and sustainable society. A vegetarian and vegan offer will be offered, delighting your taste buds and the environment.
This break is also a perfect opportunity to exchange with other participants, expand your network or even visit the stands of the organizers and sponsors.
12
Product Analytics : making sense from unreliable data.
In today's fast-paced digital environment, product analytics is unavoidable to optimise user experiences and drive revenues.
However, data often comes with reliability issues, such as inconsistencies, incompleteness, and noise.
This talk will detail the challenges that product analytics are facing and will propose several strategies to extracting valuable insights, despite data quality.
Lunch
Enjoy a friendly and gourmet lunch break to recharge your batteries. The meal will be prepared by Meet My Mama, a committed caterer highlighting the world's kitchens through the talent of “Mamas”, woman entrepreneurs who share their culinary passion while promoting a more inclusive and sustainable society. A vegetarian and vegan offer will be offered, delighting your taste buds and the environment.
This break is also a perfect opportunity to exchange with other participants, expand your network or even visit the stands of the organizers and sponsors.
Lunch
Enjoy a friendly and gourmet lunch break to recharge your batteries. The meal will be prepared by Meet My Mama, a committed caterer highlighting the world's cuisines through the talent of “Mamas”, woman entrepreneurs who share their culinary passion while promoting a more inclusive and sustainable society. A vegetarian and vegan offer will be offered, delighting your taste buds and the environment.
This break is also a perfect opportunity to exchange with other participants, expand your network or even visit the stands of the organizers and sponsors.
13
Reducing Production Incidents with Data Contracts
At BlaBlaCar, data teams are consuming 1 billion of events per day through Kafka created by backend and frontend teams. To enhance the reliability of our data pipelines, we're using data contracts to provide a standardized structure for our data. This closed the feedback loop between data producers and data consumers, massively improving the quality of the data inflow.
This talk will present how we set this organization in BlaBlaCar and what issues we could solve. Technologies involved: Kafka, Avro, BigQuery, OpenAPI, Java.
Team meetings #2
Team meetings #2
14
Analytics for All: Creating a Self-Service Culture in a Scaling Organization
Brevo’s Data Team was inundated with requests for insights, which often slowed down decision-making processes. By leveraging a robust modern data stack, the company shifted to a self-service model that enables every employee, regardless of technical expertise, to explore and analyze data independently. Taha will cover key strategies implemented to facilitate this transformation, including the integration of AI that streamline data interpretation and enhance user experience.
15
Une stack légère pour suivre la standardisation de l'Analytics à l'échelle : dbt + duckdb + Observable Framework
At Decathlon, we went from a centralized to decentralized Analytics organization in a business domain this year. However, last year we had the ambition to reduce the number of DBT repositories on our Github from 250 to 15 repositories (1 per domain), in order to allow for simpler governance and to be able to converge DBT architectures.
To continue our standardization work at scale and monitor the evolution of its 15 repositories, we have chosen to apply DORA metrics on each of the repositories.We have developed an analytics stack that is fully embedded in Github.
We use:
- Duckdb as an execution engine, which recently supports table reading
- DeltaDBT-DuckDB to orchestrate modeling
- Observable Framework for building a dataviz application, deployed statically on Github Pages
The different synchronization methods for Data Platforms at Carrefour
The largest companies, made up of multiple countries and/or subsidiaries, have several local data platforms. At Carrefour, we have a federated model where each country has its own data platform. As long as the data platforms are used locally, there is no issue. However, when headquarters wishes to consolidate and aggregate data from different platforms to create global applications, analytical dashboards, or centralized operations, many challenges arise.
Beyond governance and data documentation, data synchronization is a real challenge. When should data be retrieved from each platform to cross-reference and aggregate them? Several solutions are possible: using a scheduler, using an orchestrator, or using an event-driven architecture, among others. We will review these solutions, their advantages and disadvantages based on use cases and requirements.
A demo and interactions with participants are planned for this session.
16
Le futur du CDO et des équipes Data & IA
Join us for an exclusive panel discussion on the evolving roles of Chief Data Officers (CDOs) and Data & AI teams, moderated by Robin Conquet, founder of DataGen. Virginie Cornu (MyFenix, ex-Jellysmack) and Claire Lebarz (Malt, ex-Airbnb) will share their insights on the future of strategic roles in product, technology, and data fields.
We will explore the evolution of the CDO role, its impact on the role of CPO, key skills for future Data & AI leaders, and the new organizational dynamics of Data teams within mature companie
Quand la culture Produit rencontre l’IA Générative
For several years, AI has been serving products to meet user needs. With the rise of Generative AI, the emergence of these new models could have seamlessly and fluidly integrated into the Product approach. Many believed that Generative AI was merely a 'new tool' in the technological revolution that is AI. However, it is not that simple and immediate! It is clear that Generative AI is a revolution in itself, disrupting our achievements, as well as our convictions, habits, and processes.
How does the world of Product Management adapt to this disruption to continue providing products that are always useful, usable, and used ?
The SNCF Connect & Tech teams will detail the launch of their first use case leveraging the capabilities of Generative AI. Beyond the technical setup of our Generative AI for the Customer Relationship teams, we will discuss the path we chose to take to build a tailored tool. Our ambition: to combine the best of available technologies with our human expertise, while designing a relevant, fast, and integrated user experience.
In this talk, we will explore together what remains unchanged, what had to change, and what needs to change to achieve a smooth production.
17
What’s Next After SQL?
It sounds like a hot take. But does a language created more than 30 years ago is still relevant to our analytics need?
SQL was designed for OLTP (Online Transaction Processing). For CRUD operations (Create, Read, Update, Delete).
In the advent of data analytics, we now use SQL to transform data. To create ad-hoc analysis. To create business intelligence dashboards.
We have created tools (e.g. dbt) to streamline such process. To bring “software best practices”. We have made SQL our de facto lingua for anything analytics related.
SQL doesn’t need to change. It’s working fine for decades. It’s the keystone of most of our modern world databases.
It’s the data and what we do with it that has changed. Still, we only rely on quite low-level frameworks (Spark with Hadoop/MapReduce) and we have built our analytics semantic on top of SQL to deal with data which is not rectangular anymore.
Are we missing something ? What’s next after SQL?
Coming back on my experiences as a data ops engineer, supporting data team at company like Deezer, Olympique de Marseille, Maisons du Monde, etc. this talk will look at the overlooked flaw introduced by SQL in the analytics world, how it can be managed and how new frameworks are leading the way in that problem space.
How to scale Machine Learning Operations with Feature Stores ?
Feature Store is a central component of Machine Learning at scale for mature organizations, providing increased operational efficiency, consistency, and scalability.
More and more organizations are reaching a higher level of maturity for ML in production. We have discussed this with many of these organizations and observed a trend of many people wondering about how a Feature Store would help them overcome critical challenges.
This talk aims to increase the audience's understanding of Feature Stores by providing a broad overview of their anatomy, key benefits, and pitfalls, how they work under the hood, and different possible architectures. Aside from theoretical content, practical examples from real-world applications will be given along the way.
18
Break
Recharge your batteries, enjoy a coffee, exchange ideas, and, if you’re curious, explore the conference stands
Break
Recharge your batteries, enjoy a coffee, exchange ideas, and, if you’re curious, explore the conference stands
Break
Recharge your batteries, enjoy a coffee, exchange ideas, and, if you’re curious, explore the conference stands
19
Back to the future, forward to the past: how the lessons of yesterday shapes the data PM role to move forward
In this talk, we will uncover the critical role of Data Product Managers (DPMs) in making your data strategy a reality.
We will start by discussing the "data as a product" approach, emphasizing its focus on user-centricity. Next, we will explore how this approach can yield significant benefits even without adopting a full datamesh.
We will then delve into the specific responsibilities of DPMs, from managing data products to aligning data goals with business objectives. You will also discover practical tips for smoothly integrating DPMs into your organisation, along with strategies for training and recruiting.
One Thousands and One dbt Models: How BlaBlaCar Moved to dbt in 12 months
Over 12 months, we migrated 1,000 dbt models. We introduced a new paradigm and tool in our stack. This required training, new frameworks, testing, transversal collaboration, squad implementation, etc. We want to share our migration journey, targeting companies that plan such a move and want to design their strategy / supporting tooling.
20
Closing the Loop: Alerting your Stakeholders on Data Quality
Data Quality is one of the biggest priorities, whether it's to ensure accuracy of our dashboards or reliability of ML / GenAI models. It is also a responsibility of the Data Team, while data producers are typically business and tech squads. How can we automatically alert them on quality, 5+ different teams in 3 different countries? How do we close this loop? We're offering solutions to show the audience how to implement them in their context as well as show-casing possibilities we unlocked
Gérer les carrières dans la data : Comment ne plus jamais avoir peur de l’évaluation annuelle !
Does all of this sound familiar to you? “I've been working here for six months and I feel stuck — I'm not learning anything anymore, and I feel like I'm stagnant.” Or maybe, “When will I get my next raise?” X just got one, what about me? Besides, Competitor 1 would pay me double.” If that resonates with you, participate in the workshop “Managing Data Careers: How to Never Stress Again for the Annual Review!” , a collaborative workshop where we will build together a skills framework using a methodology that has been proven in companies of 200 and 20 people, specially designed for data teams. This framework will provide you with a clear guide to help data professionals evolve throughout their careers, ensure fair promotions, and streamline salary decisions — without turning you into a robot applying a rigid and inadequate grid.
21
Comment le sport de haut niveau s’est emparé des données pour les Jeux de Paris et après
How, in light of obtaining the Paris Olympic and Paralympic Games, the French sports ecosystem was organized so that data offered a competitive advantage to French athletes.This conference will discuss the constituent stages of the Sport Data Hub up to concrete examples of the use of data to estimate potentials, predict performance levels, analyze the competition...
Modelling your Business in a Spreadsheet in Just 30 Minutes
Since 2023, as software businesses aim for profitability and growth at all costs is no longer viable, there is a pressing need to ruthlessly prioritise and identify the most promising revenue-generating opportunities.
Data Analysts play a crucial role in helping business leaders identify and evaluate the best opportunities. They are now expected to develop tools that facilitate informed business and product tradeoff decisions.
Given these macro-economic changes, Data Analysts must adapt their skills to align closely with business value. They must now support business leaders by identifying and sizing the best opportunities, and one key competency they need to develop is linking any initiative to business outcomes. This includes performing "what if" scenarios and sensitivity analyses to enable effective business and product tradeoff decisions.In this presentation, we will walk through the step-by-step process of building a “Growth Model,” a powerful tool for understanding business mechanics and determining where to allocate resources for growth. We will demonstrate an example model for a B2B SaaS business, sharing lessons from our experience at Dashlane.
We will emphasise that the process of building the tool is as important as the final output. It involves figuring out how all metrics interconnect to produce sensible results, tracking down baseline rates for each assumption, and applying excellent business judgment. To develop this tool, one must have an intuitive sense of the company’s strategy, be an unbiased observer, understand the business at a molecular level, and be capable of obtaining accurate data for each input.For analysts, developing this tool will deepen their understanding of the business at a granular level, positioning them as a top strategic resource within their company.
By the end of the presentation, Data Analysts will have a clear vision of the type of data products they should learn to build to advance their analytics career, transitioning from a tactical role to a strategic advisor with data expertise.
22
AI on Data - snake oil or actually useful?
Many recent text-to-SQL solutions claim to replace data analysts, but they result in untrustworthy data and inaccurate queries on anything but the most simple datasets. In reality, data engineering is needed to define the semantics of the data and provide a way for LLMs to naively request what they want without needing to generate complex code. Deterministic systems are needed to convert these requests into accurate SQL using the well-defined data models built by engineers. This talk will explore why data engineering remains critical for successful AI implementations.
23
Humanizing Data Strategy
People are emotional, irrational and unpredictable – and yet they are the most important aspect of any data strategy. Tiankai introduces his framework of the 5 Cs – competence, collaboration, communication, creativity and conscience – with actionable examples to help you put the human being really at the center of your data efforts, and to turn your team members and employees into active advocates for your data strategy.