Changing Large Tables Keynote
About the talk
Everything changes and nothing stays the same. Yet, when it comes to managing a Dataset, we often consider change as a secondary thought. However, the world is evolving rapidly, and the Dataset must keep pace to remain useful. Rows must be inserted, deleted, or updated. In a data management environment, managing change is therefore not optional. However, doing it well is difficult. It is all too common to see sparse collections of CSV and Parquet files that are somehow derived from one another. We can do better.
Recent advances, such as Lakehouse-type formats and various schema management initiatives, aim to improve this state of affairs, but the exact direction of this evolution remains uncertain. In my presentation, I will discuss the advantages and challenges of integrating traditional transactional semantics into large-scale data analysis workflows. We will see data and schema changes in action, and even real time travel.