Changing Large Tables
About the talk
“Everything changes and nothing stays the same”. Yet somehow, when dealing with datasets, we often consider change as merely an afterthought. But very quickly, the world moves on, and the dataset needs to catch up to remain useful. Rows have to be inserted, deleted or updated. As a data management environment, managing change is thus not optional. However, managing changes correctly is difficult. All-too-common are the wild collections of CSV and Parquet files that are somehow derived from each other. We can do better.
Recent developments like the Lakehouse formats and the various initiatives at schema management aim at improving things, but its not yet entirely clear where this road will lead. In my talk, I will discuss the benefits and challengis of bringing traditional transactional semantics to large-scale data analysis workflows. We will see data and schema changes and even actual time travel in action.