Tutorials

Tutorial 1 - Utilizing Quantum Computing to Improve the Quality of Data (2-hour tutorial)

Abstract

In today's data-driven world, ensuring data quality has become the key to success for organizations across industries and academia. Hence, this tutorial begins by exploring the foundational principles of data quality, emphasizing dimensions such as accuracy, completeness, consistency, timeliness, and validity. We will dive into strategies for identifying and addressing common data quality issues, including data duplication, missing values, and errors in data entry or processing. However, many strategies for improving data quality are costly in terms of processing time, computational resources, and/or the needed amount of training data. Furthermore, applied heuristic methods return suboptimal results. Current quantum computing research explores whether these types of computational challenges could benefit from quantum computers. Quantum computing is rapidly emerging as a transformative technology, marked by recent breakthroughs and a potentially revolutionary computational paradigm. Although it is still in its early stages, improved quantum hardware, growing global awareness, and investments underscore its potential to transform various industries. This tutorial will briefly introduce quantum computing and focus on its applications to data quality, including quantum machine learning. We will discuss the potential of quantum computing and show use cases utilizing quantum machine learning and quantum optimization for data quality issues. We will also point out open challenges that might open new research directions and new business opportunities in improving data quality.

Presenters

Valter Uotila is a Ph.D. student at the University of Helsinki, specializing in quantum computing applications for databases and data management. He has previously presented tutorials on these topics at SIGMOD 2023 and IEEE Quantum Week 2024. He has been a proceedings chair for the Quantum Data Science and Management Workshops co-located with VLDB. Beyond his primary research, Valter is also interested in distributed quantum computing, quantum information theory, and applied category theory, as well as their synergies with data management systems. He has secured top-three placements in several international hackathons, such as QHack 2022 and 2023, the Quantum Internet Application Challenge by QIA, and BMW's Quantum Computing for Automotive Challenges.

Soror Sahri is an Associate Professor at Université Paris Cité. Her research interests focus on data management, including big data analytics, data quality, and query processing. She has been involved in various international and interdisciplinary projects, and is the project coordinator of the ANR-DFG QualityOnt project "High-Quality Knowledge Graphs from Recent English, French, and German Emergent Trends with the Example of COVID-19". She is a member of the editorial board of TLDKS journal, and has served as a local organization chair of IEEE-MASCOTS 2014, and general co-chair of KGSWC 2024. She served as PC member and session chair of many national and international conferences.

Sven Groppe is a Professor at the University of Lübeck and the project coordinator of the BMBF-funded project QC4DB – Accelerating Relational Database Management Systems via Quantum Computing. Furthermore, he is principal investigator in the ANR-DFG funded project "High Quality Knowledge Graphs from recent English, French and German Emergent Trends with the example of COVID-19" (QualityOnt). Previous projects cover topics about Semantic Web databases, GPU and FPGA hardware acceleration of relational and Semantic Web databases, and advanced data management techniques for the Semantic Internet-of-Things. He is a full member of the International Federation for Information Processing (IFIP) Working Group WG2.6 Database. He has been a member of the RDF Data Access Working Group, which has been a working group of the World Wide Web Consortium (W3C) to specify SPARQL, and the Rule Interchange Format Working Group of the W3C. Over 125 program committee memberships in international conferences and workshops, reviewing activities in over 40 internationally recognized journals and for 8 funding organizations, editorial activities in 4 journals and chair of Quantum Data Science and Management, Semantic Big Data, Big Data in Emergent Distributed Environments, and Very Large Internet of Things - Workshops at the first-class ACM SIGMOD and VLDB conferences as well as general chair of the International Semantic Intelligence Conference, International Conference on Applied Machine Learning and Data Analytics, International Health Informatics Conference (IHIC) and other conferences as well as co-authorship with over 190 scientists from 28 countries on 6 continents are hints for a strong integration into the scientific community. For more details about his academic career, visit https://www.ifis.uni-luebeck.de/~groppe

Tutorial 2 - Graph Analytics for Bridging Human and Data Sciences (1-hour tutorial)

A hands-on tutorial exploring epistemic influence, intellectual history, and data justice through interdisciplinary graph analysis

Abstract

This tutorial invites data and computer scientists to step beyond conventional technical challenges and engage with questions of epistemic justice. Participants will apply tools like NLP, graph analytics, and embeddings to culturally complex datasets—such as Wikipedia entries and literary corpora from Latin America and Eastern Europe. Through hands-on exercises, they will explore how data science can be used to surface epistemic violence, recognize marginalized contributions to knowledge, and support decolonial approaches to information systems.

Presenters

Alejandra Josiowicz is Assistant Professor and Coordinator of Internationalization at the Institute of Languages and Literatures of the State University of Rio de Janeiro (UERJ). She is also Prociencia Fellow (2021-2024) at UERJ and Jovem Cientista do Nosso Estado (2023-2025) at FAPERJ. She was a Researcher at the National Council for Science and Technological Research of Argentina (CONICET). She was Post-doctoral Fellow at the Digital Humanities Laboratory in the School of Social Sciences of the Getulio Vargas Foundation. She is a Member of the CENTER FOR CRITICAL RACE + DIGITAL STUDIES and participates of the 2022-2023 cohort of the Latin American Hub (LAC) of the Feminist AI Research network.

Genoveva Vargas-Solar is a principal scientist at CNRS, LIRIS Laboratory, and an IEEE Senior Member. She holds PhDs in Computer Science (Univ. Joseph Fourier) and Literature (Univ. Stendhal), and an HDR from Grenoble. She is a member of the Mexican Academia of Computing and promotes gender equality in science. Her research focuses on data science management systems for just-intime processing. She leads projects on decolonial data, inclusion, and algorithmic fairness. Genoveva coordinates the inter-conference DEI initiative and serves on EDBT, SIGMOD, ADBIS and AMW steering/executive committees. She is an activist in Tierra Común and FeministA+I, advocating inclusive tech. She fosters Latin America–Europe collaboration, especially between France and Mexico.

Tutorial 3 - Vector Representations of Multi-Modal Data (2-hour tutorial)

Abstract

Multi-modal data processing is about exploring the interactions between various types of data to produce a more comprehensive or accurate understanding of a phenomenon such as health, emotions, or circumstances. Vectors as data representation methods have emerged as an important component in modern data management, driven by the growing importance for the need to computationally describe multi-modal data such as texts, images and video in various domains. In this tutorial, we provide a fundamental introduction on vector representations of multi-modal data, which includes intra-modal representation and intermodal representation. The goal of our tutorial is to provide a centralized and condensed introduction regarding theories and applications of multimodal data vectorization technologies for both database researchers and practitioners. We also discuss how to use vector database management systems for the management of multi-modal data.

Presenters

Toni Taipalus is an Assistant Professor at Tampere University, and the leader of the Database Systems research group. His current research interests focus on vector databases, query languages, and sustainable data management.

Jiaheng Lu is a Professor at the University of Helsinki. His research focuses on multi-model data management and database management systems, particularly addressing challenges in processing massive, heterogeneous data repositories. He has authored four books on XML and NoSQL databases and published over 130 papers in leading venues such as SIGMOD, VLDB, TODS, and TKDE.

Tutorial 4 - Data Warehousing: The Industrial Perspective (2-hour tutorial)

Abstract

Like with many other concepts, the theoretical ideas regarding how data warehouses should be organized are rarely followed in real-life systems. The notion of data warehouses as storage for historical, rarely or never changing records used for summary reports, slicing and dicing, and trend analysis do not reflect the most common business needs. In this tutorial, we will closely examine existing business demands and discuss how to support them. In this tutorial, we are going to discuss the data warehousing industry needs and whether these needs are covered by existing commercial offerings. We will outline some known challenges and discuss how new research could address them. We want to convince attendees that data warehouses are more complicated than they are usually thought of and are therefore interesting as a source of research problems.

Presenters

Henrietta Dombrovskaya is a database researcher and practitioner with over 40 years of academic and industrial experience. She holds a Ph.D. in Computer Science from the University of Saint Petersburg, Russia. At present she is Database Architect at DRW Holdings in Chicago, IL. Her research interests are focused on developing efficient interactions between applications and databases and implementation of temporal data. Henrietta is an active community member, a PostgreSQL Contributor and a frequent speaker at the PostgreSQL Conferences. She is a founder of Prairie Postgres, a not-forprofit with the goal to promote Postgres education in the Midwest states of the USA.