Data Lineage w. Graph DBs / From active DBs to NoSQL
Details
Session 1: Data Lineage made easy with Graph Databases
Speaker: Gianni Ceresa, DATAlysis GmbH, Analytics & EPM Consultant
Abstract: Data Lineage has always been a topic, at least for auditing, and came back as a key element with regulations like GDPR and similar. The problem is that with the multiplications of tools, sources, transformations and movements of data it's getting harder and harder to have a clear picture of the whole data lineage in a company and even more complicated to use that information for auditing. This is where graph databases jump in to make things easier: data lineage is by nature a graph. It's possible to model every single flow, every single component down to a column in a database or a dashboard. Add the whole corporate security on top with the various abstraction layers of groups and roles on top of users and your graph is ready for analysis. This talk will cover why graph databases are a perfect match for data lineage and use an analytical enterprise platform as example, tracking a single column from a database table to the very end into dashboards and reports and the respective security. (Based on Oracle Property Graph engine PGX with Cytoscape for visualization, and OAC/OBIEE for the data lineage example).
Session 2: Database Programming - From Active Databases to NoSQL and Cloud Computing
Speaker: Bastian Hossbach, Principal Member of Technical Staff for Oracle
Abstract: Database applications define highly complex business logic that
cannot be expressed purely in SQL. This prevented them from being moved completely into the database in order to achieve high performance. Therefore, vendors of relational databases started eventually to provide procedural extensions to SQL such as PL/SQL (Oracle Database), SQLPL (IBM DB2), T-SQL (Microsoft SQL Server), or PL/pgSQL (PostgreSQL). Due to the introduction of server-side programming languages, databases began to shift from being absolutely passive to becoming more active. Along with the rapid growth of some Web companies, radically new approaches to data management, known as NoSQL, emerged. NoSQL databases broke with not only how data is stored and maintained, but also how it is accessed and processed. Everyday programming languages such as Java (e.g., Apache Hadoop), Erlang (e.g., Riak), or JavaScript (e.g.,MongoDB, CouchDB) were simply used as the query language. While the need for stronger consistency or SQL stroke back in many cases, the demand for supporting modern programming languages in databases remained. With the ongoing
transition to cloud computing, developers now want to keep their preferred programming languages and environments when switching to database-side bprogramming. Consequently, vendors of cloud databases try to support multiple popular programming languages such as JavaScript, R, and Python (e.g., Microsoft Azure Cosmos DB [NoSQL], Amazon Redshift [SQL]). In this lecture, we will review the common history of databases and programming languages, give an introduction into database-side programming for some relevant systems, and discuss the challenges of effectively and efficiently integrating modern programming languages with databases.