Interactive and Interpretable Machine Learning and Graph-Based Data Science


Details
Hi Everyone,
Agenda:
11:30 am - 11:45 Introductions/Meet and greet
11:45 - 12:45: Interactive Visualization for Interpretable and Interactive Machine Learning with Professor Enrico Bertini
12:45 - 1:45: Graph-Based Data Science with `kglab` with Paco Nathan
1:45 - 1:50: Final thoughts
First Guest
BIO:
Enrico Bertini is an Associate Professor in the Department of Computer Science and Engineering at NYU Tandon School of Engineering, where he teaches and conducts research in information visualization and visual analytics. His latest research focuses on interpretable and interactive machine learning with a focus on data visualization to make sense of machine learning model behavior. Dr. Bertini is also the co-host of Data Stories, a podcast on data visualization and the role data plays in our lives.
http://enrico.bertini.io
http://datastori.es
Research Interests: Visual Analytics, Information/Data Visualization, User Interfaces
Second Guest
BIO:
Paco Nathan
Known as a "player/coach", with core expertise in data science, natural language, cloud computing; ~40 years tech industry experience, ranging from Bell Labs to early-stage start-ups. Advisor for Amplify Partners, IBM Data Science Community, Recognai, KUNGFU.AI, Primer. Lead committer PyTextRank, kglab. Formerly: Director, Community Evangelism @ Databricks and Apache Spark. Cited in 2015 as one of the Top 30 People in Big Data and Analytics by Innovation Enterprise.
Abstract: Graph-Based Data Science
Python has a number of excellent libraries for working with graphs which provide: semantic technologies, graph queries, interactive visualizations, graph algorithms, probabilistic graph inference, plus embedding and other integrations with deep learning. However, few of these share integration paths – other than writing lots of custom code – and most do not share common file formats.
This talk explores `kglab` https://github.com/DerwenAI/kglab a recently launched open-source project which is pursuing integration in three dimensions:
-
Classes and transforms making it simple to combine semantic technologies (RDFlib, OWL-RL, pySHACL, csvwlib, etc.) with graph libraries that expect a matrix or tensor representation (NetworkX, iGraph, pslpython, node2vec, PyVis, etc.)
-
An abstraction layer that integrates effectively with popular data science tools: pandas, scikit-learn, PyTorch, spaCy, and so on.
-
Architecture for scale-out with popular data engineering infrastructure (Apache Parquet, fsspec, Apache Spark, Ray, RAPIDS, etc.) on cloud computing.
## Key takeaways
-
Attempts to integrate with many different graph-based libraries instead of trying to supersede them, while being agnostic about graph databases.
-
Popular use cases benefit by alternating between graph methods, for example: embedding combined with SHACL validation and statistical relational learning (e.g., PSL) allows for more effective data quality checks of annotations and entity linking into graphs (now the #4 top NLP use case in the industry).
Graph databases tend to promise "One Size Fits All" solutions, though lessons from the history of data science disprove this notion: proprietary OSFA generally fails, open-source integration/interoperability wins out; instead, the diversity of graph-based techniques tend to complement each other, e.g., alternating between SHACL validation on an RDF graph and PSL inference on a probabilistic graph to test the data quality of added annotations.
The `kglab` docs include notebook-based tutorials to explore concepts and provide sample code. We'll cover portions of those, plus the integration roadmap and challenges.
https://www.linkedin.com/in/ceteri/
https://twitter.com/pacoid
https://www.linkedin.com/groups/6725785/
More details coming soon.
Sponsors:
NUMFOCUS, h2o.ai

Sponsors
Interactive and Interpretable Machine Learning and Graph-Based Data Science