Skip to content

Quick Graphs: Extracting Taxonomies, Strava, Wikipedia, Python Dependencies

Photo of Mark Needham
Hosted By
Mark N. and Neo4j
Quick Graphs: Extracting Taxonomies, Strava, Wikipedia, Python Dependencies

Details

In this session Jesús Barrasa and Mark Needham will present 4 15 minute lightning talks showing you how to quickly analyse some fun datasets with Neo4j.

We'll show how to extract taxonomies from tagged data, analyse Strava runs, build a Wikipedia Knowledge Graph, and look into Python dependencies

Please sign up on the Skillsmatter page (https://skillsmatter.com/meetups/11220-neo4j-july)

Taxonomies from tagged data

Say we have a dataset of multi-tagged items: books with multiple genres, articles with multiple topics, products with multiple categories.
We want to organise logically these tags - the genres, the topics, the categories - in a descriptive but also actionable way.
A typical organisation will be hierarchical, like a taxonomy.

But rather than building it manually, we are going to learn it from the data in an automated way using Neo4j.
Jesus will show how this taxonomy can be used and will present an example on content recommendation / enhanced search.

Strava

Mark is an avid runner and tracks his run using the popular Strava application.
In this talk we'll learn how to load data into Neo4j using APOC's Load JSON procedure and then slice and dice the data using the temporal datatype released in Neo4j 3.4.

We'll be able to answer questions such as:

  • How many runs were there with a pace under 7:30 minutes per mile?
  • What's my quickest 10k run?
  • How many runs have I done in a given month?

Wikipedia

For this QuickGraph Jesus will use data about Wikipedia Categories.
You may have noticed at the bottom of every Wikipedia article a section listing the categories it’s classified under.
Every Wikipedia article will have at least one category, and categories branch into subcategories forming overlapping trees.
It is sometimes possible for a category (and the Wikipedia hierarchy is an example of this) to be a subcategory of more than one parent category, so the hierarchy is effectively a graph.

Python Dependencies

In this QuickGraph Mark will show you how to find the dependencies between your pip modules and import them into Neo4j.
We'll import the dependency graph of a few popular libraries - scikit-learn, tensorflow, pandas, and neo4j - and see what they have between them.
If we get time we'll even run graph algorithms over the dependency graph to see what it reveals.

Photo of Graph Database - United Kingdom group
Graph Database - United Kingdom
See more events