Doorgaan naar de inhoud

Details

Dear Data Lovers,

Our very first Data Science MeetUp is planned! I am happy to announce that we will have a talk from Natalino Busa (https://www.linkedin.com/in/natalinobusa), head of Applied Data Science at Teradata, and a very enthusiastic data lover. From his bio:

All-round CTO, IT Architect, Data Scientist, and Digital Innovator with 15+ years experience in development, management and research of distributed architectures and scalable services and applications.

Here is a description of what he will show during his talk:

Geolocated clustering and predictive services with Python and Scikit-Learn

Machine learning, and in particular clustering algorithms, can be used to determine which geographical areas are commonly visited and “checked into” by a given user and which areas are not. Such geographical analyses enable a wide range of services, from location-based recommenders to advanced security systems, and in general provide a more personalized user experience.

I will use these techniques to provide two flavours of predicting analytics:

First, I will build a simple recommender system which will provide the most trending venues in a given area. In particular, k-means tclustering can be applied to the dataset of geolocated events to partition the map into regions. For each region, we can rank the venues which are most visited. With this information, we can recommend venues and landmarks such as Times Square or the Empire State Building depending of the location of the user.

Second, I’ll determine geographical areas that are specific and personal to each user. In particular, I will use a density-based clustering technique such as DBSCAN to extract the areas where a user usually go. This analysis can be used to determine if a given data point is an outlier with respect to the areas where a user normally checks in. And therefore it can be used to score a "novelty" or "anomaly" factor given the location of a given event

We will analyze this events from a public dataset shared by Gowalla on venues checkins registered between 2008 and 2010. This notebook will cover some typical data science steps:

data acquisitiondata preparationdata exploration
Thereafter, we will dive into some unsupervised learning techniques: k-means and dbscan clustering, respectively for recommending popular venues and for determining outliers.
For the remainder of the evening I was thinking of doing several informal lightning talks of various audience members. where anyone (you?) can talk about anything for 2 to 15 mins on something you are currently working on, share some valuable experience or ask the crowd for their insight in some problems you are facing. If you already know something you would like to present feel free to contact me.

Of course all this will be done while enjoying a cold beer and some good food, what's not to love? Hope to see you guys there!

Gerelateerde onderwerpen

Misschien vind je dit ook leuk