D cubed: Decision Trees, Docker and Data Science in the Cloud


Details
For our last meetup of 2016, we are going to look deeper into Decision Trees and examine the ease of analysing third party data in the Cloud. As always, please RSVP here AND register at SkillsMatter (https://skillsmatter.com/meetups/8262-datapalooza-nights-meetup).
Understanding and Using Decision Trees
This talk is part of our Machine Learning education series. In this session we will look at our first logical model – Decision Trees. We will cover topics like organisation of a tree structure, using machine learning to construct decision trees, and employing decision trees to make predictions for classification tasks. We will also introduce metrics like entropy and information gain, and we will talk about advantages and disadvantages of the decision tree model.
If you have missed any of the previous talks in the series, there are recordings available on YouTube (https://www.youtube.com/playlist?list=PLKryvmknjpgPLh3kS_t1_Z1DlmcYPlQhZ).
Nikolay Manchev has over 10 years of database experience and has been involved in large scale migration, consolidation, and data warehouse deployment projects in the UK and abroad. He is a speaker, blogger, author of numerous articles and a book on advanced database topics. For the last three years Nikolay has been working exclusively in the big data (Hadoop) space with focus on Spark and machine learning. He has an M.Sc. in Software Technologies and is working towards an M.Sc. in Data Science.
Data Science in the Cloud
Data Science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools and finally presenting the best possible answer in a clear, engaging manner. But it all starts with having access to the data. I will walk your through some examples of how to collect, store and access data in the Cloud with the use of different APIs:
-
Store Twitter and weather data in a NoSQL database (Cloudant) and enrich the data with sentiment using Watson
-
Collect and store daily weather observations
-
Access this data in a Jupyter notebook using Python, Pandas and Spark
-
Real time Twitter sentiment in a Jupyter notebook using the PixieDust (https://developer.ibm.com/clouddataservices/2016/10/11/pixiedust-magic-for-python-notebook/)
Margriet Groenendijk is a Developer Advocate at IBM Cloud Data Services. Currently she is all about data from storing, cleaning, munging and analysing through to visualisation. All to create clear narratives and figures showing new insights from diverse data sources. She uses a range of tools for this, such as Cloudant, dashDB, Spark and Python notebooks.
Docker - all you need to know in 30 minutes
Docker is the new way to package, distribute and run your application or microservice. It's the bees-knees. It's the best thing since sliced bread. It's the kings pajamas etc, etc. In this short but packed talk I'll explain what Docker actually is and how it works. I'll show you how much of the hype is real and give a practical demo or two to illustrate the power of Docker. Is Docker the panacea for all ills? This talk will help you make up your mind.
Steve Poole is a DevOps practitioner (leading a team of engineers on cutting edge DevOps exploitation ) and a long time IBM Java developer, leader and evangelist. He’s been working on IBM Java SDKs and JVMs since Java was less than 1. He's also had time to work on other things including representing IBM on various JSRs, being a committer on various open source projects including ones at Apache, Eclipse and OpenJDK. He’s also member of the Adopt OpenJDK group championing community involvement in OpenJDK. Steve is a seasoned speaker and regular presenter at JavaOne and other conferences on technical and software engineering topics.

Sponsors
D cubed: Decision Trees, Docker and Data Science in the Cloud