Production Quality Data Science. Building Rapid Ingestion Data Pipelines


Details
HUG Ireland has the pleasure of announcing two fantastic speakers for our June Meetup, which is hosted by Bank of Ireland @boistartups at Grand Canal Square. Cronan McNamara, CEO of Creme Global will speak about setting up an online production quality data science environment, whereas Guglielmo Iozzia, Big Data Infrastructure Engineer with IBM will talk about rapid data ingestion with a FOSS tool called Streamsets Data Collector. As you can see, this Meetup is themed around the practicals of getting stuff done in Data Science, which is of interest to all members, especially those in the Data Science, Data Analysis and Architecture areas.
The agenda is as follows:
Production Quality Data Science is a Team Sport by Cronan McNamara (https://ie.linkedin.com/in/cronanmcnamara), CEO of Creme Global (http://www.cremeglobal.com/).
It takes a team to deliver production quality data analytics. This creates a barrier for organisations trying to get started extracting value from their data, choosing the right technologies and setting up their development environment. All this must be done before getting started on the real data science work. Cronan will explore these challenges and demonstrate how Creme Global has developed a platform, called Expert Models, which helps teams to overcome some of the challenges involved in taking data science from research to a product. Expert Models is an online platform which facilitates data storage, data editing, model development, model deployment and computation all from a standard web browser. Based on modern web technologies like Linux, AWS, MySQL, Python, REST APIs, JSON and JS, team members can invite and work with their colleagues, upload, edit and connect to data sets, create and share data science models and output results amongst the team or with other organisations and the general public. The talk will be illustrated with examples of Creme Global's predictive models that are being used by the pesticide and fragrance industries internationally.
Building a data pipeline to ingest data into Hadoop in minutes using Streamsets Data Collector by Guglielmo Iozzia (https://ie.linkedin.com/in/giozzia), Big Data Infrastructure Engineer Security/Ethical Hacking Team @ IBM Ireland (http://www.ibm.com/ie-en/)
In a modern Big Data Analytics infrastructure the collection of the raw data from different sources (like relational databases, MongoDB, log files, Amazon S3, JMS consumers and many others) is often a complex challenge. You need to write code to adapt the flow to your specific analytics needs. This talk will walk through a real case scenario that helped us to save a lot of time by managing everything through a web UI (including data clean up, error handling and performance and data quality monitoring): building a pipeline to continuously ingest data into an Hadoop eco-system integrating an Open Source tool called Streamsets Data Collector.
As usual, our event hashtag is #HUGIreland and joining HUG Ireland will allow you to RSVP your place at a great Data Science themed event for the "sunny" month of June in Ireland!... Hope to see you all there!

Production Quality Data Science. Building Rapid Ingestion Data Pipelines