Big Data Montreal would like to invite you to its 44th meeting!
Join us on Tuesday January 5th 2016 at 18h00 to attend a conference, as well as to network with other Big Data enthusiasts from Montreal!
The meeting will take place at the Cloud.ca Center (formerly RPM Startup Centre (http://centre.cloud.ca/)), which is located at 420 Guy street.
All are welcome, no matter if you already have some experience with Big Data technologies or if you're simply curious to learn more.
We have 3 presentations scheduled:
• PatchWork by Thomas Triplet, Researcher – Data Scientist at the CRIM
While hundreds of clustering algorithms have been proposed, many are complex and do not scale well as more data become available, making then inadequate to analyze very large datasets: many clustering algorithms are sequential, thus inherently difficult to parallelize. We propose PatchWork, a novel clustering algorithm to address those issues. PatchWork is a distributed density clustering algorithm with linear computational complexity and linear horizontal scalability. It relies on the map/reduce paradigm to parallelize computations and was implemented using Apache Spark. On our experiments using commodity hardware, we could cluster a billion points in a few minutes only, a 40x improvement over the k-means implementation in Spark MLLib.
• Towards a time series library for Apache Spark by Simon Ouellette, CEO of Nabla Analytics, inc.
spark-timeseries (https://github.com/cloudera/spark-timeseries) is a financial and time series library for Apache Spark that is currently in development. We will go over the current design and functionality with examples, and we will discuss challenges and future developments that are expected.
• LinkedIn's Pinot by Jean-François Im, Data analytics infrastructure engineer at LinkedIn
Finally, you are also welcome to join us for some casual networking, in the same room, after the presentations, followed by a bear at Brasseurs de Montreal.