• Data lake optimization and Attention models in deep learning

    Hello everyone, I would like to welcome you all to the June meetup of KW Intersections on June 11th. Kik thank you very much for hosting us. In this meetup we will have two talks. The first talk will be about data lake optimization and will be provided by Marco Albuquerque and has the title 10 design principles for a muddy data lake with a fragile, expensive, and inefficient data pipeline. The second talk will be about Attention models in Neural nets and will be provided by Mahtab Kamali and has the title Why we need attention. Title: 10 design principles for a muddy data lake with a fragile, expensive, and inefficient data pipeline Abstract: TBA Speaker: Marco Albuquerque Title: Why we need attention Abstract: Attention mechanism turned to become one of the main building blocks in Deep Neural Net context. It had significantly improved the performance of tasks involving long sequences of data such as machine translation, image captioning and sentiment analysis. Attention mechanism's main concept revolves around allowing RNN models to focus (pay attention) more on some part of the data sequence than the other parts. This talk provides an overview of main building blocks of attention models such as seq2seq models and vanishing gradient, following by their shortcomings and how attention model is addressing them. The theory of attention and some of its further improvements are being touched lightly followed by some examples. Speaker: Maria (Mahtab) Kamali is a data scientist currently working with Thomson Reuters. She has a PhD in System Design Engineering from University Waterloo. Kind regards, Mary

    1
  • The EHT: How Data Science helped capture the 1st Image of a Black Hole

    I would like to welcome you all to the next meetup on May 14th at Auvik Networks. Jorge Alejandro (Alex) Preciado, who was part of The Event Horizon Telescope collaboration (EHT) will be the speaker for this month. He will be discussing the work that he did as a data scientist to help capture the first image of a black hole. 𝐓𝐢𝐭𝐥𝐞: The Event Horizon Telescope: How Data Science helped capture the First Image of a Black hole 𝐒𝐩𝐞𝐚𝐤𝐞𝐫: Jorge Alejandro (Alex) Preciado, Data Scientist at the Event Horizon Telescope (https://www.linkedin.com/in/alexpreciado/) 𝐀𝐛𝐬𝐭𝐫𝐚𝐜𝐭: The Event Horizon Telescope (EHT), an international collaboration coordinating the observations of a global network of radio telescopes, has recently captured the first image of a black hole. I will show how the EHT acquired, processed and analyzed astrophysical data to generate the images and estimate the parameters of the black hole. The problem is an example of Bayesian parameter estimation where many different models and comparison schemes were used, and which will be discussed in this talk.

    10
  • Data Science in reinsurance & Textbased class of Freedom of Information Request

    Hello all, I would like to welcome you all to the April meetup of KW Intersections on April 9th. We will have two talks. The first talk will be by Slava Kohut and the second talk will be by Scott Jones. We will disclose soon a description the location where we will be hosting the meetup. Title: Using Data Science to Create Value in Reinsurance Speaker: Slava Kohut (https://www.linkedin.com/in/slavakohut/) is a Research Scientist in the Data Science team at Validus Research. Abstract: The rapid rise in data production and the availability of robust tools for data processing are transforming the insurance industry. Slava will present an overview of a particular branch of insurance called reinsurance, explain how reinsurers can use data analytics to improve their risk management, and finally talk about a few illustrative examples from his experience. Disclaimer: Slava's talk represents his own opinions and does not the reflects the view of the company he is currently affiliated with. Title: Text-Based Classification of Freedom of Information Requests Speaker Scott Jones Abstract: All citizens in the province of Ontario are permitted to make a request for information to either their municipality, or the provincial government, under the Freedom of Information and Protection of Privacy Act (FIPPA). Thereafter, they should expect a timely response. In Kitchener-Waterloo and Toronto, these requests are publicly available through an open data portal. Using the submitted request free text, I have attempted to classify, and ultimately predict, which response will be given to a particular request. The problem is a nice example of multi-class classification. Many different algorithms have been applied to test whether there is predictive capability in the text. I will present some of these, along with a few useful NLP techniques to clean and prepare the data. Mary

    8
  • Intro to common NLP practices & Reinforcement learning in scientific research

    Hello all, We would like to invite you all to the March meetup of KW Intersections on March 12th. This time the meetup will be hosted and sponsored by Sonoma. As always we will have two talks. The first talk will be provided by Lucas Meng, Dawar Ahmad and Bikramjeet Singh and the second talk will be provided by Qasim Ali. Title: Intro to common NLP practices with an uncommon dataset Speakers: Lucas Meng (https://www.linkedin.com/in/zhixiang-meng/), Dawar Ahmad (https://www.linkedin.com/in/dawar-ahmad/) and Bikramjeet Singh (https://www.linkedin.com/in/bikramsingh95/ ) Abstract: Join us as we perform language analysis using different concepts such as word embeddings, clustering and topic modelling on a very peculiar dataset - the last statements of inmates on death row Title: Application of reinforcement machine learning in scientific research problems Speaker: Qasim Ali (https://www.linkedin.com/in/qasimch86/) Reinforcement machine learning is a kind of adaptive predictive modelling that is frequently used as a feature engineering tool to optimize values of selected parameters. My talk includes an application of reinforcement machine learning in the theory of aging. I developed an adaptive model to optimize altruist behaviour of a mother yeast cell to produce maximum number of healthy daughters. I will give a brief description of the biological problem, its mathematical formulation, its implementation and finally the simulation results.

    6
  • Blockchain instead of db to store information & Supply-Chain blockchain platform

    Hello all, I would like to invite you all to the first meetup of 2019 on February 19th. We also will provide an online streaming option in case that we will have bad weather. Thank you very much Gheorghe and Nife to give a talk at this meetup. The theme this time will be blockchain. Gheorghe will discuss why you could use blockchain instead of a database to store information. Next Nife will walk us through his journey in how he developed a supply-chain blockchain platform. Why use blockchain instead of a database to store information? by Gheorghe Curelet-Balan This talk will investigate what is the computer science fundamental concept that differentiate blockchain from a database when it comes to implement all the features blockchain technology claims as a trusted decentralized persistent transfer of value & business processes automation technology. My journey developing a supply-chain blockchain platform by Nife Oluyemi Explore my journey through creating a supply chain transparency platform using blockchain. I will talk about knowledge gained, resources used, design decisions, system architecture, and tools used. Mary

    4
  • Bit Manipulation Hacks in the Wild & Fast clustering & visualization of big data

    Hello all, I would like to welcome you all to the next meetup of KW Intersections on November 13th at 7 pm. This will be our 50th meetup. Incredible how time flies. We will have two times. The first talk, titled "Bit Manipulation Hacks in the Wild; Time Series and Granger Causality" will be given by Avishalom (Vish) Shalitm Head of Data Science at Kik. The second talk titled "Fast clustering and visualization of big data" will be given by Daniel Ashlock, Professor of Mathematics at the University of Guelph. I am looking forward to seeing you all on November 13th. &&&&&& Title: Bit Manipulation Hacks in the Wild; Time Series and Granger Causality. (Featuring SQL and math) Speaker: Avishalom (Vish) Shalit Head of Data Science Kik Bits are back! A favourite interview topic from a decade ago turns out to have been important all along. When your terabytes of data are stored in the cloud, you could do push a lot of the data crunching right to the query retrieving your data with MATH(!!!). In this session we will go over some representation and manipulation methods and their uses in time series analyses. We will discuss real world applications, calculating the Granger causality between various time series very efficiently; several orders of magnitude better. We explore fast alternatives to windowing aggregate functions. Cooler than 0x5F3759DF. &&&&&& Title: Fast clustering and visualization of big data. Speaker: Daniel Ashlock Professor of Mathematics, University of Guelph The talk presents and off-line/on-line technique for clustering data sets that scales well to big data sets and which can work transparently with high-dimensional data. The technique is based on point packing, a technique that arises from the theory of error correcting codes. A set of data of the the sort being clustered are selected so that they are well spaced out in the data space; this selection process, point-packing in the data space, is the off-line part of the process. These points are called the cluster centers. Clustering is then performed by binning data by which of the selected points they are nearest to. The resulting clustering is of moderate quality but requires linear time to generate, making it very fast; this is the on-line portion of the process. Clustering of this sort can then be used to generate a simple visualization of the data. Clusters become nodes in a network with the links in the network generated between small numbers of nearest neighbors among the cluster centers. The cluster centers are placed in the plane via non-linear projection or any of a variety of other algorithms. The network then forms a 2D picture of the data. These techniques are based on sophisticated mathematics but are accessible to anyone familiar with simple coding. This technique not only visualizes the data but highlights density anomalies in the data.

    2
  • Security information Event Management applications of ML & Scalable ML pipelines

    Hello, I would like to welcome you all to the October Meetup of KW Intersections on October 9th. This meetup will be hosted at Trustwave and we will have two talks. The first talk will be presented by Rudra Sharma by Trustwave and is titled "Security information event management applications of ML" and the second talk is titled "Understanding scalable ML pipelines" and is presented by Mary Loubele Security information event management applications of ML by Rudra Sharma The summary of this talk follows shortly Understanding scalable ML pipelines by Mary Loubele In this talk we discuss what happens with a machine learning model when the data scientist hands it over to the Data Engineering team for productionization. We will explain where Docker and Kubernetes come into play and how you host such a scalable solution in your cloud provider.

  • Evolutionary computations and Word embeddings: Feature learning in NLP

    I would like to welcome you all to the September meetup of KW Intersections. In this session we will have two talks. Justin Schonfield will talk about Evolutionary Computations and Mandy Gu from Kiite will talk about Word Embeddings. Also thank you very much for Kiite for sponsoring our pizza for the meetup. Evolutionary Computation: What, When, and How? by Justin Schonfield Evolutionary computation comprises a family of biologically inspired population based optimization techniques. This talk has two goals: to provide an overview of evolutionary computation and to dive a bit more deeply into the advantages and mechanics of these algorithms. The first part of the talk will provide an introduction to a few of the major approaches in evolutionary computation: genetic algorithms, genetic programming, and differential evolution among others. Showing how each technique works and when you might want to use it. The second part of this talk will look at fitness landscapes and discuss how evolutionary search can find robust solutions as well as the role representation plays in shaping these evolutionary search landscapes. Word Embeddings: Feature Learning in NLP by Mandy Gu "Word Embeddings" are a collection of methods used to map vocabulary words and phrases onto the set of real numbers. Ranging from methods rooted in deep learning, unsupervised learning to probabilistic modelling, these embeddings serve as the feature set for natural language processing.

    9
  • Predict Text Similarity using Sentence Repr. & Privacy focused decentralized db

    Hello all, I would like to invite you all for the next meetup on August 14th at Trustwave. We will be having two talks. The first talk will be about Predicting Text Similarity using Sentence Representations presented by Angela Zhao. The second talk will be about HermitDB - A privacy focused decentralized database replicated over Git presented by David Rusu. Predicting Text Similarity using Sentence Representations by Angela Zhao ------------------------------------------------------------------------------------------------------ In this talk, we will go over a model that predicts text similarity using an unsupervised approach. Our work is largely based on the 2018 paper “An Efficient Framework for Learning Sentence Representations” authored by Lajanugen Logeswaran & Honglak Lee, who created the Quick-Thoughts (QT) model. The QT model is an enhanced version of Skip-Thoughts (ST) model proposed by Ryan Kiros et al. in 2015. We will go over the differences between the models and show a comparison of the results using each one. This talk is for anyone interested in NLP and/or machine learning in general. Link to paper: https://arxiv.org/abs/1803.02893 Link to their code: https://github.com/lajanugen/S2V About Messagepoint ----------------------------- Omni channel customer communication is a fundamental requirement for any business across all industry verticals. The communication artifacts (such as letters, notices, statements) are regularly produced at scale by organizations to communicate with their end users. These artefacts are data driven, and must be 'correct' when measured against a number of dimensions such as content, message, language etc. The volume and variety of information makes it difficult to ensure consistency across client communication process, where the main motivation is to preserve the semantics of client messages across a set of variations (such as vocabulary, language etc.). The Messagepoint's AI team is mandated to create a content analytics engine tailored for the CCM/CX industry. HermitDB - A privacy focused decentralized database replicated over Git by David Rusu ----------------------------------------------------------------------------------------------------- We now understand how to build decentralized systems better then we ever did, but the kinds of applications that we use to organize our life, (like password managers) have not really improved yet. HermitDB is a database that replicates over a user provided distributed log (like Git) meaning we can build applications that give users agency over their data. In this talk I'll be going over the design of HermitDB focusing on CRDT's and Log Replication. http://hermitdb.com

    4
  • AutoFraud Detection w Unsupervised ML & Data Driven Monitoring

    I would like to welcome you all to the meetup of July of KW Intersections on July 10th. This meetup is hosted at Sortable. Thank you very much Sortable for providing us the space. We will have two talks. The first talk will be given by Varuna Manevannan and is titled Auto Fraud Detection using Unsupervised Machine Learning and the second talk will be given by Jinane Harmouche and will be called Data Driven Monitoring in Safety Critical Infrastructure. Auto Fraud Detection using Unsupervised Machine Learning by Varuna Manevannan Obtaining labels for data mining problems is costly and time consuming sometimes even infeasible. I explain how one applies an unsupervised spectral ranking method for anomaly detection using an auto insurance claim dataset. In this talk, I shall discuss how spectral optimization can be used for be used to solve an unsupervised SVM problem. I demonstrate that the rst non-principal eigenvector of a Laplacian matrix is linked to a ​bi-class classication strength measure which can be used to rank anomalies. While ignoring the labels from the auto insurance claim data set when generating ranking, we notice that our proposed SRA signicantly surpasses existing outlier-based fraud detection methods. Prior knowledge of the algorithm is not necessary, but familiarity with SVM and LU decomposition is helpful. Data Driven Monitoring in Safety Critical Infrastructure by Jinane Harmouche PhD. The autonomous monitoring and control of critical infrastructure and industrial systems through sensor measurements is one of the most important applications of a smart city. The objective is to develop low-cost, accurate and real-time event monitoring solutions, which is challenged by increased systems complexity and scale. Innovative solutions are centered around two main axes: automated inspection and advanced sensing technologies, and efficient data processing and modelling. Good examples of applications are (1) leak detection in water distribution networks, (2) damage detection in bridges, (3) machinery diagnostics. I will talk about the work we are doing at the Structural Dynamics, Identification and Control group at Univ of Waterloo, to develop data-driven monitoring solutions for infrastructure

    2