- When Customers Organize Products - Graph Theory in Practice
Abstract: Retailers have many methodologies for grouping their products together, some may use a merchant driven hierarchy, while others use a hierarchy dictated by Marketing Strategy. Either way, these product groupings influence a number of critical decisions that each retailer must make, such as how products are brought to market, how they are advertised, and how they are discounted. This talk will focus on an alternative method for developing a product hierarchy, a customer driven approach. For demonstration purposes, we will construct a toy data set to represent customer sales data, from which we will construct Random Intersection Graphs. These graphs relate products to one another via transactional history and their projections will be used to create graphs that provide alternative underlying structures for product relationships. The insights driven from uncovering these latent structures in product relationships can assist in driving strategies throughout the business. This talk will focus on how one can determine a customer’s level of brand loyalty when making their purchases. Bio: Tipan Verella is a Data Scientist in Marketing’s Advanced Analytics organization at Kohl’s, where his coffee fueled days are spent doing data engineering/wrangling/analysis, as well as building models that serve as the foundation for executive strategy. Prior to his tenure at Kohl’s, Tipan worked in AdTech, for companies such as Millenial Media and AOL (both now Verizon), primarily focusing on the performance prediction of click-through and conversion rates of online advertisements. Tipan is finishing his PhD work in Systems and Information Engineering at the University of Virginia, where he researched latent structures of complex behavioral systems using tools and techniques from probability and graph theory. Tipan is a proud Marquette University faculty spouse, and Highland Community School parent, with a penchant for mathematics and programming in Python. Sponsors: I would like to thank American Family for the food and Cloudera for an after meetup round of drinks.
- Managing and Governing your Data Assets from Edge to AI
Abstract: From autonomous vehicles and surgical robots to churn prevention and fraud detection, enterprises rely on data to uncover new insights and power world-changing solutions. Organizations today require multi-function analytics capabilities across all data types and sources to eliminate silos and speed the discovery of data-driven insights. With this flexibility also comes the responsibility of ensuring that your data is secured and governed across any infrastructure, on-prem, cloud or a hybrid from Edge to AI. In this discussion, we will walk through how organizations are leveraging open source Apache projects such as Ranger and Atlas to provide granular, dynamic, role and attribute-based security policies to prevent unauthorized access to sensitive or restricted data access. Along with security policies, we will also discuss how enterprise are leveraging auditing and governance capabilities to drive compliance with GDPR, CCPA and other privacy regulations. BIO: Muki Soomar is a solution engineer with Cloudera helping organizations with their digital transformation journey using cutting edge technologies within the Big Data open source eco-system for real-time and historical analytics. Muki is passionate about using innovative technologies that can help solve complex business problems enabling businesses to derive real-time actionable insights through Predictive and Prescriptive Analytics using CEP and ML; analyzing business events that are generated from not only internal transactional systems but also from external data sources such as social media, mobile devices and many others that the businesses have to react to every day. Muki has over 20 years of experience in the IT industry and has worked in many different roles. He has successfully delivered many complex projects working across many different verticals, including Automotive, Finance, Insurance, Airlines, Medical Device Manufacturing, Healthcare, Market Research and Retail to name a few. Across these verticals, Muki has designed and implemented business solutions using SOA, data integration, complex events processing (CEP), Master Data Management (MDM) and Modern Data Architectures using technologies from the Hadoop ecosystem. Prior to Cloudera, Muki worked at Hortonworks, Software AG, TIBCO Software, Rush Medical University, Allstate Insurance, CNA Insurance, RouteOne, Ford Motor and Chrysler. Muki has three graduate degrees - an M.Sc in Mechanical Engineering from Queen's University, Canada, an MS in Engineering Mechanics from Michigan State University, East Lansing and and MS in Computer Science from University of Michigan, Dearborn. Sponsors: I would like to thank Cloudera for both the food and an after meetup round of drinks.
- AI Research at American Family Insurance
Abstract: American Family Insurance has a strong relationship with data science and AI research. We've collaborated with partners at various universities and presented many papers at conferences on topics from computer vision to natural language processing. Recently, we've announced major support of the new Data Science Institute at the University of Wisconsin-Madison. As a research data scientist, I am in a unique position mediating between business sponsors, product owners, partners at universities and cutting-edge research. For this talk, I will offer insight into the research process at American Family and how we integrate new research into products. After introducing the general process, I will talk about some recent work involving knowledge graph-driven workflows for entity refinement for chatbots. Finally, I will speak to some general lessons and best practices (borrowed from software engineering) for bringing ideas from research into production at enterprise scale. Bio: Devin Conathan joined the machine learning research team at American Family Insurance as an intern in the summer of 2016 and came on full-time the following year. He has undergraduate degrees in mathematics and philosophy from Cornell University and a masters in electrical engineering with a focus on optimization and active learning research from the University of Wisconsin-Madison. For work he enjoys developing full-stack solutions that use state-of-the-art machine learning algorithms for industry-quality applications and research. His recent work includes implementing active learning for text annotation, CNNs for chatbot intent-classifiers, and building out an AI-driven knowledge graph platform for powering knowledge-rich applications at American Family. In his free time, he enjoys riding his bike, playing music, and reading sci-fi novels. Sponsors: I would like to thank American Family for the food at the meetup and Cloudera for an after meetup round of drinks.
- Predicting and hiding personal information from face images using deep learning
Dr. Sebastian Raschka, Assistant Professor of Statistics at UW-Madison and author of the book Python Machine Learning, will speak on the following topic. Abstract: In the modern digital age, researchers have developed a vast array of genuinely fascinating techniques that can enhance our everyday life. However, as more and more data is collected and extracted, the protection and respect for users' privacy have become a big concern. In the first portion of this talk, I will demonstrate methods for extracting soft-biometric attributes from facial images -- soft-biometric characteristics include a person's age, gender, race, and health status. In particular, as a case study, I will present a new method for predicting age reliably from face images using a convolutional neural network architecture designed for ordinal regression. The second portion of this talk will then focus on a series of convolutional neural network architectures designed to conceal soft-biometric information. To respect and enhance the privacy of users, data sharing, and the risk for unsolicited use of private information should be minimized. However, many useful security-related applications rely on face recognition technology for user verification and authentication. Hence, the approaches being presented focus on a dual objective: concealing personal information that can be obtained from face images while preserving the utility of these images for face matching. Sponsors: I would like to thank American Family for the food at the meetup and Cloudera for an after meetup round of drinks.
- Data-Driven Wisconsin Conference Day 2 (Presentations)
This is an announcement for the fourth annual Data Driven Wisconsin Conference (formerly BigDataWisconsin). Our mission is to foster and grow the data ecosystem here in Wisconsin and the upper Midwest. Our experienced speakers and tutorial leaders with guide software engineers, data scientists, and data professionals through new tools and techniques to grow their careers! Join us in our fourth year as the State's premier big data and advanced analytics conference! This is a two day event that promises to be fun and thought provoking. Please see the Conference website: https://www.datadrivenwi.org for more information and to register. Please note that RSVP'ing to the meetup does not guarantee you a spot at the conference. Cheers! Pitt Fagan ----------------- Keynote Presentations: - How Not to Be Wrong: The Power of Mathematical Thinking Jordan Ellenberg - Bad Algorithms & The Ethical Matrix Cathy O’Neil Data Science Presentations - Statistical Learning Tools for Catastrophic Pediatric Illness Decision Making Natasha Sahr, PhD - Some New Thoughts on Anomaly Detection Rachel Traylor, PhD - Building large ML pipelines in Post GDPR world Mansur Ashraf - Integration of Multiple Data Sets for Disease Forecasting in Wildlife Ecology Alison C. Ketz, PhD - Why Model Fit Statistics Conceal Important Relationships, and What You Can Do About It Brian Barkley, PhD - Non-Probabilistic Recruitment Polling Parker Quinn Data Engineering Presentations - Data-driven Agriculture to Feed the World Yalda Zare, PhD - Scaling with Healthcare: Predictions Meeting Clinicians Where They Are Drew McCombs - Scaling Dynamic SQL Workloads in the Cloud with Presto Harsha Gopu - Predicting and Interpreting Respiratory Risk from Inhaler Sensors and Environmental Data Nicholas Hirons - Parallel Computing in Python with Dask James Bourbeau, PhD - A Self-Healing SolrCloud Built Using Cloud Design Patterns James Strassburg Data in Business Presentations - How Automated AI is Reshapping Analytics Rajiv Shah, PhD - Data Rich to Data Driven: The Amfam Story Susannah Barnes - Robotic Process Automation in the Real World - Designing for Succeess Creamheld Pepito - ElectionVR: An Experiment in Visualizing Election Data in Virtual Reality Steve Brudz - Lingua Frankness: Effective Communication Between Analytics and Strategy Mary Willcock - Technology Alone is Not Enough Brent Leland
- Data-Driven Wisconsin Conference - Day 1 (Tutorials and dinner)
This is an announcement for the fourth annual Data Driven Wisconsin Conference (formerly BigDataWisconsin). Our mission is to foster and grow the data ecosystem here in Wisconsin and the upper Midwest. Our experienced speakers and tutorial leaders with guide software engineers, data scientists, and data professionals through new tools and techniques to grow their careers! Join us in our fourth year as the State's premier big data and advanced analytics conference! This is a two day event that promises to be fun and thought provoking. Please see the Conference website: https://www.datadrivenwi.org for more information and to register. Please note that RSVP'ing to the meetup does not guarantee you a spot at the conference. Cheers! Pitt Fagan --------------- Here are the currently planned Tutorials: Full Day Tutorials - Artificial Intelligence Through Reinforcement Learning Cary Walker - Current Topics in Deep Learning for Natural Language Processing (NLP) Jay Urbain, PhD - Building a Web Clickstream Analytics Application using Open Source Streams Processing Technologies Muki Soomar - Shiny-R for Interactive Data Visualization Kamrul Hasan, PhD Half Day Tutorials - Mass Criminalization, Big Data & Digital Technology Joshua Riebe - A Deep Dive in the Data Lake Mike McWhorter - Big Data & the NBA: Creating a Scalable ML Service Adam Converse - Learn to Build your Own Data Strategy David Williams
- High Performance Networking at Google
Abstract: This is a discussion of High Performance Networking concepts and is targeted towards a technical audience. We start with a quick discussion of the OS implications of ever increasing network speeds. The next section describes a typical Google datacenter, as motivation for thinking about networking challenges at scale. A brief overview of Linux’s general purpose network stack highlights some opportunities for different tradeoffs. Finally RDMA (Remote Direct Memory Access) is discussed in detail. libibverbs examples are used as exemplars for common concepts (out of band establishment, RMA, messaging). Expected benefits of attending this talk will be 1) insight into some of the networking problems are being addressed by large providers (by proxy gains of using a solution like BigQuery) and 2) exposure to RDMA (knowing what it is, what it can do, etc.) Bio: Kevin Springborn has been working on Google Madison's host networking team since 2015. Helping to make billions of lives incrementally better by getting bytes where they need to be as fast and efficiently as possible. Before Google, he got to participate in a couple local startups and a high frequency trading firm. He is a proud graduate of the UW Computer Science department. Sponsors: I would like to thank Google for the food at the meetup and Cloudera for a post-meetup round of drinks.
- Data Science at Comscore
Hi Everyone! I would like to thank Ashish Thusoo of Qubole for a great talk in April! Here is a link to his presentation: https://github.com/Pshrub/Talks-and-Sample-Code/tree/master/2019-04-30_Qubole_Cloud Below is the info on the May meetup. I hope to see you there! Abstract: Comscore records a million internet events each second. Its servers communicate more than a billion mobile devices and PCs each day, making it the second most called domain in the US on a daily basis. Comscore uses this data to report on digital advertisement, website, mobile app, TV, and video consumption around the globe. Big data problems abound. In this talk, I’ll give a fly-by of some of these problems, and discuss statistical tools and infrastructure we use to tackle them. I’ll give a dive deep into how measurement entities (i.e., Comscore), ad networks, and data aggregators measure users across different devices, even when they aren’t logged into a website or app (hint: a data structure called a ‘device graph’). Bio: Dr. Malloy is a Principal Data Scientist and Director at Comscore. Sponsors: I would like to thank Comscore for the food at the meetup and Cloudera for a post-meetup round of drinks.
- CEO/Founder of Qubole discusses Big Data in the Cloud: From Facebook to IoT
Abstract: This presentation will discuss the gap that enterprises face today when trying to monetize their data and scale up to achieve data maturity. We will present several ways enterprises can use to measure their own data activation readiness, and a technical dive into the role that Data Ops plays in achieving a data driven culture. There will be an exploration of the stages that Facebook went through to democratize and self-serve data across the company. The session will also cover usage trends of Qubole's cloud-native big data optimization platform, including customer success stories. About the Presenter: 1997 Graduate MS Computer Science from UW-Madison, Ashish Thusoo co-founded Qubole in 2013 with a mission to close the data accessibility gap. In launching Qubole, he leveraged his experience as part of the original Facebook Data Service Team from 2007 to 2011, to democratize data across an enterprise. Ashish authored many prominent data industry tools during his time at Facebook, including the Apache Hive Project. His goal was to deliver massive speed and scale to the data platform, while providing better self-service access to the data for all users. Ashish and his co-founder Joydeep Sen Sarma built these product principles of speed, scale, and accessibility into the foundation of Qubole. Sponsors: I would like to thank Qubole for the food at the meetup and Cloudera for an after meetup round of drinks.
- Does this AI Look Good on Me? (joint event with Women in Big Data group)
Greetings everyone! Time to announce the next event, which is a special collaboration between this meetup and the Women in Big Data group. Please see below for details on the presentation and the speaker. I would also like to thank the presenters from the prior meetup, Travis Kerr and Justin Stadler for giving an excellent couple of presentations. Here is a link to Justin's presentation on data architecture at ABC Supply: https://97a06988c44343b78b39-my.sharepoint.com/:p:/g/personal/justin_stadler_abcsupply_com/ETgP1NTBvLJNqqKE4OHcGIgBTOpnzjSl4yCSIdbIMJX_vQ?e=GbjmAm Cheers, Pitt Abstract: Does this AI Look Good on Me? Some Ethical and Cultural Concerns About Wearables, Cheating Detection, and Autonomous Vehicles by Jo Ann Oravec, Professor, University of Wisconsin at Whitewater (Information Technology and Supply Chain Management, College of Business and Economics) and Robert F. and Jean E. Holtz Center for Science, Technology, & Society Studies, University of Wisconsin at Madison Bio: Dr. Jo Ann Oravec received her MBA, MS, MA, and PhD degrees at the University of Wisconsin at Madison. She taught computer information systems and public policy at Baruch College of the City University of New York; she also taught in the School of Business and the Computer Sciences Department at UW-Madison as well as at Ball State University. She chaired the Privacy Council of the State of Wisconsin, the nation's first state-level council dealing with information technology and privacy issues. She has written books (including "Virtual Individuals, Virtual Groups: Human Dimensions of Groupware and Computer Networking," Cambridge University Press) and dozens of articles on futurism, ethics, film, artificial intelligence, disability, mental health, technological design, privacy, computing technology, management, and public policy issues. She has worked for public television and developed software along with her academic ventures. She has held visiting fellow positions at both Cambridge and Oxford, was recently a featured speaker at conferences in Japan and Australia, and was recently covered by BBC News and Healthline on cyberhealth issues. Event co-organizer: Women in Big Data: https://www.womeninbigdata.org/ Food Sponsor: I would like to thank Google for the food and drink for the meetup. Space Sponsor: I would like to thank American Family for venue.