• Welcome HDP 3.0

    WeWork 31 St. James Ave

    ** WeWork location requires a full name and a valid photo ID for entry. ** Registration required for this event: https://www.eventbrite.com/e/future-of-data-boston-welcome-hdp-30-tickets-48495457218 DESCRIPTION Come learn what’s new and improved in the just-released open source Hadoop 3 distribution HDP 3.0! Containerization, GPU pooling, NameNode federation for huge clusters, major improvements to Atlas, Hive 3.0, LLAP configuration, management through Ambari 2.7, and up to 50% reduction in storage costs for archive data with erasure coding in Hadoop 3.1 are just some of the high-impact updates in this major update to the connected data platform. AGENDA 6:00 – 6:30pm: Networking and Food 6:30pm: Welcome to Future of Data Meetup 6:35pm: Welcome HDP 3.0 presentation Q&A 8:00pm Wrap up About the Speaker William Brooks, Solutions Engineer, Hortonworks Bill Brooks has been modeling, managing and integrating data since 1995, beginning at CID Associates developing application databases, then at Children's Hospital Boston as manager of the Decision Support Systems Group. He managed data integration before becoming Enterprise Data Architect for MFS Investment Management, then served as Global Chief Data Architect for Mercer, where developed a firmwide data architecture practice and drove the creation of a shared big data and advanced analytics program. Bill's background includes traditional relational database design, data warehouse design and implementation, ETL, messaging and ESBs, and Hadoop and Spark-based analytics. Bill is now a Solution Engineer at Hortonworks, specializing in Data Governance and architecture for Hadoop and Spark solutions and serves on the board of Information Quality International (IQI/IAIDQ). WeWork Location * Security Requirements: Register at EventBright link above. Bring a photo ID to the meetup. * The nearest "T" Stop is Arlington Station. * Location info: https://www.wework.com/buildings/st-james--boston--MA

  • Enterprise Data Science at Scale

    IBM Watson Health

    Due to the venue being a highly-secure facility, all attendees MUST register via this Eventbrite link (https://www.eventbrite.com/e/enterprise-data-science-at-scale-tickets-39928064917). Thanks for your cooperation! ********** Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production? Why Data Science on Big Data? In this meetup you will cover the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of Apache Zeppelin, Apache Spark, Apache Livy and Apache Hadoop with the focus on integration, security and model deployment and management. Data Science at Scale DEMO The demo will cover the Data Science life cycle: develop model in team environment, train the model with all the data on a Hadoop cluster, deploy model into production. The model will be a Spark ML model. Practical ML Topic: TBD Agenda: 6:00 – 6:30pm: Networking and Pizza 6:30pm: Introducing Data Science at Scale Building and Deploying Models Collaboratively with DSX Training Models with all the Data Putting Models to Work in a Streaming Application Q&A BIOs: Rich Tarro, Solutions Architect, IBM Corporation Rich Tarro helps clients gain insight into data through the application of Analytics, Data Science, Information Governance, and Cloud technologies. Rich has an MS in Electrical Engineering from Rensselaer Polytechnic Institute and has worked his entire professional career at IBM. His roles at IBM have encompassed Chip Design, Hardware Architecture, Data Warehousing, Information Management Architecture, Big Data, Apache Spark and Machine Learning. Carolyn Duby, Solutions Engineer, Hortonworks Carolyn Duby helps organizations harness the power of their data with Apache open source platforms. Prior to joining Hortonworks she was the architect for cyber security event correlation at SecureWorks. Ms. Duby earned a ScB Magna Cum Laude and ScM from Brown University in Computer Science. She recently completed the Johns Hopkins University Coursera Data Science Specialization. With a diverse experience working for small companies, startup companies, large companies, and for herself, she has a passion for challenging data intensive systems. *Parking is not available at this location. Attendees can park at the CambridgeSide Galleria or one of the parking lots in the area. The facility is also accessible using public transportation.

    6
  • Unlocking Insights in Streaming Data with Open Source Solutions

    Streaming data is rich with insights but these insights can be difficult to find due to the difficulty of developing and deploying streaming applications. During this presentation we will show how to build and deploy a complex streaming application in a few minutes using open source tools. First we will build an application using Streaming Analytics Manager and Schema Registry that ingests data into Apache Druid. Then we will use Apache Superset to build beautiful, informative dashboards. Agenda: 6:30 PM: Networking, food and drink 6:45 PM: Announcements 7:00 PM - 8:00 PM: Carolyn Duby Presentation and Demo: Unlocking Insights in Streaming Data with Open Source Solutions 8:00 - 8:30: Networking and Wrap up Biography: Carolyn Duby, Solutions Engineer, Northeast, Hortonworks Carolyn Duby helps organizations harness the power of their data with Apache open source platforms. Prior to joining Hortonworks she was the architect for cyber security event correlation at SecureWorks. Ms. Duby earned a ScB Magna Cum Laude and ScM from Brown University in Computer Science. She recently completed the Johns Hopkins University Coursera Data Science Specialization. With a diverse experience working for small companies, startup companies, large companies, and for herself, she has a passion for challenging data intensive systems. For more information about the location see: https://pivotal.io/locations/boston

    3
  • The Future of Spark, Spark Intro, and Spark with RapidMiner

    The Future of Spark, Spark Intro and Spark with Rapid Miner Agenda: 6:15 PM - 6:30 PM: Networking, food and drink 6:30 PM: Announcements 6:30 PM - 7:15 PM: Carolyn Duby Brief introduction to Spark. See Apache Spark at work detecting credit card fraud. 7:15 PM - 8:00: Yuanyuan Huang Spark with RapidMiner demo Wrap Up Q&A 8:00 PM - 8:30 PM: Networking Biographies: Carolyn Duby, Solutions Engineer, Northeast, Hortonworks Carolyn Duby helps organizations harness the power of their data with Apache open source platforms. Prior to joining Hortonworks she was the architect for cyber security event correlation at SecureWorks. Ms. Duby earned a ScB Magna Cum Laude and ScM from Brown University in Computer Science. She recently completed the Johns Hopkins University Coursera Data Science Specialization. With a diverse experience working for small companies, startup companies, large companies, and for herself, she has a passion for challenging data intensive systems. Yuanyuan Huang, Sales Engineer, Resident Data Scientist, RapidMiner Yuanyuan(YY) Huang is a resident data scientist for RapidMiner. She received her PhD from Iowa University in Biomathematics, Bioinformatics and Computational Biology. She has previously written on the Simulation for Yeast Cooperation in 2D. Currently YY is working on a project for text mining, predictive maintenance, fraud detection, customer prediction, and web analytics.

    11
  • Interactive Data Science with Spark and Zeppelin

    Join us as we do an introduction to modern Data Science at scale. We will review the concepts and drivers behind the Apache Spark project before diving into an interactive lab. This lab will be done through the Zeppelin web notebook. If you want to join into the interactive portion of this demo please download the sandbox and virtual box ahead of time. http://hortonworks.com/products/sandbox/ APACHE SPARK Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. http://hortonworks.com/apache/spark/ APACHE ZEPPELIN Apache Zeppelin is a web-based notebook which brings data exploration, visualization, sharing and collaboration features to Spark. It supports Python,Scala, Java, Hive, SparkSQL, shell, markdown, and several other languages. http://hortonworks.com/apache/zeppelin/ Objectives of the lab: • Understand how to load data • Clean data to produce an easy to use dataset • Graphing • Machine learning - Review

    7
  • Apache NiFi - MiNiFi: Taking Dataflow Management to the Edge

    Microsoft New England Conference Center

    Meeting is @ 255 Main Street in Cambridge, MA, please go to the MSFT entrance through the glass door entrance (take elevator to second floor) - on the corner of Main and Broadway. Target Audience: Intermediate Agenda: 6:00 PM - 6:30 PM: Food, drinks, mingling 6:30 PM - 6:45 PM: Dan Rice Announcements, call for presenters, future events 6:45 PM - 7:45 PM: Joseph Percivall MiNiFi is a recently started sub-project of Apache NiFi that is a complementary data collection approach which supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. Simply, MiNiFi agents take the guiding principles of NiFi and pushes them to the edge in a purpose built design and deploy manner. This talk will focus on MiNiFi's features, go over recent developments and prospective plans, and give a live demo of MiNiFi. Wrap Up Q&A Bio: Joseph Percivall Software Engineer and ASF PMC Member After having spent multiple years as a government contractor (US DoD) Joseph has since switched to join Hortonworks to create 100% open-source solutions. He has a passion for Enterprise Dataflow, Internet of Things and the intersection of the two. As an emerging speaker Joseph has presented at various venues including local user groups and an international conference.

    4
  • Apache Phoenix and HBase: Past, Present and Future of SQL over HBase

    7:00 PM - 7:15 PM: Chris Gambino Announcements, call for presenters, future events 7:15 PM - 8:15 PM: Enis Soztutar, Member of Technical Staff at Hortonworks / Apache HBase PMC Apache Phoenix and HBase: Past, Present and Future of SQL over HBase HBase as the NoSQL database of choice in the Hadoop ecosystem has already been proven itself in scale and in many mission critical workloads in hundreds of companies. Phoenix as the SQL layer on top of HBase, has been increasingly becoming the tool of choice as the perfect complementary for HBase. Phoenix is now being used more and more for super low latency querying and fast analytics across a large number of users in production deployments. In this talk, we will cover what makes Phoenix attractive among current and prospective HBase users, like SQL support, JDBC, data modeling, secondary indexing, UDFs, and also go over recent improvements like Query Server, ODBC drivers, ACID transactions, Spark integration, etc. We will conclude by looking into items in the pipeline and how Phoenix and HBase interacts with other engines like Hive and Spark. Wrap Up Q&A Bio: Enis Soztutar is a committer and PMC member of Apache HBase, Phoenix and Hadoop projects and a member of the Apache Software Foundation. He has been using and developing Hadoop ecosystem projects since 2007. He is currently working as a lead at Hortonworks, HBase engineering.

    3