• Containerized Architectures for Deep Learning +KubeFlow, TFX, TensorFlow,Airflow

    Agenda Talk 1: Containerized Architectures for Deep Learning Container and cloud native technologies around Kubernetes have become the de facto standard in modern ML and AI application development. And while many data scientists and engineers tend to focus on tools, the platform that enables these tools is equally important and often overlooked. Let’s examine some common architecture blueprints and popular technologies used to integrate AI into existing infrastructures, and learn how you can build a production-ready containerized platform for deep learning. In particular, she explores Docker and Kubernetes, with its associated cloud native technologies, and its use and advantages in ML/AI environments. Talk 2: Hands-on Learning with KubeFlow, TFX, TensorFlow, GPU/TPU, Kafka, Scikit-Learn and JupyterLab running on Kubernetes. Date/Time 9-10am US Pacific Time (Third Monday of Every Month) ** RSVP & LOGIN HERE ** Eventbrite: https://www.eventbrite.com/e/1-hr-free-workshop-pipelineai-gpu-tpu-spark-ml-tensorflow-ai-kubernetes-kafka-scikit-tickets-45852865154 Meetup: https://www.meetup.com/Advanced-KubeFlow/ Zoom: https://zoom.us/j/690414331 Webinar ID:[masked] Phone: [masked] (US Toll) or [masked] (US Toll) Related Links PipelineAI Home: https://pipeline.ai PipelineAI Community Edition: https://community.pipeline.ai PipelineAI GitHub: https://github.com/PipelineAI/pipeline PipelineAI Quick Start: https://quickstart.pipeline.ai Advanced KubeFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-KubeFlow/ YouTube Videos: https://youtube.pipeline.ai SlideShare Presentations: https://slideshare.pipeline.ai Slack Support: https://joinslack.pipeline.ai Email Support: [masked]

  • KubeFlow +Keras/TensorFlow 2.0 +TF Extended (TFX) +Kubernetes +Airflow +PyTorch

    **Title** Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU RSVP: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227 **Description** In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. Airflow is the most-widely used pipeline orchestration framework in machine learning. **Pre-requisites** Modern browser - and that's it! Every attendee will receive a cloud instance Nothing will be installed on your local laptop Everything can be downloaded at the end of the workshop **Location** Online **Agenda** 1. Create a Kubernetes cluster 2. Install KubeFlow, Airflow, TFX, and Jupyter 3. Setup ML Training Pipelines with KubeFlow and Airflow 4. Transform Data with TFX Transform 5. Validate Training Data with TFX Data Validation 6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow 7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow 8. Analyze Models using TFX Model Analysis and Jupyter 9. Perform Hyper-Parameter Tuning with KubeFlow 10. Select the Best Model using KubeFlow Experiment Tracking 11. Reproduce Model Training with TFX Metadata Store and Pachyderm 12. Deploy the Model to Production with TensorFlow Serving and Istio 13. Save and Download your Workspace **Key Takeaways** Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using modern frameworks and open-source tools. RSVP: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227

  • KubeFlow +Keras/TensorFlow 2.0 +TF Extended (TFX) +Kubernetes +Airflow +PyTorch

    **Title** Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU RSVP: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227 **Description** In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. Airflow is the most-widely used pipeline orchestration framework in machine learning. **Pre-requisites** Modern browser - and that's it! Every attendee will receive a cloud instance Nothing will be installed on your local laptop Everything can be downloaded at the end of the workshop **Location** Online **Agenda** 1. Create a Kubernetes cluster 2. Install KubeFlow, Airflow, TFX, and Jupyter 3. Setup ML Training Pipelines with KubeFlow and Airflow 4. Transform Data with TFX Transform 5. Validate Training Data with TFX Data Validation 6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow 7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow 8. Analyze Models using TFX Model Analysis and Jupyter 9. Perform Hyper-Parameter Tuning with KubeFlow 10. Select the Best Model using KubeFlow Experiment Tracking 11. Reproduce Model Training with TFX Metadata Store and Pachyderm 12. Deploy the Model to Production with TensorFlow Serving and Istio 13. Save and Download your Workspace **Key Takeaways** Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using modern frameworks and open-source tools. RSVP: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227

  • KubeFlow +Keras/TensorFlow 2.0 +TF Extended (TFX) +Kubernetes +Airflow +PyTorch

    **Title** Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU RSVP: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227 **Description** In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. Airflow is the most-widely used pipeline orchestration framework in machine learning. **Pre-requisites** Modern browser - and that's it! Every attendee will receive a cloud instance Nothing will be installed on your local laptop Everything can be downloaded at the end of the workshop **Location** Online **Agenda** 1. Create a Kubernetes cluster 2. Install KubeFlow, Airflow, TFX, and Jupyter 3. Setup ML Training Pipelines with KubeFlow and Airflow 4. Transform Data with TFX Transform 5. Validate Training Data with TFX Data Validation 6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow 7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow 8. Analyze Models using TFX Model Analysis and Jupyter 9. Perform Hyper-Parameter Tuning with KubeFlow 10. Select the Best Model using KubeFlow Experiment Tracking 11. Reproduce Model Training with TFX Metadata Store and Pachyderm 12. Deploy the Model to Production with TensorFlow Serving and Istio 13. Save and Download your Workspace **Key Takeaways** Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using modern frameworks and open-source tools. RSVP: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227

  • Using SQL compliant applications and code to get the most out of Hadoop data

    Using SQL compliant applications and code to get the most out of Hadoop data Abstract: SQL has long been the most widely used language for big data analysis. The SQL-on-Hadoop ecosystem is loaded with both commercial and open source alternatives, each offering tools optimized for various use cases. Fledgling analytical engines are in incubation, but are they ready to become full-fledged members of your enterprise infrastructure? Are they ready to fly? In the real world, enterprises must understand their needs and select a SQL-on-Hadoop solution that addresses them. Points to consider: What are your analytics use cases-will a single user be working on data discovery or will multiple users perform daily analytics? Will you need to modify SQL to adjust to different deployment scenarios, or does a single solution exist for on-premises, Cloud, and Hadoop? Can a single solution support a variety of workloads from quick-hit dashboards to complex, resource-intensive, join-filled queries? Deriving value from open source innovation, while keeping your deployment options open, requires a field-proven and extensive SQL analytical database. Bios: Ben Smith is a Manager on the Vertica Product Marketing team, where he supports go-to-market planning, content development, marketing campaigns, competitive research, and more. Ben is a Vertica product evangelist, has a background in Iota and a passion for clean energy and smart building analytics. Bryan Whitmore, Field Chief Technologist, Vertica Bryan credits smart customers, who challenge vendors and tackle tough applications, with driving innovation. Bryan leverages 6+ years’ experience with Vertica’s most innovative and technically demanding customers, and 20+ years of Presales and Product Management experience gained from successful startups (Vertica Systems, Acoria Networks, and Arrowpoint), to identify and influence industry trends and ongoing product development. Bryan’s prior experience spans Networking, Systems, Storage, Applications, and high-value data architectures within Financial Services, Telecom, Gaming, and Social Media. Agenda: 6:30 - Drinks/Apps and Networking 7 - Introduction 7:15 - Talks 8:30 - Wrap up/shut down Getting Here: AddThis HQ is located next to the Silver lines' Spring Hill Metro station. Free parking is also available. If you have trouble getting in please call Brad at[masked] Food: AddThis will provide beer, sodas, and HPE Vertica will sponsor the food. There are plenty of local bars available if people would like to continue the discussion after the talk. Sponsors: HPE Vertica

    10
  • Easy Time Series Analysis with Riak TS, Python, Pandas & Jupyter

    Summary: Handing Time Series data in highly available and scalable way has always been a significant challenge. In this session we will introduce you to the newly open-sourced Riak TS (Time Series) database and show you how to get started and how to analyze time series data with Python, Pandas and Jupyter. We will explain how RiakTS has been designed specifically for handling vast amounts of time series data, and will also show you how to utilize Python for time series data analysis. During the demonstration phase we will use Jupyter to write and execute Python code that: • Creates a table in Riak TS • Populates the table with data • Queries the data that we added • Uses Pandas to visualize the results of our queries Biography: Craig Vitter, Solutions Architect, Basho Craig has been designing and building data driven applications for more than twenty years. At Basho, Craig works with customers looking to architect and build applications on top of Riak KV and Riak TS. Agenda: 6:30 - Drinks/Apps and Networking 7 - Introduction 7:15 - Talks 8:30 - Wrap up/shut down Getting Here: AddThis HQ is located next to the Silver lines' Spring Hill Metro station. Free parking is also available. If you have trouble getting in please call Brad at[masked] Food: AddThis will provide beer, sodas, and Basho will sponsor the food. There are plenty of local bars available if people would like to continue the discussion after the talk. Sponsors: AddThis, Basho

    6
  • Creating Your First Predictive Model in Python

    Summary: If you’ve been reading books and blog posts on machine learning and predictive analytics and are still left wondering how to create a predictive model in Python and apply it to your own data, this presentation will give you the steps and code you need to do just that. You'll learn how to go from raw data to a trained predictive model you can implement in a production system, and then how to implement it in production. Biography: Robert Dempsey Robert Dempsey is tested leader and technology professional delivering solutions and products to solve tough business challenges. His experience forming and leading agile teams combined with more than 15 years of technology experience enable me to solve complex problems while always keeping the bottom line in mind. He's founded and built three startups in tech and marketing, developed and sold online applications, consulted to Fortune 500 and Inc. 500 companies, and spoken nationally and internationally on software development and agile project management. In addition, he's the author of the soon-to-be-released "Python Business Intelligence Cookbook". Agenda: 6:30 - Drinks/Apps and Networking 7 - Introduction 7:15 - Talks 8:30 - Wrap up/shut down Getting Here: AddThis HQ is located next to the Silver lines' Spring Hill Metro station. Free parking is also available. If you have trouble getting in please call Brad at[masked] Food: AddThis will provide beer, sodas, and food. There are plenty of local bars available if people would like to continue the discussion after the talk. Sponsors: AddThis (http://www.addthis.com/)

    28
  • Analyzing Semi-Structured Data at Volume in the Cloud

    Summary: Analyzing Semi-Structured Data at Volume in the Cloud The Cloud, Mobile and Web Applications are producing semi-structured data at an unprecedented rate. IT professionals continue to struggle capturing, transforming, and analyzing these complex data structures mixed with traditional relational style datasets using conventional MPP and/or Hadoop infrastructures. Public cloud infrastructures such as Amazon and Azure provide almost unlimited resources and scalability to handle both structured and semi-structured data (XML, JSON, AVRO) at Petabyte scale. These new capabilities coupled with traditional data management access methods such as SQL allow organizations and businesses new opportunities to leverage analytics at an unprecedented scale while greatly simplifying data pipeline architectures and providing an alternative to the "data lake". Please join Big Data DC and Snowflake Computing for a discussion of these topics and a demonstration of this game changing technology. The demonstration will focus on analyzing structured and semistructured together using a commercially available cloud based platform and standards based SQL language to provide insights on large petabytes scale data sets. Biography: Kevin Bair Kevin Bair is a Solution Architect with extensive experience working with both federal and large commercial organizations over the last 25 years. He has a background in application development, database and content management, virtualization, and operational analytics. His career includes 15 years working for IBM Software Group, ITIL certification, and development of a patent related to Big Data on a virtualized network. Kevin is currently a Solution Architect at Snowflake Computing helping clients and business partners develop enterprise class solutions on AWS using Snowflake's Cloud-based Elastic Data Warehouse. Agenda: 6:30 - Drinks/Apps and Networking 7 - Introduction 7:15 - Talks 8:30 - Wrap up/shut down Getting Here: AddThis HQ is located next to the Silver lines' Spring Hill Metro station. Free parking is also available. If you have trouble getting in please call Brad at[masked] Food: AddThis will provide beer and sodas and Snowflake will provide food. There are plenty of local bars available if people would like to continue the discussion after the talk. Sponsors: AddThis (http://www.addthis.com/), Snowflake (http://www.snowflake.net/)

    22
  • Data Science meets Software Development AND Is Apache Spark Ready for the Cloud?

    Hi Big Data DC, we have a double feature co-organized with Washington DC Area Apache Spark Interactive (http://www.meetup.com/Washington-DC-Area-Spark-Interactive/) for you. Summary: Data Science meets Software Development Alexis works in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed. With his fellow Developers, Alexis' goal is to bring these processes to production. Developers and Scientists have very different constraints: developers want the code to be versioned, to be tested, to be deployed automatically and to produce logs. Developers also need it to run in production on distributed architectures(Spark, Hadoop, …), with fixed versions of languages and frameworks (Scala…), and with data that changes every day. In this talk, Alexis will explain how Developers work hand-in-hand with Data Scientists to shorten the path to running data workflows in production. Biography: Alexis Seigneurin @ASeigneurin Software engineer for 15 years and consultant for Ippon Technologies (http://www.ipponusa.com). Ippon delivers Digital, Big Data and Cloud applications on top of proven Java expertise in the US and France. Throughout many projects, Alexis explored many aspects of data management - cleansing, processing, indexing, reporting… - with many languages, frameworks, systems and databases. Alexis has used Spark since early 2014 and specifically Spark with Cassandra when working on real-time reporting applications. Summary: Is Apache Spark Ready for the Cloud? Many technology companies are turning to the cloud for scalable, elastic infrastructure to store and analyze user behavior data, information from wearables, sensor data, and more. However, running big data tools, like Apache Spark, in the cloud presents a host of challenges. Spark is typically deployed in a dedicated data center as a next step in an organizations big data deployment strategy to gain deeper and faster insights. However, as the advantages of big data in the cloud become more apparent and gain wider adoption, can organizations also reap the benefits of Spark as a service without sacrificing its primary benefit—speed? In other words, is Spark ready for the cloud? In this presentation, Praveen Seluka, a software engineer and Apache Spark expert at Qubole, will outline how the combination of Apache Spark and AWS can be implemented to ensure high performance, based on real-world experience. The audience will learn how to effectively deploy Spark in the cloud, key technological challenges with delivering it as a service that can scale and deliver the performance Spark was designed to deliver, and the important benefits that can be achieved through Spark as a Service. Speaker Bio: Praveen Seluka is a software engineer at Qubole. Prior to Qubole, Praveen worked as a software engineer at Microsoft and Yahoo. Praveen has won several coding competitions such as Kaggle and Codechef. He has a master’s degree in information systems from the Birla Institute of Technology and Science, Pilani. Agenda: 6:30 - Drinks/Apps and Networking 7 - Introduction 7:15 - Talks 8:30 - Wrap up/shut down Getting Here: AddThis HQ is located next to the Silver lines' Spring Hill Metro station. Free parking is also available. If you have trouble getting in please call Brad at[masked] Food: AddThis will provide beer and sodas and Ippon and Qubole will provide food. There are plenty of local bars available if people would like to continue the discussion after the talk. Sponsors: AddThis (http://www.addthis.com/), Ippon (http://www.ippon.fr/), Qubole (http://www.qubole.com/)

    16