• Hands-on Workshop: Introduction to Red Hat's OpenShift Kubernetes distribution

    An introduction to OpenShift, with a lab constructing an n-tier web application on the platform, and details about OpenShift's relationship to upstream Kubernetes SPEAKER: Josh Wood, Developer Advocate, Red Hat LinkedIn: https://www.linkedin.com/in/joshix/ ABSTRACT: # Introduction to Red Hat's OpenShift Kubernetes distribution This workshop prepares web and application developers to build applications using containers, Kubernetes, and OpenShift. We’ll start with an introduction to containers and Kubernetes, the foundation of OpenShift. Using hands-on exercises, we will walk through use cases for OpenShift: How easy it is to deploy existing containers and how to build containers by just providing a git repository with your application source. Want to see easy application scaling - no problem! Wish you could do A/B (aka blue/green) deployments - your wish is our command. Bring your curiosity and learn what you need to know to start building pure awesomeness on OpenShift. NOTE: This is a BRING-YOUR-OWN-LAPTOP lab. SPONSOR: Red Hat

  • Beyond SQL: Building a hybrid analytics/insights system on top of Apache Spark

    TITLE: Beyond SQL: Building a hybrid analytics/insights system on top of Apache Spark ABSTRACT: Over the years the one tool that remained constant across Data X (Engineering, Science, Analysis, etc) has been SQL. What we learned from SQL was how to load, join, project and fetch rows of structured data with the hope of answering questions across many various fields of study and industry. During the course of this tech talk you will learn how to go beyond the traditional capabilities of SQL and understand how to utilize core and less-known features of Spark SQL in order to build a modern analytics and insights engine on top of Spark. - Learn how to load data from various backing sources (eg. common Data Lake with Mixed formats issue) - Learn how to easily join data between streaming and batch jobs. Allowing you the flexibility of working in real-time and with historic data - Level up your Spark skills and learn how to gain insights from your data SPEAKER: Scott Haines, Principal Software Engineer at Twilio Inc. LinkedIn: https://www.linkedin.com/in/scotthaines/

  • Exploring Data Analytics, Image Recognition w/Spark, Keras/TensorFlow & RedisAI

    This is a FREE 1-day Hands-on workshop event with 3 sessions. Due to unexpected reasons, we could not make the PyTorch session available in time for the event. Our sincere apologies for that. Here is the updated agenda for the event: TECH TALK 1: TITLE: "Principals of Predictive Analytics and the path to Time-Series predictions" SPEAKER: Scott Haines, Principal Software Engineer, Twilio Inc. LinkedIn: https://www.linkedin.com/in/scotthaines/ ABSTRACT: Statistical data mining is a useful art that is often skipped in order to race towards trying to throw ML/DL at a data science problem. However this initial exploration step is actually critical to ensuring that your understanding the important properties that your data can uncover. In this session we will learn how to explore the Kaggle Wine Reviews Dataset. We will be looking for statistical trends in the data, learn how to find and fix missing data issues, and impute values to fill in important gaps. We will then move into using Unsupervised Learning methods to find clusters of similar wines and look at using the Apriori algorithm via spark's FPGrowth model. We will be graphing our findings along the way and having fun looking at wine. PRE-REQUISITES FOR HANDS-ON: Workshop code base: https://github.com/newfront/odsc-east2019-warmup TECH TALK 2: TITLE: "Creating and Deploying Models with Jupyter, Keras/TensorFlow 2.0 & RedisAI" SPEAKER: Chris Fregly, Founder and CEO, PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco. Linkedin: https://linkedin.com/in/cfregly/ ABSTRACT: In this session, you will learn how to create a Tensorflow 2.0 model using Keras in Jupyter Notebooks. You will also learn how to access models and interact with them using RedisAI. TECH TALK 3: Dave Nielsen, Head of Community & Ecosystem Programs at Redis Labs LinkedIn: https://linkedin.com/in/dnielsen/ TITLE: "An Intro to Redis Streams" SPEAKER: Dave Nielsen, Head of Community & Ecosystem Programs at Redis Labs Linkedin: https://linkedin.com/in/dnielsen/ ABSTRACT: In this session, you will learn the basics of Redis Streams. You will also learn how to create and access a Redis Streams Publisher, Chanel and Subscriber. SCHEDULE: 9:30am: Networking 9:45am: Introduction by Arivoli & Dave Nielsen 10:00am-12:30pm: "Principals of Predictive Analytics and the path to Time-Series predictions", by Scott Haines 12:30pm-1:15pm: LUNCH 1:15pm - 2:30pm: "Creating and Deploying Models with Jupyter, Keras/TensorFlow 2.0 & RedisAI", by Chris Fregly 2:30pm - 3:15pm: "An Intro to Redis Streams", by Dave Nielsen VENUE: HackerDojo 3350 Thomas Road, Suite 150 Santa Clara, CA 95054 FOOD SPONSOR: Redis Labs

  • Automating Stateful Applications with Kubernetes Operators

    ABSTRACT Kubernetes scales and manages stateless applications quite easily. Stateful applications can require more work. Databases, caching systems, and file stores are harder to dynamically manage with data intact, and sometimes come with their own notion of clustering. Operators are Kubernetes agents that know how to deploy, scale, manage, backup, and even upgrade complex, stateful applications. This tutorial will provide an update on the Operator pattern introduced by CoreOS, adopted by many community projects like Rook, Prometheus, and others, and supported by this spring’s release of the Operator Framework by Red Hat. Demonstrations will show the installation and use of Operators on a OpenShift Kubernetes cluster. With an understanding of Operators in place, the session will go on to detail the Operator Framework and its main components, the Operator SDK and the Lifecycle management backplane, concluding with a demonstration of building an operator with the Operator SDK. SPEAKER Josh Wood, Developer Advocate for Red Hat’s OpenShift Container Platform. LinkedIn: https://www.linkedin.com/in/joshix/ Josh has worked in a variety of roles in innovative startups throughout his career, holding diverse titles from systems admin to product director and CTO. He was formerly responsible for documentation at CoreOS. He is passionate about constructing the future of utility computing with open source technologies like Kubernetes. When procrastinating, Josh enjoys photographing polydactyl cats and writing short autobiographies. VENUE: Rakuten, USA 800 Concar Drive, Suite 175, San Mateo, CA, 94404 SPONSORS: Venue and Food: Rakuten, USA AGENDA: 6:00pm: Networking 6:45pm: Introduction from Data Riders and Rakuten USA 6:50pm: Red Hat Tech talk by Josh Wood

  • Managing Modern Databases for Modern Cloud Native/Microservices based Apps

    Hello! We are excited to have you join us on March 5th at Oracle Santa Clara for a workshop on "Managing Modern Databases". The food and networking will begin at 6:00 pm. The workshop will start sharp at 6:30 pm. Doors close at 6.45 pm. Please plan to be at the venue accordingly. IMPORTANT Please fill out this https://goo.gl/forms/0zdDFleNoH4VLTzn1 to expedite the registration process and for us to provision your $500 Cloud Platform Trial account so that you can run this workshop smoothly. Please choose an email address that you have never used for Oracle Cloud Platform Trial account before. You will not be required to input your credit card for creating this account. PRE-REQUISITES: Bring Laptop (either Windows/Mac) and pre-install: 1. SQL Developer: https://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html On Windows: Select the OS for your computer. (This page also has instructions on how to install SQL Developer on Windows, Mac OSX and Linux.) If you already have SQL Developer installed on your computer please check the minimum version required to connect to an Oracle ADW Cloud is SQL Developer 17.4. Mac: https://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/sqldev-install-mac-1969675.html 2. Data Visualization Desktop: Oracle Data Visualization Desktop makes it easy to visualize your data so you can focus on exploring interesting data patterns. Choose from a variety of visualizations to look at data in a specific way. Data Visualization Desktop comes with Oracle ADW. To download and install Data Visualization Desktop, please follow: https://www.oracle.com/technetwork/middleware/oracle-data-visualization/downloads/oracle-data-visualization-desktop-2938957.html Select the OS for your computer. This page also has instructions on how to install DVD on Windows and Mac OSX. If you already have Data Visualization Desktop installed on your computer then please check the version. The minimum version that is required to connect to an Oracle ADW Cloud is 12c[masked]. Here is the Link to the learning Library you will need during the workshop. bit.ly/atpLabs ABSTRACT: How is Data Management different today than it used to be? Well, to begin with, there is a gazillion times more data today than there used to be just a decade ago. Clearly, traditional tools of data management are not going to work. Let’s take a look at ways to manage enterprise data at scale without getting a PhD in database administration. We all wear multiple hats in this cloud era so whether you are a developer, DBA, IT Ops or DevOps, this session is for you. We’ll also look at terraform based orchestration, ways to deploy microservices on a Kubernetes cluster and whip up some python/java apps quickly in the Oracle cloud. The lab flow: • Lab 1: Provisioning an Autonomous Database. • Lab 2: Managing and Scaling Data using Autonomous Features. • Lab 3: Configuring a Node.js app (as an example) to work with the Autonomous Database. • Lab 4: Understanding and Working with REST APIs. • Lab 5: Configuring Infrastructure • Lab 6: Modern App Dev on Oracle’s Cloud Native Framework with Oracle Autonomous as the Backend, Database Layer. WORKSHOP CREW: 1) Kri Bhanushali, Senior Principal Product Manager for Oracle's Database Cloud Service 2) Gopikrishna Manchala, Certified Principal Enterprise Cloud Architect (OCI) for the Northern California Region. 3) Santosh Kumar Ramarathnam, Senior Technical Consultant at Oracle and is an OCI certified Associate. 4) Vishal Atreja, Vishal is a cloud technology evangelist and a member of the Oracle Digital North America Technology Division Solution Engineering

  • ML Day

    Hacker Dojo

    This event is a double header. TECH TALK #1: "Data and ML at scale using Kubernetes" SPEAKER: Jayant Thomas, Director/Head of AI and Machine Learning, Veritas Technologies LinkedIn: https://www.linkedin.com/in/jayantthomas/ ABSTRACT: In this talk, we will go over ML & Data Platform architecture used at scale using Kubernetes at the heart of compute layer and a NoSQL for managing data. We will illustrate the platform further using use-cases from Storage and Backup to compute System Reliability Score (SRS), Storage Forecasting, Configuration Drift and other algorithms used for many 100's of thousands of Storage appliances. We will analyze the pros and cons of using this architecture and close it with lessons learned. BIOGRAPHY: Jayant Thomas (JT) has a passion for AI, IoT, Machine Learning and Cloud Native architectures at scale. His passion has led him to many successful adventures at Veritas, GE, Oracle, AT&T, Nuance and other startups in building platforms at scale. JT is a MBA from UC Davis along with M.Tech from NIIT, and has more than 15 patents in the IoT, NLP processing and Cloud architectures. JT is also an enthusiastic speaker/writer and contributing to many stimulating thoughts across many conferences and meetups. JT in addition is an active fitness and health freak dabbling in various diets and health fads. Author of Best Selling "IIoT Application Development book": https://www.amazon.com/gp/product/B075V92JW7/ref=dbs_a_def_rwt_hsch_vapi_taft_p1_i0 TECH TALK #2: "Automated time series forecasting, backtesting, and optimization" Speaker: Marcello Tomasini, PhD, Sr. Data Scientist, Veritas Technologies LinkedIn: https://www.linkedin.com/in/marcellotomasini/ ABSTRACT: Traditionally, time series forecasting has been a popular discipline among the finance realm but often neglected as an area of machine learning. This changed dramatically with the growing popularity of IoT, sensor networks, and streaming data which lead to the collection of vast amount of time series data, often stored in ad-hoc time series databases. Forecasting is now used for a moltitude of use cases from application performance optimization to workload anomaly detection. The challenge then is to automate a process that was handcrafted for the analysis of a single data series constitued of just tens of data points to large scale processing of thousands of time series and millions of data points. In this talk, we will tackle on some of the issues and solution to deal with time series forecasting at scale, including continuos accuracy evaluation and algorithm hyperparameters optimization. As a real world example we will be discussing the solution implemented in Veritas Predictive Insight which is capable of training, evaluating and forecasting over 70,000 time series daily. BIOGRAPHY: Marcello Tomasini is a "Computer Engineer and Scientist interested in Machine Learning, Computer Security, Complex Networks, and Biology with a Think Different life style”. Marcello holds a B.S. and a M.S. in Computer Engineering from University of Modena and Reggio Emilia, Italy, and a Ph.D in Computer Science from Florida Institute of Technology, USA. He has several papers published in international peer-reviewed conferences and journals in the areas of mobile sensor networks, human mobility modeling, and machine learning. He currently works as Sr. Data Scientist at Veritas Technologies where he designed and developed the system reliability score and the storage forecasting algorithms implemented in Veritas Predictive Insight. His free time is a mix of gym/bootcamps, machine learning meetups, and traveling.

  • Demystifying AWS Analytics & Datamart-in-a-box with Hadoop and Druid

    This event is a double header. Tech Talk #1: "Demystifying AWS Analytics" Speaker: Vinayak Datar, Product Manager, ShareInsights LinkedIn: https://www.linkedin.com/in/vinayakdatar/ Vinayak is an expert in Analytics space with experience in both BI and Big Data. Previously he has held senior delivery manager positions with Persistent System. He holds Master’s degree in Computer Application from VJTI, Mumbai. Abstract: AWS Serverless Analytics services are changing the way analytics is done in cloud and clearly challenging On-premise Hadoop deployments. There are many services – Athena, EMR, Glue and so on. Let’s explore what their strengths and weaknesses are, how they co-exist and co-operate, what are time/cost/feature implications, and how ShareInsights seamlessly allows you to switch between them and gets the best out of them. Tech Talk #2: "Datamart-in-a-box" with Hadoop and Druid Speaker: Josh Walters is a Principal Software Engineer at Splunk. He has experience designing large scale data applications. He has presented his work at Hadoop Summit, XLDB, and IEEE Big Data. Prior to Splunk, he was worked at Yahoo. Josh holds a BS in Computer Science from UC San Diego. LinkedIn: https://www.linkedin.com/in/joshwalters/ Abstract: Data engineers face an onslaught of analysts and product managers who want to ask hard questions of their data. They want to find key performance indicators like daily active users, total app installs, and user retention rates. They also want to perform complex longitudinal analysis over large time windows. The optimal solution would have to take into account performance, cost, and ease of infrastructure management. This talk will examine the use of Hadoop, Hive, and Druid to solve this problem by building fast and efficient datamart-in-a-box solutions.

  • Training and Deploying TensorFlow Models at scale on Kubernetes with Kubeflow

    ABSTRACT: Machine Learning needs an infrastructure to support all the underlying operations required in building a model and pushing it into production. For instance, such operations include infrastructure serving, machine resource management, configuration, monitoring, etc.. To ensure composability, portability, and scalability of a machine learning model, a Data scientist has to go through extra pain of all the ceremonial activities. Containerization is one of the best ways to automate all the DevOps operations and a Data scientist can purely focus on Machine Learning. Alvin will share an example of a Machine Learning workflow from a data scientist perspective. His demo will cover training and severing Tensorflow models at scale using Kubeflow. SPEAKER BIO: Alvin Henrick (https://www.linkedin.com/in/alvinhenrick/) is a Principle Engineer at Change Health Care. He has strong technical background in Big Data, Spark, Hadoop, Machine Learning, Parallel Database Systems, AWS, Containers, Linux, Map/Reduce, Interactive Querying, Distributed Query Execution, Query Scheduling, etc. He is an ASF member and committer to Apache Tajo. The tech talk would cover the following technologies: https://kubernetes.io/ https://www.tensorflow.org/ https://jsonnet.org/ https://ksonnet.io/ https://www.kubeflow.org/ https://www.docker.com/ https://www.seldon.io/ https://github.com/SeldonIO/seldon-core/blob/master/docs/reference/internal-api.md https://github.com/openshift/source-to-image

  • Microservices Day - Tech talks from Amazon, YugaByte and Oracle

    SPECIAL OFFER !!! Oracle is offering a free $500 Oracle Cloud Platform Trial for interested attendees to explore Oracle Cloud Platform. Registration is required to avail this offer. Here is the event schedule. Since there are 3 tech talks, we would be following the schedule closely. SCHEDULE: 6:00pm: Check-in, Food and networking 6:30pm: Start time 6:35pm-7:10pm: Tech talk #1: "Building a modern app at cloud scale with containers and managed services using 12-factor app principles" Speaker: Asif Khan, Special Projects at Amazon Web Services, Amazon. https://www.linkedin.com/in/asifkhan00/ 7:10pm-7:15pm: Q & A for Tech talk #1 7:15pm-7:50pm: Tech talk #2: "5 Best Practices for Running Mission-Critical Stateful Apps on Kubernetes" Speaker: Bogdan-Alexandru Matican, Founding Engineer, YugaByte. https://www.linkedin.com/in/bmatican/ 7:50pm-7:55pm: Q & A for Tech talk #2 7:55pm-8:30pm: Tech talk #3: "GraphPipe: Blazingly Fast Machine Learning Inference" Speaker: Vish Abrams, Architect of Cloud Development, Oracle. https://www.linkedin.com/in/vishvananda/ 8:30pm-8:35pm: Q & A for Tech talk #3 8:40pm: End time TECH TALK #1: Abstract: As compute evolves from bare metal to virtualized environments to containers towards server-less, the efficiency gains have enabled a wide variety of use cases. Organizations have used containers to run long running services, batch processing at scale, control planes, Internet of Things, and Artificial Intelligence workloads. Further, methodologies for software as a service, such as twelve-factor app, emphasize a clean contract with the underlying operating system and maximum portability between execution environments in the process transforming developer experience, continuous delivery and service deployment. We will cover the design patterns, tools, and technologies enabling the transformation. TECH TALK #2: Abstract: Docker containers are great for running stateless microservices, but what about stateful applications such as databases and persistent queues? Kubernetes provides the StatefulSets controller for such applications that have to manage data in some form of persistent storage. While StatefulSets is a great start, a lot more goes into ensuring high performance, data durability and high availability for stateful apps in Kubernetes. Following are 5 best practices that developers and operations engineers should be aware of. 1. Ensure high performance with local persistent volumes and pod anti-affinity rules. 2. Achieve data resilience with auto-failover and multi-zone pod scheduling. 3. Integrate StatefulSet services with other application services through NodePorts & LoadBalancer services. 4. Run Day 2 operations such as monitoring, elastic scaling, capacity re-sizing, backups with caution. 5. Automate operations through Kubernetes Operators that extend the StatefulSets controller. We will demonstrate how to run a complete E-Commerce application powered by YugaByte DB, when all services are deployed in Kubernetes. TECH TALK #3: Abstract: Machine Learning needs a means for efficient remote deployment. This allows models to be glued together into larger ensembles, and helps support both collaborative research and connected deep learning for mobile devices. We have developed a lightweight standard for deploying machine learning models called GraphPipe. GraphPipe is designed to bring the efficiency of a binary, memory-mapped format while remaining simple and light on dependencies. We will present a performance comparison to illustrate that GraphPipe achieves state-of-the-art performance while still achieving our goals of simplicity and minimal dependencies. In addition, we will show some examples of how you can use GraphPipe to simplify model deployment today.