Note: ML Model Versioning, Deployment, and Monitoring are core themes of the https://scale.bythebay.io 2019, 11/14-15, Oakland. Reserve your seat today using the code MEETSALTOPS15 for 15% off all passes, including the complete Serverless workshop! Joint meetup -- please RSVP at http://bay.area.ai! (1) MODEL VERSIONING: WHY, WHEN, AND HOW Models are the new code. While machine learning models are increasingly being used to make critical product and business decisions, the process of developing and deploying ML models remain ad-hoc. In the “wild-west” of data science and ML tools, versioning, management, and deployment of models are massive hurdles in making ML efforts successful. As creators of ModelDB, an open-source model management solution developed at MIT CSAIL, we have helped manage and deploy a host of models ranging from cutting-edge deep learning models to traditional ML models in finance. In each of these applications, we have found that the key to enabling production ML is an often-overlooked but critical step: model versioning. Without a means to uniquely identify, reproduce, or rollback a model, production ML pipelines remain brittle and unreliable. In this talk, we draw upon our experience with ModelDB and Verta to present best practices and tools for model versioning and how having a robust versioning solution (akin to Git for code) can streamlining DS/ML, enable rapid deployment, and ensure high quality of deployed ML models. Speakers: Manasi Vartak, CEO, Verta.ai, Conrado Miranda, CTO, Verta.ai Manasi Vartak is the founder and CEO of Verta.ai (www.verta.ai), an MIT-spinoff building software to enable high-velocity machine learning. Manasi previously worked on deep learning for content recommendation as part of the feed-ranking team at Twitter and dynamic ad-targeting at Google. Conrado Miranda is the CTO at Verta.AI. Conrado has a PhD in Machine Learning and a focus on building platforms for AI. He was the tech lead for the Deep Learning platform at Twitter’s Cortex, where he designed and led the implementation of TensorFlow for model development and PySpark for data analysis and engineering. He also led efforts on NVIDIA’s self-driving car initiative, including the Machine Learning platform, large scale inference for the Drive stack, and build and CI for Deep Learning models. (2) Model Monitoring in Production Machine Learning models continuously discover new data patterns in production they have never seen during training and testing iterations. The best offline experiment can lose in production. The most accurate model is not always tolerant to a minor data drift or adversarial input. Neither prodops, data science or engineering teams are skilled to detect, monitor and debug model degradation behaviour. Real mission critical AI systems require advanced monitoring and model observability ecosystem which enables continuous and reliable delivery of machine learning models into production. Common production incidents include: - Data anomalies - Data drifts, new data, wrong features - Vulnerability issues, adversarial attacks - Concept drifts, new concepts, expected model degradation - Domain drift - Biased Training set In this demo based talk we discuss algorithms for monitoring text and image use cases as well as for classical tabular datasets. Demo part will cover the full cycle of machine learning model in production: Model training and deployment with Kubeflow pipelines Production traffic simulation Model monitoring metrics configuration Data drift detection Drift exploration and monitoring metadata mining New training dataset generation from production feature store Model retraining and redeployment Stepan Pushkarev is a CTO of Hydrosphere.io - Model Management platform and co-founder of Provectus - an AI Solutions provider and consultancy, a parent company of Hydrosphere.io.

  • Scale By the Bay 2018: Regular Admissions End on 10/31

    Dear Friends — we are proud to announce the program of Scale By the Bay 2018, our sixth year of the flagship, and by now iconic, independent developer conference By the Bay. (Tl;dr: get your spot at http://scale.bythebay.io while supplies last, and especially when Early Bird is in effect until August 31.) The conference follows the established three-day, three track structure, hosted for the third year in a row by Twitter HQ in its wonderful modern building, with all of its spacious tracks, community spaces, cozy booths, and the commons area where so many connections are made during the hallway track. This year, Martin Odersky, the creator of Scala, opens the main conference on November 15. Neha Narkhede, the co-creator of Kafka and cofounder of Confluent, is keynoting the day 2. The three tracks are — Functional and Thoughtful Programming — Reactive Microservices and Streaming Architectures — End-to-end Data Pipelines all the way up to Machine Learning and AI The 100 sessions include technology leaders such as Twitter, IBM, Microsoft, Salesforce, Fauna, DataStax, Databricks, Confluent, Credit Karma, Sumo Logic, GoPro, Buoyant, Workday, Zignal Labs, and many more. We cover your tools with JetBrains, your shopping with Best Buy and Target, your vacations with HomeAway, your listening with Spotify, your viewing with Netflix, your reading with Medium, and your banking with JP Morgan Chase. The list goes on and on and on — we have the most of the advanced stacks and approaches employed by the best that Silicon Valley offers to the world at scale, shared as best practices, with code, yours to learn, take home, and build upon. Our speakers span the whole spectrum from the first-time presenters with leading companies to veterans of SBTB going all the way back to 2013, evolving their craft before our eyes. You can follow their progress by watching their previous talks on http://functional.tv and the photos of the past conferences at https://meetup.bythebay.photo/Conferences/Scale-By-the-Bay The three panels, closing each day, are: — Thoughtful Software Engineering — Data Engineering for AI, and — Cloud, Edge, and Silver Lining. Each day begins with a hot breakfast, that begins an uninterruptible supply of Philz coffee through the whole day, and lunch is provided. On the first two days, the closing panels are followed by our signature happy hours, with great drinks, food, and conversation. The hallway tracks are legendary. SBTB is famous for its bespoke, all-day, build-yourself-a-company training. This year, we double it. Cliff Click, the legend of software engineering, is teaching a full day Advanced Software Engineering workshop on 11/13, followed by Ryan Knight, now of Fauna, leading cloud-native data pipelines on 11/14. The workshops are limited by 80 participants each. As last year, we’ll plan an unconference track for those who want to share their ideas in an intimate setting for joint brainstorming. The only thing moderate about SBTB is its size — we cap at 600 attendees to preserve the immediate and direct nature of the communication that happens, sparks that fly, and serendipity that always occurs. We are always sold out by the time the conference begins in November — so reserve your seat early at http://scale.bythebay.io! And enjoy the Early Bird that is in effect until August 31.

  • SMACK 2.0: Emerging Data Pipelines Panel at Index

    Needs a location

    • What we'll do IBM Index (http://chief.sc/index-2018) is a fantastic new developer conference. Register (http://chief.sc/index-2018-register) by 2/20 with the code CD3ALEXY to attend the Community Day for free and the main program for just $280. SMACK 2.0 panel (http://chief.sc/index-2018-smack20-panel) is held on 2/22 at 2pm, preceded by the SMACK 2.0 workshop (http://chief.sc/iindex-2018-smack20-workshop) on the community day (day 0, 2/20, 3-5:30pm). In this panel, we discuss SMACK (http://smackstack.org/), the popular framework to describe and compare data pipelines. SMACK 1.0 was often composed of Spark, Mesos, Akka, Cassandra and Kafka. In SMACK 2.0, we explore emerging ways to build scalable data-heavy applications for Machine Learning, relying on Streaming, and in-Memory computing (including Spark), Model-serving, API, Cloud/Cassandra/Containers and Kubernetes (with Kafka often being the source). Instead of fixing SMACK components as we did for SMACK 1.0 — Data source, API, Compute, Persistence, Operationalization — we consider alternatives for various use cases. For instance, S will increasingly be Serverless. What are the emerging patterns, and when some of the approaches make more sense than others? Certain applications, such as Fintech, inform in-Memory computing, while others, such as IoT, favor streaming with real-time AI feedback. Panelists: Nikita Ivanov, co-founder and CTO, GridGain Sijie Guo, co-founder, Streamlio Anya Bida, DevOps Engineer, Salesforce Hugh McKee, Developer Advocate, Lightbend Tathagata Das, Software Engineer, Databricks The SMACK 2.0 panel is preceded by the workshop (http://chief.sc/iindex-2018-smack20-workshop) during the Community Day. Both sessions are curated and moderated by Dr. Alexy Khrabrov (http://chiefscientist.org/), the founder and organizer of Scale By the Bay and the creator of the original SMACK Stack (http://smackstack.org/) training. • What to bring • Important to know

  • Streamlio, GridGain, Cassandra+Spark: FREE Workshop at Index

    We're happy to announce two new Index (http://chief.sc/index-2018) sessions. 2/20 is the free SMACK 2.0 workshop. Moscone West, 3-5:30pm. Register (http://chief.sc/index-2018-register) by 2/20 with the code CD3ALEXY to attend the Community Day for free and the main program for just $280. (1) Streaming -- Streamlio (2) Memory computing -- GridGain (3) Cassandra+Spark (1) Building modern data pipelines by unifying Apache Pulsar, Apache Heron, Apache BookKeeper For today’s enterprises, ensuring that data pipelines are available to every corner of the organization is key to building next generation data-driven applications. In this talk Karthik Ramasamy of Streamlio will present on how to combine three best of breed open-source projects to have a solid data infrastructure that are is easy to develop against and simple to operate at scale in production. He will provide an overview of the merits of the three open source systems and then benefits they bring when integrated: Apache Pulsar: unified queuing and streamingApache Heron: stream processingApache BookKeeper: distributed stream storage Karthik Ramasamy is the co-founder of Streamlio that focuses on building next generation real time processing engines. Before Streamlio, he was the engineering manager and technical lead for real-time analytics at Twitter where he co-created Twitter Heron. He has two decades of experience working in parallel databases, big data infrastructure, and networking. Karthik is the author of several publications, patents, and "Network Routing: Algorithms, Protocols and Architectures". He has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases. (2) Apache Spark and Apache Ignite: Where Fast Data Meets the IoT It is not enough to build a mesh of sensors or embedded devices to obtain more insights about the surrounding environment and optimize your production systems. Usually, your IoT solution needs to be capable of transferring enormous amounts of data to storage or the cloud where the data have to be processed further. Quite often, the processing of the endless streams of data has to be done in real-time so that you can react on the IoT subsystem's state accordingly. This session will show attendees how to build a Fast Data solution that will receive endless streams from the IoT side and will be capable of processing the streams in real-time using Apache Ignite's cluster resources. In particular, attendees will learn about data streaming to an Apache Ignite cluster from embedded devices and real-time data processing with Apache Spark. Live-Coding Workshop (3) Building Your First Spark & Cassandra Application: A Code-Along Adventure w/ Russell Spitzer Not sure where to start with Cassandra and Spark? Together let’s walk through starting your first Spark Application. We’ll walk through the setting up your IDE and integration tests, everything you need to build your first scalable and distributed Spark App. Learn how to use embedded Cassandra and Spark to write your own tests which are easily debuggable in standard IDEs. This will be a short but interactive adventure! Feel free to bring your own laptop and come code along!We will be using IDEA along with the template provided by DatastaxAbout Russell Spitzer: After earning his Ph.D in bioinformatics from UCSF, Russell Spitzer took his love of big data to DataStax. There he has worked on all aspects of integrating Cassandra with other Apache technologies like Spark, Hadoop and Solr. Now his main focus on the integration of Cassandra with Apache Spark via the Spark Cassandra Connector. We are working with the IBM community teams to make their flagship developer conference, Index ( http://www.indexconf.com/ ), the most meaningful and fun experience for Bay Area developers. Alexy Khrabrov talks about Index with Markus Eisele, Selection Committee Chair and Director of Developer Advocacy, Lightbend: http://chief.sc/index-2018-overview In our communities, we created and popularized the SMACK Stack ( http://smackstack.org/ ) -- a way to reason about end-to-end data pipeline architectures. Building and running such pipelines, and the components comprising them, are the key themes of Index. The conference starts with the free Index Community Day ( https://developer.ibm.com/indexconf/communities/ ), 2/20 which consists of 14 half-day sessions on the key technologies, many either directly relevant or of strong interest to most of us: • Spark • Kafka • Docker • Kubernetes • OpenAPI • Hyperledger • Istio • TensorFlow • Cloud Foundry You can build multiple viable architecture from these technologies, and they are often used together. To explore the progress made since SMACK 1.0, introduced in 2015, we are putting together a SMACK 2.0 panel, brainstorming the emerging SMACK Stacks. There is a wealth of expertise from many of the companies that present By the Bay regularly: Google (TensorFlow), Lightbend, Twilio, Slack, Uber, Google, Facebook, IBM, Eero, Spotify and many others. You can already meet many speakers at the IBM developerWorks TV playlist for Index: http://chief.sc/index-2018-videos We’ll update this description as we ramp up our Index + SMACK 2.0 events!

  • Use Case w/ Salt Reactors: Zile Rehman from Nyansa

    Zile Rehman's Bio: Over 10 years of experience in developing automated solution for Continous Integration (CI) and Continuous Delivery (CD). Currently at Nyansa as an Automation Infrastructure Architect. · Nyansa allows organizations to proactively predict problems and optimize their network by analyzing wired and wireless data for every user in – real-time and over time – across the entire network-application stack. Nyansa Use Case In order to keep Nyansa's costs down, all of our dev, staging and POC environments are on AWS spot instances. Salt Stack allows us to detect the outage and spin up the replacement instances automatically. With over 100+ nodes on spot instances, and growing, we are confident that Salt will scale as we grow our business. Agenda · Slide 1: Explain the big picture. Salt Master server with staging environments (i.e. minions) and event bus · Slide 2: Skeleton python script outlining reactor tags and pseudo-code for reactions to events · Demo: Bring down a node and detect the recovery · Q & A I will setup a separate github repo to spin up Salt Master and minions in vagrant. And provide instructions on setting up and learning about Salt Reactors.

  • Free Live Webinar: Why Devops Needs Infrastructure-as-C­ode ?

    Hello, We'd like to invite you for an expert live Webinar on ' Why Devops Needs Infrastructure-as-Code (http://unbouncepages.com/devops-webinar/) ?' Scheduled on 07th December 2016 [ 8:30 PM IST | 11:00 AM EDT ] TOPICS : > Uses of DevOps > What is DevOps and Why DevOps > Benefits of DevOps > Skills required for DevOps > Implementing DevOps Processes > Explore DevOps tools and Implement use case of DevOps This promises to be an extremely enriching session and we hope you can make it - Register Now (http://unbouncepages.com/devops-webinar/) In case you can't make it sign-up anyway, we'll send you the recording. [ http://unbouncepages.com/devops-webinar/ ] For more info [masked] or mail me on sales(at)kratoes(dot)com Cheers.!

  • SaltConf16

    The Grand America Hotel

    SaltConf16 is coming soon. It is the ultimate SaltStack meetup and there is no better time or place to learn how to effectively get Salted. The SaltConf16 speaker lineup this year includes more than 60 talks from companies like Adobe, Aetna, Dun & Bradstreet, Lyft, LinkedIn, Pure Storage, National Instruments, TD Bank, USDA and dozens more. We are also very excited to have sponsors from Adobe, Google, Linode, PagerDuty, StackIQ, Sumo Logic, SUSE, VMware, XebiaLabs, Zenoss and more. For your support and contributions to the SaltStack community, we are offering SaltStack meetup group members a discount on SaltConf16 registration. Use registration code SALTUSER213X to receive $200 off the price of a SaltConf16 main conference pass. Feel free to share this code with your friends, but act fast. The code will expire on March 11, and we fully expect SaltConf16 to sell out. For an even better deal, it is not too late to get FREE SaltConf16 pre-conference training if you stay three nights or more at The Grand America Hotel at the extremely discounted price of $199 per night. Pre-conference training includes all new courses and content this year. Keep in mind, several of the courses are almost at capacity. Go to www.saltconf.com/register for more information or to register using these discounts. We hope you will get Salted with us at SaltConf16. Best regards, SaltStack

  • Live Session:Applying Hadoop in Production Environments

    Needs a location

    Hello, We'd like to invite you for an expert live session on 'Applying Hadoop in Production Environments (http://promo.skillspeed.com/webinar-hadoop-development-production-environments/)' scheduled for October 15th Thursday, 9:30PM to 10:30PM EDT The session agenda is as follows: • Introduction to BIG Data & Hadoop • Multi-node Cluster Setup • Hadoop Configuration Files • Path from Testing to Production • Live Programming Tutorial • Use-Cases & Applications This promises to be an extremely enriching session and we hope you can make it - Register Now (http://promo.skillspeed.com/webinar-hadoop-development-production-environments/) In case you can't make it sign-up anyway, we'll send you the recording. Cheers!

  • Performance Automation via #DevOps

    Needs a location

    Hello, We'd like to invite you for an expert live session on ‘Performance Automation via #DevOps (http://promo.skillspeed.com/webinar-devops-performance-automation/)’ scheduled on 20th August Thursday, 9:00PM to 10:00PM IST. The session agenda is as follows: • Introduction to DevOps • Adoption Process • The DevOps Toolbox • Performance & Release Optimization • Use-Cases & Applications • Future & Possibilities of DevOps This promises to be an extremely enriching session and we hope you can make it - Register Here (http://promo.skillspeed.com/webinar-devops-performance-automation/) In case you can't make it sign-up anyway, we'll send you the recording. Cheers!

  • Real-World Saltstack: Running at Scale


    Speaker: Bo Blanton, Senior Software Engineer, MyFitnessPal (30 minutes + Q&A time) Topic: Large Scale Self-Service DevOps w/Saltstack Description: At MyFitnessPal, we are dedicated users of SaltStack and use it to manage nearly 800 AWS machines serving over 85 million users. We'll be showing how we've accomplished this with SaltStack, and also show self-service tools built atop SaltStack that allow developers to manage their own dev and test environments. Others Speakers TBA. Please email me ( [masked] ) if you would like to give a talk. Parking: Parking is available on the street, or at the paid lot at the Bank of America next door. Sponsor: MyFitnessPal