• OmniSci-StreamSets F1 Demo + Scaling StreamSets On Azure Kubernetes Service

    This is a joint event with the StreamSets User Group: https://www.meetup.com/San-Francisco-StreamSets-User-Group-Meetup/events/261489457 Join us for two great presentations, food, beverages, and take a turn in OmniSci's instrumented F1 simulator! Agenda: 6:30pm - Food, beverages, and try out OmniSci's F1 simulator! 7:15pm - Creating the OmniSci F1 Demo: Real-Time Data Ingestion With StreamSets Veda Shankar - Senior Developer Advocate - OmniSci https://www.linkedin.com/in/veda-shankar-6260a516 Telematics is a rapidly growing use case for IoT and Big Data, and OmniSci hacked a F1 racing game to demonstrate how telematics data can be collected and analyzed in real-time. A combination of open source tools were used to generate, capture, process, analyze, and chart the data from a Formula 1 racing simulation. StreamSets was used to visually architect and implement the data flows with an open-source Docker container. Read this blog for more details: https://streamsets.com/blog/omnisci-f1-demo-real-time-data-ingestion-streamsets/ 7:45pm - Scaling StreamSets On Azure Kubernetes Service Speaker TBD Provisioning agents are containerized applications that run within a container orchestration framework, such as Kubernetes. You can run Kubernetes on-premise, or leverage cloud-based solutions such as Azure Kubernetes Service (AKS) and Google Kubernetes Engine for a "pay-as-you-consume" model without the complexity of implementation, installation, and maintenance. In this session, we will show how to scale StreamSets Data Collector instances on Azure Kubernetes Service (AKS) using provisioning agents that help automate upgrading and scaling resources on-demand, without having to stop execution of dataflow pipeline jobs. 8:30pm - Close

  • Scale By the Bay 2019 CFP Open until May 31

    Needs a location

    Friends — the month of May is when the Scale By the Bay (SBTB) CFP always runs, for the conference in November. The CFP is now open at https://scale.bythebay.io There are three tracks, as usual: — Functional Programming — Service Architectures — Data Pipelines, including ML/AI The theme for this year is the emergence of new distributed systems and their applications, including Edge, IoT, DLT, and AI on the Edge. Helena Edelson lead a team at Apple enabling ML/AI with Spark, Joe Beda started Google Compute Engine and Kubernetes, and Heather Miller lead Scala Center at EPFL and now advances distributed and edge systems at CMU. We have two talk lengths, 20 minutes and 40 minutes. There are 5-10 minute breaks between some, but not all, talk slots, and excellent coffee is served all day long so every break is a coffee break. Please check each time length you can work with. We often ask 40 min talks to shrink to 20 min as we try to accommodate all the best talks — and our acceptabnce rate is going down to 1:3 with years. We also serve hot breakfast and great lunch and amazing happy hours follow the main program in between all days. The hallway track is legendary, facilitated by the high ratio of speakers — 100+ out of the 600 attendees. We are committed to community above all and are working with underrepresented groups to send speakers. Please share this CFP with your diversity advocates, community managers, and encourage female engineers, African-American developers, and others to submit talks. If you could send such speakers on behalf of your company, it will help the community a lot. We’re also proactively reaching out to meetups, our core constituents, to help our established diversity program. We also work with companies like Stripe on diversity scholarships — let us know if you’d like to partner on this. Submit your best talks at https://scale.bythebay.io by May 31!

  • [Register at Bay.Area.AI] Applied Machine Learning: a Netflix Production

    This is a joint meetup By the Bay: register at http://bay.area.ai! ----- Applied Machine Learning is about as mature as Software Engineering circa 1998. For Data Scientists, it’s hard to collaborate, hard to be productive and hard to deploy to production. In the last 20 years, Software Engineers have become far more collaborative thanks to tools like git, far more productive thanks to cloud computing and far more effective at delivering quality software thanks to CI/CD and agile development practices. At Netflix, I get to work on problems like: how do we scale Data Science innovation by making collaboration effortless? How do we enable Data Scientists to single-handedly and reliably introduce their models to production? How do we make it easy to develop ML models that humans trust? More importantly, how do we use ML to make humans BETTER? In this talk, we’ll explore how Netflix is approaching these problems to further our mission of creating joy for our 125 Million+ members worldwide! Speaker: Julie Pitt leads the Machine Learning Infrastructure at Netflix, with the goal of scaling Data Science while increasing innovation. She previously built streaming infrastructure behind the "play" button while Netflix was transitioning from domestic DVD-by-mail service to international streaming service. Julie also co-founded Order of Magnitude Labs, with a mission to build AI capable of doing things that humans find easy and today’s machines find hard: exploration, communication, creativity and accomplishing long-range goals. Early in her career, Julie developed data processing software at Lawrence Livermore National Laboratory that enabled scientists to study the newly-sequenced human genome. ----- Julie is a regular speaker at Scale By the Bay, the 2019 CFP opens May 1 and ends May 31, submit your best talks early starting May 1 at http://scale.bythebay.io!

    1
  • Managing Globally Distributed Data for Deep Learning using TensorFlow on YARN

    The benefits of large datasets for deep learning are well known. But what if the source of this data is globally distributed? Jagane Sundar shares a system for replicating data across geographically distributed data centers, discusses the benefits of consistently replicating data that is used by TensorFlow for training, and explores the advantages of using a Paxos-based distributed coordination algorithm for replication. Jagane then details the resultant unique capability to maintain consistent writable copies of the data in multiple data centers. Speaker: Jagane Sundar is the CTO at WANdisco. Jagane has extensive big data, cloud, virtualization, and networking experience. He joined WANdisco through its acquisition of AltoStor, a Hadoop-as-a-service platform company. Previously, Jagane was founder and CEO of AltoScale, a Hadoop- and HBase-as-a-platform company acquired by VertiCloud. His experience with Hadoop began as director of Hadoop performance and operability at Yahoo. Jagane’s accomplishments include creating Livebackup, an open source project for KVM VM backup, developing a user mode TCP stack for Precision I/O, developing the NFS and PPP clients and parts of the TCP stack for JavaOS for Sun Microsystems, and creating and selling a 32-bit VxD-based TCP stack for Windows 3.1 to NCD Corporation for inclusion in PC-Xware. Jagane is currently a member of the technical advisory board of VertiCloud. He holds a BE in electronics and communications engineering from Anna University. WANdisco will be giving away 3 Ipad Airs (the new model!) at the meetup.To enter the drawing, take this 3-question quiz https://forms.gle/op8PChvjuz2NXfSK6 by 12pm PST on Wed, 27 Mar and show up at the meetup for the drawing.

    1
  • The Feature Store: the missing API between Data Engineering and Data Science?

    This is a crosspost from Bay Area AI, please register at https://www.meetup.com/bay-area-ai/events/258164070 Machine Learning (ML) pipelines are the key building block for productionizing ML code. However, pipelines are often developed as "silos" - features tend not to be easily re-used across pipelines or even within the same pipeline. Silos lead to duplication, unnecessarily re-implementing features and in the worst case correctness problems, if, for example, the features used for training and serving have inconsistent implementations. The Feature Store solves the problem of siloed and ad-hoc machine learning pipelines, by providing a data layer where feature engineering can be separated from the usage of features to generate training data. That is, the Feature Store should provide a clean API separating Data Engineering from Data Science. In this talk, we will introduce the world's first open-source Feature Store, built on Hopsworks, Apache Spark, and Apache Hive and targeting both TensorFlow/Keras and PyTorch. We will show how ML pipelines can be programmed, end-to-end, in Python, and the role of the Feature Store as a natural interface between Data Engineers and Data Scientists. In an end-to-end pipeline, we will show how the Feature Store works, and how you can write end-to-end ML pipelines in Python only (if you so choose). Speaker Bio: Jim Dowling is the CEO of Logical Clocks AB, as well as an Associate Professor at KTH Royal Institute of Technology in Stockholm. He is the lead architect of Hops, the world's most fastest and most scalable Hadoop distribution and first Hadoop platform with support for GPUs as a resource. He is a regular speaker at AI industry conferences, and blogs at O'Reilly on AI.

  • Aggregations and knowledge extraction from social data: challenges and lessons

    This is a crosspost from the Bay Area AI -- register there to attend! https://www.meetup.com/bay-area-ai/events/257791863/ This talk is about the construction of new data assets from social media using techniques drawn from the areas of information retrieval, machine learning, graphs, and social networks. I’ll describe three projects based on Twitter and Foursquare data sets that use social data in different ways to help users in information seeking scenarios. The first one, a recommender system for recreational queries using location-based social networks. The second project, a social knowledge graph derived from Twitter with the goal of discovering relationships between people, links, and topics. And the third one, an application for archiving and Wikification of stories. Omar Alonso is a Principal Applied Scientist with Microsoft where he works on the intersection of information retrieval, social data, human computation, and knowledge graph generation. He is the co-chair of the Human Computation and Crowdsourcing track at WWW'19 and on the organizing committee for HCOMP'19.

  • Scale By the Bay 2018

    Needs a location

    Dear Friends — we are proud to announce the program of Scale By the Bay 2018, our sixth year of the flagship, and by now iconic, independent developer conference By the Bay. (Tl;dr: get your spot at http://scale.bythebay.io while supplies last, and especially when Early Bird is in effect until August 31.) The conference follows the established three-day, three track structure, hosted for the third year in a row by Twitter HQ in its wonderful modern building, with all of its spacious tracks, community spaces, cozy booths, and the commons area where so many connections are made during the hallway track. This year, Martin Odersky, the creator of Scala, opens the main conference on November 15. Neha Narkhede, the co-creator of Kafka and cofounder of Confluent, is keynoting the day 2. The three tracks are — Functional and Thoughtful Programming — Reactive Microservices and Streaming Architectures — End-to-end Data Pipelines all the way up to Machine Learning and AI The 100 sessions include technology leaders such as Twitter, IBM, Microsoft, Salesforce, Fauna, DataStax, Databricks, Confluent, Credit Karma, Sumo Logic, GoPro, Buoyant, Workday, Zignal Labs, and many more. We cover your tools with JetBrains, your shopping with Best Buy and Target, your vacations with HomeAway, your listening with Spotify, your viewing with Netflix, your reading with Medium, and your banking with JP Morgan Chase. The list goes on and on and on — we have the most of the advanced stacks and approaches employed by the best that Silicon Valley offers to the world at scale, shared as best practices, with code, yours to learn, take home, and build upon. Our speakers span the whole spectrum from the first-time presenters with leading companies to veterans of SBTB going all the way back to 2013, evolving their craft before our eyes. You can follow their progress by watching their previous talks on http://functional.tv and the photos of the past conferences at https://meetup.bythebay.photo/Conferences/Scale-By-the-Bay The three panels, closing each day, are: — Thoughtful Software Engineering — Data Engineering for AI, and — Cloud, Edge, and Silver Lining. Each day begins with a hot breakfast, that begins an uninterruptible supply of Philz coffee through the whole day, and lunch is provided. On the first two days, the closing panels are followed by our signature happy hours, with great drinks, food, and conversation. The hallway tracks are legendary. SBTB is famous for its bespoke, all-day, build-yourself-a-company training. This year, we double it. Cliff Click, the legend of software engineering, is teaching a full day Advanced Software Engineering workshop on 11/13, followed by Ryan Knight, now of Fauna, leading cloud-native data pipelines on 11/14. The workshops are limited by 80 participants each. As last year, we’ll plan an unconference track for those who want to share their ideas in an intimate setting for joint brainstorming. The only thing moderate about SBTB is its size — we cap at 600 attendees to preserve the immediate and direct nature of the communication that happens, sparks that fly, and serendipity that always occurs. We are always sold out by the time the conference begins in November — so reserve your seat early at http://scale.bythebay.io! And enjoy the Early Bird that is in effect until August 31.

    2
  • The Rise of the Operational Analytic Data Stores

    Orange Silicon Valley

    Note: we need a venue to host us in San Francisco! ----- For more analytics pipelines, join us at http://scale.bythebay.io, 11/14-17, at Twitter HQ. ----- Operational analytic data stores are a new emerging class of databases that merges ideas of logsearch systems (Elastic, Splunk, etc) and traditional analytic databases (Vertica, Teradata, etc). Popular projects in this class include Apache Druid (incubating), Scuba (from Facebook), Clickhouse (from Yandex), Pinot (from LI), Palo (from Baidu), and more. We will discuss the motivation behind these databases, and discuss in the detail the history, architecture, and future of Druid. Speaker: Gian Merlino is an Apache Druid (incubating) PMC member and a co-founder of Imply. Previously, Gian led the data ingestion team at Metamarkets (now a part of Snapchat) and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.

  • Rethink Trust -- Amsterdam, June 29

    Beurs van Berlage

    We’re super excited to share some awesome news: we’re expanding to Europe and introducing our newest conference! Blockchain: http://RethinkTrust.org - taking place at Beurs van Berlage, Amsterdam, on June 29th! (Scroll down for 15% off.) Blockchain: Rethink Trust is a gathering of top experts in engineering and ecosystem-minded leaders, focused on reengineering enterprise trust networks through technology. It will expand your understanding of blockchain, trust mechanisms, and how corporate world uses them. Rethink Trust is a By the Bay conference, brought to you by the creators of Scale By the Bay, AI By the Bay, and Data By the Bay engineering events help in San Francisco for over five years in partnership with IBM, ING, Apple, Twitter, Salesforce, and dozens of innovative startups and enterprises in the Bay Area and around the world. Dr. Alexy Khrabrov, Founder, By the Bay, is the Program Chair, and he wrote about the philosophy of Rethink Trust on his Medium blog: http://chief.sc/rethinktrust2018-intro We invite C-level executives, senior engineers, and technical leaders who wants to master the best practices in blockchain to the famous Beurs van Berlage, “the third stock exchange” of Amsterdam. True to the spirit of all of our previous events, Blockchain: Rethink Trust is laser-focused on learning, open-source excellence, and industry-oriented approaches that work. SPEAKERS We have the world’s top technology leaders speaking at the event. The keynote speakers include: -- Christopher Ferris, IBM CTO Open Technology, Chair of the Hyperledger Technical Steering -- Mariana Gómez de la Villa, Global Program Manager, ING DLT You will also hear from: -- Clara Durodie, Founder and CEO, Cognitive Finance Group -- Roman Shaposhnik, VP Product & Strategy, Co-founder. ZEDEDA -- Yonatan Sompolinsky, Co-founder & Scientist, DAGlabs -- Michael Egorov, CTO, NuCypher -- Roberto Mancone, Chief Operating Officer at we.trade Innovation DAC, the company developing, deploying, and distributing we.trade, the Blockchain based Trade Finance , reporting to the Board of Director of the 9 European Shareholders Banks (Deutsche Bank, HSBC, KBC, Natixis, Nordea, Rabobank, Rabobank, Santander, SocGen, Unicredit) -- Christopher Georgen, Founder and CEO, Topl, the company developing blockchain solutions for the developing world … and more! TOPICS We will cover a variety of topics at the intersection of engineering and business management, including: -- Crypto protocols of today and tomorrow and the software engineering process required to deploy them for enterprise customers at scale; -- Technology behind the key blockchain deployments in FinTech and IoT; -- Rigorous software engineering practices required for safe, correct, and performant implementation of blockchain applications and platforms; -- Key aspects of hardware-software codevelopment crucial for IoT+blockchain; -- ... and more Explore the program of the conference at http://rethinktrust.org WORKSHOPS We’ll have three workshops at the workshop track, available to all attendees. You can freely switch between the main track and the workshop track. -- Hyperledger workshop taught by Arnaud Le Hors, core Hyperledger team -- Implementation workshop by IntellectSoft -- Scala Blockchain -- secure and type-safe -- by Topl, the developing world blockchain company YOU WILL LEARN -- Discover ways to implement blockchain technology for reengineering trust in key business verticals -- Understand best practices of enterprise adoption of the ledger consensus approaches -- Conduct strategic partnerships for a consortium of trusted and trustless systems -- Invite key developers to collaborate on your blockchain ecosystem Tickets For a limited time, we’re offering an additional 15% off to By the Bay community. To claim the offer, use the code BYTHEBAYOFFER15 at http://rethinktrust.org

    2
  • Scale By the Bay 2018 CFP is Open until May 31

    Needs a location

    It's the sixth year that we are organizing our flagship Scale By the Bay conference, and it's a truly spectacular tech event many of you know very well. For those who are new to SBTB, I would love to invite you to attend. And if you'd like to present, we'd like to see your talk! The CFP for SBTB 2018 is now open through May 31: http://scale.bythebay.io/cfp.html Give it your best shot, or two, as the rate of high-quality submissions is already very high. At Scale By The Bay, returning to Twitter HQ in San Francisco on November 15-17, 2018, you can connect with fellow senior software engineers, CTOs, VPs/Directors of Engineering, developers and technical founders who never stop learning. Embrace the whole end-to-end software stacks and infrastructure running them, put together your own SMACK Stack, operationalize reactive micro services and data pipelines, build streaming data infrastructure for actionable, real-time insights, and deep-dive into practical aspects of full-stack architectures and developer productivity. We'll have a stellar program: * the full-stack Scala and Functional Programming conference with world authorities on practical FP, beginning with Martin Odersky, the creator of Scala, who comes back to keynote SBTB! * the fast data pipelines done right, with Neha Narkhede, the co-creator of Apache Kafka and co-founder of Confluent, keynoting * FP+ML: Functional Programming for Machine Learning, a topic even more current today when TensorFlow for Swift has been unveiled Throughout the three track, three day event, we'll weave the themes of open-source development, type safety, full-stack acrhitectures, with the emerging areas of ML and AI so that you can learn all about it if you want to. At the same time, we'll make sure we're still, and always, the best in the software engineering realm with solid understanding of distributed systems, from operations up to services to streaming algorithms. We firmly believe that thoughtful software engineering with the right reusable abstractions and best practices around development is key to everything. We want to link this approach to more things and see more use cases. We especially welcome FP+ML talks this year. Please note that we go forward at Scale. We welcome production use cases of all thoughtfully designed software stacks, including Scala, Haskell, Swift, Rust, Clojure, F#, and so on. We welcome Java, C++, Go, and other systems, especially in the microservice, polyglot environment. Our SMACK 2.0 plan, unveiled at the Index conference, calls for Streaming, in-Memory architectures, API-centric, Containerized and running on Kubernetes. We welcome submissions on all levels of these new systems, starting with orchestration. No matter where you are along the full-stack spectrum, you need thoughtful software engineering, reactive and streaming architectures, manageable micro services, and scalable data pipelines that can work together with modern ML frameworks for immediate customer insights. Join the SBTB family at Twitter HQ again this year, see how companies like Twitter are built in software, build your own, and share your findings with others! See you in November at Twitter HQ! Dr. Alexy Khrabrov, Program Chair, By the Bay PS. If you are in Europe and can't wait, By the Bay comes to Amsterdam as RethinkTrust.org, out first signature engineering take on enterprise trust systems with blockchain and hyperledger in energy, fintech, IoT, and other real-world use cases. Our tech includes Swift and Scala, scalability and security of trust systems, their performance and enterprise stacks integration — the topics rarely, if ever, covered at general blockchain events. Use the code TRUSTBYTHEBAY for 15% off and join us in Amsterdam!