• Managing Globally Distributed Data for Deep Learning using TensorFlow on YARN

    The benefits of large datasets for deep learning are well known. But what if the source of this data is globally distributed? Jagane Sundar shares a system for replicating data across geographically distributed data centers, discusses the benefits of consistently replicating data that is used by TensorFlow for training, and explores the advantages of using a Paxos-based distributed coordination algorithm for replication. Jagane then details the resultant unique capability to maintain consistent writable copies of the data in multiple data centers. Speaker: Jagane Sundar is the CTO at WANdisco. Jagane has extensive big data, cloud, virtualization, and networking experience. He joined WANdisco through its acquisition of AltoStor, a Hadoop-as-a-service platform company. Previously, Jagane was founder and CEO of AltoScale, a Hadoop- and HBase-as-a-platform company acquired by VertiCloud. His experience with Hadoop began as director of Hadoop performance and operability at Yahoo. Jagane’s accomplishments include creating Livebackup, an open source project for KVM VM backup, developing a user mode TCP stack for Precision I/O, developing the NFS and PPP clients and parts of the TCP stack for JavaOS for Sun Microsystems, and creating and selling a 32-bit VxD-based TCP stack for Windows 3.1 to NCD Corporation for inclusion in PC-Xware. Jagane is currently a member of the technical advisory board of VertiCloud. He holds a BE in electronics and communications engineering from Anna University. WANdisco will be giving away 3 Ipad Airs (the new model!) at the meetup.To enter the drawing, take this 3-question quiz https://forms.gle/op8PChvjuz2NXfSK6 by 12pm PST on Wed, 27 Mar and show up at the meetup for the drawing.

  • The Feature Store: the missing API between Data Engineering and Data Science?

    This is a crosspost from Bay Area AI, please register at https://www.meetup.com/bay-area-ai/events/258164070 Machine Learning (ML) pipelines are the key building block for productionizing ML code. However, pipelines are often developed as "silos" - features tend not to be easily re-used across pipelines or even within the same pipeline. Silos lead to duplication, unnecessarily re-implementing features and in the worst case correctness problems, if, for example, the features used for training and serving have inconsistent implementations. The Feature Store solves the problem of siloed and ad-hoc machine learning pipelines, by providing a data layer where feature engineering can be separated from the usage of features to generate training data. That is, the Feature Store should provide a clean API separating Data Engineering from Data Science. In this talk, we will introduce the world's first open-source Feature Store, built on Hopsworks, Apache Spark, and Apache Hive and targeting both TensorFlow/Keras and PyTorch. We will show how ML pipelines can be programmed, end-to-end, in Python, and the role of the Feature Store as a natural interface between Data Engineers and Data Scientists. In an end-to-end pipeline, we will show how the Feature Store works, and how you can write end-to-end ML pipelines in Python only (if you so choose). Speaker Bio: Jim Dowling is the CEO of Logical Clocks AB, as well as an Associate Professor at KTH Royal Institute of Technology in Stockholm. He is the lead architect of Hops, the world's most fastest and most scalable Hadoop distribution and first Hadoop platform with support for GPUs as a resource. He is a regular speaker at AI industry conferences, and blogs at O'Reilly on AI.

  • Bill Venners introduces Property-based Testing in ScalaTest 3.1

    This is a joint event with SF Scala -- please register there! https://www.meetup.com/SF-Scala/events/258488711/ ----- ScalaTest 3.1 will include built-in support for property-based testing. In this talk, Bill Venners will explain property-based testing, walk you through the design implementation of ScalaTest's support in 3, and compare it to ScalaCheck's approach. In addition, Bill will show a preview of Expectations and Facts, coming in ScalaTest 3.2, and show how Facts and property-based testing can be combined to describe and check contract specifications. Bill Venners is president of Artima, Inc., provider of Scala consulting, training, books, and tools. He leads the open source projects for the ScalaTest testing library and the Scalactic library for functional, object-oriented programming. He is coauthor with Martin Odersky and Lex Spoon of the book, Programming in Scala. And he is a community representative on the Scala Center's Advisory Board. ----- We're lucky to be in Bill's habitat, and he will teach special Scala retreats in March and April: Join us at a Scala Retreat! Step away from your daily routine and gather with other developers for a Scala learning experience surrounded by nature. In March and April, Bill Venners will be leading three Scala Retreats: March 18-20: Simply Scala Fundamentals, San Damiano, CA March 21-22: Simply Scala Advanced, San Damiano, CA April 8-9: Effective Scala, Palm Desert, CA Get more details and register here: https://www.artima.com/shop/workshop Enrollment is limited. Please register early.

  • Aggregations and knowledge extraction from social data: challenges and lessons

    This is a crosspost from the Bay Area AI -- register there to attend! https://www.meetup.com/bay-area-ai/events/257791863/ This talk is about the construction of new data assets from social media using techniques drawn from the areas of information retrieval, machine learning, graphs, and social networks. I’ll describe three projects based on Twitter and Foursquare data sets that use social data in different ways to help users in information seeking scenarios. The first one, a recommender system for recreational queries using location-based social networks. The second project, a social knowledge graph derived from Twitter with the goal of discovering relationships between people, links, and topics. And the third one, an application for archiving and Wikification of stories. Omar Alonso is a Principal Applied Scientist with Microsoft where he works on the intersection of information retrieval, social data, human computation, and knowledge graph generation. He is the co-chair of the Human Computation and Crowdsourcing track at WWW'19 and on the organizing committee for HCOMP'19.

  • Scale By the Bay 2018

    Twitter HQ

    Dear Friends — we are proud to announce the program of Scale By the Bay 2018, our sixth year of the flagship, and by now iconic, independent developer conference By the Bay. (Tl;dr: get your spot at http://scale.bythebay.io while supplies last, and especially when Early Bird is in effect until August 31.) The conference follows the established three-day, three track structure, hosted for the third year in a row by Twitter HQ in its wonderful modern building, with all of its spacious tracks, community spaces, cozy booths, and the commons area where so many connections are made during the hallway track. This year, Martin Odersky, the creator of Scala, opens the main conference on November 15. Neha Narkhede, the co-creator of Kafka and cofounder of Confluent, is keynoting the day 2. The three tracks are — Functional and Thoughtful Programming — Reactive Microservices and Streaming Architectures — End-to-end Data Pipelines all the way up to Machine Learning and AI The 100 sessions include technology leaders such as Twitter, IBM, Microsoft, Salesforce, Fauna, DataStax, Databricks, Confluent, Credit Karma, Sumo Logic, GoPro, Buoyant, Workday, Zignal Labs, and many more. We cover your tools with JetBrains, your shopping with Best Buy and Target, your vacations with HomeAway, your listening with Spotify, your viewing with Netflix, your reading with Medium, and your banking with JP Morgan Chase. The list goes on and on and on — we have the most of the advanced stacks and approaches employed by the best that Silicon Valley offers to the world at scale, shared as best practices, with code, yours to learn, take home, and build upon. Our speakers span the whole spectrum from the first-time presenters with leading companies to veterans of SBTB going all the way back to 2013, evolving their craft before our eyes. You can follow their progress by watching their previous talks on http://functional.tv and the photos of the past conferences at https://meetup.bythebay.photo/Conferences/Scale-By-the-Bay The three panels, closing each day, are: — Thoughtful Software Engineering — Data Engineering for AI, and — Cloud, Edge, and Silver Lining. Each day begins with a hot breakfast, that begins an uninterruptible supply of Philz coffee through the whole day, and lunch is provided. On the first two days, the closing panels are followed by our signature happy hours, with great drinks, food, and conversation. The hallway tracks are legendary. SBTB is famous for its bespoke, all-day, build-yourself-a-company training. This year, we double it. Cliff Click, the legend of software engineering, is teaching a full day Advanced Software Engineering workshop on 11/13, followed by Ryan Knight, now of Fauna, leading cloud-native data pipelines on 11/14. The workshops are limited by 80 participants each. As last year, we’ll plan an unconference track for those who want to share their ideas in an intimate setting for joint brainstorming. The only thing moderate about SBTB is its size — we cap at 600 attendees to preserve the immediate and direct nature of the communication that happens, sparks that fly, and serendipity that always occurs. We are always sold out by the time the conference begins in November — so reserve your seat early at http://scale.bythebay.io! And enjoy the Early Bird that is in effect until August 31.

  • The Rise of the Operational Analytic Data Stores

    Orange Silicon Valley

    Note: we need a venue to host us in San Francisco! ----- For more analytics pipelines, join us at http://scale.bythebay.io, 11/14-17, at Twitter HQ. ----- This is a joint meetup with SF Hadoop. ----- Operational analytic data stores are a new emerging class of databases that merges ideas of logsearch systems (Elastic, Splunk, etc) and traditional analytic databases (Vertica, Teradata, etc). Popular projects in this class include Apache Druid (incubating), Scuba (from Facebook), Clickhouse (from Yandex), Pinot (from LI), Palo (from Baidu), and more. We will discuss the motivation behind these databases, and discuss in the detail the history, architecture, and future of Druid. Speaker: Gian Merlino is an Apache Druid (incubating) PMC member and a co-founder of Imply. Previously, Gian led the data ingestion team at Metamarkets (now a part of Snapchat) and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.

  • Scale By the Bay 2018 CFP is accepting late submissions by 6/30

    Due to the overwhelming clamor for late submissions and great talks coming in still, the CFP is logarithmically extended as follows. 1/2 the program will be formed with the submissions added by 5/31. The next quarter will take into account those sent by 6/15 and the rest of the submissions that didn’t make the cut yet. The next part will be selected from all those plus the talks submitted by 6/30. A block of time is reserved for the invited talks of exceptionally high quality and importance, expanding the scope of the conference. Submit your talk at scale.bythebay.io!

  • Rethink Trust -- Amsterdam, June 29

    Beurs van Berlage

    We’re super excited to share some awesome news: we’re expanding to Europe and introducing our newest conference! Blockchain: http://RethinkTrust.org - taking place at Beurs van Berlage, Amsterdam, on June 29th! (Scroll down for 15% off.) Blockchain: Rethink Trust is a gathering of top experts in engineering and ecosystem-minded leaders, focused on reengineering enterprise trust networks through technology. It will expand your understanding of blockchain, trust mechanisms, and how corporate world uses them. Rethink Trust is a By the Bay conference, brought to you by the creators of Scale By the Bay, AI By the Bay, and Data By the Bay engineering events help in San Francisco for over five years in partnership with IBM, ING, Apple, Twitter, Salesforce, and dozens of innovative startups and enterprises in the Bay Area and around the world. Dr. Alexy Khrabrov, Founder, By the Bay, is the Program Chair, and he wrote about the philosophy of Rethink Trust on his Medium blog: http://chief.sc/rethinktrust2018-intro We invite C-level executives, senior engineers, and technical leaders who wants to master the best practices in blockchain to the famous Beurs van Berlage, “the third stock exchange” of Amsterdam. True to the spirit of all of our previous events, Blockchain: Rethink Trust is laser-focused on learning, open-source excellence, and industry-oriented approaches that work. SPEAKERS We have the world’s top technology leaders speaking at the event. The keynote speakers include: -- Christopher Ferris, IBM CTO Open Technology, Chair of the Hyperledger Technical Steering -- Mariana Gómez de la Villa, Global Program Manager, ING DLT You will also hear from: -- Clara Durodie, Founder and CEO, Cognitive Finance Group -- Roman Shaposhnik, VP Product & Strategy, Co-founder. ZEDEDA -- Yonatan Sompolinsky, Co-founder & Scientist, DAGlabs -- Michael Egorov, CTO, NuCypher -- Roberto Mancone, Chief Operating Officer at we.trade Innovation DAC, the company developing, deploying, and distributing we.trade, the Blockchain based Trade Finance , reporting to the Board of Director of the 9 European Shareholders Banks (Deutsche Bank, HSBC, KBC, Natixis, Nordea, Rabobank, Rabobank, Santander, SocGen, Unicredit) -- Christopher Georgen, Founder and CEO, Topl, the company developing blockchain solutions for the developing world … and more! TOPICS We will cover a variety of topics at the intersection of engineering and business management, including: -- Crypto protocols of today and tomorrow and the software engineering process required to deploy them for enterprise customers at scale; -- Technology behind the key blockchain deployments in FinTech and IoT; -- Rigorous software engineering practices required for safe, correct, and performant implementation of blockchain applications and platforms; -- Key aspects of hardware-software codevelopment crucial for IoT+blockchain; -- ... and more Explore the program of the conference at http://rethinktrust.org WORKSHOPS We’ll have three workshops at the workshop track, available to all attendees. You can freely switch between the main track and the workshop track. -- Hyperledger workshop taught by Arnaud Le Hors, core Hyperledger team -- Implementation workshop by IntellectSoft -- Scala Blockchain -- secure and type-safe -- by Topl, the developing world blockchain company YOU WILL LEARN -- Discover ways to implement blockchain technology for reengineering trust in key business verticals -- Understand best practices of enterprise adoption of the ledger consensus approaches -- Conduct strategic partnerships for a consortium of trusted and trustless systems -- Invite key developers to collaborate on your blockchain ecosystem Tickets For a limited time, we’re offering an additional 15% off to By the Bay community. To claim the offer, use the code BYTHEBAYOFFER15 at http://rethinktrust.org

  • Joint SF Spark, Global Advanced Spark and TensorFlow, and Bay Area AI Megameetup

    This meetup is a housewarming of the new Mesosphere office, that used to be the Nitro office, where we held so many SF Scala, SF Spark, Bay Area AI, Reactive Systems, and Advanced Spark meetups. And you know what? We are bringing the sexy back with all of them joining forces at once! Welcome the first joint meetup at Mesosphere, and remember to check in regularly for the great things to come that are happening here!

  • Spark+AI Summit

    Moscone Center

    As you know Spark + AI Summit, 2018 will be at the Moscone Convention Center, San Francisco. We are providing members of this meetup organization a special 15% discount code SF2MU. This year, Spark Summit adds the much needed AI focus. As Databricks matures, and as Spark adoption widens in the industry, we see an exciting series of use cases, new solutions in OSS and SaaS areas, and new entrants in the community as well as progress reports from many companies that are our regulars. There are two meetups held in conjunction with the summit: before and after! Before: the original BASM (founded in 2012 by Alexy and Matei) on June 4th, 2018, at the Moscone Convention Center, which is open to everyone; you don’t have to be registered to attend! https://www.meetup.com/spark-users/events/250659328/ And after: Joint SF Spark and Friends, Global Advanced Spark and TensorFlow, and Bay Area AI meetup (founded and presented by Alexy, Chris, and Alexy, resp.), https://www.meetup.com/SF-Spark-and-Friends/events/251030715/ -- hosted by Mesosphere in its new office at 225 Bush St., 7th floor -- that used to be Nitro where SF Spark and SF Scala and Bay Area AI were held before and will continue to be held going forward!