- [Register at https://swift.tf!] Swift as syntactic sugar for MLIR
Please register at https://swift.tf! This is a joint meetup with Swift for TensorFlow. If you RSVP here you'll be waitlisted and nothing else will happen! Swift for TensorFlow is covered at https://scale.bythebay.io conference in November. Reserve your seat to learn more! ----- We need a video sponsor for this meetup, at $500. You will be mentioned in the video if it happens and on the meetup! ----- Swift works great as an infinitely hackable syntactic interface to semantics that are defined by the compiler underneath it. The two options today are LLVM (there's a running joke that Swift is just syntactic sugar for LLVM) and TensorFlow graphs (which is the contribution of early versions of Swift for TensorFlow). Multi-Level Intermediate Representation (MLIR) is a generalization of both the LLVM IR and TensorFlow graphs to represent arbitrary computations at multiple levels of abstraction. This enables domain-specific optimizations and code generation (e.g. for CPUs, GPUs, TPUs, and other hardware targets). In the talk, we'll present some thoughts on how Swift could compile down to MLIR and show a few demos of prototype technologies that we've developed. Eugene Burmako ([masked]) is working on Swift for TensorFlow at Google AI. Before joining Google, he made major contributions to Scala at EPFL and Twitter, founding Reasonable Scala compiler, Scalameta and Scala macros. Eugene loves compilers, and his mission is to change the world with compiler technology. Alex Suhan ([masked]) is also working on Swift for TensorFlow at Google AI. He has been using LLVM to accelerate machine learning and data analytics workloads for the last five years. Alex enjoys working at the interface between software and various hardware accelerators. Our work is the result of discussions and collaboration with many folks - our colleagues from Google, the Swift compiler team from Apple, as well as our community members, including Jeremy Howard from http://fast.ai. We're very grateful for everyone's input and contributions!
- Scale By the Bay 2019 CFP Open until May 31
Friends — the month of May is when the Scale By the Bay (SBTB) CFP always runs, for the conference in November. The CFP is now open at https://scale.bythebay.io There are three tracks, as usual: — Functional Programming — Service Architectures — Data Pipelines, including ML/AI The theme for this year is the emergence of new distributed systems and their applications, including Edge, IoT, DLT, and AI on the Edge. Helena Edelson lead a team at Apple enabling ML/AI with Spark, Joe Beda started Google Compute Engine and Kubernetes, and Heather Miller lead Scala Center at EPFL and now advances distributed and edge systems at CMU. We have two talk lengths, 20 minutes and 40 minutes. There are 5-10 minute breaks between some, but not all, talk slots, and excellent coffee is served all day long so every break is a coffee break. Please check each time length you can work with. We often ask 40 min talks to shrink to 20 min as we try to accommodate all the best talks — and our acceptabnce rate is going down to 1:3 with years. We also serve hot breakfast and great lunch and amazing happy hours follow the main program in between all days. The hallway track is legendary, facilitated by the high ratio of speakers — 100+ out of the 600 attendees. We are committed to community above all and are working with underrepresented groups to send speakers. Please share this CFP with your diversity advocates, community managers, and encourage female engineers, African-American developers, and others to submit talks. If you could send such speakers on behalf of your company, it will help the community a lot. We’re also proactively reaching out to meetups, our core constituents, to help our established diversity program. We also work with companies like Stripe on diversity scholarships — let us know if you’d like to partner on this. Submit your best talks at https://scale.bythebay.io by May 31!
- Applied Machine Learning: a Netflix Production, Deep Recommendations at Twitch
This is a megameetup hosted by Twitch! The hosts present their tech as well as the talks from Netflix and Aperture Data engineers. Thank you so much Twitch! This meetup will be twitched -- expect a link shortly! (1) Applied Machine Learning is about as mature as Software Engineering circa 1998. For Data Scientists, it’s hard to collaborate, hard to be productive and hard to deploy to production. In the last 20 years, Software Engineers have become far more collaborative thanks to tools like git, far more productive thanks to cloud computing and far more effective at delivering quality software thanks to CI/CD and agile development practices. At Netflix, I get to work on problems like: how do we scale Data Science innovation by making collaboration effortless? How do we enable Data Scientists to single-handedly and reliably introduce their models to production? How do we make it easy to develop ML models that humans trust? More importantly, how do we use ML to make humans BETTER? In this talk, we’ll explore how Netflix is approaching these problems to further our mission of creating joy for our 125 Million+ members worldwide! Speaker: Julie Pitt leads the Machine Learning Infrastructure at Netflix, with the goal of scaling Data Science while increasing innovation. She previously built streaming infrastructure behind the "play" button while Netflix was transitioning from domestic DVD-by-mail service to international streaming service. Julie also co-founded Order of Magnitude Labs, with a mission to build AI capable of doing things that humans find easy and today’s machines find hard: exploration, communication, creativity and accomplishing long-range goals. Early in her career, Julie developed data processing software at Lawrence Livermore National Laboratory that enabled scientists to study the newly-sequenced human genome. (2) Deep Recommendations at Twitch Abstract: Deep Recommendations at Twitch: Twitch is a social video platform that democratizes broadcasting, with 15 million + daily viewers. In this talk we'll explore some of the difficulties that live content introduces to recommendations, and the recommender we built to personalize many products at Twitch. In particular, we'll explore some of the architecture decisions we made and what informed them. We'll also discuss some of our learnings around offline metrics and things to keep an eye on as you move to online experiments. Speaker: Mark Ally is a Senior Applied Scientist at Twitch, working on deep learning techniques for recommendation systems (3) Let Us Manage Your Visual Data So You Can Make Machines Learn Better ApertureData's platform accelerates AI applications through its Data Management solution that redefines how large visual data sets are stored, searched and processed. It exposes a unified interface that allows users to store and search both the data and metadata associated with visual artifacts (images or videos). ApertureData's platform provides several innovative features: the ability to evolve metadata easily without requiring costly schema change, first-class status for feature vectors and bounding boxes, the ability to perform similarity searches as well as the ability to perform common pre-processing operations close to the data. The platform will be pluggable in allowing data to be stored on different backends and serve any machine learning pipeline. Speaker: Vishakha Gupta is the Founder and CEO at ApertureData. Prior to that, she was at Intel Labs for over 7 years where she led the design and development of VDMS (the Visual Data Management System) which forms the core of ApertureData's platform. Vishakha graduated from the Georgia Institute of Technology with a Ph.D in Computer Science where her work focused on virtualization. ----- Julie is a regular speaker at Scale By the Bay, the 2019 CFP opens May 1 and ends May 31, submit your best talks early starting May 1 at http://scale.bythebay.io!
- Managing Globally Distributed Data for Deep Learning using TensorFlow on YARN
The benefits of large datasets for deep learning are well known. But what if the source of this data is globally distributed? Jagane Sundar shares a system for replicating data across geographically distributed data centers, discusses the benefits of consistently replicating data that is used by TensorFlow for training, and explores the advantages of using a Paxos-based distributed coordination algorithm for replication. Jagane then details the resultant unique capability to maintain consistent writable copies of the data in multiple data centers. Speaker: Jagane Sundar is the CTO at WANdisco. Jagane has extensive big data, cloud, virtualization, and networking experience. He joined WANdisco through its acquisition of AltoStor, a Hadoop-as-a-service platform company. Previously, Jagane was founder and CEO of AltoScale, a Hadoop- and HBase-as-a-platform company acquired by VertiCloud. His experience with Hadoop began as director of Hadoop performance and operability at Yahoo. Jagane’s accomplishments include creating Livebackup, an open source project for KVM VM backup, developing a user mode TCP stack for Precision I/O, developing the NFS and PPP clients and parts of the TCP stack for JavaOS for Sun Microsystems, and creating and selling a 32-bit VxD-based TCP stack for Windows 3.1 to NCD Corporation for inclusion in PC-Xware. Jagane is currently a member of the technical advisory board of VertiCloud. He holds a BE in electronics and communications engineering from Anna University.
- The Feature Stores: the missing API between Data Engineering and Data Science?
This meetup is focused around Features Stores with three talks from Jim Dowling (Logical Clocks), Varant Zanoyan (Airbnb), and Nick Handel (Branch). Thanks to Mesosphere for hosting the event and ArangoDB for sponsoring Pizza! *The Feature Store: the missing API between Data Engineering and Data Science?* Machine Learning (ML) pipelines are the key building block for productionizing ML code. However, pipelines are often developed as "silos" - features tend not to be easily re-used across pipelines or even within the same pipeline. Silos lead to duplication, unnecessarily re-implementing features and in the worst case correctness problems, if, for example, the features used for training and serving have inconsistent implementations. The Feature Store solves the problem of siloed and ad-hoc machine learning pipelines, by providing a data layer where feature engineering can be separated from the usage of features to generate training data. That is, the Feature Store should provide a clean API separating Data Engineering from Data Science. In this talk, we will introduce the world's first open-source Feature Store, built on Hopsworks, Apache Spark, and Apache Hive and targeting both TensorFlow/Keras and PyTorch. We will show how ML pipelines can be programmed, end-to-end, in Python, and the role of the Feature Store as a natural interface between Data Engineers and Data Scientists. In an end-to-end pipeline, we will show how the Feature Store works, and how you can write end-to-end ML pipelines in Python only (if you so choose). Speaker Bio: Jim Dowling is the CEO of Logical Clocks AB, as well as an Associate Professor at KTH Royal Institute of Technology in Stockholm. He is the lead architect of Hops, the world's most fastest and most scalable Hadoop distribution and first Hadoop platform with support for GPUs as a resource. He is a regular speaker at AI industry conferences, and blogs at O'Reilly on AI. *Zipline at Airbnb* Zipline is Airbnb’s soon to be open-sourced data management platform specifically designed for ML use cases. It has taken the task of training data generation from months to days and offers data management solutions from model training to serving. This talk will cover the framework at a high level, focusing on the specific challenges of data engineering for ML, and how Zipline provides a solution. Speaker Bio: Varant Zanoyan is a software engineer on the Machine Learning Infrastructure team at Airbnb where he focuses on Zipline, a data management framework for Machine Learning. Previously, he solved data infrastructure problems at Palantir Technologies. *Machine Learning Infrastructure at an Early Stage* Good machine learning is built on infrastructure but many startups don't have the bandwidth or resources to build this foundation while scaling. It's difficult to prioritize the pieces of ML Infrastructure that data scientists and engineers need to be productive and successful when the scale of these projects can be months or years for small teams of engineers. The dividends are large down the road but the cost of pursuing infrastructure that doesn't work or doesn't solve the right problems can leave a team months down the road without necessary progress. This talk focuses on the foundation that any good machine learning system is built on and the elements of ML infrastructure to focus on first. Speaker Bio: Nick Handel serves as Branch International's Head of Data Science. Prior to joining Branch, he was a Product Manager for Airbnb's machine learning infrastructure teams. Before moving to centralize the company's artificial intelligence efforts, he was an early member of the company's data science team, helping the company expand internationally between 2014 and 2015 and leading a data science team that launched Airbnb's Trips product in 2016. Before joining Airbnb, he was a research economist at BlackRock, focusing on emerging market debt.
- Aggregations and knowledge extraction from social data: challenges and lessons
This talk is about the construction of new data assets from social media using techniques drawn from the areas of information retrieval, machine learning, graphs, and social networks. I’ll describe three projects based on Twitter and Foursquare data sets that use social data in different ways to help users in information seeking scenarios. The first one, a recommender system for recreational queries using location-based social networks. The second project, a social knowledge graph derived from Twitter with the goal of discovering relationships between people, links, and topics. And the third one, an application for archiving and Wikification of stories. Omar Alonso is a Principal Applied Scientist with Microsoft where he works on the intersection of information retrieval, social data, human computation, and knowledge graph generation. He is the co-chair of the Human Computation and Crowdsourcing track at WWW'19 and on the organizing committee for HCOMP'19.
- Scale By the Bay 2018
Folks -- our flagship yearly gathering, scale.bythebay.io, is fast approaching, and the program is bursting at the seems with amazing talks. We close all the gaps with the strongest additions, many related to AI that is fed by our favorite data pipelines we know how to build so well. — Clément Farabet, VP of AI Infrastructure at Nvidia, will share the updates from the GPU land on AI for Self-Driving Cars — Alex Sergeev, the creator of Horovod from Uber, will show how to speed up your Deep Learning dramatically with it — Aleksandra Kudriashova, Head of Product at Astro Digital, will show how satellite imagery can help analyze world food economy — Salesforce will show how data pipelines and cutting edge R&D connect in production with Salesforce Einstein and graph analysis, with Richard Socher, Chief Scientist of Salesforce, following engineering talks with a fireside chat and a panel. Our Data Pipelines for AI panel this year includes Richard Socher, Peter Bailis, professor at Stanford and member of DAWN lab there, as well as the founder of sisu.ai; Pete Skomoroch, the founder of SkipFlag (acquired by Workday); Lukas Biewald, founder of CrowdFlower and Weights and Biases; and Michelle Casbon, Google Cloud Platform ML/Big Data engineer. Our Thoughtful Software Engineering panel, including Martin Odersky, Julie Pitt, Marius Eriksen, Runar Bjarnason, and Bryan Cantrill, will be moderated by Cliff Click -- the creator of the HotSpot JIT and cofounder of H2O.ai, who is also teaching the bespoke Advanced Software Engineering workshop the day before. The Cloud, Edge and IoT panel now includes Anoop Nannra, the Head of Cisco Blockchain Initiative and Chairman, Trusted IoT Alliance; Roman Shaposhnik, cofounder, Zededa, and board member, Apache Software Foundation; Bernard Golden, Head of Cloud Strategy, Capital One (to be continued) -- looking for strong panelists representing GCP/Azure/AWS as well. Other talks on the program include High-Performance Bayesian Inference with Rainier by Avi Bryant (Stripe), Graph Analysis by Alexis Roos (Salesforce), Privacy-Preserving Data Science in Scala by David Andrzejewski (Sumo Logic), Towards Typesafe Deep Learning by Tongfei Chen (Johns Hopkins University), The Evolution of the GoPro data platform by David Winters (GoPro), Labels to Inference by Jeff Fenchel (Zignal Labs), Structured Deep Learning by Jayant Krishnamurthy (Semantic Machines), Hadoop Future in the AI World by Milind Bhandarkar (Ampool), and many, many more -- see the full schedule at http://scale.bythebay.io. Use the code BAYHAREAAI15 for 15% off all passes while they are available! Late Bird only from November 1st.
- Scale By the Bay 2018 CFP is accepting late submissions by 6/30
Due to the overwhelming clamor for late submissions and great talks coming in still, the CFP is logarithmically extended as follows. 1/2 the program will be formed with the submissions added by 5/31. The next quarter will take into account those sent by 6/15 and the rest of the submissions that didn’t make the cut yet. The next part will be selected from all those plus the talks submitted by 6/30. A block of time is reserved for the invited talks of exceptionally high quality and importance, expanding the scope of the conference. Submit your talk at scale.bythebay.io!
- Joint SF Spark, Bay Area AI, Global Advanced Spark and TensorFlow Meetup!
This meetup is a housewarming of the new Mesosphere office, that used to be the Nitro office, where we held so many SF Scala, SF Spark, Bay Area AI, Reactive Systems, and Advanced Spark meetups. And you know what? We are bringing the sexy back with all of them joining forces at once! Welcome the first joint meetup at Mesosphere, and remember to check in regularly for the great things to come that are happening here!
- Scale By the Bay 2018 CFP is Open until May 31
It's the sixth year that we are organizing our flagship Scale By the Bay conference, and it's a truly spectacular tech event many of you know very well. For those who are new to SBTB, I would love to invite you to attend. And if you'd like to present, we'd like to see your talk! The CFP for SBTB 2018 is now open through May 31: http://scale.bythebay.io/cfp.html Give it your best shot, or two, as the rate of high-quality submissions is already very high. At Scale By The Bay, returning to Twitter HQ in San Francisco on November 15-17, 2018, you can connect with fellow senior software engineers, CTOs, VPs/Directors of Engineering, developers and technical founders who never stop learning. Embrace the whole end-to-end software stacks and infrastructure running them, put together your own SMACK Stack, operationalize reactive micro services and data pipelines, build streaming data infrastructure for actionable, real-time insights, and deep-dive into practical aspects of full-stack architectures and developer productivity. We'll have a stellar program: * the full-stack Scala and Functional Programming conference with world authorities on practical FP, beginning with Martin Odersky, the creator of Scala, who comes back to keynote SBTB! * the fast data pipelines done right, with Neha Narkhede, the co-creator of Apache Kafka and co-founder of Confluent, keynoting * FP+ML: Functional Programming for Machine Learning, a topic even more current today when TensorFlow for Swift has been unveiled Throughout the three track, three day event, we'll weave the themes of open-source development, type safety, full-stack acrhitectures, with the emerging areas of ML and AI so that you can learn all about it if you want to. At the same time, we'll make sure we're still, and always, the best in the software engineering realm with solid understanding of distributed systems, from operations up to services to streaming algorithms. We firmly believe that thoughtful software engineering with the right reusable abstractions and best practices around development is key to everything. We want to link this approach to more things and see more use cases. We especially welcome FP+ML talks this year. Please note that we go forward at Scale. We welcome production use cases of all thoughtfully designed software stacks, including Scala, Haskell, Swift, Rust, Clojure, F#, and so on. We welcome Java, C++, Go, and other systems, especially in the microservice, polyglot environment. Our SMACK 2.0 plan, unveiled at the Index conference, calls for Streaming, in-Memory architectures, API-centric, Containerized and running on Kubernetes. We welcome submissions on all levels of these new systems, starting with orchestration. No matter where you are along the full-stack spectrum, you need thoughtful software engineering, reactive and streaming architectures, manageable micro services, and scalable data pipelines that can work together with modern ML frameworks for immediate customer insights. Join the SBTB family at Twitter HQ again this year, see how companies like Twitter are built in software, build your own, and share your findings with others! See you in November at Twitter HQ! Dr. Alexy Khrabrov, Program Chair, By the Bay PS. If you are in Europe and can't wait, By the Bay comes to Amsterdam as RethinkTrust.org, out first signature engineering take on enterprise trust systems with blockchain and hyperledger in energy, fintech, IoT, and other real-world use cases. Our tech includes Swift and Scala, scalability and security of trust systems, their performance and enterprise stacks integration — the topics rarely, if ever, covered at general blockchain events. Use the code TRUSTBYTHEBAY for 15% off and join us in Amsterdam!