• Aggregations and knowledge extraction from social data: challenges and lessons

    This is a crosspost from the Bay Area AI -- register there to attend! https://www.meetup.com/bay-area-ai/events/257791863/ This talk is about the construction of new data assets from social media using techniques drawn from the areas of information retrieval, machine learning, graphs, and social networks. I’ll describe three projects based on Twitter and Foursquare data sets that use social data in different ways to help users in information seeking scenarios. The first one, a recommender system for recreational queries using location-based social networks. The second project, a social knowledge graph derived from Twitter with the goal of discovering relationships between people, links, and topics. And the third one, an application for archiving and Wikification of stories. Omar Alonso is a Principal Applied Scientist with Microsoft where he works on the intersection of information retrieval, social data, human computation, and knowledge graph generation. He is the co-chair of the Human Computation and Crowdsourcing track at WWW'19 and on the organizing committee for HCOMP'19.

  • Scale By the Bay 2018

    Needs a location

    Dear Friends — we are proud to announce the program of Scale By the Bay 2018, our sixth year of the flagship, and by now iconic, independent developer conference By the Bay. (Tl;dr: get your spot at http://scale.bythebay.io while supplies last, and especially when Early Bird is in effect until August 31.) The conference follows the established three-day, three track structure, hosted for the third year in a row by Twitter HQ in its wonderful modern building, with all of its spacious tracks, community spaces, cozy booths, and the commons area where so many connections are made during the hallway track. This year, Martin Odersky, the creator of Scala, opens the main conference on November 15. Neha Narkhede, the co-creator of Kafka and cofounder of Confluent, is keynoting the day 2. The three tracks are — Functional and Thoughtful Programming — Reactive Microservices and Streaming Architectures — End-to-end Data Pipelines all the way up to Machine Learning and AI The 100 sessions include technology leaders such as Twitter, IBM, Microsoft, Salesforce, Fauna, DataStax, Databricks, Confluent, Credit Karma, Sumo Logic, GoPro, Buoyant, Workday, Zignal Labs, and many more. We cover your tools with JetBrains, your shopping with Best Buy and Target, your vacations with HomeAway, your listening with Spotify, your viewing with Netflix, your reading with Medium, and your banking with JP Morgan Chase. The list goes on and on and on — we have the most of the advanced stacks and approaches employed by the best that Silicon Valley offers to the world at scale, shared as best practices, with code, yours to learn, take home, and build upon. Our speakers span the whole spectrum from the first-time presenters with leading companies to veterans of SBTB going all the way back to 2013, evolving their craft before our eyes. You can follow their progress by watching their previous talks on http://functional.tv and the photos of the past conferences at https://meetup.bythebay.photo/Conferences/Scale-By-the-Bay The three panels, closing each day, are: — Thoughtful Software Engineering — Data Engineering for AI, and — Cloud, Edge, and Silver Lining. Each day begins with a hot breakfast, that begins an uninterruptible supply of Philz coffee through the whole day, and lunch is provided. On the first two days, the closing panels are followed by our signature happy hours, with great drinks, food, and conversation. The hallway tracks are legendary. SBTB is famous for its bespoke, all-day, build-yourself-a-company training. This year, we double it. Cliff Click, the legend of software engineering, is teaching a full day Advanced Software Engineering workshop on 11/13, followed by Ryan Knight, now of Fauna, leading cloud-native data pipelines on 11/14. The workshops are limited by 80 participants each. As last year, we’ll plan an unconference track for those who want to share their ideas in an intimate setting for joint brainstorming. The only thing moderate about SBTB is its size — we cap at 600 attendees to preserve the immediate and direct nature of the communication that happens, sparks that fly, and serendipity that always occurs. We are always sold out by the time the conference begins in November — so reserve your seat early at http://scale.bythebay.io! And enjoy the Early Bird that is in effect until August 31.

  • The Rise of the Operational Analytic Data Stores

    Orange Silicon Valley

    Note: we need a venue to host us in San Francisco! ----- For more analytics pipelines, join us at http://scale.bythebay.io, 11/14-17, at Twitter HQ. ----- Operational analytic data stores are a new emerging class of databases that merges ideas of logsearch systems (Elastic, Splunk, etc) and traditional analytic databases (Vertica, Teradata, etc). Popular projects in this class include Apache Druid (incubating), Scuba (from Facebook), Clickhouse (from Yandex), Pinot (from LI), Palo (from Baidu), and more. We will discuss the motivation behind these databases, and discuss in the detail the history, architecture, and future of Druid. Speaker: Gian Merlino is an Apache Druid (incubating) PMC member and a co-founder of Imply. Previously, Gian led the data ingestion team at Metamarkets (now a part of Snapchat) and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.

  • Rethink Trust -- Amsterdam, June 29

    Beurs van Berlage

    We’re super excited to share some awesome news: we’re expanding to Europe and introducing our newest conference! Blockchain: http://RethinkTrust.org - taking place at Beurs van Berlage, Amsterdam, on June 29th! (Scroll down for 15% off.) Blockchain: Rethink Trust is a gathering of top experts in engineering and ecosystem-minded leaders, focused on reengineering enterprise trust networks through technology. It will expand your understanding of blockchain, trust mechanisms, and how corporate world uses them. Rethink Trust is a By the Bay conference, brought to you by the creators of Scale By the Bay, AI By the Bay, and Data By the Bay engineering events help in San Francisco for over five years in partnership with IBM, ING, Apple, Twitter, Salesforce, and dozens of innovative startups and enterprises in the Bay Area and around the world. Dr. Alexy Khrabrov, Founder, By the Bay, is the Program Chair, and he wrote about the philosophy of Rethink Trust on his Medium blog: http://chief.sc/rethinktrust2018-intro We invite C-level executives, senior engineers, and technical leaders who wants to master the best practices in blockchain to the famous Beurs van Berlage, “the third stock exchange” of Amsterdam. True to the spirit of all of our previous events, Blockchain: Rethink Trust is laser-focused on learning, open-source excellence, and industry-oriented approaches that work. SPEAKERS We have the world’s top technology leaders speaking at the event. The keynote speakers include: -- Christopher Ferris, IBM CTO Open Technology, Chair of the Hyperledger Technical Steering -- Mariana Gómez de la Villa, Global Program Manager, ING DLT You will also hear from: -- Clara Durodie, Founder and CEO, Cognitive Finance Group -- Roman Shaposhnik, VP Product & Strategy, Co-founder. ZEDEDA -- Yonatan Sompolinsky, Co-founder & Scientist, DAGlabs -- Michael Egorov, CTO, NuCypher -- Roberto Mancone, Chief Operating Officer at we.trade Innovation DAC, the company developing, deploying, and distributing we.trade, the Blockchain based Trade Finance , reporting to the Board of Director of the 9 European Shareholders Banks (Deutsche Bank, HSBC, KBC, Natixis, Nordea, Rabobank, Rabobank, Santander, SocGen, Unicredit) -- Christopher Georgen, Founder and CEO, Topl, the company developing blockchain solutions for the developing world … and more! TOPICS We will cover a variety of topics at the intersection of engineering and business management, including: -- Crypto protocols of today and tomorrow and the software engineering process required to deploy them for enterprise customers at scale; -- Technology behind the key blockchain deployments in FinTech and IoT; -- Rigorous software engineering practices required for safe, correct, and performant implementation of blockchain applications and platforms; -- Key aspects of hardware-software codevelopment crucial for IoT+blockchain; -- ... and more Explore the program of the conference at http://rethinktrust.org WORKSHOPS We’ll have three workshops at the workshop track, available to all attendees. You can freely switch between the main track and the workshop track. -- Hyperledger workshop taught by Arnaud Le Hors, core Hyperledger team -- Implementation workshop by IntellectSoft -- Scala Blockchain -- secure and type-safe -- by Topl, the developing world blockchain company YOU WILL LEARN -- Discover ways to implement blockchain technology for reengineering trust in key business verticals -- Understand best practices of enterprise adoption of the ledger consensus approaches -- Conduct strategic partnerships for a consortium of trusted and trustless systems -- Invite key developers to collaborate on your blockchain ecosystem Tickets For a limited time, we’re offering an additional 15% off to By the Bay community. To claim the offer, use the code BYTHEBAYOFFER15 at http://rethinktrust.org

  • Scale By the Bay 2018 CFP is Open until May 31

    Needs a location

    It's the sixth year that we are organizing our flagship Scale By the Bay conference, and it's a truly spectacular tech event many of you know very well. For those who are new to SBTB, I would love to invite you to attend. And if you'd like to present, we'd like to see your talk! The CFP for SBTB 2018 is now open through May 31: http://scale.bythebay.io/cfp.html Give it your best shot, or two, as the rate of high-quality submissions is already very high. At Scale By The Bay, returning to Twitter HQ in San Francisco on November 15-17, 2018, you can connect with fellow senior software engineers, CTOs, VPs/Directors of Engineering, developers and technical founders who never stop learning. Embrace the whole end-to-end software stacks and infrastructure running them, put together your own SMACK Stack, operationalize reactive micro services and data pipelines, build streaming data infrastructure for actionable, real-time insights, and deep-dive into practical aspects of full-stack architectures and developer productivity. We'll have a stellar program: * the full-stack Scala and Functional Programming conference with world authorities on practical FP, beginning with Martin Odersky, the creator of Scala, who comes back to keynote SBTB! * the fast data pipelines done right, with Neha Narkhede, the co-creator of Apache Kafka and co-founder of Confluent, keynoting * FP+ML: Functional Programming for Machine Learning, a topic even more current today when TensorFlow for Swift has been unveiled Throughout the three track, three day event, we'll weave the themes of open-source development, type safety, full-stack acrhitectures, with the emerging areas of ML and AI so that you can learn all about it if you want to. At the same time, we'll make sure we're still, and always, the best in the software engineering realm with solid understanding of distributed systems, from operations up to services to streaming algorithms. We firmly believe that thoughtful software engineering with the right reusable abstractions and best practices around development is key to everything. We want to link this approach to more things and see more use cases. We especially welcome FP+ML talks this year. Please note that we go forward at Scale. We welcome production use cases of all thoughtfully designed software stacks, including Scala, Haskell, Swift, Rust, Clojure, F#, and so on. We welcome Java, C++, Go, and other systems, especially in the microservice, polyglot environment. Our SMACK 2.0 plan, unveiled at the Index conference, calls for Streaming, in-Memory architectures, API-centric, Containerized and running on Kubernetes. We welcome submissions on all levels of these new systems, starting with orchestration. No matter where you are along the full-stack spectrum, you need thoughtful software engineering, reactive and streaming architectures, manageable micro services, and scalable data pipelines that can work together with modern ML frameworks for immediate customer insights. Join the SBTB family at Twitter HQ again this year, see how companies like Twitter are built in software, build your own, and share your findings with others! See you in November at Twitter HQ! Dr. Alexy Khrabrov, Program Chair, By the Bay PS. If you are in Europe and can't wait, By the Bay comes to Amsterdam as RethinkTrust.org, out first signature engineering take on enterprise trust systems with blockchain and hyperledger in energy, fintech, IoT, and other real-world use cases. Our tech includes Swift and Scala, scalability and security of trust systems, their performance and enterprise stacks integration — the topics rarely, if ever, covered at general blockchain events. Use the code TRUSTBYTHEBAY for 15% off and join us in Amsterdam!

  • SMACK 2.0: Emerging Data Pipelines Panel at Index

    IBM Index (http://chief.sc/index-2018) is a fantastic new developer conference. Register (http://chief.sc/index-2018-register) by 2/20 with the code CD3ALEXY to attend the Community Day for free and the main program for just $280. SMACK 2.0 panel (http://chief.sc/index-2018-smack20-panel) is held on 2/22 at 2pm, preceded by the SMACK 2.0 workshop (http://chief.sc/iindex-2018-smack20-workshop) on the community day (day 0, 2/20, 3-5:30pm). In this panel, we discuss SMACK (http://smackstack.org/), the popular framework to describe and compare data pipelines. SMACK 1.0 was often composed of Spark, Mesos, Akka, Cassandra and Kafka. In SMACK 2.0, we explore emerging ways to build scalable data-heavy applications for Machine Learning, relying on Streaming, and in-Memory computing (including Spark), Model-serving, API, Cloud/Cassandra/Containers and Kubernetes (with Kafka often being the source). Instead of fixing SMACK components as we did for SMACK 1.0 — Data source, API, Compute, Persistence, Operationalization — we consider alternatives for various use cases. For instance, S will increasingly be Serverless. What are the emerging patterns, and when some of the approaches make more sense than others? Certain applications, such as Fintech, inform in-Memory computing, while others, such as IoT, favor streaming with real-time AI feedback. Panelists: Nikita Ivanov, co-founder and CTO, GridGain Sijie Guo, co-founder, Streamlio Anya Bida, DevOps Engineer, Salesforce Hugh McKee, Developer Advocate, Lightbend Tathagata Das, Software Engineer, Databricks The SMACK 2.0 panel is preceded by the workshop (http://chief.sc/iindex-2018-smack20-workshop) during the Community Day. Both sessions are curated and moderated by Dr. Alexy Khrabrov (http://chiefscientist.org/), the founder and organizer of Scale By the Bay and the creator of the original SMACK Stack (http://smackstack.org/) training.

  • Streamlio, GridGain, Cassandra+Spark: FREE Workshop at Index

    We're happy to announce two new Index (http://chief.sc/index-2018) sessions. 2/20 is the free SMACK 2.0 workshop. Moscone West, 3-5:30pm. Register (http://chief.sc/index-2018-register) by 2/20 with the code CD3ALEXY to attend the Community Day for free and the main program for just $280. (1) Streaming -- Streamlio (2) Memory computing -- GridGain (3) Cassandra+Spark (1) Building modern data pipelines by unifying Apache Pulsar, Apache Heron, Apache BookKeeper For today’s enterprises, ensuring that data pipelines are available to every corner of the organization is key to building next generation data-driven applications. In this talk Karthik Ramasamy of Streamlio will present on how to combine three best of breed open-source projects to have a solid data infrastructure that are is easy to develop against and simple to operate at scale in production. He will provide an overview of the merits of the three open source systems and then benefits they bring when integrated: Apache Pulsar: unified queuing and streaming Apache Heron: stream processing Apache BookKeeper: distributed stream storage Karthik Ramasamy is the co-founder of Streamlio that focuses on building next generation real time processing engines. Before Streamlio, he was the engineering manager and technical lead for real-time analytics at Twitter where he co-created Twitter Heron. He has two decades of experience working in parallel databases, big data infrastructure, and networking. Karthik is the author of several publications, patents, and "Network Routing: Algorithms, Protocols and Architectures". He has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases. (2) Apache Spark and Apache Ignite: Where Fast Data Meets the IoT It is not enough to build a mesh of sensors or embedded devices to obtain more insights about the surrounding environment and optimize your production systems. Usually, your IoT solution needs to be capable of transferring enormous amounts of data to storage or the cloud where the data have to be processed further. Quite often, the processing of the endless streams of data has to be done in real-time so that you can react on the IoT subsystem's state accordingly. This session will show attendees how to build a Fast Data solution that will receive endless streams from the IoT side and will be capable of processing the streams in real-time using Apache Ignite's cluster resources. In particular, attendees will learn about data streaming to an Apache Ignite cluster from embedded devices and real-time data processing with Apache Spark. Live-Coding Workshop (3) Building Your First Spark & Cassandra Application: A Code-Along Adventure w/ Russell Spitzer Not sure where to start with Cassandra and Spark? Together let’s walk through starting your first Spark Application. We’ll walk through the setting up your IDE and integration tests, everything you need to build your first scalable and distributed Spark App. Learn how to use embedded Cassandra and Spark to write your own tests which are easily debuggable in standard IDEs. This will be a short but interactive adventure! Feel free to bring your own laptop and come code along! We will be using IDEA along with the template provided by Datastax About Russell Spitzer: After earning his Ph.D in bioinformatics from UCSF, Russell Spitzer took his love of big data to DataStax. There he has worked on all aspects of integrating Cassandra with other Apache technologies like Spark, Hadoop and Solr. Now his main focus on the integration of Cassandra with Apache Spark via the Spark Cassandra Connector. We are working with the IBM community teams to make their flagship developer conference, Index ( http://www.indexconf.com/ ), the most meaningful and fun experience for Bay Area developers. Alexy Khrabrov talks about Index with Markus Eisele, Selection Committee Chair and Director of Developer Advocacy, Lightbend: http://chief.sc/index-2018-overview In our communities, we created and popularized the SMACK Stack ( http://smackstack.org/ ) -- a way to reason about end-to-end data pipeline architectures. Building and running such pipelines, and the components comprising them, are the key themes of Index. The conference starts with the free Index Community Day ( https://developer.ibm.com/indexconf/communities/ ), 2/20 which consists of 14 half-day sessions on the key technologies, many either directly relevant or of strong interest to most of us: • Spark • Kafka • Docker • Kubernetes • OpenAPI • Hyperledger • Istio • TensorFlow • Cloud Foundry You can build multiple viable architecture from these technologies, and they are often used together. To explore the progress made since SMACK 1.0, introduced in 2015, we are putting together a SMACK 2.0 panel, brainstorming the emerging SMACK Stacks. There is a wealth of expertise from many of the companies that present By the Bay regularly: Lightbend, Twilio, Slack, Uber, Google, Facebook, IBM, Eero, and many others. You can already meet many speakers at the IBM developerWorks TV playlist for Index: http://chief.sc/index-2018-videos We’ll update this description as we ramp up our Index + SMACK 2.0 events!

  • Functional Linear Algebra and Holiday Party By the Bay!

    • What we'll do It is curious that a bunch of Linear Algebra implementations are written as if it is Fortran all the way down. We can do better, and a functional implementation, with less mutability and more abstractions, may save efforts, space, and processor time. I will show pieces of easy-to-read code that works with vectors and matrices in a functional way. As an example, an efficient and practical implementation of PCA. Linear Algebra is how you eventually solve Deep Learning and most other Statistical Machine Learning problems. Vlad Patryshev is the organizer of Scala Bay, regular speaker By the Bay, and a software engineer at Salesforce. • What to bring Wine and cakes -- it is a Holiday Party too! We can also host unmeetup-style lightning talks and it will be the holiday party! • What to bring • Important to know

  • Not Another Big Data Conference


    NABDConf (http://www.criteo.com/events/nabdconf-palo-alto/) is coming to the Bay Area for the first time on December 12! After two successful editions in Paris, we are now bringing NABDConf to the Bay area. On December 12, the conference by developers for developers will bring together engineers who have spent their careers resolving difficult problems at scale with other engineers who are facing similar challenges or who are just plain curious as to what’s going on behind the scenes at web-scale companies like Spotify, Cloudera, Uber, Criteo, and at fast-moving startups like Determined AI. This edition will center around topics ranging from real world deep learning and data science at scale to the nuts and bolts of making data engineering easier to understand and ever more efficient. We'll deep dive into technology like Spark, Parquet and Arrow, TensorFlow and other OSS projects like Featran, Cuttle and SLAB. Sessions will include industry heavyweights sharing insights on: Scalable Deep Learning on Big-Data Clusters Monitoring data production and the escalation process around it Examining multiple examples to outline these key capabilities of TensorFlow and Spark Register for the conference today! Want to know what to expect? Check out the highlight reel from last year's event.

  • H2O World

    Needs a location

    • What we'll do H2O World 2017 Date: 4th and 5th December, 2017 Location: Computer History Museum, Mountain View We’re back with our flagship event H2O World 2017 to bring together the best of data science, AI, and business transformation. We’re going to make it rain insights and storm discussions with speakers from industry-leading companies such as ADP, Amazon’s A9, Booking.com, Capital One, Comcast, Experian, Kaiser Permanente, PayPal and many more to learn how they are using data science, machine learning and AI to transform their business. Enjoy 50+ sessions on topics such as Auto Feature Engineering, Machine Learning Interpretability, NLP, Automatic Machine Learning, Convex Optimization, Auto Visualization, Computer Vision and many more. In addition to learning from industry experts, we invite you to: * Chat with 5 Kaggle Grandmasters * Compete in the Hackathon * Participate in hands-on lab sessions with the makers behind H2O products * Book signings by Professor Rob Tibshirani, Darren Cook, Michal Malohlava * Make a memory at the digital caricature booth * Let your taste buds loose by our food trucks Register here: h2oworld.h2o.ai To win a ticket to H2O World 2017, please tweet your best hashtag about H2O / H2O World to @h2oai and the most impressive hashtags will stand a chance to win. You could also fill out this small survey but we suggest to let your creative horses loose on twitter and better your chances. To avail a discounted ticket, please use discount code ALEXY65 on our general admission tickets to avail a 65% discount! Note: RSVP'ing on meetup.com doesn't account for your ticket to H2O World 2017. Please visit the official website at h2oworld.h2o.ai and save a spot! • What to bring • Important to know