- Hadoop User Group: Facebook London
Our next Hadoop User Group meeting is going to be at Facebooks new office in London. We've got three speakers from Hotels.com and Facebook covering a range of topics. Plus the usual food and drinks for the evening. Details of the three talks: 1. Tools and approaches for migrating big datasets to the cloud - Adrian Woodhead and Elliot West from Hotels.com This presentation describes the journey taken by the Hotels.com big data platform team when tasked with migrating big data sets and pipelines from on-premises clusters to cloud based platforms. We present two open source tools that we built to overcome the unexpected challenges we faced. 2. Facebook disaggregated storage for Data Warehouse - Pavel Zakharov from Facebook This talk will give details on the Warm Storage service that we use to replace HDFS. We will go into details about the design choices for this from recent changes in hardware available. This will also summarise how this fits into our data centre networking setups to achieve the storage and compute tradeoffs. 3. Scribe, the Globally Distributed Message Bus at Facebook - Benjamin Leonhardi from Facebook This talk will give an overview of Scribe, the global distributed message bus which is the backbone for most log aggregation and realtime usecases at Facebook. We will describe Scribe and its tradeoffs compared to similar open source systems build on Kafka and how Scribe solved some of the most common issues like auto scaling, multi tenancy, and failover.
- November Hadoop Users Group MeetUp
Hi Folks, I am pleased to announce our next Hadoop Users Group MeetUp! Date is 15th November. 6-9pm. Venue is Olympia – West Hall at Big Data LDN. Fast Data Theatre. We have three great talks from the following companies: accelerite: A Case Study of Rapid Big Data App Development at a Media Company Evolution AI: Deduplicating Data at Scale Databricks: Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark accelerite is a Silicon Valley based company delivering secure business-critical infrastructure software for Global 1000 enterprises. Accelerite’s product suite includes hybrid cloud infrastructure, endpoint security, big data analytics, and the Internet of Things. Evolution AI develops products that read and understand human language. They build enterprise-grade AI solutions which can learn to perform useful ‘knowledge-work’ tasks without any explicit instructions. Their systems can accurately analyse information from very large volumes of text documents and are essential to any NLP work that enterprises undertake. Databricks has a mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. NOTE- ALL ATTENDEES MUST REGISTER FOR FREE TO GAIN ACCESS TO THE VENUE HERE: https://bigdatalondon.circdata-solutions.co.uk/rfg/publish/BDL17/simplereg.aspx Big Data LDN is back for a second year and is set to be larger than ever in 2017. The two day event is essential for those with businesses wanting to deliver a data-driven strategy. Get the latest updates on fast/real-time data, artificial intelligence, machine learning, GDPR, deep learning, self-service analytics and much more. The event will host leading, global data and analytics experts, ready to arm you with the tools to deliver your most effective data-driven strategy. Open to all and free to attend conference and exhibition. Discuss the big questions and share ideas with forward-thinking peers and leading members of the data community. Be in the vanguard of the data revolution, sign up to Big Data London and learn how to build a bright data-driven future for your business. You can expect over 100 expert speakers, more than 80 global exhibitors, use-cases, live demos and 5 theatres full of cutting edge content. Don't miss out; become data-driven at Big Data LDN 2017. Further info link: https://bigdataldn.com
- April Hadoop Users Group MeetUp
Folks, We are pleased to confirm details for our April MeetUp at Facebook's offices on 25th April. PLEASE NOTE: For Security and Access, you must register for this event via the follow link to ensure entry: https://hadoopmeetupinlondon.splashthat.com/ We are really excited about this MeetUp with 3 fantastic talks lined up for us from our great presenters: CORENTIN GUILLO - CEO AND FOUNDER OF BIRD.I Corentin Guillo is CEO and founder of Bird.i (hibirdi.com), a platform for up-to-date earth observation data supplied by satellite operators, drones and aircrafts - all from a scalable cloud platform accessible either through a web frontend or through APIs. Corentin is a space engineer with years of experience in the aerospace & satellite business. TYLER BULUT - SOFTWARE ENGINEER AT FACEBOOK Tyler Bulut is an engineering manager at Facebook. The teams he supports build large scale distributed systems, that are parts of Facebook’s data infrastructure platform, with the goals of supporting scheduling, resource management, and massive parallel computation for batch applications, such as data warehouse, graph analytics, and data science. The key systems his teams develop and contribute are Apache Spark, Apache Giraph, and a Facebook in-house scheduler. JERRY CHEN - SOFTWARE ENGINEER AT FACEBOOK Jerry Chen is a software engineer at Facebook, working on realtime stream processing. He initiated and leads the effort to build Stylus, a high performance, scalable and fault tolerant stream processing system at Facebook. This system has been the enabler for many mission critical stream processing applications, such as Mobile Analytics, Instagram Trending, Page Insights, Chorus, etc. He also help started Turbine, a stream processing management platform. Before that, he managed the HBase and HDFS team at Facebook. Under his lead, HBase grew from an experimental project into a critical storage system powering Messages, as well as search index, operational datastore, etc. And HDFS team is powering one of the world's largest Hadoop clusters. Jerry Chen got his MSEE degree from University of Minnesota. Given capacity and expected interest in this event, this will based on first registration basis- please use the link: https://hadoopmeetupinlondon.splashthat.com/ Look forward to seeing you next week! HUGUK Team.
- November HUGUK Meetup
Hi Folks, We are pleased to announce our November MeetUp at the London Olympia Conference Centre on the evening of November 3rd. Seminar Room 2 We can confirm our first speaker and our Sponsor Cloudera for the event - please see details below. In the meantime you can also register for the Big Data London Conference taking place on the 3rd and 4th of November at the Olympia Conference Centre - this is required for security and entry to the MeetUp. Registration is free - please see the following link for full details of the event: http://bigdataldn.com/ Presentation 1 Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data Apache Kudu is a fast new columnar data store for the Hadoop ecosystem designed to enable high-performance, flexible analytic pipelines. Being optimized for lightning-fast scans, Kudu is particularly well suited to hosting time-series data such as metrics, machine learning model-building workloads, and data warehousing applications. Despite its impressive scan speed, Kudu also supports operations supported by many traditional data stores, including real-time insert, update, and delete operations. Kudu supports a "bring your own SQL" model, and supports being queried by multiple SQL engines, including Apache Spark SQL, Apache Impala (incubating), and Apache Drill. This talk will discuss what Kudu is, why we decided to build it, what makes it fast, and an example of how it can be used for a time-series use case. Presenter: Mike Percy is a Software Engineer at Cloudera and a committer on Apache Kudu (incubating). Prior to joining Cloudera, Mike worked on big data infrastructure for machine learning at Yahoo! Mike holds a BSCS from UC Santa Cruz and an MSCS from Stanford. Presentation 2 Skool: a new open-source data integration tool for Hadoop “Skool is a data integration tool which handles the following: a) data transfer from Hadoop into a relational database (Oracle / SQL Server / MySQL / Neteeza or any JDBC compliant database) b) data transfer from a relational database into Hadoop (includes automated creation of Oozie workflows and Hive tables) c) file transfer and Hive table creation for file-based transfers into Hadoop d) automatic generation and deployment of file creation scripts and jobs from Hadoop or Hive tables It is suitable for use by data scientists for ad hoc data loads, and also for productionised regular data loads. The main benefit of Skool is that it simplifies the process for the end user and provides default configuration which avoids the need for detailed knowledge of the underlying technologies. But it is customisable for advanced users.” Presenter: Gareth Watkins, Big Data Architect in BT’s Data Analytics team We look forward to seeing you all there. HUGUK Team.
- Spark Structured Streaming in Practice and Blockchain Explained
Tonight we partner with the Big Data Week (http://london.bigdataweek.com/) (BDW) Conference and the Big Data Analytics London Meetup (https://www.meetup.com/Big-Data-Developers-in-London/) to bring you two of the hottest topics in Data Science. Please RSVP here AND register at SkillsMatter (https://skillsmatter.com/meetups/8261-datapalooza-nights-meetup). The BDW organisers are also giving our members 30% discount on tickets to the "Big Data In Use" conference on October 27th, with discount code BDW30. They also have some good group offers: get 2 tickets for £190 each, or 3 tickets for £150 each. The details and registration: http://london.bigdataweek.com/ . Spark Structured Streaming in Practice Spark Structured Streaming provides the means to express streaming computations the same way as it would be made with static data. The built-in engine is incrementally and continuously updating the final results as streaming data continues to arrive. We’ll cover how a real life implementation of Spark Structured Streaming on top of a Hadoop Cluster is helping a big online retailer to analyse clickstream data and aggregate it with customer history information. Andrei Muraru, Solution Architect at Bigstep, has designed and implemented complex big data projects for more than 4 years. Currently, he is focused on large-scale real-time implementations. He is helping customers begin their journey with big data workloads by providing meaningful insights on the products and services that are appropriate for their use case. Blockchain Explained Blockchain is a shared, replicated ledger. Its reach is wider than just crypto-currencies, as it provides the foundation for a new generation of transactional applications that establish trust and transparency, while streamlining business processes. Come along to this session to find out more about Blockchain and see it in action, and discover the future plans that IBM and the Linux Foundation are building for Blockchain. Dave Gorman has worked in the IT industry for 26 years. He is currently part of the IBM Europe technology team, specialising in blockchain, and is based at the Hursley Laboratory in Hampshire. The team is the centre of blockchain client enablement worldwide for IBM. In this role, Dave works with customers across a broad spectrum of industries and use-cases, including the Financial Services Sector, enabling them in their understanding of blockchain and how they can best make use of the technology within their respective industries. The HUGUK team
- Machine Intelligence Showcase
!!! Please sign up on Eventbrite !!! (https://www.eventbrite.co.uk/e/machine-intelligence-showcase-tickets-28253650429) Join fellow developers, startups, academics and investors at our event to highlight the breadth and depth of machine intelligence activity and talent in Europe. !!! Please sign up on Eventbrite !!! (https://www.eventbrite.co.uk/e/machine-intelligence-showcase-tickets-28253650429) Schedule: • 18:30 Arrival and networking • 19:00 Start • European Machine Intelligence Landscape (http://bit.ly/euaiscape2016medium) - Project Juno • Short Company presentations • Networking- There will be pizza! • 21:30 Close Speakers: • Mostafa Elsayad, Founder, Automata Technologies (http://www.getautomata.com/) (Affordable, Accessible Robotics) • Michael Tusch, Founder, ArtOfUs (http://www.artofus.com/) (Human Operating Systems) • Peter Ondruska, CEO, Blue Vision Labs (http://bluevisionlabs.com/) (The Future of Mobile Perception) • Sylvain Cornillon, CTO, Bossa Studios (http://www.bossastudios.com/) (AI in Video Games) • Conan McMurtrie, Machine Learning Engineer, Digital Genius (https://digitalgenius.com/)(Human + AI customer service) • David Peto, Founder, Excession (http://excession.co/)(Counter Terrorism and Security) • Simon Knowles, CTO, Graphcore (https://www.graphcore.ai/) (Accelerating Machine Intelligence) • Ondrej Urban, Data Scientist, HAL24 (http://hal24k.com/)K (Making Cities and Organisations Smarter) • Bogdan Coman, CEO, Woogie (http://www.hiwoogie.com/) (Smart Companion for Kids) Powered by: Looking forward to seeing you there, HUGUK & Project Juno
- September HUGUK Meetup
Dear HUGUK members, We have two exciting talks lined up by Google & Trifacta, and an intro of ether.camp, one of the original, leading blockchain companies in Ethereum. Details below. We would like to thank our sponsor Google. Presentation 1 - Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta As Hadoop became mainstream, the need to simplify and speed up analytics processes grew rapidly. Data wrangling emerged as a necessary step in any analytical pipeline, and is often considered to be its crux, taking as much as 80% of an analyst's time. In this presentation we will discuss how data wrangling solutions can be leveraged to streamline, strengthen and improve data analytics initiatives on Hadoop, including use cases from Trifacta customers. Bio: Olivier is EMEA Solutions Lead at Trifacta. He has 7 years experience in analytics with prior roles as technical lead for business analytics at Splunk and quantitative analyst at Accenture and Aon. Intro - ether.camp, Stephen Taylor Stephen Taylor is the community manager for Ether Camp. They provide an analysis tool for the Ethereum blockchain, ‘Block Explorer’ and also an ‘Intergrated Development Environment’ (I.D.E) that empowers developers to build, test and deploy applications in a sandbox environment. This November they are launching their second annual hackathon, hack.ether.camp which is aiming to deliver a more sustained approach to the hackathon ideology, by utilising blockchain technology. Presentation 2 - Easier, faster, more cost-effective Spark and Hadoop - James Malone, Google At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone. Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos." We look forward to seeing you at the Rainmaking Loft. The HUGUK team
- July HUGUK Meetup
Dear HUGUK members, We have three exciting talks lined up by Cognizant, Waterline Data and Bigstep. Details below. We would like to thank our sponsors Cognizant and Bigstep. Bigstep is presenting Big Data Week and offer a special 30% discount for our members - just use the discount code HUG_BDW. For more information about the BDW check out the newsletter (http://us7.campaign-archive2.com/?u=9d9f50e985ed344d2a906f5b1&id=2d2636a511&e=5a45304b4a) and the blog post (http://blog.bigdataweek.com/2016/05/26/leading-edge-talks-top-companies-big-data-week-2016-london/) presenting the line up. Presentation 1 - Building a Data Ingestion Framework for Hadoop Tackling one of the most critical components for any enterprise-grade Data Lake, this session explores the best practice approaches and lessons learnt from real world implementations in creating a robust and re-usable data ingestion framework for Hadoop. Speaker: Chris Soza, Big Data Architect Presentation 2 - Bigstep DataLab sneak peek This is a behind-the-scenes view of a new Spark-as-a-Service offering, currently in the works at Bigstep. Speaker: Cristina Grosu, Product Manager Speaker 3: Building the Successful Enterprise Data Lake As enterprises create large Hadoop clusters and ingest vast amounts of data they can find that instead of a Data Lake, they end up with a Data Swamp where a large repository of unusable data sets that are impossible to navigate and dangerous to rely on for business-critical decisions. This presentation explores some of the critical issues in setting up and managing a Data Lake and looks at the challenges and categories of tools that are crucial to success. Speaker: Rob Anderson, Waterline Data EMEA We look forward to seeing you there! The HUGUK team
- HUG UK MeetUP @ Strata+Hadoop World 2016
Hi Folks, We are pleased to confirm three great talks covering the latest and greatest from the world of Big Data and Hadoop for our upcoming HUG UK MeetUp on 1st June 2016. NOTE: The MeetUp will take place in the Capital Suite 2/3 @ExCel London PRESENTATION 1 : Introduction to Apache Kudu The Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. Despite these advances, users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also briefly describe the demo of Apache Kudu (http://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/), the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API. Presenter: Todd Lipcon, Cloudera Engineer. Todd is a committer and a PMC member on the Apache Hadoop, HBase, and Thrift projects. Prior to Cloudera, Todd worked on web infrastructure at several startups and researched novel machine-learning methods for collaborative filtering. Todd received his bachelor’s degree with honours from Brown University. PRESENTATION 2: Introduction to AtScale – the OLAP engine for Hadoop AtScale is an advanced OLAP engine built from the ground up for and on Hadoop. Data analysts can create virtual cubes from Hive metadata in minutes using the AtScale Design Center’s visual drag and drop tool running in a browser. No special clusters are required and data is never moved out of Hadoop. With AtScale, business users get interactive and multi-dimensional analysis capabilities, directly on Hadoop, at maximum speed, using the tools they already know, own and love – from Microsoft Excel to Tableau Software to QlikView. Built by Big Data Veterans from Yahoo!, Google and Oracle, AtScale has enabled BI on Hadoop at major enterprises such as Amex, Comcast and Macy’s. Presenter: Dr Nigel Geary, Product Evangelist with AtScale. Nigel has spent his entire career working with cubes and in 1991 was one of the original designers of Essbase, the first OLAP server that led Dr Ted Codd to first coin the term OLAP. Prior to joining AtScale Nigel was the founder of PrecisionPoint Software, creator of hyper-cubes for ERP systems, and was also a BI architect for the Data Lake Project at Centrica. Nigel has a PhD in engineering for polymer modelling from the University of Liverpool. PRESENTATION 3: Building an IOT platform using NiFi and Storm for Logistic Optimization Internet of Things (IOT) has been getting huge traction these days; it presents new opportunities and challenges for larger and smaller enterprises in various sectors ranging from insurance, transport, banking, healthcare, telecom , manufacturing, oil and gas to education and agriculture. Real time data with contextual local information enriched with historical knowledge is changing business models in these sectors. The success or failures of an organization depends on how they react to events that they experience while serving their customers. It is not their action but reaction that decides customer experience. Organizations use various methods to capture this customer experience to provide a measured response to an event that drive the customer to dissatisfaction. For example, Uber has been successful in providing a response to customer’s need with real time analyses of the data. IOT provides opportunity to sense the ecosystem around us to measure various factors and gauge our feelings about products or services that we may be using. It provides opportunity to take actions, possibly predictive in response to such events. In this talk we will present one of the way of building such platform which will enable us to acquire the data from various devices, ingest it and analyse with enriched datasets. It will demonstrate the use of technologies such as NiFi, Storm to build such platform with Logistics Optimization as a use case. Presenter: Prashant Bhalesain, is a Big Data Architect at WHISHWORKS. Prashant has been working in Big Data Technologies for past 4 years and has built hadoop based platforms and solutions for organizations in Telecom, Retail, Medical Devices and Banking. He holds a Masters Degree in Technology from Indian Institute of Technology, Mumbai. Looking forward to a really great night. See you there! HUGUK Team.
- Big Data Bootcamp
Hi everyone, Bigstep and HUGUK are staging a one-day big data boot camp in the heart of London. Join us on Saturday March 12 for a hands-on experience with the hottest big data technologies to date. Details below. Please also register through Eventbrite (http://www.eventbrite.com/e/big-data-boot-camp-tickets-21297576625?aff=HUG). Best, The Organisers Save the date! Saturday, March 12th, 2016 Where? etc.venues St. Paul’s (http://bit.ly/1QL3QUY) Attend one of the three big data workshops and fuel your journey from guesswork to smart work. Bigstep - Learn how to Build a Data Lake in the Cloud, grasping topics like: • Intro and general architecture • Connecting to on-premises services • Hybrid and Real-time infrastructures MapR - Get a firm grip on The Essentials of the MapR Converged Data Platform, covering: • Data Ingestion and Storage • Twitter Sentiment Analysis (Spark Streaming) • Data Exploration through Apache Drill Couchbase - Discover real-time big data and analytical insights with: • Couchbase Operations – Learn about how Couchbase operates and scales • Couchbase & BigData – Learn how Couchbase integrates into the big data landscape with Storm & Kafka • Couchbase & Real Time Analytics – Demo of a Couchbase powered travel application with analytics Agenda 10:00 - 10:30 Registration & breakfast 10:30 - 11:30 Briefing: notions & guidelines 11:30 - 12:00 Coffee break 12:00 - 13:00 Workshop sessions - part one 13:00 - 14:00 Lunch 14:00 - 15:00 Workshop sessions - part two 15:00 - 15:30 Coffee break 15:30 - 16:30 Workshop sessions - part three Are you an architect, developer, data scientist, analyst or just a newbie into the big data scene? You will definitely find a session that suits you best. Take a closer look!