Agenda for HUG Bucharest Meetup on 29.01 at 19h EET
Talk 1 - 'Big Data Tools & Mobile Apps'
Learn more about how to track, collect and analyze mobile usage data with the help of Flume & Hadoop.
The presentation will be given by Cornel Balaban, Avira's Mobile Development Manager.
Talk 2 - 'Windows Secure Hadoop Clusters'
Since the newly released Hadoop 2.6 is possible to deploy secure Hadoop clusters on the Windows platform. This allows for integration with Windows domains, configure authentication, authorization and permissions for HDFS and clusters queues using the existing enterprise Active Directory, single sign-on for job submission, cluster monitoring and HDFS access, Hadoop infrastructure task isolation and all the other goodies expected from a well integrated enterprise security solution.
Remus Rusanu works with the Microsoft SQL Server development team since 2001 and since 2012 has made Microsoft sponsored contributions to Hive and Hadoop. He authored YARN-1972 and YARN-2190 that enable deployment of secure Hadoop clusters on Windows.
Talk 3 - 'Couchbase 3.0: a new Java Client Library'
With Couchbase 3.0 also new client libraries were released. The 2.x version of the Java client library is not just an improved 1.x one, moreover it has a complete new ‚look and feel’.
David Maier (Senior Solutions Engineer at Couchbase) will explain why it means the next generation of database access, what are the key concepts behind it and how to use it in your Java applications.
Agenda for HUG Bucharest Meetup on[masked] at 19h EET [revised[masked]]
Talk 1 - 'Elastic Search'
Cristi Toader will give a presentation about Couchbase and Elasticsearch, showing strengths and weaknesses for both of them. Focus points of this talk will be:
• Couchbase easily handles massive volumes of data requests; it is nevertheless hard to use Couchbase for data management (the process is to slow)
• For data interpretation it is recommended to use Hadoop but for real time results, Elasticsearch has proven to be the best choice.
• Couchbase can automatically sync with Elasticsearch which is perceived as an external cluster.
• Practical exercise: what happens when you lose your data. Other Use case scenarios for data storage and research
• Live demo: run Couchbase and Elasticsearch clusters to obtain real time statistics.
Talk 2 - 'Data for Big Data Applications - Special Scaling Issues'
Calin Burloiu & Marius Ionescu's presentation covers two essential topics related to scaling Big Data applications:
• Special scaling issues from doing analyses on Petabyte sized clusters
• Data pipeline: the roundtrip path from front-end consumer web applications to HDFS and Hbase and back
Talk 3 - 'Couchbase 3.0'
David Maier (Senior Solutions Engineer at Couchbase) will take you through top improvements and detail how Couchbase has simplified your life with great availability, scalability, security and developer features in 3.0.
Couchbase Server 3.0 and Couchbase Server SDK 2.0 have hundreds of improvements and many hidden gems. Example benefits are a faster indexing, optimizations for massive data sets, a better resource utilization, incremental backups, a better Ultra High Availability and security improvements.
Couchbase has engineered Couchbase Server 3.0 to meet the requirements of enterprises that are replatforming to support multiple use cases, increase efficiency, and improve business agility in a world of big data, mobile, social interaction, cloud infrastructure, and the Internet of Things.
We're happy to announce our inaugural August meet up. Let's get together, learn about each other's interests and fire up some talks!
At Avira Romania we've developed several applications on the Hadoop stack that are powering millions of customer interactions per hour. For our first meet-up I'd propose we present a couple of our current consumer use-cases.
My employer is also sponsoring the event with location, drinks & food.
• Calin Burloiu, Couchdoop
• Corneliu Balaban, Soft Authentication
1.) Couchdoop by Calin Burloiu
Couchdoop is a Hadoop connector for Couchbase which is able import, export and update data. The connector can be used both as a command line tool, which works with CSV files from HDFS, and as a library for MapReduce jobs, for more flexibility. The library provides a Hadoop InputFormat which is able to read data from Couchbase by querying a view and an OutputFormat which can store in Couchbase key-value pairs read from any Hadoop source. The OutputFormat also allows other useful operations like deleting, counting or changing the expiry of some documents. Couchdoop can be used to update some existing Couchbase documents by using data from other Hadoop sources. Imagine a recommendation system which stores item scores in Couchbase documents. After rerunning a machine learning algorithm over user events data from Hadoop the scores from Couchbase can be updated directly. Couchdoop aims to be a better alternative for the official Couchbase Sqoop connector which is only able to import a full bucket or to stream documents for a configurable amount of time.
2.) Soft Authentication by Corneliu Balaban
Soft Authentication (SAUTH) is a large scale backend application that authenticates and manages a company’s users, the products and devices they are using while offering complete anonymity and privacy with anonymous tokens. Using Java, CouchBase, Flume and Hadoop for persisting the user data we are able not only to authenticate the users or create user profiles in realtime but simultaneously to identify the devices that they are using our products on in order to deliver them maximum security. SAUTH supports tens of thousands of operations per second and provides maximum flexibility for enriching and serving customer profiles (due to a schemaless database) directly from it’s in-memory database to our company’s consumer-facing applications. SAUTH also has various fuzzy matching algorithms enabling it to make user mulitiple-device (any type of device) mapping at runtime or determine whether an unregistered user is actually a known registered one.