- Hadoop Data Security with Apache Ranger
Enterprises are adopting Hadoop as the center of the modern data driven architecture. Customers are expecting advanced data security and governance to be embedded within Hadoop and related projects. Apache Ranger provides centralized security administration for Hadoop ecosystem projects. With the recent release, Apache Ranger now provides centralized access control and auditing for many Hadoop related projects including Hadoop, Hive, HBase, Storm, Kafka, Solr, Yarn and Knox. Ranger has also introduced a services based stack which can be easily extended by partners to extend centralized security framework across many applications. About the Speaker Biren Saini is a Senior Solutions Engineer and Governance SME Lead at Hortonworks. He’s been with Hortonworks for over a year helping organizations install, configure, tune and secure their Hadoop clusters. Biren has 15 years of technology experience & during this time worked on different platforms / technologies - Big Data Apps, Mobile Apps, Web Apps, Cloud Computing, Systems, Networking etc.
- Designing Scalable Data Pipelines with Apache NiFi
Designing Scalable Data Pipelines with Apache NiFi An increasing number of companies are embarking on the journey to becoming truly data-driven; this profound change presents its own unique challenges. The velocity and diversity of new datasets is compelling organizations to search for new approaches for reliable data ingestion, whether these system are Hadoop, a data warehouse, or some new-fangled NoSQL database. For years, enterprises have been struggling to create dataflows between diverse systems dealing with both internal and external business data within their infrastructure. This talk will cover a new project in the Apache ecosystem, NiFi. Throughout the talk you will learn how NiFi has greatly improved the overall efficiency of data ingestion here on the data platform team at Cloudera. Apache NiFi is a new project that aims to make architecting mission-critical dataflows as simple as designing a flow chart. NiFi’s core concepts are borrowed from a programming paradigm known as flow-based programming, a paradigm that has been around since the 1970s. Although NiFi is a brand new open source project, it has been used in production by the National Security Agency for several years. One of the primary benefits of using NiFi is the significant boost to data agility, that is, the ability to ingest new datasets within minutes, as opposed to hours or days. With NiFi, security, monitoring, and fault-tolerance are first class citizens, giving you the confidence that production pipelines will continue working, even if failures occur. In this talk we will cover how NiFi implements the core concepts behind “flow-based programming.” Using a real world example, we will also cover how you can port custom internal code to run within NiFi. Finally, you will learn how even non-programmers can create dataflows using a robust web interface designed from the ground up. You should walk away from this talk with a good understanding of how you can utilize NiFi to automate dataflows between the various systems within an enterprise. About the Speaker Ricky Saltzer is a local data engineer for Cloudera's internal data platform team. His team is responsible for architecting scalable ingestion pipelines for a multitude of datasets, such as support data, business data, and internal machine data. Ricky has been using Hadoop for over three years, and is a contributor to multiple open source big data projects. The data platform team he's on makes extensive use of many big data technologies (e.g. HBase, Impala, Kafka,NiFi).
- RTP Geriatric Caregiver Solutions Hackathon $4,500 in prizes
Saturday & Sunday, April 11-12, 2015 Quintiles 4820 Emperor Boulevard, durham, NC (edit map) Join Northwest AHEC, NCHICA and Quintiles for a healthcare Hackathon! We will be teaming up and designing tech solutions aimed at helping caregivers of elderly with dementia in their home. We are inviting students and professionals with expertise in computer science, healthcare technology and caregiving to team up and problem solve using six future methods of coordination. The winning team will walk away with $3,000. Student registration is only $10 including all meals and snacks. It will be a great weekend with food trucks, photo booths, interviews and fun. Please ask those interested register at http://northwestahec.org/etcreg and plan to join us!
- Hive on Spark
Hive on Spark is a project to bring together the Hive and Spark communities, and provide a better performing and more robust version of Hive. Cloudera, Intel, MapR, Databricks, and IBM are working together to bring the project to users. In this talk we'll discuss the motivations, technical overview, implementation challenges, and a demo. About the Speaker Szehon Ho is a software engineer in Cloudera and a Hive committer, based in Palo Alto, CA. Prior to this, he was a principal software engineer in Informatica, gaining experience in enterprise software development and the field of data integration. He holds a BS in EECS from UC Berkeley.
- TriHUG Social + Lightning Talks
We'll have good food, drinks, prizes and excellent talks from the community. This was one of our best events last year! If you're interested in presenting, please send us a topic. You will need to keep presentations to less than 10 mins (including questions!). If you are interested in sponsoring refreshments or prizes, please let us know. Hope to see you all there.
- Azure Dev Camp
This is not necessarily JUST for devs. I would recommend this session for IT Pros, Architects, Developers and IT Executives. I'll be teaching along with a few colleagues and MVP's. Its an all day free training with food. Register with this link: http://aka.ms/clouddevcamps We are covering a TON of material from automation to analytics to the app platform to security and compliance. There will be coffee, food, snacks etc. Nov 18 9am-4:30pm 8055 Microsoft Way, Charlotte NC
- ORM for HBase
The Kite Software Development Kit (Apache License, Version 2.0), or Kite for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem. In this talk, we'll get an introduction to the Kite project and its generic data API. The second half will go into detail about using Kite as an ORM layer for HBase. About the Speaker: Ryan Blue has spent the past 3 years working to make the Hadoop stack easier to use. He currently works on Kite, a data API for Hadoop.
- Special Guest: Doug Cutting
TriHUG October will feature special guest speaker Doug Cutting from Cloudera. Please make sure to RSVP for this event since we will most likely be at capacity. About the Speaker Doug Cutting is the founder of numerous successful open source projects, including Lucene, Nutch, Avro, and Hadoop. Doug joined Cloudera in 2009 from Yahoo!, where he was a key member of the team that built and deployed a production Hadoop storage and analysis cluster for mission-critical business analytics. Doug holds a Bachelor’s degree from Stanford University and sits on the Board of the Apache Software Foundation.
- Nikhil Kumar (SyncSort) on Converting SQL to MapReduce
According to Gartner – 70% of all data warehouses are performance and capacity constrained, and often ELT workloads can drive up to 80% of database capacity. What happened? The original vision was the Enterprise Data Warehouse (EDW) is the single, consistent version of the truth - but the modern day reality is an EDW architecture that consists of thousands of lines of SQL code and ELT “pushdown” data transformations happening primarily in the EDW and staging layers. Organizations have created constrained Data Warehouses with spaghetti like architectures resulting in the inability to get value from one of their biggest investments. With constant upgrades to the EDW just to keep the lights on, companies are now under immense pressure to alleviate the resulting budget and performance problems. The good news is that for the first time, there’s a completely new approach that is not only scalable but also economically feasible. When comparing the costs to manage data, Hadoop is orders of magnitude cheaper than the EDW, and that’s why it’s become a very disruptive force in the data landscape. But where do you start? Nikhil Kumar will present SyncSort’s EDW offload framework and SILQ, a unique first of its kind utility to make that transition between legacy SQL and Hadoop as smooth as possible. Speaker: Nikhil Kumar, Technical Product Manager, SyncSort
- Rethinking SQL for Big data – Don’t compromise on flexibility or performance
Can I reduce the time to value for my business users on Hadoop data? How can I do SQL on semi-structured types? How do I create and manage schemas for my data when the applications are changing fast? What types of distributed systems problems do I have to solve when you move beyond traditional MPP scale to Hadoop scale? Overall, a new way of thinking is needed to bring end-to-end agility with the BI/Analytics environments operating on Hadoop/NoSQL data. Along with the table stakes requirements to support broad eco system of SQL tools, close attention must be paid to the new requirements such as working with flexible and fast changing data models, semi-structured data and achieving low latencies on the scale of ‘big’ data. This session will cover how Apache Drill is driving this audacious goal to bring Instant, Self Service SQL natively on Hadoop/NoSQL data without compromising either the flexibility of Hadoop/NoSQL systems or the low latency required for BI/Analytics experience. It covers the exciting architectural challenges the Apache Drill community is working with, progress made so far and the roadmap. Speaker: Keys Botzum is Senior Principal Technologist with MapR Technologies, where he wears many hats. His primary responsibility is interacting with customers in the field, but he also teaches classes, contributes to documentation, and works with engineering teams. He has over 15 years of experience in large scale distributed system design. Previously, he was a Senior Technical Staff Member with IBM, and a respected author of many articles on the WebSphere Application Server as well as a book. When not wearing one of his MapR hats, Keys enjoys time with friends and family, and getting outside to play tennis and hike.