Streaming-OODT was originally conceived to overcome the limitations of traditional big-data processing and management systems. It is based on an open source data processing framework called OODT (Object Oriented Data Technology) and was funded by NASA Jet Propulsion Laboratory’s Big Data Research & Technology Development initiative “Archiving, Processing and Dissemination for the Big Data Era”. The vision behind the project is to combine state-of-the-art technologies into an easy-to-use big-data processing system prepackaged to allow users to quickly process big-data without the need to patch together individual technologies.
Streaming-OODT provides both traditional batch processing as well as in-memory MapReduce processing for use on general computing clusters. Cluster management and multi-tenancy is provided via Apache Mesos, which manages batch processing as well as the Streaming-OODT’s underlying technologies. This ensures that multi-tenancy is applied to both the system and the user’s processing.
Apache Spark provides in-memory MapReduce processing enabling processing at speeds hundreds of times faster than Hadoop MapReduce. This system is augmented by Apache Kafka used to manage streaming data. This enables the user to process streaming data alongside traditional data in Apache Spark and thus tackle data-sets too large to persist en-masse to disk, while not losing the ability to process data sets that already exist on disk.
Tachyon, an in-memory distributed file system, provides lightning-fast distributed access to data files and streams on all nodes of the cluster. Persistence is provided by Hadoop Distributed File System (HDFS) thus allowing the user both fast data access and persistence to disk.
The purpose of this talk is to demonstrate Streaming-OODT, which will enable the audience to use Streaming-OODT and supporting technologies to quickly tackle their own big-data problems. The talk will introduce Streaming-OODT, show how to quickly install and configure the system, explain the value added by the underlying technologies, and walk through a working example of big-data processing. Finally, benchmarks will be presented so that the audience can see the benefit of these technologies and their combination.
Speaker: Michael Starch ----
SGVLUG is one of the oldest and most active Linux User Groups in the Greater Los Angeles area. In addition to Linux, the group also shares interests in other free and open source software, all forms of technology, and the discussion of issues that arise with the these new tools, such as privacy rights. SGVLUG attracts members from throughout LA County including Pasadena, Glendale, Burbank, and eastward throughout the San Gabriel Valley. Our members include software developers, system administrators, hardware engineers, and software users of all levels of experience. Many work in the technology field as employees, contractors or consultants, and enjoy the learning and networking opportunities available from the group. We also have many members that serve as volunteers of their time and skills at various local events, including the annual Southern California Linux Expo (SCaLE).
Join us for dinner and presentations. Dinner begins and 7 pm and any presentations will start after most people have recieved their food or 8 pm, whichever comes first. Parking for Burger Continental is off of California Blvd just west of Lake Ave in the Pavillions parking lot.