Skip to content

Big Data Analytics with Apache AsterixDB

Photo of Michael Fahy
Hosted By
Michael F.
Big Data Analytics with Apache AsterixDB

Details

Please join us for our November meeting which will be held via Zoom webinar. You will need to register for this webinar at:
https://acm-org.zoom.us/webinar/register/5515997844539/WN_BHuLS-saSb6xJPyMqAj8Cg

Abstract:

Apache AsterixDB is a Big Data Management System (BDMS) with a feature set chosen to target use cases such as web data warehousing and social media data analysis. Its notable features include:

  • A NoSQL-style data model based on extending JSON with object database concepts;
  • A declarative query language, SQL++, that supports a broad range of queries against multiple semi-structured datasets;
  • A query optimizer for parallel queries and an efficient dataflow execution engine for partitioned-parallel query execution;
  • Partitioned and LSM-based native storage and indexing for large datasets;
  • Support for querying of external data (e.g., data on AWS S3) as well as natively stored data;
  • Rich data type support, including numeric, textual, temporal, and simple spatial data;
  • Basic NoSQL-like transactional capabilities.

This talk will provide a brief user-level overview of the system, which in current industrial terminology might be classified as a "parallel NoSQL document database system". It will then dive into how Apache AsterixDB's SQL++ language can be used to query and analyze large volumes of semistructured (JSON) data. "NoSQL" does NOT mean NoQueries!

Speaker Biography:

Michael Carey received his B.S. and M.S. degrees from Carnegie-Mellon University and his Ph.D. from the University of California, Berkeley. He is currently a Bren Professor of Information and Computer Sciences and Distinguished Professor of Computer Science at UC Irvine, where he leads the AsterixDB project, as well as a Consulting Architect at Couchbase, Inc. Before joining UCI in 2008, he worked at BEA Systems for seven years and led the development of their AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble. He is an ACM Fellow, an IEEE Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests center around data-intensive computing and scalable data management (a.k.a. Big Data).

Co-sponsors

This event is co-sponsored by the IEEE Orange County Computer Society and the Los Angeles Chapter of the ACM.

Photo of Orange County ACM Chapter group
Orange County ACM Chapter
See more events