This is a group for everyone interested in building applications using Apache Hadoop and other open-source, big data technologies.

If you are interested in speaking at a meetup, please contact us (bigdataappmeetup@gmail.com).

Come and learn how to apply big data technologies to solve real world problems!

Meetup topics are focused on use cases, building end-to-end solutions, and making different technologies work together. The topics include technical presentations from open-source projects, open-source vendors and open-source users building big data applications. Topics include:

• Describing the technology behind a specific use-case (e.g. HBase at Flipboard)

• Making the best use of a project/technology (e.g. Spark performance tuning)

• Integrating different technologies (e.g. Using Apache Kafka as a reliable distributed message queue)

• Introducing new projects/technologies in the space (e.g. Introducing Apache Flink; CDAP is now open-source!)

• Evolution of existing projects/technologies (e.g. What's new in Cassandra 2.0?)

All meetups are recorded, and videos and presentations of the meetups are available here: bdam.io

Contact us (bigdataappmeetup@gmail.com) for sponsoring or hosting a future meetup.

Apache Airflow Community Meetup

Online event

Drift Bio: The Future of Microbial Genomics with Apache Airflow

In recent years, the bioinformatics world has seen an explosion in genomic analysis as gene sequencing technologies have become exponentially cheaper. Tests that previously would have cost tens of thousands of dollars will soon run at pennies per sequence. This glut of data has exposed a notable bottleneck in the current suite of technologies available to bioinformaticians. At Drift Biotechnologies, we use Apache Airflow to transition traditionally on-premise large scale data and deep learning workflows for bioinformatics to the cloud, with an emphasis on workflows and data from next generation sequencing technologies.

Data Lake Management Community Meetup

Online event

gRPC integration and its applications in Hive Metastore

gRPC is a modern open source high performance RPC framework providing advanced features such as authentication, service mesh support, and streaming. Dataproc Metastore (DPMS) on GCP is integrating Hive Metastore with gRPC as an access path in addition to Thrift. This talk will introduce the design of gRPC integration, and how it enables new DPMS features such as Cloud Run endpoint, find-grained IAM, and metastore federation.

Add this event to your calendar: https://bit.ly/39cspge

Beam Summit 2021

Online event

