Distributed Data Storage: Comparing Cassandra, HBase, ElasticSearch and GridGain


Details
Abstract:
The growth in popularity of Big Data resulted in the emergence of a variety of distributed data storage technologies. These include Hadoop-based platforms, such as HBase and Hive, sharded search platforms such as ElasticSearch, as well as distributed databases that include MongoDB and Cassandra. This plethora of choice is both a blessing and a curse for engineers. On one hand, technology stack may be selected in a way that best fits a particular organization in terms of infrastructure, processing requirements, etc. On the other hand, it is not always clear which Big Data technology is best suited for a particular application or environment. In this talk, I will describe a case study of comparing Cassandra, HBase, ElasticSearch and GridGain in-memory database in terms of its applicability to a customer problem. I will outline techniques that were used to conduct experiments and compare these databases in terms of performance, scalability, throughput and latency. While I will show the results of our experiments, the main takeaway for this talk is to communicate ideas and help develop an intuition as to how to conduct similar problem-focused comparative studies and select Big Data solutions in a way that best fits a given technology environment and business problem.
Bio:
Anton Slutsky is an experienced information technology professional with over a decade and a half of experience in the field. He has a Masters degree in Computer Science from Villanova University and is PhD Candidate at Drexel University with published research works in the area of Artificial Intelligence, Machine Learning and Data Mining. Currently, Anton leads the Data Science practice at EPAM Systems, which is a publicly traded, multinational consulting firm. Prior to his current position, Anton led engineering efforts at the Oracle and BEA Systems.

Canceled
Distributed Data Storage: Comparing Cassandra, HBase, ElasticSearch and GridGain