Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.
In this session, I will go through Hive , we will insert some real time tweets to one table. Join another table from structured database to this tweet table.
I will go through Presto installation and configuration from scratch, describing the architecture.
Then we will use Presto to query the joined data we created earlier, and I will show you that there is no MapReduce involved, unlike Hive queries !
Now lets use a business intelligence reporting tool to access that data through Presto JDBC driver.
In summary, you will know how to get your results from petabyte scale data faster.
As usual, this will probably the very first discussion about Presto in our area. And please be the first to master it too.
All are welcome and thanks for your support.