Advanced Data Acquisition with Amazon Redshift


Details
*** please register at http://sfdata-redshift.eventbrite.com ***
SF DATA presents a by-invite-only tech talk on next generation data acquisition techniques using Amazon Redshift.
Learn from...
• Yelp (http://yelp.com/)
• Amplitude (https://amplitude.com/)
• Tenjin (https://www.tenjin.io/)
• IronSource (http://www.ironsrc.com/atom)
... in highly technical deep dives on how they’re building hyper-scale data platforms.
------------------------------------------------------------------
THE SPEAKERS
• Shahid Chohan, Software Engineer - Yelp
• Jeffrey Wang, Chief Architect - Amplitude
• Nirmal Utwani, Senior Software Engineer - Amplitude
• Amir Manji, CTO - Tenjin
• Adam Ben David, VP Developer Success - IronSource
------------------------------------------------------------------
THE PLAN
5:30pm - Doors open
6:00pm - 2 min demo by IronSource
6:05pm - 3 x 15 min tech talks with Yelp, Amplitude and Tenjin
6:50pm - Food, drinks and networking (Chipotle!)
9:00pm - Lights out
------------------------------------------------------------------
TOPICS
Yelp: Streaming Messages from Kafka into Redshift in near Real-Time
Shahid will cover Yelp's real-time streaming data infrastructure, which streams MySQL updates in real-time with an exactly-once guarantee, how Yelp's data infrastructure automatically tracks & migrates schemas, processes & transforms streams, and finally how all of this data gets pushed into datastores like Redshift and Salesforce.
Amplitude: Building a Redshift Pipeline with Dynamic Schemas
Amplitude is a product analytics service that helps product and growth teams get deep insights into user behavior and drive engagement and retention. Jeffrey and Nirmal will cover one key aspect of Amplitude, which is the pipeline that loads event data into a dedicated Redshift cluster with a fully dynamic schema that is optimized for speed and convenience of queries.
Tenjin: How to Ingest Different Types of Data at Scale Into Redshift
YC-backed Tenjin provides mobile attribution, aggregation and analytics for marketers to analyze source, cost and LTV at the user level. Amir will cover how Tenjin processes different types of data (events, clicks, structured relational data) into Redshift so marketers can perform custom analyses without having to build their own infrastructure.
IronSource: Streaming 200B Monthly Events into a Data Warehouse in under a Minute
Atom Data Flow Management is a self-service data infrastructure solution by IronSource that allows you to stream data in near real-time into your data warehouse. Adam will demo how Atom’s processing layer transforms and enriches your data, helping you turn raw logs into queryable fields and rich insights.

Advanced Data Acquisition with Amazon Redshift