Unified Data Applications with Shiny on Delta Lakes

Name: Unified Data Applications with Shiny on Delta Lakes
Start: 2021-02-23T18:30:00-06:00
End: 2021-02-23T20:30:00-06:00

Hosted By

George G Vega Y. and 2 others

Unified Data Applications with Shiny on Delta Lakes

Details

Talk

Shiny is arguably the most popular framework among data scientists for building advanced data apps. Once you build the Shiny application two questions loom: how to keep the input data up-to-date, and where to host it. The answers are often related. For example, if your data is in an on-premise database, you cannot host your app in the cloud. As data grows and changes, keeping the input to the application becomes more challenging. I have seen many enterprise users, implementing a two-step architecture. They run regular batch jobs to fetch and summarize data from their data lake or data warehouse into a staging environment. The Shiny app, then, loads staged data and presents advanced analytics to end-users.

In this talk, I will show how you can use Apache Spark and the Delta Lake open source projects in your Shiny applications to directly load data from the Lake House. I will discuss how such a unified approach can remove several "moving parts" and simplify your work. I will present what the Lake House architecture is, and how R programs can interact with the Lake House using SparkR or sparkly. I will also demo examples of interactively developing and hosting simple Shiny apps that access large data on Databricks.

Speaker: Hossein Falaki

Currently, I am a staff software engineer at Databricks. I joined Databricks in December 2013 as one of the first software engineers. As an early employee, I had the opportunity to wear different hats including development, product management, data science, and field engineering. I have been presenting my work at leading industry conferences.

As a software engineer, in addition to regular software development responsibilities, I championed and implemented several key features in Databricks product and contributed to Apache Spark. These include integration with third-party visualization libraries, end-to-end implementation of R Notebooks, integration with SparkR, integration with sparklyr, programmable input widgets, and data ingest UI. I also made several contributions to Apache Spark open-source project, including the CSV data source.

As a founding member of the data science team, I built our first usage monitoring dashboards using our product and performed several deep dives and advanced analyses on topics of interest to the executive team.

http://www.falaki.net/

Schedule
6:20 Open zoom
6:30-7:30 Presentation
~7:30 Virtual Social

Code of Conduct: https://github.com/laRusers/codeofconduct

LA R Users Group
Invite yourself to our Slack group: https://socalrug.herokuapp.com/
Ask us any questions by email: larusers@gmail.com
Find our previous talks on GitHub: https://github.com/laRusers/presentations
Follow us on Twitter: @la_Rusers
Check out more events: https://socalr.org/

***** DON'T BE SHY *******
Reach out if you want to be our speaker. First-time speakers are welcome

Events in