Skip to content

Details

Talk

Shiny is arguably the most popular framework among data scientists for building advanced data apps. Once you build the Shiny application two questions loom: how to keep the input data up-to-date, and where to host it. The answers are often related. For example, if your data is in an on-premise database, you cannot host your app in the cloud. As data grows and changes, keeping the input to the application becomes more challenging. I have seen many enterprise users, implementing a two-step architecture. They run regular batch jobs to fetch and summarize data from their data lake or data warehouse into a staging environment. The Shiny app, then, loads staged data and presents advanced analytics to end-users.

In this talk, I will show how you can use Apache Spark and the Delta Lake open source projects in your Shiny applications to directly load data from the Lake House. I will discuss how such a unified approach can remove several "moving parts" and simplify your work. I will present what the Lake House architecture is, and how R programs can interact with the Lake House using SparkR or sparkly. I will also demo examples of interactively developing and hosting simple Shiny apps that access large data on Databricks.

Speaker: Hossein Falaki

Currently, I am a staff software engineer at Databricks. I joined Databricks in December 2013 as one of the first software engineers. As an early employee, I had the opportunity to wear different hats including development, product management, data science, and field engineering. I have been presenting my work at leading industry conferences.

As a software engineer, in addition to regular software development responsibilities, I championed and implemented several key features in Databricks product and contributed to Apache Spark. These include integration with third-party visualization libraries, end-to-end implementation of R Notebooks, integration with SparkR, integration with sparklyr, programmable input widgets, and data ingest UI. I also made several contributions to Apache Spark open-source project, including the CSV data source.

As a founding member of the data science team, I built our first usage monitoring dashboards using our product and performed several deep dives and advanced analyses on topics of interest to the executive team.

http://www.falaki.net/

Schedule
6:20 Open zoom
6:30-7:30 Presentation
~7:30 Virtual Social

Code of Conduct: https://github.com/laRusers/codeofconduct

LA R Users Group
Invite yourself to our Slack group: https://socalrug.herokuapp.com/
Ask us any questions by email: larusers@gmail.com
Find our previous talks on GitHub: https://github.com/laRusers/presentations
Follow us on Twitter: @la_Rusers
Check out more events: https://socalr.org/

***** DON'T BE SHY *******
Reach out if you want to be our speaker. First-time speakers are welcome

Members are also interested in