The focus of this Meetup group is to provide free monthly community data events.
The Data Science Festival is a global data community. We aim to connect the data world and foster the sharing of knowledge, inspiration and ideas.
The global network is dedicated to free education through grassroots technical events. we will cover the latest topics that matter most to data scientists and data engineers. There will be no demos that you can learn from a book or video, instead our speakers will be discussing real world problems, what works, what doesn’t work and why they’ve implemented the solutions they have. They will generate lively discussion and debate, offering real-world take-aways to help you in your job.
Who is the Data Science Festival for?
• Data engineers, analysts, scientists, and other practitioners
• Academics, founders, researchers, authors
• R, Python and other software engineers who work with data or want to learn
• Data visualisation developers and designers
• Non-technical team leads, executives, and other decision makers from data centric startups and large companies looking to utilise open source tools
Spark optimisation: building an efficient Lakehouse with Databricks.
Join DSF in June for our Women in Data Talks. Speakers from this sector share their stories, projects, joys, trials, and tribulations over the course of June 2021. Come and listen to these amazing companies and ask questions to learn more about this growing industry.
Ticket Allocation Process: Registering here guarantees you a ticket for the Data Science Festival Virtual Event on June 24th, 2021. Please ensure to add this session to your schedule in order to receive the joining URL links.
Summary: Spark optimisation: building an efficient Lakehouse. Apache Spark is a unified analytics engine for large-scale data processing. A lakehouse is a new, open architecture that combines the best elements of data lakes and data warehouses. Lakehouses are enabled by a new open and standardized system design: implementing similar data structures and data management features to those in a data warehouse, directly on the kind of low cost storage used for data lakes. In this talk we’ll cover how to use Spark in the most efficient way: how writing an optimised Spark jobs can reduce run time and costs building a strong and future-proof foundation for the lakehouse. We’ll discuss the topics like partitioning of data, choosing the optimal spark configuration, and main pitfalls to avoid.
Speaker: Oleksandra Bovkun- Solutions Architect at Databricks
Bio: After obtaining her master’s degree in applied mathematics, Oleksandra started as a researcher in the R&D department of an energy company. Once she finished her research, she continued her career as software developer, database architect, and data engineer. Spark was always one of the core technologies she worked with including large scale Spark optimisation project and implementing data platform with Spark on Kubernetes. After joining Databricks she helps customers to implement Data and AI projects and enable them to run Spark application in a more efficient way.