Skip to content

Webinar "From Docs to Tables: Generating Structured Data with LLMs"

Photo of Iryna Pidkovych
Hosted By
Iryna P.
Webinar "From Docs to Tables: Generating Structured Data with LLMs"

Details

To access this webinar, please register here: https://hubs.ly/Q01WTR1t0

Topic: “From Docs to Tables: Generating Structured Data with LLMs”

Speaker#1: Geoffrey Angus, ML Engineer at Predibase
At Predibase, Geoff is working on a number of LLM related solutions including RHLF tools and information extraction. Prior, Geoff worked at Google Research on the Perception team. While there, he implemented, trained, and deployed large multi-modal models for Image Search and Google Lens. Geoffrey holds a Bachelor's and Master's in Computer Science from Stanford University, where he conducted machine learning research on weak supervision and computer vision for medical imaging applications.

Speaker#2: Wael Abid, ML Engineer at Predibase
At Predibase, Wael developed systems for leveraging LLMs for information extraction, automated testing and observability for model training robustness, and recommendation systems to assist users during model building. Previously, he was on the Siri Search team at Apple, working on multilingual search relevance. Wael holds a Bachelor's and Master's in Computer Science from Stanford University where he conducted research on low-resource machine translation and retrieval-augmented Q&A.

Speaker#3: Jeffery Kinnison, ML Engineer at Predibase
Jeffery is part of the machine learning engineering team at Predibase. Previously he was a researcher at University of Notre Dame where he focused on machine learning and computer vision with applications in neuroscience and the digital humanities. He also published research on distributed hyperparameter optimization.

Abstract:
From customer emails to PDFs, every organization has mountains of unstructured text, and buried within are insights that can inform decision-making. Many teams have started using LLM-powered search and Q&A systems to retrieve insights from their unstructured data. While these systems are good for ad-hoc Q&A, they are not optimized for large-scale production-grade analytics use cases.

Better results can be obtained by using an LLM to convert documents into tables via large batch jobs for downstream analytics and ML use cases—directly on top of a warehouse like Snowflake. Join this webinar to learn how to easily generate insights from unstructured data by customizing an LLM with open-source declarative ML framework, Ludwig, and Predibase.

In this live session and tutorial, you will learn:
- Why you can achieve better results for analytical queries by using LLMs to generate structured data
- How to define a schema of data to extract from a large corpus of PDFs using an LLM
- What it takes to customize an use open-source LLM to construct new tables with source citations
- How to visualize and run predictive analytics on your extracted data with Ludwig and Predibase

Join us to ask your questions live during our Q&A.

ODSC Links:
• Get free access to more talks/trainings like this at Ai+ Training platform:
https://hubs.li/H0Zycsf0
• ODSC blog: https://opendatascience.com/
• Facebook: https://www.facebook.com/OPENDATASCI
• Twitter: https://twitter.com/_ODSC & @odsc
• LinkedIn: https://www.linkedin.com/company/open-data-science
• Slack Channel: https://hubs.li/Q01VgLrB0
• Code of conduct: https://odsc.com/code-of-conduct/

Photo of ODSC Edinburgh Data Science group
ODSC Edinburgh Data Science
See more events