Skip to content

Talk Series: Customizable RAG workflows & Natural Language to Query Language

Photo of Hao Xu
Hosted By
Hao X. and Lisa N. C.
Talk Series: Customizable RAG workflows & Natural Language to Query Language

Details

This event will be held at: 2 Marina Blvd., Bldg. C Room C205, San Francisco, CA 94123

Whether you're a seasoned data engineer or just starting out, this event will provide valuable insights and practical tips on the work we do. Come network and hear from like-minded professionals in the Bay Area who are passionate about all things data engineering!

Title: Customizable RAG workflows with your own Data
Speaker: Christy Bergman
Abstract: You’ve heard good data matters in Machine Learning, but does it matter for Generative AI applications? Corporate data often differs significantly from the general internet data used to train most foundation models. Join me for best practice tips and demo of RAG (Retrieval Augmented Generation) pipelines using Milvus vector database, LangChain, Ollama with Llama 3, Ragas RAG eval, and optional Zilliz cloud, Anyscale, Groq, OpenAI.

Title: NL2QL: natural language to query language
Speaker: Mahan Das
Abstract: Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant op- portunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi- structured data such as logs, telemetries, and time-series for big data analytics platforms. This paper introduces NL2KQL an innova- tive framework that uses large language models (LLMs) to convert natural language queries (NLQs) to KQL queries. The proposed NL2KQL framework includes several key components: the Schema Refiner which narrows down the schema to its most pertinent el- ements; the Few-shot Selector which dynamically selects relevant examples from a few-shot dataset; and the Query Refiner which repairs syntactic and semantic errors in KQL queries. Addition- ally, this study outlines a method for generating large datasets of synthetic NLQ-KQL pairs which are valid within a specific data- base contexts. To validate NL2KQL’s performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics.

---
Data Engineer Things, founded by Xinran Waibel, is a global online data engineering community for data professionals to connect and learn. Join us to grow together!

Slack: http://join.det.life Medium: https://blog.det.life Youtube Channel: https://www.youtube.com/@data-engineer-things/streams Newsletter: https://dataengineerthings.substack.com/
---

By attending this meetup, you agree to abide by our Code of Conduct (https://docs.google.com/document/d/1vcvPwERVPx_RqFHNWgBqEZBufnp-7crSwAy1TvukAE0/edit?usp=sharing). Failure to comply with our CoC may result in removal from current and future DET events.

Special thanks to Datastrato (https://datastrato.ai/) for sponsoring this meetup.

Photo of Data Engineer Things Bay Area Meetup group
Data Engineer Things Bay Area Meetup
See more events
Needs a location