
What we’re about
A group for experienced and aspiring data professionals.
Join our Slack: https://datatalks.club/slack.html
Upcoming events
5
•OnlineData Engineering Zoomcamp 2026 Course Launch
OnlineAlexey Grigorev, the course creator, will officially start the new cohort of the Data Engineering Zoomcamp in this live session. He’ll walk you through the course structure, key topics, and what you’ll build.
What You’ll Learn During the Session
Alexey will walk you through:
- What’s included in the course: topics, tools, and hands-on projects
- How assignments, feedback, and scoring work
- How to approach the material, even if you’re new to ML
- What it’s like to learn together with thousands of learners in the DataTalks.Club community
You’ll also have a chance to ask Alexey your questions live.
Thinking About AI Dev Tools Zoomcamp?
Data Engineering Zoomcamp is a free 9-week course covering infrastructure setup, workflow orchestration, data warehousing, analytics, batch processing, and streaming. The last three weeks focus on a capstone project in which you'll build an end-to-end data pipeline using a dataset of your choice, demonstrating data lake and warehouse solutions with documentation. Projects are peer-reviewed by fellow participants.
The new cohort of the Data Engineering Zoomcamp starts on January 12, 2026. You can join by registering here.
About the Speaker
Alexey Grigorev is the Founder of DataTalks.Club and creator of the Zoomcamp series.
Alexey is a seasoned software and ML engineer with over 10 years of engineering experience and 6+ years in machine learning. He has deployed large-scale ML systems at companies like OLX Group and Simplaex, authored several technical books, including Machine Learning Bookcamp, and is a Kaggle Master with a 1st place finish in the NIPS'17 Criteo Challenge.
Join our slack: https://datatalks.club/slack.html47 attendees
•OnlineHow to Reduce LLM Hallucinations with Wikidata: Hands-On Fact-Checking Using MCP
OnlineLLMs are powerful, but they still hallucinate facts, especially when asked about entities, relationships, or claims that require up-to-date or structured knowledge.
In this hands-on workshop, we'll explore how to use Wikidata as a grounding and fact-checking layer for LLMs to reduce hallucinations and make AI systems more reliable.
We'll start with a short introduction to Wikidata and then set up the Wikidata MCP so an LLM can retrieve and verify facts rather than relying solely on its internal memory. This already provides a practical way to ground LLM outputs in verifiable data.
From there, we’ll go beyond LLM-only approaches and build a small experimental fact-checking pipeline. The system combines semantic retrieval, LLM-based reranking, and natural language inference (NLI) to validate claims against evidence in a more controlled and interpretable way.
This workshop focuses on evidence-driven verification pipelines that make LLM's reasoning steps explicit and easier to inspect, debug, and improve.
What we'll cover:
- Wikidata as a structured source for factual verification
- Setting up and querying Wikidata using MCP
- Verifying claims with MCP + an LLM
- Moving beyond pure GenAI to evidence-based fact-checking
- Finding relevant Wikidata statements with semantic search
- Ranking candidate evidence with an LLM
- Verifying claims using an NLI model
What you'll leave with
By the end of the workshop, you'll be able to:
- Ground LLM outputs in structured data to reduce hallucinations
- Understand when LLM-only fact-checking is not enough
- Build a small, transparent fact-checking pipeline you can adapt to real projects
About the speaker:
Philippe Saadé is the AI/ML project manager at Wikimedia Deutschland. His current work focuses on making Wikidata accessible to AI application with projects like the Wikidata vector database and the Wikidata Model Context Protocol.
Join our Slack: https://datatalks.club/slack.html
This event is sponsored by Wikimedia40 attendees
Past events
362



