

What we’re about
欢迎来到IBM赞助的大数据交流社区。我们旨在为大数据开发人员,数据科学家,及所有大数据爱好者,提供一个亲手感受我们的大数据解决方案和工具的机会。
To see all meetups in this group: https://www.meetup.com/pro/ibm-community/
This is an IBM sponsored Meetup group geared towards developers, data scientists, data engineers, and ALL Big Data, Cloud and AI enthusiasts. Our meetups provide an opportunity to work hands on with the solutions and tools in our Big Data portfolio and to interact and share knowledge with experts at IBM and in our extended community.我们的每次交流聚会通常包含一个最多45至60分钟的针对某一大数据技术的介绍性报告。之后,会有约3小时的交流时间用来和与会的开发人员共同应用大数据技能。
Our Meetups typically include a 45-60 min (max) presentation that serves as an introduction and overview for a specific Big Data technology. It is followed by ~3 hours to collaborate with fellow developers and apply your Big Data skills.
我们会提供一个“免费”的大数据云平台供您用您的手提电脑上的浏览器登录使用。We provide a cloud environment that you can run through the browser of your laptop at NO cost to you.
我们的交流聚会也是“免费”的。Our meetups are FREE.
聚会议题包括,但不局限于:Meetup topics include, but not limited to:
- 基于Hadoop的数据分析 Hadoop-based analytics
- 流计算 Stream Computing
- 文本分析 Text Analytics
- 大数据可视化及探索工具 Visualization and Discovery tools for Big Data
- 大数据应用开发 Big Data App Development
- 大数据分析流程中的关键技术深探 Deep dives into the technologies that makes big data processing possible
- 大数据业界方案选例 Big Data industry solution case studies
- 任何与大数据相关的议题 Anything and everything about Big Data
敬请参与并亲身体验大数据开发和应用的乐趣!
Join us today and enjoy a hands-on software development and application experience.
Sponsors
See allUpcoming events (2)
See all- Network event161 attendees from 110 groups hosting[AI Alliance] Chat with your website using an LLMLink visible for attendees
Abstract
Imagine being able to ask questions about a website in natural language—and receiving meaningful answers instead of simple keyword matches. In this talk, I’ll introduce Allycat, an open-source, end-to-end stack that enables conversational interaction with website content using Large Language Models (LLMs).We’ll walk through the complete pipeline:
- Crawling and indexing website content
- Cleaning and extracting meaningful information from HTML
- Creating embeddings and storing them in a vector database
- Querying the data using an LLM for contextual, accurate responses
We’ll also demonstrate Allycat’s lightweight UI that allows users to interactively test their queries. The entire stack is built with Python and open-source components, making it easy to adopt, adapt, and extend.
You can checkout Allycat here : https://github.com/The-AI-Alliance/allycat
Audience
AI/ML Engineers, Data Engineers, Data Scientists interested in building intelligent, LLM-powered search and chatbot interfaces.Level
Beginner to IntermediateFormat
45-minute presentation with demonstrationAbout the speaker
Sujee Maniyam (AI Engineer, Developer Advocate @ Node51) is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups.About the AI Alliance
The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players. - Network event160 attendees from 111 groups hosting[AI Alliance] GneissWeb: Preparing High Quality Data for LLMs at ScaleLink visible for attendees
Details
IBM recently released GneissWeb, a large dataset yielding around 10 trillion tokens that caters to the data quality and quantity requirements of training Large Language Models. In this talk i will do a deep dive on the philosophy behind this dataset, where it stands w.r.t the other datasets out there, how to recreate it based on the tools IBM has open sourced and some performance figures with it. This talk will be a followup of the talk given by Shahrokh Daijavad of IBM in the month of March.Prerequisites
This is a follow up to our March 6, 2025 session “Introducing GneissWeb - a state-of-the-art LLM pre-training dataset“:- Check the GitHub show notes
- Re-watch on YouTube
About the presenter
Bishwaranjan Bhattacharjee (LinkedIn), Senior Technical Staff Member and Master Inventor, IBM ResearchAbout the AI Alliance
The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.
Past events (194)
See all- Network event251 attendees from 109 groups hosting[AI Alliance] Introducing gofannonThis event has passed