Skip to content

Details

Stan Srednyak will present online on February 23, on Why and How to Build a Distributed Search Engine.

Abstract: We describe the architecture of a distributed decentralized search engine. There are multiple challenges that must be overcome by such construction. They include: building things at web scale and processing petabytes of data in coherent, controlled and distributed way, combatting various attacks lead by adversarial participants that can try to disrupt the network protocols, overwhelm the nodes with spurious requests, manipulate ranking system, create fraudulent advertisement transactions, etc. We explain in some detail how to build a system that overcomes these challenges.

In this system, the usual operations associated with web search - data ingestion, index construction, ranking, query service - are performed by a set of nodes that interact according to a specified protocol and according to the roles (tied to data segments) assigned to them by a network of manager nodes. We perform size, latency and cost analysis of deploying such systems. The operations of the network are backed by blockchains. We discuss mechanisms to prevent node collusion and manipulation of rank. It is based on introducing randomness in the cluster construction by randomness servers. This is the only trust element of the system. The incentive for the node maintainers to participate in this computation comes from an advertisement system, which operates similar to the usual one (including the price bidding mechanism) but allows for blockchain backed distribution of funds to the node owners that participated in the service of the transaction.
We hope that this development will contribute to the creation of a decentralized internet of the web3 epoch. We will also discuss creation of a personal and enterprise search bot ecosystem for the management of data created in the functioning of such entities.

Time permitting, I will discuss architectures for training modern language models (Transformers, models with attention, BERT,...) in a distributed fashion.

Bio: Stan Srednyak is a mathematical physicist by education and by profession, working mostly in particle and nuclear physics and related areas of pure mathematics. In addition, he worked for more than 10 years on various problems in Natural Language Processing, Computational Linguistics, Knowledge Graph Architecture, Information Retrieval, etc. He is a strong proponent of open source code, asynchronous decentralized development and transparency. Blockchain revolution generated new ideas that show the possibility to decentralize businesses and other vertical human structures, including web search industry and information handling at web scale, the topics he is currently working on.

--
By responding here, you acknowledge and consent to our Code of Conduct: We seek to provide a respectful, friendly, professional experience for everyone regardless of gender, sexual orientation, physical appearance, disability, age, race, and religion. We do not tolerate behavior that is harassing or degrading to any individual, in any form. Participants are responsible for knowing and abiding by these standards. We encourage all attendees to assist in creating a welcoming, safe, and respectful experience.

We are grateful for meetup support provided by Arria NLG, providing AI that transforms structured data into natural language; Basis Technology, building AI solutions for analyzing text, connecting data silos, and discovering digital evidence; Kensho, AI & machine learning driving essential intelligence; and John Snow Labs, publisher of the Spark NLP, an open source text processing library for Python, Java, and Scala.

Related topics

Artificial Intelligence
Deep Learning
Natural Language Processing
Big Data
Linguistics

You may also like