Come for Table discussions, Member Self-Intro, What's New, Application Showcase, and Advanced Application Development Techniques! Exchange ideas, meet experts, share code... all HPC & GPU, all practical, all cutting-edge.
General Discussion: 6:15-6:50pm: What’s new and first-time attendee intros
7:00-7:50pm: Streaming similarity search over one billion tweets (Dr. Narayanan Sundaram, Intel Research)
In recent years, adding support to databases to identify similar objects or finding nearest neighbors has become an important operation on databases, with applications to text search, multimedia indexing, and many other areas. One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kd-trees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm for finding similar objects.
We show that on a workload where we perform similarity search on a dataset of > 1 Billion tweets, with hundreds of millions of new tweets per day, we can achieve query times of 1–2.5 ms. We show that this is an order of magnitude faster than existing indexing schemes, such as inverted indexes. To the best of our knowledge, this is the fastest implementation of LSH, with table construction times up to 3.7× faster and query times that are 8.3× faster than a basic implementation.
Carnegie Mellon Silicon Valley;
NASA Research Park Bldg 23;
Mountain View, CA 94043;
Directions to Carnegie Mellon Silicon Valley;
Google Map showing parking, check point, and building entrance;
NOTE: You will need a government issued ID (e.g. Driver's License) to enter NASA Research Park