Past Meetup

High-Level Languages: GPU Join Processing and GPU Scripting and Code Generation

This Meetup is past

95 people went

Details

Come for Table discussions, Member Self-Intro, What's New, Application Showcase, and Advanced Application Development Techniques! Exchange ideas, meet experts, share code... all HPC & GPU, all practical, all cutting-edge.

Agenda:

General Discussion:

6:15-6:50pm: What’s new and first-time attendee intros

Main Program:

7:00-7:50pm: Let your GPU do the heavy lifting in your Data Warehouse (Dr Rene Mueller, IBM Almaden)

8:00-8:30pm: GPU Scripting and Code Generation with PyCUDA (Dr Bryan Catanzaro, NVIDIA)

Let your GPU do the heavy lifting in your Data Warehouse

While the abundance of compute resources made Graphics Processors (GPUs) an attractive platform for supercomputing workloads, an analogue argument can be made for its superior memory bandwidth and data-intensive workloads like data warehousing. However, making the GPU a first class citizen for complex query processing environments, in particular, efficiently handling data sets orders of magnitude larger than GPU memory, requires careful consideration regarding data placement, layout, and movement.

In this talk I will present a GPU-based query processing prototype developed at IBM Research Almaden. At the heart of this GPU solution is an efficient implementation of join and group-by operators. The bottleneck of hash joins, by far the most common join implementation, is its inherently random memory access pattern, during hash table creation as well as probing. When comparing the performance of CPU and GPU memory access, we find that the GPUs provide an order of magnitude higher bandwidth to the GPU-local device memory.

This suggests that the GPU is a good candidate to speed up hash operations when the hash tables can be placed into device memory. The largest hash tables observed from the Star Schema benchmark, a derivative TPC-H, at scaling factor 1,000 with 750 GB of raw data, were less than 500 MB and easily fit onto device memory. I will elaborate on the data placement between host and GPU and on how the implementation is gradually improved until joins can be processed at greater than 90% hardware efficiency of a PCI Express link. The result is a single-node solution able to execute queries that touch more than 100 GB of data in less than 30 seconds.

Speaker Bio:

Rene Mueller is a Research Staff Member at IBM Research -- Almaden in San Jose. His research interests are hardware/software co-design, hardware acceleration for high-performance data management using field-programmable gate arrays (FPGAs) and graphics processors, as well as, NUMA systems. He received his PhD and MSc in Computer Science from ETH Zurich where he was working on data stream processing in wireless sensor networks and data stream acceleration on FPGAs.

GPU Scripting and Code Generation with PyCUDA

High-level scripting languages are in many ways polar opposites to GPUs. GPUs are highly parallel, subject to hardware subtleties, and designed for maximum throughput, and they offer a tremendous advance in the performance achievable for a significant number of computational problems. On the other hand, scripting languages such as Python favor ease of use over computational speed and do not generally emphasize parallelism. PyCUDA is a package that attempts to join the two together. This chapter argues that in doing so, a programming environment is created that is greater than just the sum of its two parts.

We would like to note that nearly all of this chapter applies in unmodified form to PyOpenCL, a sister project of PyCUDA, whose goal it is to realize the same concepts as PyCUDA for OpenCL.

Speaker Bio:

Dr. Bryan Catanzaro is a research scientist at NVIDIA. His work focuses on tools and programming methodologies for parallel processors, especially GPUs. He recently earned his PhD from UC Berkeley, under the direction of Kurt Keutzer, where he built the Copperhead language and compiler. He created the GPUSVM and Damascene libraries for Support Vector Machine training and high-quality image contour detection, and has written several articles on OpenCL.

Location:

Open Space;
Carnegie Mellon Silicon Valley;
NASA Research Park Bldg 23;
Mountain View, CA 94043;

Directions (http://www.cmu.edu/silicon-valley/about-us/directions.html) to Carnegie Mellon Silicon Valley;

Google Map (http://maps.google.com/maps/ms?gl=us&hl=en&ie=UTF8&msa=0&ll=37.410941,-122.063169&spn=0.019191,0.048923&t=h&z=15&msid=215438781255871976989.00049cacf6f0e5596e5cc) showing parking, check point, and building entrance;

NOTE: You will need a government issued ID (e.g. Driver's License) to enter NASA Research Park