GPU's : current state and evolution

chaired by Davide Salomoni (INFN-CNAF)

Thursday, June 18, 2009 from 10:00 to 13:30 (Europe/Rome)
at INFN - CNAF ( Asinelli )

Via Ranzani 13/2 40127 Bologna - Italy

Description

In this meeting Bill Dally, chief scientist and vice president of NVIDIA, Bell Professor of Engineering at Stanford University, will share with us his views on the current state and evolution of throughput-optimized processor, like Graphic Processing Units (GPU's). Before Bill's talk we'll have a few short presentations on GPU-based tests and applications.

For directions on how to reach CNAF, see here

Go to day

Thursday, June 18, 2009

10:00 - 10:15 Introduction 15'

Speakers:	Dr. Davide Salomoni (INFN-CNAF), Vincenzo Nuti (E4 Computer Engineering S.p.A.)
Material:	Slides

10:15 - 11:00 Experience with GPU-based applications 45'

CUDA-based paralog filtering in Expressed Sequence Tag clusters 15'

Determining the connection between genotypes and complex diseases in humans is particularly difficult. The genome sequencing of many animal species may help to better understand the genetical/molecular basis of complex diseases in humans. Single Nucleotide Polymorphisms (SNPs) are most widely used markers for complex diseases mapping. The SNPs discovery in incomplete genomes makes use of a large set of Expressed-Sequence-Tags (EST). However, SNPs discovery is affected by artifacts  during the cluster assembling (paralogs, sequencing errors and pseudogenes). We then designed a newly developed filtering procedure based on Expectation-Maximization algorithm, for finding and removing paralogs in EST clusters previosly assembled using genomic anchors from different species. 
We implemented the method on a CUDA programming environment without specific optimizations and we compared the performance between NVIDIA TESLA and the TESLA-TEST CPU. The results obtained (without any optimization on the GPU code)  show that the  NVIDIA TESLA code  can be 22 times faster than on a conventional CPU. This finding is very promising and indicates that time-consuming biological applications can take advantage of the GPU computation.

Speakers:	Dr. Piero Fariselli (Biology Dept., University of Bologna), Dr. Raffaele Fronza (Biology Dept., University of Bologna)
Material:	Slides

Implementing a second order electromagnetic particle in cell code on the CUDA architecture 15'

We present a CUDA implementation of an electromagnetic particle in cell code, focusing on the  simulation of laser plasma interaction. In particular, the talk will be addressed to the parallelization issues and strategies that involve particle to grid interpolation. Other projects and demos implemented on the CUDA platform will be presented.

Speakers:	Francesco Rossi (Physics Dept., University of Bologna), Carlo Benedetti (Physics Dept., University of Bologna), Andrea Sgattoni (Physics Dept., University of Bologna)
Material:	Slides

Spin-Glass Monte Carlo Simulations 15'

Spin models (among them, spin glasses) are relevant in many areas of condensed-matter physics. They describe systems characterized by phase transitions  (such as the para-ferro transition in magnets) or by 
``frustrated'' dynamics, appearing when the complex structure of the energy landscape of the system makes the  approach to equilibrium very slow. Recent developments in many-core processor architectures, like GPU, seem to open new opportunities in this field, where computational  requirements are so demanding that state-of-the-art simulations often use dedicated computing systems.

Speaker:	Fabio Sebastiano Schifano (INFN-Ferrara and University of Ferrara)
Material:	Slides

11:00 - 11:45 The End of Denial Architecture and the Rise of Throughput Computing 45'

Most modern processors are in denial about two critical aspects of machine organization: parallel execution and hierarchical memory organization. These processors present an illusion of sequential execution and uniform, flat memory. The evolution of these sequential, latency-optimized processors is at an end, and their performance is increasing only slowly over time. In contrast, the performance of throughput-optimized processors, like GPUs, continues to scale at historical rates. Throughput processors embrace, rather than deny, parallelism and memory hierarchy to realize their performance and efficiency advantage compared to conventional processors. Throughput processors have hundreds of cores today and will have thousands of cores by 2015. They will deliver most of the performance, and most of the user value, in future computer systems.
This talk will discuss some of the challenges and opportunities in the architecture and programming of future throughput processors. In these processors, performance derives from parallelism and efficiency derives from locality. Parallelism can take advantage of the plentiful and inexpensive arithmetic units in a throughput processor. Without locality, however, bandwidth quickly becomes a bottleneck. Communication bandwidth, not arithmetic is the critical resource in a modern computing system that dominates cost, performance, and power. This talk will discuss exploitation of parallelism and locality with examples drawn from the Imagine and Merrimac projects, from NVIDIA GPUs, and from three generations of stream programming systems.

Speaker:	Prof. Bill Dally (NVIDIA)
Material:	Slides

11:45 - 12:15 Discussion 30'