Compiler benchmark: gcc clang pcc tcc
This is a collection of compiler benchmarks.
- Timed Apache Compilation 2.2.11
- Timed PHP Compilation 5.2.9
- Timed ImageMagick Compilation 6.6.3-4
- Apache Benchmark 2.2.11 — Static Web Page Serving
- 7-Zip Compression 9.04 — Compress Speed Test
- LAME MP3 Encoding 3.98.2 — WAV To MP3
- Gcrypt Library 1.4.4 — CAMELLIA256-ECB Cipher
- OpenSSL 1.0.0a — RSA 4096-bit Performance
- C-Ray 1.1 — Total Time
- Bullet Physics Engine 2.75 — Test: 3000 Fall
- Bullet Physics Engine 2.75 — Test: 1000 Stack
- Bullet Physics Engine 2.75 — Test: 1000 Convex
- Bullet Physics Engine 2.75 — Test: Convex Trimesh
- John The Ripper 188.8.131.52 — Test: Blowfish
- Timed HMMer Search 2.3.2 — Pfam Database Search
NVIDIA GeForce GTX 460 – Der Jäger stellt sich vor – Tests …
Beide Modelle agieren mit hoher Leistung in der 200-US-Dollar-Region. Ganz offensichtlich haben die Boardpartner bei diesen Grafikkarten schon unmittelbar zum Start hin großen Spielraum. Referenzmodelle bemustert NVIDIA nämlich nicht und die Boardpartner rühmen sich, von Anfang an eigene Designs und übertaktete Versionen zu zeigen.Wir dürfen heute gleich zwei Top-Modelle der Hersteller Gainward und MSI präsentieren. Die Gainward GeForce GTX 460 Golden Sample mit 1024 MByte und der MSI N460GTX Cyclone mit 768 MByte Hauptspeicher. Beide Grafikkarten sind von Hause aus schon übertaktet.
Andrew Corrigan: Porting Large Fortran Codebases to CUDA
A converter was developed to automatically port the code FEFLO to GPUs.
Full GPU performance
• Port ~1 million lines of code (~11,000 parallel loops)
• Continue development in Fortran using established coding practices.
• A single, unified codebase.
Using a Python script
• O(1000) line Python script based on FParser
• Developed in a few months.
• Generates an optimized, running code.
• Does much more than translate loops in isolation.
• Generates CUDA kernels from existing OpenMP and vector loops.
• Tracks array usage across the entire code.
• By far the most difficult task.
• Many other tasks.
CULA is a GPU-accelerated linear algebra library that utilizes the NVIDIA CUDA parallel computing architecture to dramatically improve the computation speed of sophisticated mathematics.
CULAtools™ is EM Photonics’ product family comprised of CULA™ Basic, Premium, and Commercial. CULA is our GPU-accelerated implementation of the LAPACK numerical linear algebra library, containing several of the most popular LAPACK functions. After developing accelerated linear algebra solvers since 2004 for our clients, EM Photonics partnered with NASA Ames Research Center in 2007 to extend and unify these libraries into a single, GPU-accelerated package. Through a partnership with NVIDIA®, we focused on developing a commercially available implementation of accelerated LAPACK routines. Our primary goal is to help a wide range of users experience computational performance previously available only on supercomputers. By leveraging NVIDIA’s CUDA™ architecture, CULA provides users linear algebra functions with unsurpassed performance.
swan – Multiscale Laboratory
What is it?Swan is a small tool that aids the reversible conversion of existing CUDA codebases to OpenCL. It does several useful things:
Translates CUDA kernel source-code to OpenCL.
Provides a common API that abstracts both CUDA and OpenCL runtimes.
Preserves the convenience of the CUDA <<>> kernel launch syntax by generating C source-code for kernel entry-point functions.
It can also be usefully used for compiling and managing kernels written directly for OpenCL.
Why might you want it?
Possible uses include:
Evaluating OpenCL performance of an existing CUDA code.
Maintaining a dual-target OpenCL and CUDA code.
Reducing dependence on NVCC when compiling host code.Support multiple CUDA compute capabilities in a single binary
A runtime library for managing OpenCL kernels for new development
David J. Hardy at Beckman Institute
Ph.D., Computer Science, University of Illinois at Urbana-Champaign, 2006M.S., Computer Science, University of Missouri-Rolla, 1997
B.S., Mathematics and Computer Science, Truman State University, 1994
Numerical methods for molecular dynamics
GPU acceleration of molecular modeling applications
Software development for molecular dynamics
NAMD – Scalable Molecular Dynamics
NAMD, recipient of a 2002 Gordon Bell Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of processors on high-end parallel platforms and tens of processors on commodity clusters using gigabit ethernet. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR. NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Our tutorials show you how to use NAMD and VMD for biomolecular modeling.The 2005 reference paper Scalable molecular dynamics with NAMD has over 1000 citations as of March 2010.
Klaus Schulten speaks on GPUs in molecular simulation
How does the H1N1 “Swine Flu” virus avoid drugs while attacking our cells? What can we learn about solar energy by studying biological photosynthesis? How do our cells read the genetic code? Computational biology is approaching a new and exciting frontier: the ability to simulate structures and processes in living cells. Come learn about the “computational microscope,” a new research instrument that scientists can use to simulate biomolecules at nearly infinite resolution. The computational microscope complements the most advanced physical microscopes to guide today’s biomedical research. This speech will introduce the computational microscope, showcase the widely used software underlying it, and highlight major discoveries made with the aid of the computational microscope ranging from viewing protein folding, translating the genetic code in cells, and harvesting solar energy in photosynthesis.
The Landscape of Parallel Computing Research: A View from …
Old conventional wisdom:
Power is free, but transistors are expensive.
Multiply is slow, but load and store is fast.
We can reveal more instruction-level parallelism (ILP) via compilers and architecture innovation.
Don’t bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
Uniprocessor performance doubles every 18 months.
High Performance Data Mining
I am an associate professor in School of Information Sciences and Engineering, Graduate University of Chinese Academy of Sciences. I have a joint position in Fictitious Economy and Data Science Research Center, Chinese Academy of Sciences. I graduated from Electrical and Computer Engineering Department of Northwestern University with my Ph.D. degree in 2005. My Ph.D. advisor was Prof. Alok Choudhary.My research interests are data mining, parallel computing, performance evaluation, stream mining, data modeling, databases and data warehousing.