Openwall HPC Village

Openwall HPC Village

HPC Village from Openwall is an opportunity for HPC (High Performance Computing) hobbyists alike to program for a heterogeneous (hybrid) HPC platform. Participants are provided with remote access (via the SSH protocol) to a server with multi-core CPUs and HPC accelerator cards of different kinds – Intel MIC (Xeon Phi), AMD GPU, NVIDIA GPU – as well as with pre-installed and configured drivers and development tools (SDKs).

We provide within one machine access to the mentioned four types of computing devices, including OpenCL support for all of them, as well as support for development tools and usage models specific to some of them (OpenMP on CPU, OpenMP offload from CPU to MIC, CUDA on NVIDIA GPU). Although it is uncommon to use more than two types of computing devices within one node in real-world HPC setups, such configuration is convenient for getting acquainted with the different technologies, for trying out and comparing them on specific tasks, and for development of portable software programs (including debugging and optimization).

Hardware

The current hardware configuration is as follows:

Supermicro GPU SuperWorkstation 7047GR-TPRF workstation/server platform with MCP-290-00059-0B rackmount rail set
4U chassis
Two 1620W PSUs 1)
Dual socket 2011 motherboard with IPMI, 16 memory sockets, four PCIe 3.0 x16 slots for full-length dual-width PCIe cards and a fifth slot for a shorter card
A full set of cooling fans, including those pulling hot air out of passively-cooled accelerator cards
Two 8-core Intel Xeon E5-2670 CPUs
Sandy Bridge-EP, support AVX and AES-NI
A total of 16 CPU cores seen as 32 logical CPUs (two hardware threads per core), at a clock rate of at least 2.6 GHz
Turbo boost to up to 3.0 GHz with all cores in use or 3.3 GHz with few cores in use
128 GB DDR3-1600 ECC RAM
8x 16 GB DDR3-1600 ECC Registered modules on 8 channels (4 channels per CPU)
Theoretical bandwidth 102.4 GB/s, actual measured bandwidth ~85 GB/s (cumulative from 32 threads)
Intel Xeon Phi 5110P coprocessor module
Intel Many Integrated Core (MIC) architecture, Knights Corner
60 cores (x86-ish with 512-bit SIMD units) seen as 240 logical CPUs (four hardware threads per core), 1053 MHz, 8 GB GDDR5 ECC RAM on a 512-bit bus, 320 GB/s
Peak performance of about 2 TFLOPS single-precision, 1 TFLOPS double-precision
AMD Radeon HD 7990 gaming graphics card
AMD GCN architecture
Two “Tahiti” GPUs, which provides 2×2048 SPs, 6 GB GDDR5 RAM on two 384-bit buses, 576 GB/s
Custom core clock rate: 501 MHz for GPU0 (heavily underclocked), 997.5 MHz to 1050 MHz for GPU1 (almost same as HD 7970 GE) 2)
Peak performance of over 6 TFLOPS single-precision, about 1.5 TFLOPS double-precision
This is a budget replacement for the FirePro S10000 GPU card intended for servers (which would cost at least 3 times more, but would offer ECC RAM)
NVIDIA GTX TITAN gaming graphics card (Zotac GeForce GTX TITAN AMP! Edition)
NVIDIA Kepler architecture
One GK110 GPU with 2688 SPs at 902 MHz to 954 MHz in single-precision mode, 6 GB GDDR5 RAM on a 384-bit bus, 317.2 GB/s
Peak performance of over 5 TFLOPS single-precision, from 1.3 to 1.5 TFLOPS double-precision in the corresponding mode
This is a budget replacement for the TESLA K20X GPU card intended for workstations and servers (which would cost at least 3 times more and would run considerably slower at single-precision and integer code, but would offer ECC RAM)
NVIDIA GTX Titan X gaming graphics card (reference design, manufactured by Gigabyte)
NVIDIA Maxwell architecture
One GM200 GPU with 3072 SPs at 1000 MHz to 1076 MHz, 12 GB GDDR5 RAM on a 384-bit bus, 336 GB/s
Peak performance of over 6 TFLOPS single-precision, 0.2 TFLOPS double-precision
AMD Radeon HD 5750/6750 gaming graphics card marketed as “PowerColor Radeon HD 6770 Green Edition (AX6770 1GBD5-HV4)”, one half of a HD 5850
AMD TeraScale 2 (VLIW5) architecture
One Juniper PRO GPU with 720 SPs at 700 MHz, 1 GB GDDR5 RAM on a 128-bit bus, 73.6 GB/s
A short card that fits into this motherboard’s 5th dual-width PCIe slot
Not a high performance card, but usable for testing/benchmarking on the old VLIW5 architecture, such as to avoid performance regressions for users with older cards like this (HD 5000 and 6000 series up to and including 6870)
Peak performance of over 1 TFLOPS single-precision
Total peak performance is over 20 TFLOPS single-precision, about 4 TFLOPS double-precision.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s