Intel® Xeon Phi™ Coprocessors
What is the Intel® Xeon Phi™ coprocessor?
Intel® Xeon Phi™ coprocessors are PCI Express* form factor add-in cards that work synergistically with Intel® Xeon® processors to enable dramatic performance gains for highly parallel code—up to 1.2 double-precision teraFLOPS (floating point operations per second) per coprocessor.
Manufactured using Intel’s industry-leading 22nm technology with 3-D Tri-Gate transistors, each coprocessor features more cores, more threads, and wider vector execution units than an Intel Xeon processor. The high degree of parallelism compensates for the lower speed of each core to deliver higher aggregate performance for highly parallel workloads.
What applications can benefit from the Intel Xeon Phi coprocessor?
While a majority of applications (80 to 90 percent) will continue to achieve maximum performance on Intel Xeon processors, certain highly parallel applications will benefit dramatically by using Intel Xeon Phi coprocessors. To take full advantage of Intel Xeon Phi coprocessors, an application must scale well to over 100 software threads and either make extensive use of vectors or efficiently use more local memory bandwidth than is available on an Intel Xeon processor. Examples of segments with highly parallel applications include: animation, energy, finance, life sciences, manufacturing, medical, public sector, weather, and more. Learn more about Intel® Many Integrated Core Architecture (Intel® MIC Architecture) development.
Cilk Home Page | CilkPlus
Why Use it? Intel© Cilk™ Plus is the easiest, quickest way to harness the power of both multicore and vector processing. What is it? Intel Cilk Plus is an extension to the C and C++ languages to support data and task parallelism.
lotsofcores.com: High Performance Programming for Intel Xeon Phi Coprocessors | Intel Xeon Phi Coprocessor High Performance Programming
The world’s fastest computer, for the third time in a row on biannual Top500 list, uses Intel Xeon Phi coprocessors to make it possible. Intel Xeon Phi coprocessors are used in the #1, #7, #15, #39, #50, #51, #65, #92, #101, #102, #103, #134, #157, #186, #235, #251 and #451 systems. EXPLICIT VECTORIZATION – A TALK ABOUT THE NEED FOR THIS, GIVEN AT SGIUG ON APRIL 30, 2014
Homepage for James Demmel
Professor of Mathematics and Computer Science
I am a doctoral candidate in Computer Science. My advisor is Prof. James Demmel. I have BS and MS in Applied mathematics and Physics from Moscow Institute of Physics and Technology. Publications 2011. Alcantara et al. Building an Efficient Hash Table on the GPU, in GPU Computing Gems Jade edition, 39–54. 2009. Datta et al. Auto-tuning the 27-point stencil for multicore, 4th International Workshop on Automatic Performance Tuning (iWAPT). 2008. Volkov and Demmel. Benchmarking GPUs to tune dense linear algebra, SC08. 2008. Datta et al. Stencil computation optimization and autotuning on state-of-the-art multicore architectures, SC08. 2008. Garland et al. Parallel computing experiences with CUDA, IEEE Micro 28, 4, 13–27. 2008. Volkov and Kazian. Fitting FFT onto the G80 Architecture, CS 258 final project report, University of California, Berkeley. 2008. Volkov and Demmel. LU, QR and Cholesky factorizations using vector capabilities of GPUs, Technical Report No. UCB/EECS-2008-49, EECS Dep…
Linux Kernel Modules Installation HOWTO
Compiler Speed-up If your machine has 16 or more Megabytes of RAM, there is a useful speed-up that can be done, which is to permit the kernel to compile two or modules in parallel. This will increase the load on the machine whilst the kernel is being recompiled, but will reduce the time during which the compilation will be taking place. Before you can use this method, you need to check the amount of RAM present in your machine, as if you set this too high, the compilation will actually slow down. Experience has shown that the optimum value depends on the amount of RAM in your system according to the following formula, at least for systems with up to 32 Megabytes of RAM, although it may be a little conservative for systems with larger amounts of RAM: N = [RAM in Megabytes] / 8 + 1
How to compile & recompile Linux kernel in Ubuntu (generic way)
You need some applications to compile the linux source code. You can install the required applications by the following code: 1 2 sudo apt-get update sudo apt-get install build-essential initramfs-tools If you have downloaded a compressed source like .tar.bz2, extract it first. You can extract it anywhere, you don’t need to put the source code in any specific folder. To compile linux kernel for the first time, open a terminal & go to the source directory in the terminal using “cd”. The source directory should contain some folders like “arch”, “block”, “crypto”, etc. Now type the following commands: 1 2 3 4 5 6 7 mkdir ../linux-build yes ”|make –jobs=`getconf _NPROCESSORS_ONLN` O=../linux-build config make –jobs=`getconf _NPROCESSORS_ONLN` O=../linux-build sudo make –jobs=`getconf _NPROCESSORS_ONLN` O=../linux-build modules_install install cd /boot sudo mkinitramfs -o initrd.img-18.104.22.168 22.214.171.124 sudo update-grub This will take a lot of time.