imperial.ac.uk: Jeff Cash

imperial.ac.uk: Jeff Cash

Research Interests:

General research interests: numerical analysis
Particular research interests: numerical solution of ODEs Boundary value problems and initial value problems. Geometric Integration.
Click here for the book Solving Differential Equations in R by Karline Soetaert, Jeff Cash and Francesca Mazzia.
Click here to see a list, containing our book, of notable computing books and articles of 2012.
Click here for the BVP software page.
Click here for the IVP software page.
Click here for IVP software – BSD licenced.
Click here for the geometric integration software page. Contains MATLAB and Fortran 90 codes.
Click here for MATLAB Software for Initial Value Problems.
Click here for the Non-Stiff Equations software page. This gives the Cash-Karp Runge-Kutta code in Fortran, Matlab and C.
Click here for the OdePkg software.
Click here for the Fortran 95 version of the IVP software MEBDFI.f.

lotsofcores.com: High Performance Programming for Intel Xeon Phi Coprocessors | Intel Xeon Phi Coprocessor High Performance Programming

lotsofcores.com: High Performance Programming for Intel Xeon Phi Coprocessors | Intel Xeon Phi Coprocessor High Performance Programming
The world’s fastest computer, for the third time in a row on biannual Top500 list, uses Intel Xeon Phi coprocessors to make it possible. Intel Xeon Phi coprocessors are used in the #1, #7, #15, #39, #50, #51, #65, #92, #101, #102, #103, #134, #157, #186, #235, #251 and #451 systems. EXPLICIT VECTORIZATION – A TALK ABOUT THE NEED FOR THIS, GIVEN AT SGIUG ON APRIL 30, 2014

cuda – Converting Octave to Use CuBLAS – Stack Overflow

cuda – Converting Octave to Use CuBLAS – Stack Overflow
I was able to produce a compiled executable using the information supplied. It’s a horrible hack, but it works. The process looks like this: First produce an object file for fortran_thunking.c sudo /usr/local/cuda-5.0/bin/nvcc -O3 -c -DCUBLAS_GFORTRAN fortran_thunking.c Then move that object file to the src subdirectory in octave cp /usr/local/cuda-5.0/src/fortran_thunking.o ./octave/src run make. The compile will fail on the last step. Change to the src directory. cd src Then execute the failing final line with the addition of ./fortran_thunking.o -lcudart -lcublas just after octave-main.o. This produces the following command g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast -Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -I/usr/local/cuda/include -o .libs/octave octave-main.o ./fortran_thunking.o -lcudart -lcublas -L/usr/local/cuda/lib64 ../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so ../liboctave/.libs/

High-Precision Software Directory

High-Precision Software Directory
This web site (see software package links below) contains the LBNL double-double precision, quad-double precision and arbitrary precision (also termed "multiprecision" or "multiple precision") software, which was written over a period of several years by David H. Bailey (LBNL), Yozo Hida (U.C. Berkeley), Xiaoye S. Li (LBNL) and Brandon Thompson (formerly of U.C. Berkeley, now at Cadence). Some additional application programs were provided by Karthik Jeyabalan (formerly at LBNL, now at Cornell), and some revised versions have been provided by Alex Kaiser (U.C. Berkeley). Here are links to some of the authors’ websites:

Dongarra+Hinds: Unrolling Loops in FORTRAN

Dongarra+Hinds: Unrolling Loops in FORTRAN
The technique of ‘unrolling’ to improve the performance of short program loops withoutresorting to assembly language coding is discussed. A comparison of the benefits of loop

‘unrolling‘ on a variety of computers using an assortment of FORTRAN compilers is

presented.

KEY WORDS Unrolled loops FORTRAN Loop efficiency Loop doubling

INTRODUCTION

It is frequently observed that the bulk of the central processor time for a program is

localized in 3 per cent of the source code.6 Often the critical code from the timing perspective

consists of one (or a few) short inner loops typified, for instance, by the scalar product of

two vectors. A simple technique for the optimization of such loops, with consequent

improvement in overall execution time, should then be most welcome. ‘Loop unrolling’ (a

generalization of ‘loop d~ubling’),~ applied selectively to time-consuming loops, is just

such a technique.