lrz.de: SuperMUC Tuning

lrz.de: SuperMUC Tuning

Tuning is the last and a fundamental process in software development. However, improving the performance and efficiency of parallel and serial programs is not an easy task. Specific and powerful tools are available to help you to analyze and to deal with performance bottle necks. The user is encorauge to read about, to select and to apply the proper tools and mechanisms that are best suited for her own needs.
Table of contents
Optimizing Compilers
Loading the tools
Information
Timing & Profiling.
Using Hardware Perfomance Counters
MPI, OpenMP, Vectorization, SIMD
Memory Leaks
Optimizing Compilers
Optimization with Intel Compilers
Most important compiler options
Pfeil nach oben
Loading the tools
Information about avail tools and version: module avail
Loading the appropriate tools: module load tool or tool/version
Pfeil nach oben
Information
HWLocThe hardware locality toolset provides command line tools as well as a programming interface for identifying and controlling resources and resource mappings for threaded execution.likwid-topologyModern computers get more and more complicated. They consist of multiple cores and each core can support multiple hardware threads. Because cores share caches and main memory access it is important to pin threads to dedicated cores. To decide this it is important to know a machines topology. likwid-topology extracts this information from the cpuid instruction.
Pfeil nach oben
Timing & Profiling.
Timing commands and Timing functionsTimers can be used to measure the total run time of an application. Different implementations are available on the UNIX and Linux systems. Some subroutines are also available to be called within your code to measure specific sections.gprofgprof calculates the amount of time spent in each routine. The effect of called routines is incorporated in the profile of each caller. The profile data is taken from the call graph profile file which is produced by compiling/linking the executable with -pg.Profile Guided OptimizationMain purpose of profile guided optimization is to re-order instructions in an optimal way. The instrumented executable is run one or more times with different typical data sets. The dynamic profiling information is merged, and the combined information is used to generate a profile-optimized excecutable.
Pfeil nach oben
Using Hardware Perfomance Counters
Intel Amplifier XEThe Intel Amplifier XE (formerly VTune) analyzer collects, analyzes, and displays hardware performance data from the system-wide view down to a specific function, module, or instruction.LIKWIDLikwid (Like I knew what I am doing) provides easy to use command line tools for Linux to support programmers in developing high performance multi threaded programs.IBM High Performance Computing Toolkit (hpccount)Report summary hardware performance counters and resource usage statistcs.PAPIPAPI (Performance Application Programming Interface) aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events.PerSyst ReportPerfomance propertires are collected by the PerSyst Monitoring system at LRZ. Using the web API or/and the console command line tool, user friendlly Interfaces are presented.
MPI, OpenMP, Vectorization, SIMD
Vampir NGVampir is the State-Of-The-Art Tool for tracing parallel programs based on MPI, OpenMP or CUDA, and serial programs. It is designed to provide accurate trace information of MPI and user function calls. The user interface and parallel processing of tracing data makes Vampir NG the most powerful tool for tracing. It includes the capability for performance-counter analysis based on PAPI.Intel Tracing ToolsThe Intel Tracing Tools support the development and tuning of programs parallelized using MPI. By using these tools you are able to investigate the communication structure of your parallel program, and hence to isolate incorrect and/or inefficient MPI programming. The Trace Collector is a set of MPI tracing libraries, and the Trace Analyzer provides a GUI for analysis of the tracing data.Intel Inspector Inspector allows you to perform correctness checking on multi-threaded applications (running in shared memory).Intel Amplifier Intel Amplifier XE (formerly VTune) allows you to perform performance analysis on multi-threaded applications (running in shared memory). The analyzer also collects, analyzes, and displays hardware performance data from the system-wide view down to a specific function, module, or instruction.Intel AdvisorAdvisor allows you to identify optimization potential in your code (both multi-threadig and SIMD vectorization)ScalascaScalasca is an open-source project developed by the Jülich Supercomputing Centre which focuses on analyzing OpenMP, MPI and hybrid OpenMP/MPI parallel applications. Scalasca can be used to help identify bottlenecks by providing a number of important features: profiling and tracing of highly parallel programs; automated trace analysis that localizes and quantifies communication and synchronization inefficiencies; flexibility and integration with PAPI hardware counters for performance analysis.IPMIPM is a portable profiling infrastructure for parallel C and Fortran programs. It provides a low-overhead profile of the performance aspects and resource utilization. Communication, computation, and IO are its primary focus. Hardware counter are based on PAPI.mpiPIt is a lightweight and scalable MPI profiling library exclusively for MPI applications. It collects statistical information with minimal overhead. The trace data is small in ASCII and human-readable format. MarmotMarmot is a MPI correct checker. It automatically checks the correct usage of MPI functions and their arguments. It can identify deadlocks, wrong ordering of messages, wrong MPI types, etc.GuideviewGuideView is a tool that displays the performance details of an OpenMP program’s parallel execution.
Pfeil nach oben
Memory Leaks
MemoryScapeThis tool provides a subset of Totalview functionality to detect memory leaks.ValgrindFor finding memory leaks, measuring memory consumption as well as identifying performance bottlenecks.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s