Blog by Dan Luu

Blog by Dan Luu

I started out working on flash memory and optics, and then moved up one level to CPUs. I was lucky enough to land at Centaur, a small company that gave me a lot of freedom, and I ended up doing RTL, ucode, verification, bringup, test, and pretty much everything else you can do on a CPU. After that, I worked on hardware/software co-design to speed up a problem domain for Google. I’m off to start a new job soon, but I’m not big on announcing things in advance, so that’s all I’m going to say about that. If you’re so inclined, you can check out my github, linkedin, and resume, but that just has a bunch of details.

Advertisements

11th International Conference on Parallel Processing and Applied Mathematics in Krakow

11th International Conference on Parallel Processing and Applied Mathematics in Krakow

September 6-9, 2015, Krakow , Poland

WLPP 2015 is a full-day workshop to be held at the PPAM 2015 focusing on high level programming for large-scale parallel systems and multicore processors, with special emphasis on component architectures and models. Its goal is to bring together researchers working in the areas of applications, computational models, language design, compilers, system architecture, and programming tools to discuss new developments in programming Clouds and parallel systems. The workshop focuses on any language-based programming model such as OpenMP, Intel TBB and Ct, Microsoft .NET 4.0 parallel extensions (TPL and PPL), Java parallel extensions, HPCS languages (Chapel, X10 and Fortress), Unified Parallel C (UPC), Co-Array FORTRAN (CAF) and GPGPU language-based programming models such as CUDA. Contributions on other high-level programming models and supportive environments for parallel and distributed systems are equally welcome.

devblogs.nvidia.com: Drop-in Acceleration of GNU Octave

devblogs.nvidia.com: Drop-in Acceleration of GNU Octave
A well-known trick to skip the time consuming rebuilding step is to dynamically intercept and substitute relevant library symbols with high performing analogs. On Linux systems LD_PRELOAD environment variable allows us to do exactly that. Now we will try OpenBLAS, built with OpenMP and Advanced Vector Extensions (AVX) support. Assuming the library is in LD_LIBRARY_PATH, ‘OMP_NUM_THREADS=20 LD_PRELOAD=libopenblas.so octave ./sgemm.m‘ yields 765 GFLOPs. We observe 2.5X speedup in SGEMM versus dual socket Ivy Bridge. But overall speedup is 1.56X. The CPU fraction does not scale, since it is executed in single thread, and NVBLAS always uses one CPU thread per GPU. The acceleration is still significant, but we start to be limited by Amdahl’s law.

uni-frankfurt.de: CSC Home / Center for Scientific Computing

uni-frankfurt.de: CSC Home / Center for Scientific Computing
The Center for Scientific Computing (CSC) of the Goethe University Frankfurt currently operates three Linux-based computer clusters within the framework of the HHLR-GU (Hessisches Hochleistungsrechenzentrum der Goethe-Universität) to support numerically intensive studies in a variety of research fields, ranging from neuroscience to high-energy physics. The CPU cluster “Fuchs” is available for HPC (High Performance Computing) applications for users from all universities in Hessen. As the system is designed to support different types of applications the cluster provides an ideal HPC-infrastructure for the scientific community. The GPGPU cluster “Scout” is a testbed for users who want to develop or port code to run on modern architecture graphics processors. Recently, the massive parallel cluster “LOEWE-CSC”, a combined CPU-GPU cluster, was installed.

www.Visual6502.org: Visual Transistor-level Simulation of the 6502 CPU

www.Visual6502.org: Visual Transistor-level Simulation of the 6502 CPU
The first of our projects is aimed at the classic MOS 6502 processor. It’s similar to work carried out for the Intel 4004 35th anniversary project, though we’ve taken a different approach to modeling and studying the chip. In the summer of 2009, working from a single 6502, we exposed the silicon die, photographed its surface at high resolution and also photographed its substrate. Using these two highly detailed aligned photographs, we created vector polygon models of each of the chip’s physical components – about 20,000 of them in total for the 6502. These components form circuits in a few simple ways according to how they contact each other, so by intersecting our polygons, we were able to create a complete digital model and transistor-level simulation of the chip. This model is very accurate and can run classic 6502 programs, including Atari games. By rendering our polygons with colors corresponding to their ‘high’ or ‘low’ logic state, we can show, visually, exactly how the ch

alasir.com: CPU Cache Memory

alasir.com: CPU Cache Memory
New additions: Beyerdynamic DT 301 and DT 302 Headphones in Reviews World of Warcraft Theme for Sony Ericsson Phones in Software Solder Alloys: Physical and Mechanical Properties in Reference Reference on KEMET SMD Tantalum Capacitors in Reference A Quick Analysis of the NVIDIA Fermi Architecture in Articles RAMspeed v2.6.0 and RAMspeed/SMP v3.5.0 have been released in Software CPUinfo, a processor information retrieving tool in Software Most popular: Functional Principles of Cache Memory in Articles Alpha: The History in Facts and Comments in Articles RAMspeed, a cache and memory benchmarking tool in Software

Prozessortaktung › Wiki › ubuntuusers.de

Prozessortaktung › Wiki › ubuntuusers.de
Beispiel: cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state 1400000 53513 1200000 2616 1000000 2400 800000 2840 600000 1208171 cat /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans 5477 cat /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table From : To : 1400000 1200000 1000000 800000 600000 1400000: 0 282 258 291 1549 1200000: 52 0 12 12 206 1000000: 48 0 0 14 208 800000: 52 0 0 0 265 600000: 2228 0 0 0 0