AWS Partner Network (APN) Blog: High-Performance Mainframe Workloads on AWS with Cloud-Native Heirloom

AWS Partner Network (APN) Blog: High-Performance Mainframe Workloads on AWS with Cloud-Native Heirloom

Heirloom Computing_Badge
Connect with Heirloom-1
Rate Heirloom-1
By Gary Crook, CEO at Heirloom Computing

It is common to meet enterprises still using mainframes because that is historically where their core business applications have been. With Heirloom on AWS, we can decouple the application from the physical mainframe hardware, allowing us to run applications in the cloud and take advantage of the benefits of Amazon Web Services (AWS).

Heirloom automatically refactors mainframe applications’ code, data, job control definitions, user interfaces, and security rules to a cloud-native platform on AWS. Using an industry-standard TPC-C benchmark, we demonstrated the elasticity of Heirloom on AWS, delivering 1,018 transactions per second—equivalent to the processing capacity of a large 15,200 MIPS Mainframe.

Heirloom is deployed on AWS Elastic Beanstalk, which facilitates elasticity (the ability to automatically scale-out and scale-back), high availability (always accessible from anywhere at any time), and pay-as-you-go (right-sized capacity for efficient resource utilization). With Heirloom on AWS, mainframe applications that were rigid and scaled vertically can be quickly and easily transformed (recompiled) so they are now agile and scaled horizontally.

Heirloom Computing is an AWS Partner Network (APN) Standard Technology Partner. In this post, we use a real-world example to outline how monolithic mainframe applications can automatically be refactored to agile cloud-native applications on AWS.

Heirloom Overview
At the core of Heirloom is a unique compiler that quickly and accurately recompiles online and Batch mainframe applications (COBOL, PL/I, JCL, etc.) into Java so they can be deployed to any industry standard Java application server, such as Apache Tomcat, Eclipse Jetty, or IBM WebSphere Liberty.

With Heirloom, once the application is deployed it retains both the original application source code and resulting refactored Java source code. Heirloom includes Eclipse IDE plugins for COBOL, PL/I, and Java, as well as a fully functional integrated JES console and subsystem for running JCL jobs. This enables a blended model for ongoing development and management of the application so you can bridge the skills gap at a pace that is optimal for you, and switch code maintenance from COBOL to Java at your convenience.

Heirloom Computing – 1

Figure 1 – Heirloom refactoring reference architecture for mainframes.

Elastic Architecture
Heirloom deploys applications to industry standard Java application servers, which means your application can instantly leverage the full capabilities of AWS. Applications can dynamically scale-out and scale-back for optimal use of compute resources, and seamlessly integrate with additional AWS managed services like AWS Lambda and Java application frameworks like Angular2.

Here’s an example that uses Amazon Alexa to interact with unmodified CICS transactions deployed as microservices, and here’s another example that utilizes Docker containers.

Heirloom Computing – 2

Figure 2 – Heirloom elastic architecture for high performance.

The Heirloom elastic architecture relies on stateless share-nothing application servers that scale horizontally across Availability Zones (AZs). Any shared or persistent data structure is stored in an elastic managed data store. On AWS, this horizontal architecture across several AZs and many instances is key for elasticity, scalability, availability, and cost optimization. AWS Elastic Beanstalk automatically handles the application deployment, from capacity provisioning, load balancing, and auto-scaling to application health monitoring.

Application artifacts that are not inherently scalable are refactored to a target that automatically removes that constraint. For example, file-based data stores such as VSAM files are migrated to a relational data store using Heirloom built-in data abstraction layers. This is achieved without requiring any changes to the application code.

Performance Results
Using a COBOL/CICS implementation of the industry standard TPC-C benchmark, we measured transaction throughput per MIPS by running the application on a mainframe with a known MIPS specification. We then ran the same application on AWS infrastructure to measure transaction throughput and derive a MIPS rating using the ratio from running the application on the mainframe. Consequently, we determined the AWS environment was able to consistently deliver an equivalent MIPS rating of 15,200 at a sustained transaction throughput of 1,018 transactions per second.

For the performance test on AWS, we took more than 50,000 lines of the TPC-C application code and screens and compiled them (without any modifications) into 100 percent Java using the Heirloom Software Developer Kit (SDK) Eclipse plugin. The Java code was then packaged as a standard .war file, ready for deployment to any industry standard Java application server (such as Apache Tomcat).

The TPC-C environment on AWS is composed of:

22,500 simulated end-user terminals hosted by 10 m3.2xlarge Amazon Elastic Compute Cloud (Amazon EC2) instances.
All concurrent transactions are distributed to application instances by a single AWS Application Load Balancer which automatically scales on-demand.
The Heirloom TPC-C application layer is hosted in an AWS Elastic Beanstalk environment consisting of a minimum of 16 m4.xlarge Amazon EC2 instances (a Linux environment running Apache Tomcat). With the AWS Auto Scaling Group, this environment automatically scales-out (by increments of 8), up to a maximum of 64 instances (depending on a metric of the average CPU utilization of the currently active instances). It also automatically scales-back when the load on the system decreases.
For enhanced reliability and availability, the instances are seamlessly distributed across three different AZs (i.e. at least three different physical data centers).
The database (consisting of around five millions of rows of data in tables for districts, warehouses, items, stock-levels, new-orders, etc.) is hosted in an Amazon Aurora database (which is either MySQL or PostgreSQL compatible).
The application monitoring layer is provided by Amazon CloudWatch, which provides a centralized constant examination of the application instances and the AWS resources being utilized.
The application workload at peak was distributed over a total of 144 CPU cores (each CPU is equivalent to 2 vCPUs), consisting of 128 CPU cores for the application layer and 16 CPU cores for the Aurora database layer. For a 15,200 MIPS capacity, this yields approximately 105 MIPS per CPU core (or 52 MIPS per vCPU). This is consistent with our client engagements, and a useful rule of thumb when looking at initial capacity planning.

Cost Analysis
For a large mainframe of more than 11,000 MIPS, the average annual cost per installed MIPS is about $1,600. Hardware and software accounts for 65 percent of this, or approximately $1,040. Consequently, we determined the annual infrastructure cost for a 15,200 MIPS mainframe is approximately $16 million.

On the AWS side, using the AWS Simple Monthly Calculator to configure a similar infrastructure to the performance test, we estimated the annual cost to be around $350,000 ($29,000 monthly). This AWS cost could be further optimized with Amazon EC2 Reserved Instances.

The annual cost for Heirloom varies depending on the size of the application being refactored (it consists of a cost per CPU core with larger discounts for larger application workloads). With all costs accounted for, our clients typically see a cost reduction in excess of 90 percent, and positive ROI within a year.

Code Quality
With any solution that takes you from one ecosystem to another, not only is it essential the application behaves and functions as it did before, it’s vital that any application code produced is of the highest quality.

Using SonarQube with the default configuration, we can examine the quality of the Java application produced by Heirloom when it compiled the 50,000+ LOC in the TPC-C application (which was originally a COBOL/CICS application written for the mainframe). SonarQube rated all the major aspects (reliability, security, and maintainability) of the Java application source-code with the highest rating of “A.”

Heirloom Computing – 3

Figure 3 – SonarQube analysis for the TPC-C benchmark application refactored with Heirloom.

Refactoring Tools and Application Development
The Heirloom SDK is a plugin for the Eclipse IDE framework and provides tooling that covers all aspects of a refactoring project, as outlined in Figure 1.

The same tooling is then used for ongoing application development and maintenance. This can be done in COBOL, PL/I, Java or any mix. Each language is fully supported with a feature-rich project workspace, editor, compiler, and debugger.

Regardless of which language you choose, Heirloom applications always execute on the Java platform.

Heirloom Computing – 4

Figure 4 – Eclipse IDE with Heirloom SDK showing a COBOL debugging and editing session.

Not Just Cloud, But Cloud-Native
You can move an application to the cloud by re-hosting it (also called “lift and shift”) on an Amazon EC2 instance, retaining the existing constraints of the legacy workload such as stateful applications, monolithic application structure, file-based data stores, and vertical scaling. It works with limitations, but it is not cloud-native.

In simple terms, cloud-native is an approach to the development and deployment of applications that fully exploits the benefits of the cloud-computing platform. There are best practices for cloud-native applications, such as:

Adherence to the Twelve-Factor App methodology, including stateless share-nothing processes and persistent or shared data stored in backend databases.
Dynamic, horizontal scale-out and scale-back.
Available to be consumed as a service.
Enables portability between execution environments in order to select fit-for-purpose compute or data store services.
Leverage cloud-provided system management such as central monitoring, central logging, central alerting, central automation, central billing.
Cloud-native applications are elastic, highly-scalable, and embrace the elasticity of underlying AWS services. With horizontal scalability, cloud-native applications are also more cost optimized because you don’t need to size a fixed number of instances for peak traffic. Instead, you can use smaller instances which are instantiated or terminated automatically based on the exact workload demand.

The major benefit of Heirloom on AWS is that it can automatically refactor mainframe applications to Java application servers so they are cloud-native, while preserving critical business logic, user-interfaces, data integrity, and systems security.

Learn More About Heirloom on AWS
See Heirloom in action with our 60-second demo. You can also try it by downloading Heirloom SDK for free—available on Windows, Linux, and macOS.
You can read an in-depth look at how the performance benchmark on AWS was performed in my LinkedIn article.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

Heirloom Computing_Logo-2
Heirloom Computing – APN Partner Spotlight
Heirloom Computing is an APN Standard Technology Partner. Heirloom automatically transforms mainframe applications so they execute on Java Application Servers, while preserving critical business logic, user-interfaces, data integrity, and systems security. With Heirloom , mainframe applications can quickly and easily be transformed to be agile, open, and scaled horizontally.

Contact Heirloom Computing | Solution Brief | Solution Demo | Free Trial

*Already worked with Heirloom Computing? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

TAGS: Amazon Alexa, Amazon EC2, APN Partner Guest Post, APN Partner Spotlight, APN Technology Partners, Application Load Balancer, Auto Scaling, AWS Elastic Beanstalk, AWS Lambda, AWS Partner Solutions Architects (SA), Docker, Heirloom Computing, Mainframe Modernization, Migration

BluAge legacy modernization

BluAge legacy modernization

Contact us
Conference registration
Request a demo
Partners portal
Blu Age
About Us

Refactoring, replatforming,
rewriting, repurchasing….
Compare the main legacy modernization techniques

Application Modernization Service Quadrant
Blu Age rated as innovator by 360Quadrants under Application Modernisation space – September 2019
Blu Age evolves from innovator to visionary leaders in Apps Transformation
Blu Age ahead of the pack on product maturity
Blu Age in the top Apps Transformation players
“Blu Age’s solutions also allow users to exercise complete control over project duration and to slice and decompose monolithic applications into independent components…”
“Blu Age has developed a portal that showcases its product offerings and allows partners to access technical and marketing resources…”

Our mission: Help private and public organizations to enter the digital era by modernizing their legacy systems while substantially reducing modernization costs, shortening projects duration and mitigating the risk of failure.

Blu Age modernization in numbers
Faster modernization projects Blu Genius for FREE
Up to 20 Times Faster
Shorten your Cloud migration from months to weeks or days.

Cheaper modernization projects
4 to 10 Times Cheaper
Automation minimizes the workload for experienced developers.

Successful modernization projects
100% Success
All technical risks are identified, measured and addressed at early stage.

Why use Blu Age ?
Proven Solutions
Blu Age products have been successfully used in more than 80 projects.

Faster Projects
The high degree of automation substantially reduces projects duration.

Minimized Risks
100% of the potential risks are identified, mitigated and controlled at early stage.

100% Automated
Blu Age fully automates the modernization of legacy code and data into modern technologies.

High Quality Code
Blu Age generates modern source code ready for future evolutions and maintenance.

Reduced Budgets
Blu Age guarantees highly competitive financial offers in the modernization landscape.

Blu Age accelerates your modernization into the world’s top Cloud providers
Digital technologies such as Cloud, mobile, and smart assistants are creating a world of new opportunities for businesses today, helping companies save millions, creating efficiencies, improving quality, and increasing product visibility. Unfortunately, legacy systems seldom support digital technologies and when coupled with the numbers of skilled workers retiring, legacy systems are becoming increasingly risky and costly to operate-making legacy modernization mission-critical.

Experienced by
If you are looking for Mainframe COBOL, PL/I, Natural Adabas, CA Ideal Datacom, PowerBuilder, RPG 400, Delphi, CoolGen, Visual Basic or any other legacy language modernization to the Cloud, Blu Age gets you there with speed, scale and safety.

Blu Age News
Stay up to date with the latest tips in modernization to Cloud technology provided by
our engineering team at Blu Age.

Blog post #4: Know the modernization market
1 min read – words: 515

Blu Insights

Blog post #3: KPIs to be reached
5 min read – words: 1536

Blu Insights

Blog post #2: Talk to your stakeholders
4 min read – words: 1186

Blu Insights
More News…
Blu Age

Powerful legacy applications modernization solutions allowing faster projects with lower budgets.

Blu Age

Everything your teams and customers need to pave the way to success.

Blu Insights – Modernization Factory
Blu Age Analyzer
Blu Age Velocity
Blu Age Classic
Serverless Cobol
Blu Age Compare Tool
Summerbatch for .NET
Conference registration
Download Eclipse 2020-03
Welcome to the jungle

Index de l’égalité professionnelle homme femme : 84/100.

© 2005-2020 BLU AGE


Heirloom Computing: Replatform Mainframe Applications as Cloud-Native Java ApplicationsHeirloom Computing

Heirloom Computing: Replatform Mainframe Applications as Cloud-Native Java ApplicationsHeirloom Computing

Heirloom Computing
The Cloud-Native Mainframe™
Heirloom® is the only proven cloud-native mainframe replatforming solution in the market.
Read this Amazon AWS blog that illustrates why leading financial services companies and government agencies are embracing Heirloom to replatform their mainframe workloads as agile cloud-native applications.

Digital Transformation of Mainframe Applications
Heirloom® automatically replatforms mainframe applications so they execute on any Cloud, while preserving critical business logic, user-interfaces, data integrity, and systems security.
Replatforming with Heirloom is 10X faster than re-engineering, cutting operational costs up to 90%, with a typical ROI well inside 12 months.

Open, Powerful, Flexible
Heirloom® applications execute on any industry-standard Java Application Server, removing any dependency on proprietary containers, and therefore eliminating vendor lock-in.
With state-of-the-art Eclipse plugins, application development can continue in the original language (e.g. COBOL, PL/I, JCL) or Java, or any mix. This unprecedented flexibility means you can move fast with a blended model that makes best use of your technical resources.

Agile Cloud-Native Applications
Replatforming with Heirloom® delivers agile cloud-native applications that can be deployed on-premise or to any cloud. Applications can dynamically scale-out & scale-back, with high-availability, and cost-effective utilization of resources.
Seamlessly integrate & extend Heirloom applications with powerful open source application frameworks to quickly add new functions, re-factor code, and construct microservices.

Heirloom® Overview
Heirloom is a state-of-the-art software solution for replatforming online & batch mainframe workloads as 100% cloud-native Java applications.

It is a fast & accurate compiler-based approach that delivers strategic value through creation of modern agile applications using an open industry-standard deployment model that is cloud-native.

How It Works In 60 Seconds
Think you’ve heard it before? Got 60 seconds? Let us show you the Heirloom difference; watch the video below.

Try It For Yourself
You can download the Heirloom SDK via our courseware for free today. It is available on Windows, Linux, and macOS.

COBOL Compiler

Fast & Accurate
The core technology of Heirloom is a patented compiler that can recompile & refactor very large complex mainframe applications built from millions of lines of code into Java in minutes. The resulting application is guaranteed to exactly match the function & behavior of the original application.

Complete Solution
Mainframe applications are dependent upon key subsystems such as transaction processors, job control, file handlers, and resource-level security & authentication. Heirloom faithfully replicates all of these major subsystems by providing a Java equivalent (for example, JES/JCL) or a layer that provides a seamless mapping to an open systems equivalent (for example, Open LDAP for security).

Built for Cloud
Heirloom was designed and built for the cloud from the the start. Cloud-native deployment delivers application elasticity (the ability to dynamically scale-out and scale-back), high availability (always accessible from anywhere at anytime), and pay-for-use (dynamic right-sizing of capacity for efficient resource utilization).
Heirloom Computing works with industry-leading systems integrators to offer complete application modernization & PaaS enablement solutions to enterprises and ISVs.

Interested in partnering with us?

Mainframe Replatforming
Solution Brief
Courseware & SDK
Product Support
Product Manuals
White Papers
Financial Services
Terms of Service
Privacy Policy
Contact Us
© 2010-2020
Heirloom Computing Inc
All Rights Reserved

BigBlueButton Online Learning

BigBlueButton Online Learning

Build Upon Us
BigBlueButton is completely open source and made by a community of dedicated developers passionate about helping improve online learning.

How we started
BigBlueButton is an open source web conferencing system for online learning. The goal of the project is to provide remote students a high-quality online learning experience.

Given this goal, it shouldn’t surprise you that BigBlueButton started at a university (though it may surprise you which one). BigBlueButton was created by a group of very determined software developers who believe strongly in the project’s social benefits and entrepreneurial opportunities. Starting an open source project is easy: it takes about five minutes to create a GitHub account. Building a successful open source project that solves the complex challenges synchronous learning, however, takes a bit longer.

There has been a core group of commiters working on BigBlueButton since 2007. To date there have been been over a dozen releases of the core product. For each release the committers have been involved in developing new features, refactoring code, supporting the community, testing and documentation.

We believe that focus is important to the success of any open source project. We are focused on one market: on-line learning. If you are an institution (educational or commercial) that wishes to teach your students in a synchronous environment, we are building BigBlueButton for you.

Interested in trying BigBlueButton
Check out the tutorial videos and then try out BigBlueButton on our demo server.

For Teachers
For Schools
Open Source Project
Developer Group
Tutorial Videos
Community Support
Commercial Support

© 2020 BigBlueButton. Data Processing and Storage for Black Hole Event Horizon Imaging Data Processing and Storage for Black Hole Event Horizon Imaging

White Paper
Data Processing and Storage for
Black Hole Event Horizon Imaging
Tom Coughlin, President, Coughlin Associates, Inc.
Executive Summary
The first imaging of the event horizon for a black hole involved an international
partnership of 8 radio telescopes with major data processing at MIT and the Max Planck
Institute (MPI) in Germany. The contribution of the brilliant scientists was aided by the
expanded computing power of today’s IT infrastructure. Processing of the 4 petabytes
(PB) of data generated in the project in 2017 for the original imaging utilized servers and
storage systems, with many of these servers coming from Supermicro. Figure 1 shows the
MIT Correlator Cluster as it looked in 2016.
Super Micro Computer, Inc.
980 Rock Avenue
San Jose, CA 95131 USA
Table of Contents
1 Executive Summary
2 Black Holes and their Event Horizons
2 The Black Hole Event
Horizon Imaging Project
4 Capturing and Storing EHT Data
4 The EHT Correlators
5 Supermicro in the EHT Correlators
8 Processing the EHT Data
9 The Future of Black Hole
10 Infrastructure Possibilities of
Future Black Hole Observations
12 About the Author
12 About Supermicro
June 2019
2 Data Processing and Storage for Black Hole Event Horizon Imaging
1 Titus et. al., Haystack Observatory VLBI Correlator 2015–2016 Biennial Report, in International VLBI Service for
Geodesy and Astrometry 2015+2016 Biennial Report, edited by K. D. Baver, D. Behrend, and K. L. Armstrong,
NASA/TP-2017-219021, 2017
Black Holes and their Event Horizons
With the publication of Albert Einstein’s general theory of relativity in 1915, our views of
the nature of space, time and gravity were changed forever. Einstein’s theory showed that
gravity is created by the curvature of space around massive objects.
When the nuclear fusion processes that create the radiation of stars begins to run out of
fuel and no longer exert sufficient outward pressure to overcome the gravity of the star,
the star’s core collapses. For a low mass star like our Sun, this collapse results in a white
dwarf star that eventually cools and becomes a nearly invisible black dwarf. Neutron stars
are the result of the gravitational collapse of the cores of larger stars, which just before
their collapse, blow much of their mass away in a type of supernova. Even larger stars,
more than double the mass of our sun, are so massive that when their nuclear fuel is
expended, they continue to collapse, with no other force able to resist their gravity, until
they form a singularity (or point-like region) in spacetime. For these objects, the escape
velocity exceeds the speed of light and hence conventional radiation cannot occur and a
black hole is formed.
Figure 1. Detail of DiFX Correlator cluster racks showing Supermicro servers1
Data Processing and Storage for
Black Hole Event Horizon Imaging
White Paper 3
The black hole singularity is surrounded by a region of space that is called the event
horizon. The event horizon has a strong but finite curvature. Radiation can be emitted
from the event horizon by material falling into the black hole or by quantum mechanical
processes that allow a particle/antiparticle pair to effectively tunnel out of the black role
releasing radiation (Hawking radiation).
Black holes are believed to be fairly common in our universe and can occur in many
sizes, depending upon their initial mass. Hawking radiation eventually results in the
“evaporation” of a black hole, and smaller, less massive, black holes evaporate faster than
more massive black holes. Super massive, long-lived black holes are believed to be at the
center of most galaxies, including our Milky Way. Until 2019, no one had imaged the space
around a black hole.
The Black Hole Event Horizon Imaging Project
Radio telescopes allow observations even in the presence of a cloud cover and microwave
radiation is not absorbed by interstellar clouds of dust. For these reasons, radio telescopes
provide more reliable imaging of many celestial objects. 1.3mm wavelength is a popular
observing frequency.
Using an array of 8 international radio telescopes in 2017, astrophysicists used
sophisticated signal processing algorithms and global very long baseline interferometry
(VLBI) to turn 4 petabytes of data obtained from observations of a black hole in a
neighboring galaxy into the first image of a black hole event horizon. This particular
black hole is located 55 million light-years away (in galaxy Messier 87, M87). It is 3 million
times the size of the Earth, with a mass 6.5 billion times that of the Earth’s sun. A version
of the images shown in Figure 2, showing a glowing orange ring created by gases
and dust falling into the black hole, appeared on the front pages of newspapers and
magazines worldwide in mid-April 2019.
The eight different observatories,
located in six distinct geographical
locations, shown in Figure 3,
formed the Event Horizon
Telescope (EHT) array. This
collection of telescopes
provided an effective imaging
aperture close to the diameter
of the earth, allowing the
resolution of very small objects
in the sky. Observations were
made simultaneously at 1.3mm
wavelength with hydrogen maser
atomic clocks used to precisely time
stamp the raw image data.
Figure 2. First M87 Event Horizon
Telescope Results. III. Data
Processing and Calibration,
The Event Horizon Telescope
Collaboration, The Astrophysical Journal Letters, 875:L3
(32pp), 2019, April 10
M87* April 11, 2017
50 μas
April 5
0 1 2 3 4 5 6
April 6
Brightness Temperature (10 K)
April 10
Figure 3. Akiyama et. al., First M87 Event Horizon Telescope Results. III. Data Processing and
Calibration, The Event Horizon Telescope Collaboration, The Astrophysical Journal
Letters, 875:L3 (32pp), 2019, April 10
4 Data Processing and Storage for Black Hole Event Horizon Imaging
Capturing and Storing EHT Data
VLBI allowed the EHT to achieve an angular resolution of 20 micro-arcseconds,
said to be good enough to locate an orange on the surface of the Moon, from the
Earth. Observation data was collected over five nights, from April 5–11, 2017. These
observations were made at each site as the weather conditions were favorable. Each
telescope generated about 350 TB of data a day and the EHT sites recorded their data
at 64 Gb/s.
The data was recorded in parallel by four digital backend (DBE) systems on 32
helium-filled hard disk drives (HDDs), or 128 HDDs per telescope. So, for the 8
telescopes, 1,024 HDDs were used. The Western Digital helium filled HDDs used were
first obtained in 2015, when these were the only He-filled sealed HDDs available.
Sealed He-filled HDDs were found to operate most reliably at the high altitudes of
the radio telescopes. Processing of the collected data was simultaneously done using
correlators at the MIT Haystack Observatory (MIT) and the Max Planck Institute in
Germany (MPI).
According to Helge Rottmann from MPI2
“For the 2017 observations the total
data volume collected was about 4 PB.” He also said that starting in 2018 the EHT
collected data doubled to 8 PB. The DBEs acquired data from the upstream detection
equipment using two 10 Gb/s Ethernet network interface cards at 128 Gb/s. Data
was written using a time sliced round-robin algorithm across the 32 HDDs. The drives
were mounted in groups of eight in four removable modules. After the data was
collected the HDD modules were flown to the Max Planck Institute (MPI) for Radio
Astronomy in Bonn, Germany for high frequency band data analysis and to the
MIT Haystack Observatory in Westford, Massachusetts for low frequency band data
Vincent Fish from the MIT Haystack Observatory said that3
, “It has traditionally been
too expensive to keep the raw data, so the disks get erased and sent out again for
recording. This could change as disk prices continue to come down. We still have
the 2017 data on disk in case we find a compelling reason to re-correlate it, but in
general, once you’ve correlated the data correctly, there isn’t much need to keep
petabytes of raw data around anymore.”
The EHT Correlators
The real key to extracting the amazing images of the event horizon of a black hole
was the use of advanced signal processing algorithms to process the data. Through
the receiver and backend electronics at each telescope, the sky signal is mixed to the
baseband, digitized, and recorded directly to hard disk, resulting in petabytes of raw
VLBI voltage signal data. The correlator uses an a priori Earth geometry and a clock/
delay model to align the signals from each telescope to a common time reference.
Also, the sensitivity of the antennas had to be calculated to create a correlation
coefficient between the different antennas.
2 Email from Helge Rottmann, Max Planck Institute, May 7, 2019
3 Email from Vincent Fish, MIT Haystack Observatory, April 30, 2019
Data Processing and Storage for
Black Hole Event Horizon Imaging
White Paper 5
The actual processing was performed with DiFX software4
running on high performance
computing clusters at MPI and MIT. The clusters are composed of 100s of servers,
thousands of cores, high performance networking (25 GbE and FDR Infiniband) and
RAID storage servers. The MIT Haystack correlator is shown in Figure 5.
The MPI Cluster, located in Bonn Germany, is comprised of 3 Supermicro head node servers,
68 Compute nodes (20 cores each = 1,360 cores), 11 Supermicro storage RAID servers
running BeeGFS parallel file system with a capacity of 1.6 petabytes, FDR Infiniband
networking, 15 Mark 5 playback units, as shown in figure 5, and 9 Mark 6 play back units.
The MIT Cluster is housed within 10 racks (9 visible
in Figure 4). Three generations of Supermicro
servers were used with the newest having two
10-core Intel®
CPUs. The network consists
of Mellanox®
100/50/40/25 GbE switches with the
majority of nodes on the high speed network at
25GbE or higher Mellanox PCIe add-on NICs. In
addition to the Mark 6 recorders there is half a
petabyte of storage scattered across the various
Supermicro storage servers for staging raw data
and archiving correlated data product1.
Supermicro in the EHT Correlators
The Correlators make extensive use of Supermicro’s large portfolio of Intel Xeon
Processor based systems and Building Block Solutions®
to deploy fully optimized
compute and storage solutions for various workloads. For example, the Haystack DiFX
Correlator depicted in Figure 6 leverages Supermicro solutions for compute, storage,
administration, and maintenance tasks.
Figure 4. Photo by Nancy Wolfe Kotary, MIT Haystack Observatory
Figure 5. Mark 5 Playback Unit
4 Deller, A. T., Brisken, W. F., Phillips, C. J., et al. 2011, PASP, 123, 275
6 Data Processing and Storage for Black Hole Event Horizon Imaging
Haystack DiFX Correlator
Due to the high resource demand of DiFX, 2×2 clustered Supermicro headend nodes
(4 total), shown in Figure 7, are required to play the role of launching the correlations;
farming out the pieces of the correlations to the compute nodes; collecting and
combining the processed correlation pieces and writing out the correlated data products.
These 4U 2 processor systems with 24 3.5″ drive headend nodes utilize the onboard
hardware SAS RAID controller to achieve high output data rates and data protection.
Figure 6. Haystack DiFX Correlator
Figure 7. Supermicro 4U Headend Node Systems
Storage Head Nodes
2X2 Clustered
Compute Cluster
Mark 6 Unit
Raid Storage
Raid Storage Super Storage
4U 2P 24X Drive
3U 2P 8 Drive, 7 PCI-E
2U 2 Node 2P
Data Processing and Storage for
Black Hole Event Horizon Imaging
White Paper 7
There are a total of 60 compute nodes that comprise the MIT cluster. There are 38 nodes
of the Supermicro TwinPro multi-node systems (19x systems total), shown in Figure 8 with
Intel Xeon E5-2640 v4 processors. The Twin multi-node system contains two independent
dual processor compute nodes in a single system doubling the density from traditional
rackmount systems and with shared power and cooling for improved power efficiency
and serviceability.
There are 16 previous generation compute cluster nodes in the cluster comprised of a 3U
dual socket Supermicro server with the Intel Xeon E5-2680 v2 processors, Figure 9.
Figure 8. Supermicro TwinPro Multi Node Systems
Figure 9. Supermicro SuperServer
8 Data Processing and Storage for Black Hole Event Horizon Imaging
The clustered storage nodes are configured with redundant high efficiency power
supplies and optimized redundant cooling to save energy, SAS3 expander options for
ease of interconnection, plus a variety of drive bay options depending on task.
At the core of the MIT DiFX Correlator is a high-performance data storage cluster based
on four Supermicro storage systems, to deliver high I/O throughput and data availability
through 10 Gigabit Ethernet networking fabrics and RAID controllers.
These systems are built with a selection of different Supermicro serverboards and
chassis with support for dual or single Intel®
processors, SAS3 drives with onboard
hardware RAID controllers, onboard dual 10GbE for efficient networking, up to 2 TB DDR4
memory and 7 PCI-E 3.0 expansion slots for external drive capabilities.
Processing the EHT Data
According to Vincent Fish2, “The time for computation is in general a complicated function
a lot of different parameters (not just the number of stations or baselines being correlated).
In general, we can correlate the data from one 2 GHz chunk of EHT data a bit slower than real
time. However, the telescopes don’t record continuously—there are gaps between scans on
different sources, and an observing night doesn’t last 24 hours—so we could correlate a
day’s worth of data in about a day per 2 GHz band if we ran the correlator continuously.
The 2017 data consisted of two 2 GHz bands, one of which was correlated at Haystack
and the other in parallel at MPIfR (MPI). The 2018 data consists of four 2 GHz bands; each
correlator is responsible for two of them.”
The Mark 6 playback units, figure 10, at the MIT correlator are connected via 40 Gbps
data links. A 100 Gbps network switch then delivers data to the processing nodes using
25 Gbps links. At MPI the internode communication, which includes the Mark 6 playback
units, is realized via 56 Gbps connections, exceeding the maximum playback rate of the
Mark 6 units of 16 Gbps3.
The average time and bandwidth in the correlators are set to ensure that any coherence
losses due to delay or rate variations are negligible, or equivalently that such variations
can be tracked both in time and frequency.
The processing was divided between the two sites and included crosschecking of results.
The supercomputers at MPI and MIT correlated and processed the raw radio telescope
data from the various observing sites.
After the initial correlation, the data are further processed through a pipeline that results
in final data products for use in imaging, time-domain analyses, and modeling.
Data were correlated with an accumulation period (AP) of 0.4 s and a frequency resolution
of 0.5 MHz.
Note that ALMA refers to the Atacama Large Millimeter/submillimeter Array (In Chile).
ALMA was configured as a phased array of radio telescopes and was a recent addition to
the EHT effort with significant resolution capability. ALMA was treated as a highly accurate
anchor station and thus was used to improve the sensitivity limits of the global EHT array.
Figure 10. Mark 6 Playback Unit
Data Processing and Storage for
Black Hole Event Horizon Imaging
White Paper 9
Although operating as a single instrument spanning the globe, the EHT remains a mixture
of new and well-exercised stations, single-dish telescopes, and phased arrays with varying
designs and operations. Each observing cycle over the last several years was accompanied
by the introduction of new telescopes to the array, and/or significant changes and
upgrades to existing stations, data acquisition hardware, and recorded bandwidth.
EHT observations result in data spanning a wide range of signal-to-noise ratio (S/N) due to
the heterogeneous nature of the array, and the high observing frequency produced data
that were particularly sensitive to systematics in the signal chain. These factors, along with
the typical challenges associated with VLBI, motivated the development of specialized
processing and calibration techniques.
The end result of all this work, involving an intense international collaboration, was the
first image of a black hole event horizon.
The Future of Black Hole Observations
The EHT team’s initial goal was to image the event horizon of the massive black hole at
the center of our own Milky Way galaxy, SgrA*. However, this proved more difficult than
originally anticipated, since its structure changes on the timescale of minutes.
According to Max Planck Director, Anton Zensus5
, “the heart of our Milky Way is hidden in
a dense fog of charged particles. This leads to a flickering of the radio radiation and thus
to blurred images of the center of the Milky Way, which makes the measurements more
difficult. But I am confident that we will ultimately overcome this difficulty. On the other
hand, M87 is about 2,000 times further away. However, the black hole in its center is also
about 1,000 times more massive than the one in our Milky Way. The greater mass makes
up for the greater distance. The shadow of the black hole in M87 therefore appears to us
to be about half the size of the one from the gravity trap in our Milky Way.”
To date, the EHT has observed the black holes in just one wavelength—light with
a wavelength of 1.3 millimeters. But the project soon plans to look at the 0.87-mm
wavelength as well, which should improve the angular resolution of the array. It should
also be possible to sharpen the existing images using additional algorithmic processing.
As a consequence, we should expect better images of M87 and other black holes in the
not too distant future. The addition of more participating radio telescope sites will also
help improve the observational imaging.
The EHT team also wants to move from only ground-based VLBI to space-based imaging
using a space-based radio telescope. Going into space would allow the EHT to have
radio telescopes that are even further apart and thus able to capture some even more
astounding and higher resolution images of the black holes around us. “We could make
movies instead of pictures,” EHT Director Sheperd Doeleman said in an EHT talk at the
South by Southwest (SXSW) festival in Austin, Texas5
. “We want to make a movie in real time
of things orbiting around the black hole. That’s what we want to do over the next decade.”
5 “An astounding coincidence with theory”, Anton Zensus, Max Planck Director interview,
10 Data Processing and Storage for Black Hole Event Horizon Imaging
One of the big obstacles to using a space based EHT dish is data transmission. For the
ground-based experiments, HDDs were physically transported from the telescope sites
to central processing facilities at MPI and MIT. It is not clear yet how data would be sent
from the space telescopes to earth, but laser communication links are one possibility.
Transferring large amounts of data to the ground may require substantial onboard data
storage and a geosynchronous satellite acting as a relay (Figure 6).
Deepening our understanding the universe around us requires sophisticated IT
infrastructure. This includes ever more digital processing with more advanced algorithms,
using faster and more sophisticated servers and fast as well as vast digital storage for
capturing and processing the sea of data generated by big international science projects,
such as the Event Horizon Telescope project.
Infrastructure Possibilities of Future Black Hole Observations
Future work to gain higher resolution and even time sequenced data (e.g. videos) of
black hole event horizons (including the black hole at the center of our Milky Way galaxy)
will involve new data and more sophisticated analysis algorithms as well as the use of
radio telescopes in space. These efforts can leverage the massive improvements already
available in today’s state-of-the-art IT Infrastructure.
The core count of the 10 Twin systems used in the current correlator could be achieved
with a single Supermicro BigTwin™ multi-node system with 2U 4 Nodes, Dual Socket 205W
Intel Xeon Scalable Processors, 24 DIMMS DDR4 memory with 6 All-Flash NVMe drives per
node. The system delivers better density and improved power efficiency.
5 Effort to 5 SXSW 2019 Panel, Event Horizon Telescope: A Planetary Photograph a Black Hole, Sheperd Doeleman,
Dimitrios Psaltis, Sera Markoff, Peter Galison,
Figure 11. Supermicro BigTwin™
Data Processing and Storage for
Black Hole Event Horizon Imaging
White Paper 11
The rendering of images could be accelerated from hours to seconds with advanced
GPU systems such as a 1U 4-GPU Server, Figure 12.
The 960 terabytes of data could be stored on a single 1U Petascale server with the
All-Flash NVMe Solid State drives with order of magnitude better performance,
reduced latency and eliminating environmental issues introduced from the high
altitude, Figure 13.
These are just a few examples of the new state-of-the-art IT Infrastructure available to
researchers across the globe to support and enhance future research and discovery.
Figure 12. Supermicro GPU-optimized server systems
Figure 13. Supermicro All-Flash NVMe storage systems
Data Processing and Storage for
Black Hole Event Horizon Imaging
White Paper 12
Super Micro Computer, Inc.
980 Rock Avenue
San Jose, CA 95131 USA
No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or mechanical,
including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written permission of the
copyright owner.
Supermicro, the Supermicro logo, Building Block Solutions, We Keep IT Green, SuperServer, Twin, BigTwin, TwinPro, TwinPro²,
SuperDoctor are trademarks and/or registered trademarks of Super Micro Computer, Inc.
Ultrabook, Celeron, Celeron Inside, Core Inside, Intel, Intel Logo, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside
Logo, Intel vPro, Itanium, Itanium Inside, Pentium, Pentium Inside, vPro Inside, Xeon, Xeon Phi, and Xeon Inside are trademarks of
Intel Corporation in the U.S. and/or other countries.
All other brands names and trademarks are the property of their respective owners.
© Copyright 2019 Super Micro Computer, Inc. All rights reserved.
Printed in USA Please Recycle
About the Author
Tom Coughlin, President, Coughlin Associates is a digital storage analyst as well as a business and technology
consultant. He has over 37 years in the data storage industry with engineering and management positions at
several companies.
Dr. Coughlin has many publications and six patents to his credit. Tom is also the author of Digital Storage
in Consumer Electronics: The Essential Guide, which is now in its second edition with Springer. Coughlin
Associates provides market and technology analysis as well as Data Storage Technical and Business
Consulting services. Tom publishes the Digital Storage Technology Newsletter, the Media and Entertainment
Storage Report, the Emerging Non-Volatile Memory Report and other industry reports. Tom is also a regular contributor on digital
storage for and other blogs.
Tom is active with SMPTE (Journal article writer and Conference Program Committee), SNIA (including a founder of the SNIA SSSI),
the IEEE, (he is past Chair of the IEEE Public Visibility Committee, Past Director for IEEE Region 6, President of IEEE USA and active in
the Consumer Electronics Society) and other professional organizations. Tom is the founder and organizer of the Storage Visions
Conference ( as well as the Creative Storage Conference ( He was the general
chairman of the annual Flash Memory Summit for 10 years. He is a Fellow of the IEEE and a member of the Consultants Network of
Silicon Valley (CNSV). For more information on Tom Coughlin and his publications and activities go to
About Super Micro Computer, Inc.
(NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology is a premier provider of
advanced server Building Block Solutions®
for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and Embedded
Systems worldwide. Supermicro is committed to protecting the environment through its “We Keep IT Green®
” initiative and provides
customers with the most energy-efficient, environmentally-friendly solutions available on the market.