Using six 2.6TB Fusion ioMemory™ ioDrive®2 Duo drives, the Universities of Michigan and Victoria are sharing massive volumes of data with the CERN Laboratory and 100 other computing centers. The solution may expedite the potential discovery of new particles and forces that will help explain the nature of the universe.
Through the use of Fusion ioMemory™ PCIe solutions, the University of Michigan and University of Victoria are implementing a multi-site supercomputing project to help them share massive volumes of data among the CERN Laboratory in Geneva, Switzerland and 100 computing centers around the world. The solution has the potential to accelerate access to physics data—allowing teams of physicists to expedite the potential discovery of new particles and forces that will help explain the nature of the Universe.
Institute of Particle Physics Research Scientist and Professor at University of Victoria
Francois Englert and Peter W. Higgs received the Nobel Prize in physics for the theoretical discovery of a mechanism that contributes to our understanding of the origin of mass of subatomic particles. This mechanism was confirmed through the discovery of the predicted fundamental particle by the ATLAS and CMS experiments at CERN’s Large Hadron Collider (LHC).
The Large Hadron Collider (LHC) is the world’s largest and most powerful particle accelerator. Initiated on September 10, 2008, the LHC consists of a 27-kilometre ring of superconducting magnets that contain a number of accelerating structures to boost the energy of the particles. Inside the accelerator, two high-energy particle beams travel at close to the speed of light before they are made to collide. They are guided around the accelerator ring by a strong magnetic field maintained by thousands of superconducting electromagnets, which are built from coils of special electric cable that operates in a superconducting state, efficiently conducting electricity without resistance or loss of energy.
All of the controls for the accelerator, including its services and technical infrastructure, are housed under one roof at the CERN Control Centre in Geneva, Switzerland. From here, the beams inside the LHC are made to collide at four locations around the accelerator ring, corresponding to the positions of four particle detectors—ATLAS, CMS, ALICE and LHCb. The LHC is a machine built to study the “Big Bang” when the Universe was formed.
ATLAS is one of the particle physics experiments at the LHC that is dealing with both trying to discover the fundamental laws of physics and with searching for new discoveries in the head-on collisions of protons of extraordinarily high energy. At approximately half the size of Paris’ Notre Dame Cathedral, ATLAS is about 45 meters long, more than 25 meters high, and weighs about 7,000 tons—the equivalent of one hundred 747 jets.
ATLAS is learning about the basic forces that have both shaped our Universe since the beginning of time and that will determine its fate. As the particles crash together in the center of ATLAS, they produce tiny fireballs of primordial energy, thereby recreating the conditions that existed at the birth of the Universe. Among the possible unknowns that ATLAS is exploring are extra dimensions of space, unification of fundamental forces, and evidence for dark matter candidates in the Universe. Due to the discovery of the last missing pieces of the Higgs boson, further data will allow in-depth investigation of the boson’s properties, and thereby, of the origin of mass.
The research at the LHC involves a very large and distributed collaboration among hundreds of organizations and thousands of physicists. The result is a huge amount of data; ten petabytes of data is transmitted each year.
The current challenge is to deploy an infrastructure that can support the data being scanned for new discoveries. The key to enabling these scientific discoveries is getting data from the source to the remote local sites as rapidly as possible. The ATLAS data sample is approximately 170 petabytes and must be transferred from CERN to other centers around the world at 100 gigabits per second (Gbps) to enable scientists to rapidly access and analyze data, and speed up discoveries.
In Spring 2015, the LHC will start colliding elements at the highest energies ever achieved in a particle accelerator. With this in mind, the physicists at the Universities of Michigan and Victoria needed to create a data transfer architecture based upon a single server that is able to transfer data among 100 computing centers around the world at 100 Gbps speeds.
Research sites typically make local copies of data when an interesting data set or anomaly has been identified. As sites begin to upgrade to 100 Gbps network connections, the challenge is to utilize the bandwidth efficiently and deploy a subsystem that can deliver at that level. To date, there has not been a vendor with a 100GB interface card.
Physicists from University of Michigan and University of Victoria are working together to develop a simple grid framework infrastructure that can utilize wide area network 100GB connections to enable data to be scanned by local sites for new physics. Large local sites must have the ability to move data across many servers and therefore need the network cache to source and sync data from many sites to distribute locally, as smaller sites deploy a single server to utilize the data.
“The ATLAS and CMS supercomputing projects are very large international projects, each involving approximately 3,000 researchers and most of the world’s countries. These are long term projects—they started 20 years ago and will continue for another 20+ years,” said Randall Sobie, Institute of Particle Physics Research Scientist and Professor at University of Victoria.
The Caltech, University of Michigan, and University of Victoria teams at Supercomputing 2014 (SC14) had been aware of SanDisk’s Fusion ioDrive® solutions for some time. NAND-based and PCIe-based storage solutions were intriguing due to the promise of obtaining better performance than RAID storage systems. The team expected that using the SanDisk drives was the only way to get sufficient performance from a single box.
Working with Fusion ioMemory solutions, the University of Michigan and University of Victoria research teams deployed a single server storage environment to fuel their multi-site supercomputing project. It was necessary that the solution deliver massive data transmissions, while eliminating the need for a multi-server configuration and reducing complexity, cost, and points of failure. The chosen flash memory solution also dramatically reduced the server footprint needed to transmit these enormous datasets. By utilizing Fusion ioMemory PCIe solutions, the universities have accelerated access to data, allowing physicists to expedite the potential discovery of new particles and forces that will help explain the nature of the Universe.
The design tested at SC14 was focused around building a single server with the capability of meeting data transfer needs of a site connected at 100GE. One high performance server resided in the University of Michigan booth at SC2014 and a second high performance server resided at the University of Victoria. High performance networks were provided by CANARIE, BCNET, CenturyLink, and SCinet in collaboration with the Caltech Networking Team.
At the University of Victoria, a dedicated 100G circuit was established using a Brocade MLXe-4 and CFP2 module to connect to the SC14 show floor. High-performance data transfer nodes were architected using a Dell R920 server equipped with six 2.6TB Fusion ioMemory ioDrive2 Duo solid state storage (SSD) drives that were connected to the MLXe-4 with a 100G CFP2 module and three 40GE NICs. An OpenDaylight controller employing custom multipath extensions was implemented on the SC14 show floor to control the MLXe-4 at the University of Victoria.
Regarding the support from SanDisk, Ian Gable, physics computing specialist with University of Victoria, said, “We were extremely impressed with the deep technical expertise of the SanDisk solutions architect. The late nights and hard work provided by SanDisk were contributing factors to our success.”
The researchers were very happy with the initial results and declared the performance of the SanDisk Fusion ioDrive2 Duo cards as nothing short of astonishing. The solution achieved 137 Gbps read-to-disk locally and 113 Gbps write-to-disk locally.
Each Dell server drove the network memory at a flat 99.7 Gbps from the University of Victoria via the Brocade MLXe-4 to the Caltech booth and then to the University of Michigan booth on the SC14 show floor—illustrating a loss-free network path between the two sites. Reading from disk using SanDisk Fusion ioDrive2 Duo drives resulted in 65 millisecond latency between the two sites at 73 Gbps. Writing occurred at 60 Gbps. These results indicate that a single server can handle the needs of a 100G wide area circuit.
“We believe that there is potential to improve on these results with further testing and tuning,” commented Shawn McKee, physics researcher at University of Michigan. “However we are quite satisfied with the progress made during the limited time frame of SC14. Using a single 40GE network card we were able to copy data from disk to disk at 37.5 Gbps with just two SanDisk Fusion ioDrive2 Duo cards. This indicates that we should be able to reach our final goal of 100G read and write using this extremely capable hardware.”
To take advantage of these results in production, it will be necessary to fully explore the interaction among all of the components of the long latency, high-bandwidth path and with the multiple levels of caching and buffering along the end-to-end system—Application, Drive, Filesystem, and Network.
“My colleagues and I are proud and excited that we have now found a way to accelerate the discoveries made for each of these projects with the help of flash memory solutions from SanDisk,” said Sobie.
Each University now has a production 100G WAN network connection and will continue to boost the efficiency of the WAN transfer. They will then deploy this formula at other scientific sites.
“SanDisk is thrilled to be working with the teams at the University of Michigan and University of Victoria to help fuel their success by providing fast, cost-effective, and highly scalable flash solutions to increase data access,” said Sumit Sadana, executive vice president and chief strategy officer, SanDisk. “By utilizing flash technology, the researchers can cost-effectively transfer massive amounts of data over long distances, ultimately enabling them to reach new discoveries faster.”
The ATLAS accelerators have been in shutdown since 2013 to undergo upgrades and infrastructure changes. Previously configured for 8 to 13 teraelectronvolts (TeV), ATLAS will be newly configured to run at 14 TeV to accommodate physics data that will be generated beginning in May 2015. Teams are in the process of upgrading equipment, software, and frameworks to implement a proof of concept infrastructure that demonstrates what options are capable of enabling their data needs. Individual sites will then determine what infrastructure will be adopted depending on yearly budgets and available equipment.