What’s New in HPC Research: ROBE, OpenMP Automated Scheduling, SCALSALE, & More

2022-10-09 08:23:16 By : Mr. Roc Yuan

Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them

Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them

In this regular feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here.

Reproducible cross-border high performance computing for scientific portals

An international team of researchers from the University of Tartu (Estonia), the University of Oslo (Norway), and the University of Iceland (Iceland) developed a software solution that allows “scientists to use cross-border cloud and HPC resources through web portals while achieving reproducibility using containers and workflow automation.” In this paper, researchers state that the solution supports job submissions from Galaxy and PlutoF portals to HPC systems. To create the solution, scientists developed “a robot user at each HPC facility for each group of users, so that user portals submit jobs as this robot user. On the HPC side, each robot user is associated with a user group to which a specific quota is assigned and managed.” The proposed solution would provide access to users needing HPC resources from one country to another (e.g., Finland to Sweden).

Authors: Kessy Abarenkov, Anne Fouilloux, Helmut Neukirchen, and Abdulrahman Azab

SCALSALE: Scalable SALE Benchmark Framework for Supercomputers

Israeli researchers from Ben-Gurion University of the Negev, Nuclear Research Center, Israel Atomic Energy Commission, Israel Institute of Technology, and Tel Aviv University introduced a scalable benchmark framework based on the SALE scheme called SCALSALE. SCALSALE was developed to “provide a simple, flexible, scalable infrastructure that can be easily expanded to include multi-physical schemes while maintaining scalable and efficient execution times.” With SCALSALE, researchers hope to bridge the gap between simplified benchmarks and scientific applications. In this paper, researchers detail how they implemented SCALSALE “in Modern Fortran with simple object-oriented design patterns and supported by transparent MPI-3 blocking and non-blocking communication that allows such a scalable framework.” They demonstrated the use of SCALSALE with the “multibounded representative Sedov-Taylor blast wave problem and compared it to the well-known LULESH benchmark using strong and weak scaling tests.” The framework is publicly available on Github: https://github.com/Scientific-Computing-Lab-NRCN/ScalSALE.

Authors: Re’em Harel, Matan Rusanovsky, Ron Wagner, Harel Levin, and Gal Oren

Automated scheduling algorithm selection and chunk parameter calculation in OpenMP

Switzerland researchers from HPE’s HPC/AI EMEA Research Lab and the department of mathematics and computer science at the University of Basel developed Auto4OMP. In this open access paper published by the IEEE Transactions on Parallel and Distributed Systems journal, researchers introduced Auto4OMP, which is a new “approach for automated load balancing of OpenMP applications.” The approach is meant to “address the scheduling algorithm selection problem in OpenMP applications.” To tackle the problem, researchers developed Auto4OMP with “a new expert chunk parameter and three new scheduling algorithm selection methods: RandomSel, ExhaustiveSel, and ExpertSel for OpenMP.” Researchers detailed their analysis of Auto4OMP’s performance and the results of their evaluation performed “on five applications, executed on three multi-core architectures to test six research hypotheses.” Results of the experiments demonstrated “that Auto4OMP improves applications performance by up to 11% compared to LLVM’s schedule(auto) implementation and outperforms manual selection. Auto4OMP improves MPI+OpenMP applications performance by explicitly minimizing thread- and implicitly reducing process-load imbalance.”

Authors: Ali Mohammed, Jonas H. Müller Korndörfer, Ahmed Eleliemy, and Florina M. Ciorba

The future of quantum computing with superconducting qubits

In this paper by a team of IBM researchers from IBM Quantum and IBM T.J. Watson Research Center in New York, researchers offer a “perspective of the future of quantum computing focusing on an examination of what it takes to build and program near-term superconducting quantum computers and demonstrate their utility.” Researchers argue that achieving computation advantage in the near future is “possible by combining multiple QPUs through circuit knitting techniques, improving the quality of solutions through error suppression and mitigation, and focusing on heuristic versions of quantum algorithms with asymptotic speedups.” However, researchers add, for this “to happen, the performance of quantum computing hardware needs to improve and software needs to seamlessly integrate quantum and classical processors together to form a new architecture that we are calling quantum-centric supercomputing.”

Authors: Sergey Bravyi, Oliver Dial, Jay M. Gambetta, Dario Gil, Zaira Nazario

HFBN: an energy efficient high performance hierarchical interconnection network for exascale supercomputer

In this paper by a team of researchers from Green University of Bangladesh (Bangladesh), King Faisal University (Saudi Arabia), and the Japan Advanced Institute of Science and Technology (Japan), researchers provide a new interconnection network for supercomputers. As supercomputers require a thousand times “performance improvement over the petascale computers and energy efficiency has attracted as the key factor for to achieve exascale system.” For example, “Fugaku supercomputer used Tofu interconnect with 7,299,072 cores and can achieve about 415PFLOPS requiring about 28,335kW power usage.” In this open access paper published in IEEE Access, researchers provide an overview of their “redesigned new energy efficient interconnection network that mitigates the problems of high power consumption, longer wiring length, and low bandwidth issues.” Compared to the Tofu network, “HFBN can obtain about 87.26% better energy efficiency with uniform traffic, about 86.32% with perfect shuffle traffic, and about 92.98% with the bit-compliment traffic at the zero load latency,” the researchers wrote.

Authors: Faiz Al Faisal, M. M. Hafizur Rahman, and Yasushi Inoguchi

Random offset block embedding for compressed embedding tables in deep learning recommendation systems

A team of Texas researchers from Rice University, West Texas A&M University, and ThirdAI Corp., an artificial intelligence startup company, introduced Random Offset Block Embedding Array (ROBE). Researchers created ROBE as “a low memory alternative to embedding tables, which provide orders of magnitude reduction in memory usage while maintaining accuracy and boosting execution speed.” According to the researchers, ROBE is an approach to “improving both cache performance and the variance of randomized hashing…” In this paper, researchers demonstrated they can “train DLRM models with the same accuracy while using 1000× less memory.” In addition, a “1000× compressed model directly results in faster inference without any engineering effort.” Notably, researchers trained a “DLRM model using ROBE Array of size 100MB on a single GPU to achieve AUC of 0.8025 or higher as required by official MLPerf CriteoTB benchmark DLRM model of 100GB while achieving about 3.1× (209%) improvement in inference throughput.”

Authors: Aditya Desai, Li Chou, and Anshumali Shrivastava 

Generalization in quantum machine learning from few training data

In this paper by an international, multidisciplinary team of researchers from the Technical University of Munich (Germany), Munich Center for Quantum Science and Technology (MCQST) (Germany), Caltech (California), Los Alamos National Laboratory (New Mexico), University of Maryland, College Park, (Maryland), and Quantum Science Center (Tennessee), the authors provided a detailed analysis of “generalization performance in quantum machine learning (QML) after training on a limited number N of training data points.” Researchers demonstrated “highly general theoretical bounds on the generalization error in variational QML: The generalization error is approximately upper bounded by √T/N.” In this open access article published in Nature Communications, researchers also demonstrated that “classification of quantum states across a phase transition with a quantum convolutional neural network requires only a very small training data set.” Researchers believe that their work also can be applied to learning quantum error correcting codes or quantum dynamical simulation applications.

Authors: Matthias C. Caro, Hsin-Yuan Huang, M. Cerezo, Kunal Sharma, Andrew Sornborger, Lukasz Cincio, and Patrick J. Coles 

Do you know about research that should be included in next month’s list? If so, send us an email at [email protected] . We look forward to hearing from you.

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

In this regular feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

The drumbeat around development of quantum computing continues to grow in mainstream media, as evidenced by a report in today’s Wall Street Journal (China Seeks a Quantum Leap in Computing). While timelines for practic Read more…

Intel is opening up its fabs for academic institutions so researchers can get their hands on physical versions of its chips, with the end goal of boosting semiconductor research and development. The effort, called the Read more…

In June 2020, the NSF awarded the National Center for Supercomputing Applications (NCSA) $10 million for its post-Blue Waters “Delta” supercomputer. Now, that funding has come to fruition: NCSA has announced that Del Read more…

For the better part of a century, General Motors (GM) was the biggest automaker in the world. Now, amid a paradigm shift toward smarter, electrified vehicles, the leading American automaker is working to meet the moment Read more…

Ten categories feature Amazon Web Services (AWS) in the 2022 HPCwire Readers’ Choice Awards. Read more…

Insurance is a highly regulated industry that is evolving as the industry faces changing customer expectations, massive amounts of data, and increased regulations. A major issue facing the industry is tracking insurance fraud. Read more…

Last week the Quantum Economic Development Consortium (QED-C) released a new report – Public-Private Partnerships in Quantum Computing – that calls for increased government-commercial collaboration, broadly describes Read more…

Intel is opening up its fabs for academic institutions so researchers can get their hands on physical versions of its chips, with the end goal of boosting semic Read more…

For the better part of a century, General Motors (GM) was the biggest automaker in the world. Now, amid a paradigm shift toward smarter, electrified vehicles, t Read more…

Last week the Quantum Economic Development Consortium (QED-C) released a new report – Public-Private Partnerships in Quantum Computing – that calls for incr Read more…

Intel's engineering roots saw a revival at this week's Innovation, with attendees recalling the show’s resemblance to Intel Developer Forum, the company's ann Read more…

Over the past five years, Intel has been iterating on its neuromorphic chips and systems, aiming to create devices (and software for those devices) that closely Read more…

The King Abdullah University of Science and Technology (KAUST) in Saudi Arabia has announced that HPE has won the bid to build the Shaheen III supercomputer. Sh Read more…

Intel shared its latest roadmap of programmable chips, and doesn't want to dig itself into a hole by following AMD's strategy in the area.  "We're thankfully not matching their strategy," said Shannon Poulin, corporate vice president for the datacenter and AI group at Intel, in response to a question posed by HPCwire during a press briefing. The updated roadmap pieces together Intel's strategy for FPGAs... Read more…

Intel has had trouble getting its chips in the hands of customers on time, but is providing the next best thing – to try out those chips in the cloud. Delayed chips such as Sapphire Rapids server processors and Habana Gaudi 2 AI chip will be available on a platform called the Intel Developer Cloud, which was announced at the Intel Innovation event being held in San Jose, California. Read more…

Nvidia is not interested in bringing software support to its GPUs for the RISC-V architecture despite being an early adopter of the open-source technology in its GPU controllers. Nvidia has no plans to add RISC-V support for CUDA, which is the proprietary GPU software platform, a company representative... Read more…

It is perhaps not surprising that the big cloud providers – a poor term really – have jumped into quantum computing. Amazon, Microsoft Azure, Google, and th Read more…

The U.S. Senate on Tuesday passed a major hurdle that will open up close to $52 billion in grants for the semiconductor industry to boost manufacturing, supply chain and research and development. U.S. senators voted 64-34 in favor of advancing the CHIPS Act, which sets the stage for the final consideration... Read more…

Amid the high-performance GPU turf tussle between AMD and Nvidia (and soon, Intel), a new, China-based player is emerging: Biren Technology, founded in 2019 and headquartered in Shanghai. At Hot Chips 34, Biren co-founder and president Lingjie Xu and Biren CTO Mike Hong took the (virtual) stage to detail the company’s inaugural product: the Biren BR100 general-purpose GPU (GPGPU). “It is my honor to present... Read more…

Tesla has revealed that its biggest in-house AI supercomputer – which we wrote about last year – now has a total of 7,360 A100 GPUs, a nearly 28 percent uplift from its previous total of 5,760 GPUs. That’s enough GPU oomph for a top seven spot on the Top500, although the tech company best known for its electric vehicles has not publicly benchmarked the system. If it had, it would... Read more…

Additional details of the architecture of the exascale El Capitan supercomputer were disclosed today by Lawrence Livermore National Laboratory’s (LLNL) Terri Read more…

HPCwire takes you inside the Frontier datacenter at DOE's Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tenn., for an interview with Frontier Project Direc Read more…

AMD is getting personal with chips as it sets sail to make products more to the liking of its customers. The chipmaker detailed a modular chip future in which customers can mix and match non-AMD processors in a custom chip package. "We are focused on making it easier to implement chips with more flexibility," said Mark Papermaster, chief technology officer at AMD during the analyst day meeting late last week. Read more…

The long-troubled, hotly anticipated MareNostrum 5 supercomputer finally has a vendor: Atos, which will be supplying a system that includes both Nvidia and Inte Read more…

The Universal Chiplet Interconnect Express (UCIe) consortium is moving ahead with its effort to standardize a universal interconnect at the package level. The c Read more…

Fusion, the nuclear reaction that powers the Sun and the stars, has incredible potential as a source of safe, carbon-free and essentially limitless energy. But Read more…

You may recall that efforts proposed in 2020 to remake the National Science Foundation (Endless Frontier Act) have since expanded and morphed into two gigantic bills, the America COMPETES Act in the U.S. House of Representatives and the U.S. Innovation and Competition Act in the U.S. Senate. So far, efforts to reconcile the two pieces of legislation have snagged and recent reports... Read more…

The steady maturation of MLCommons/MLPerf as an AI benchmarking tool was apparent in today’s release of MLPerf v2.1 Inference results. Twenty-one organization Read more…

After two-plus years of contentious debate, several different names, and final passage by the House (243-187) and Senate (64-33) last week, the Chips and Science Act will soon become law. Besides the $54.2 billion provided to boost US-based chip manufacturing, the act reshapes US science policy in meaningful ways. NSF’s proposed budget... Read more…

There’s a growing interest among silicon providers backing RISC-V to introduce 48-bit computing in custom chips to meet their specific requirements. The 48-bit long instructions focus is more as a middle ground between 32-bit and 64-bit, which has largely been the focus of chips and instruction sets until now. Read more…

Just a couple of weeks ago, the Indian government promised that it had five HPC systems in the final stages of installation and would launch nine new supercomputers this year. Now, it appears to be making good on that promise: the country’s National Supercomputing Mission (NSM) has announced the deployment of “PARAM Ganga” petascale supercomputer at Indian Institute of Technology (IIT)... Read more…

© 2022 HPCwire. All Rights Reserved. A Tabor Communications Publication

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.