*Result*: Acceleration of Eulerian multi-material methods on highly parallel compute architectures

Title:
Acceleration of Eulerian multi-material methods on highly parallel compute architectures
Authors:
Contributors:
Nikiforakis, Nikolaos, Blakely, Philip
Publisher Information:
University of Cambridge, 2022.
Publication Year:
2022
Collection:
University of Cambridge
Document Type:
*Dissertation/ Thesis* Electronic Thesis or Dissertation
Language:
English
DOI:
10.17863/CAM.87165
Accession Number:
edsble.862017
Database:
British Library EThOS

*Further Information*

*The aims of this thesis are to develop a framework, which we have named Ripple, for the efficient execution of general purpose computations on modern heterogeneous compute architectures, with a focus on multiple graphics processing units (GPUs), as well as to develop algorithms for the numerical simulation of multiple interacting materials on modern, massively parallel, computer hardware. The Ripple framework is applicable to a wide range of HPC problems, allowing programmers to concentrate on algorithm design, while the framework takes the high- level domain application logic and executes it with near optimal performance across all devices, handling data layout transformations, inter-device communication, and optimal scheduling of the computation sub-stages. To demonstrate the effectiveness of the framework, we develop efficient parallel algorithms for numerical schemes commonly used in finite-volume methods, the solution of the Eikonal equation in a narrow band, the Ghost Fluid Method (GFM), and signed-distance function generation from multiple external geometry file types. Additionally, we present the main problems involved in scaling multi-material interaction simulations across multiple GPUs, and provide solutions to these problems which allow multi-material simulations to scale almost linearly with the number of compute devices, enabling the simulation of prob- lems on massive domains. We also demonstrate how the Ripple framework improves on existing work in terms of performance and simplification of software development, and enables solutions to execute across multiple GPUs. The algorithms which we present are built around well developed methods for multi-material interaction. These methods currently require vast computational resources for simulation on domains of modest size-even when well developed adaptive mesh refinement (AMR) tech- niques are used-since they do not make use of modern GPUs. The work presented in this thesis demonstrates the benefits of utilising modern hardware, particularly GPUs, reduces com- putational time. To ensure that our algorithms are correct, we validate the developed code on standard test cases used for finite-volume methods, as well as for multi-material interaction problems in two- and three-dimensions involving gas-gas, gas-liquid, and gas-liquid-solid inter- faces. We then apply the developed techniques to novel, unvalidated real-world use cases to demonstrate the utility of the techniques. We performed comparison simulations using multi-GPU unigrid with the developed frame- work and multi-core CPU adaptive mesh refinement using existing implementation which have demonstrated good performance. While these are different algorithms executed on different hardware, the cost of execution per hour using a single NVIDIA A100 GPU and a 48 core Intel Xeon Cascade Lake is effectively equivalent on current cloud providers, such as AWS. This com- parison demonstrates the improvements in both performance and cost which can be achieved by adapting current state of the art scientific computing algorithms which have been designed for CPU execution, for execution on the GPU. For two-dimensional multi-material simulations, our framework is able to achieve up to a 24x improvement in performance and 3x reduction in cost using 8 GPUs without adaptive mesh refinement compared with a 48 core CPU implemen- tation using adaptive mesh refinement. While multi-GPU unigrid and multi-core CPU AMR are different algorithms executed on different hardware, the cost of execution for The algorithms developed in this thesis allow strong scaling of 6.95x across 8 GPUs, and the Ripple framework makes performance gains of this magnitude accessible to other computationally demanding scientific domains, with minimal effort. For novel three-dimensional blast problems involving complex geometries, we show that our signed-distance function generation for such geometries using the Ripple framework can be performed multiple orders of magnitude faster on the GPU than on the CPU, and that the overall simulation can be performed up to 35x faster on a single GPU without AMR than when using a 32 core CPU implementation with AMR, at a reduction in cost of 22x.*