Direct numerical simulation of turbulent flows requires highly detailed grids and huge computing resources. Modern GPUs have high floating-point instruction throughput and large memory bandwidth that makes them very suitable for computational fluid dynamics. Modeling complex flows demands extremely large grids. However, limitations on the device memory available on a single GPU makes it difficult to process complex models. Therefore, computations have to be distributed on a multi-GPU system. An efficient GPU-based parallel ADI algorithm and load balancing strategies for a multi-node system are proposed. A comprehensive performance analysis of the method is also presented for different input geometries.
|