Many scientific and engineering fields employ computer simulations of specific phenomena to help solve complex problems. Supercomputers and other high performance computing machines are regularly used to perform these scientific simulations. The resulting data then needs to be analyzed and visualized, which is difficult when the data is large. One approach to producing visualizations faster is to generate them in parallel. Many challenges remain, though, when attempting to analyze and visualize large data in parallel, while maintaining good performance and scalability.
The size of the data is one challenge. When data size becomes very large, the I/O overhead from loading the data becomes a bottleneck, which could hinder performance. In addition, some visualization algorithms have unknown communication and computational load, which results in poor workload distribution and load balancing. This load imbalance hinders overall scalability. Another possible reason for poor parallel performance is that the method does not take advantage of the specific hardware architecture of the host machine.
In order to meet these challenges, we present methods to parallelize several visualization techniques. First, a scalable shared memory rendering technique was found by adapting established parallel rendering methods to a shared memory architecture. Three rasterization methods, including sort-first, sort-last, and a hybrid method, were tested on a large shared-memory machine. Next, parallel streamline generation in static flow fields, due to the nature of the problem, suffers from high load imbalance. To make the computation more load balanced, we analyzed the flow field and estimated the workload of each block in the flow field. A load balanced partitioning of data blocks was then computed from this workload estimation. In our tests, we were able to scale up to thousands of processes while using hundreds of thousands of seeds. For time-varying flow fields, the Finite-Time Lyapunov Exponent (FTLE) has proven to be a powerful analysis tool. In order to achieve scalable parallel FTLE computation, we divided all available processes into several groups, and pipelined particles through these process groups. This pipelining structure resulted in faster I/O and computation times. Using this technique, we were able to advect millions of seeds and scale up to tens of thousands of processes.