iccsa-17-gpulab

Acceleration of Computing and Visualization Processes with OpenCL for Standing Sea Wave Simulation Model
git clone https://git.igankevich.com/iccsa-17-gpulab.git
Log | Files | Refs

head.tex (7354B)


      1 \section{Introduction}
      2 
      3 In most cases, visualisation of scientific data obtained during simulation or computation process is carried out separately, after all the stages of calculation are completed. This fact is connected with a rather large number of factors: computation process could be executed with CPU-only nodes (however, this should not be considered as a problem since the Mesa 3D graphics library exists); the goal to complete a task as fast as possible could have greater priority, thus, all available resources would be used for this; data could be just difficult to process and synchronise during in runtime, if such a scenario was not assumed by the software solution.
      4 
      5 However, on the other hand, during the computation process, especially the long one, there is a need to monitor for performed operations and obtained results. In the case where such a control is possible the calculations could be suspended and the necessary tweaks made with a timely response. This kind of action may also be needed while debugging the program or testing a new mathematical model. Thus, in this article we will consider the possibility of interactive control organisation for calculations with a terms of visualisation, which will be performed using the OpenGL API.
      6 
      7 Another possible scenario of this approach is an interaction with simulated objects and processes in real-time. So, you can change the initial conditions, add new environmental parameters and observe the system's response immediately from the moment of the effect's influence beginning to its end. In the framework of ocean wave simulation this has educational value, as the effect of every change in the input parameter is immediately visible. In addition to this, instantaneous visualisation of ocean wavy surface brings simulation to a new level, where dynamically changing parameters to arbitrary values within predefined ranges allows to visually verify the model and its numerical code.
      8 
      9 For the experiment we have chosen an autoregressive model of standing waves within framework of which we have accelerated velocity potential field computation with the usage of GPGPU technology through the OpenCL framework. Since, data structures that are needed for visualisation are already stored in GPU memory, we take into account that fact and remove unnecessary copying between host and device using OpenGL/OpenCL interoperability API.
     10 
     11 \section{Related work}
     12 
     13 The idea of mixing various computing APIs is not a fresh one, also including for OpenGL/OpenCL interoperability. Nvidia has announced the support for this technology in the 2011, and since then we have been able to observe the related solutions appeared on the market.
     14 
     15 The idea of compute API usage was widely spread and adopted by entertainment  industry, especially in game development sphere. Ever since the popularisation of the PhysX engine, which uses the capabilities of graphics cards to simulate a certain set of physical phenomena, it was clear that such a technology will find a place to be applied in the future \cite{geer:vut}. So, for today, almost all heavy dynamic particle systems you can met are using one of the general compute APIs \cite{unity:compute}.
     16 
     17 Another area of general usage where this technology was introduced is a computer vision. Industry standard library OpenCV has a special OCL module originally provided by AMD, which enables the acceleration for various algorithms, including ones for matrix transformations, image filtering and processing, object detection and many more~\cite{opencv:opencl}.
     18 
     19 However, the situation with scientific calculation is absolutely different. Unlike the entertainment software where the vast of the scene contents in most cases are generated in advance and the number of processes is strictly limited to ones affecting the environment at the current moment, for the scientific simulations almost everything should be calculated from scratch based only on given initial conditions, including the geometry (if it is meant by the process), the particle system, visual effects, etc. In addition, simulations by themselves are much more complicated, and the optimisation for dynamically forming geometric structures is quite difficult to perform. 
     20 
     21 There is a way to achieve more efficient usage of resources while performing the visualisation of the computational experiment results, which are involving GPGPU to speed up the computation, and it is directly related to the accelerator exploitation. Thus, when graphics cards are used to compute, it is obvious that the data is already allocated on its memory. So, if there is a way to transform the data to the format that can be used by the graphic API, and also the way to transfer it between the compute and graphics contexts, then we will not have to copy it from the memory of GPU to the RAM and vice versa. Thus, the usage of OpenCL, CUDA, or any other compute API can boost the performance not only for the calculations themselves, but also for the results rendering. Thus, we have reviewed several papers which are referring to the stated problem to get a glance whether this approach could be used for the optimisation in our case.
     22 
     23 One of the most interesting cases was shown by the research group from Boston Northeastern University \cite{ukidave:2014:PEO}. The OpenCL/OpenGL interoperability was applied to five completely different applications related to the different study areas. For example, one of them is a Material Fault Detection program used for fault detection using wave propagation in anisotropic materials, which produces material layer surfaces. They have used a slot-based rendering technique, which means that data is precomputed for several frames in advance before passing it to the rendering context. As a result, they have been able to obtain 2.2 more frames in average with discrete AMD GPUs Radeon 7770 and Radeon 7970 and 1.9 more with AMD Fusion A8 APU. 
     24 
     25 As for the image processing, we can refer to~\cite{liao:2012:GPC}, where the interop technique is used for panorama video image stitching. According to the provided metrics the best result is achieved with involving two buffers for both OpenCL computations and OpenGL rendering, and it is 12 times faster compared to the original CPU based solution. The paper is also showing that the proposed solution is scaled really well when the additional image capturing devices are attached to the system. We should also notice an another closely related case presented by Samsung at SIGGRAPH'13, which talks in general about real time video stream processing on mobile platforms captured by camera module with the usage of OpenCL and OpenGL ES interoperability~\cite{bucur:2013:OOE}.
     26 
     27 All the examples of GPGPU API interoperability mentioned above are proving that the proposed approach could be applied for the various set of problems to achieve the significant results in optimisation of calculations and visualisation itself. However, it should be said that the following solutions has been applied for the particular cases and not showing any general solution, which could be treated as a specificity of the interoperability method. During the study we will try to point out the major improvements made by research groups to complete our solution in a most optimal way.
     28 
     29 %\cite{michal:2011:OOA}
     30 %\cite{moulik:2011:RGC}
     31