iccsa-20-waves

git clone https://git.igankevich.com/iccsa-20-waves.git
Log | Files | Refs

commit 0e60291d685dbf32840d9422c4a53f771ac7d1d4
parent 43f818cb56412678f4ac4f2a4b7000e8368d99d6
Author: Ivan Gankevich <i.gankevich@spbu.ru>
Date:   Mon, 16 Mar 2020 15:32:21 +0300

Spell check.

Diffstat:
main.tex | 8++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/main.tex b/main.tex @@ -408,7 +408,7 @@ The algorithm for velocity potential solver is the following. \item First of all, we generate wavy surface, according to our solution and using wetted ship panels from the previous time step (if any). \item Second, we compute wetted panels for the current time step, which are - located under the surface calculated on the previos step. + located under the surface calculated on the previous step. \item Finally, we calculate Froude---Krylov forces, acting on a ship hull. \end{itemize} These steps are repeated in infinite loop. Consequently, wavy surface is @@ -421,7 +421,7 @@ surface grid is irregular (i.e.~we store a matrix of fluid particle positions that describe the surface), we compute the same formula for each point of the surface. It is easy to do with C++ for CPU computation, but it takes some effort to efficiently run this algorithm with GPU acceleration. Our first -naive implementation was ineffcient, but the second implementation that used +naive implementation was inefficient, but the second implementation that used local memory to optimise memory loads and stores works proved to be much more performant. @@ -430,7 +430,7 @@ Sequential storage order leads to sequential loads and stores from the global memory and greatly improves performance of the graphical accelerator. Second, we use as many built-in vector functions as we can in our computations, since they are much more efficient than manually written ones and compiler knows how -to optimise them. This also descreases code size and prevents possible mistakes +to optimise them. This also decreases code size and prevents possible mistakes in the manual implementation. Finally, we optimised how ship hull panels are read from the global memory. One way to think about panels is that they are coefficients in our model, as array of coefficients is typically read-only and @@ -445,7 +445,7 @@ and then proceeded to the next block. This approach allowed to achieve almost A distinctive feature of the local memory is that it has the smallest latency, at the same time sharing its contents between all computing units of the multiprocessor. Using local memory we reduce number of access to global -memory, which has a much bigger latency. As far as global memory bandwith +memory, which has a much bigger latency. As far as global memory bandwidth remains a bottleneck, this kind of optimisation would improve performance. To summarise, our approach to write code for graphical accelerators is the following: