first page - iccsa-20-waves

commit 41a10483db6ab7ae5514d28b1354a1db7764bc55
parent 3983a887d6b3503214d3e8b24f01375856a5a9ea
Author: Петряков Иван <franceskoizump@gmail.com>
Date:   Sun,  8 Mar 2020 23:08:30 +0300

first page

Diffstat:
main.tex  | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 54 insertions(+), 0 deletions(-)
diff --git a/main.tex b/main.tex
@@ -49,6 +49,60 @@ TODO
 
 \section{Introduction}
 \section{Methods}
+ 
+Virtual testbed is a program for personal computers. 
+Its main feature is to perform all calculations in real time, 
+ paying attention to the high accuracy of calculations. 
+This is achieved by using graphical accelerator.
+Generating Gerstner waves isn't an exception. 
+We implement algorithm for GPU, using OpenCL framework, 
+ and regular CPU, with the ability to parallelization, using OpenMP framework. 
+
+This algorithm consists of several parts.
+First of all, we calculate wavy surface, according to our approach.
+Then, we compute wetted panels, which are located under the calculated surface.
+Finally, we find the buoyancy force, acting on a ship.
+These steps are repeated in infinity loop, and this is how we get things worked.
+
+Let's consider process of computing wavy surface in more details.
+Since we have an irregular structure of surface 
+ (it means, that we store set of points, describing surface), 
+ we just need to perform same formulas for each point of surface.
+It is easy to do with C++ for CPU computation, but it takes some effort 
+ to effectively run this algorithm with GPU acceleration. 
+Our first implementations was quiet slow, when we had about five iterations of global loop,
+ but now it is much more.
+
+Storage order is very important for GPU architecture. 
+Those algorithms are efficient, which are with sequential memory access. 
+In this way, we store our set of points in sequential order: one by one.
+It is very obviosly statement, but we need it to keep in mind.
+The next feature, that we use to increase performance, was built-in vector functions.
+So, we don't need to implement custom vector functions to work with our large set of vectors,
+ and it leads to decreasing size of code and possible mistakes.
+Besides, these functions are very fast, and that is how we get there acceleration.
+The third feature, is cache managment. 
+Unlike CPU, GPU allows programmers to control it's own kind of L3 cache 
+ (more precicely -- part of L3 cache), that is called "shared memory". 
+Moreover, in most cases, among of any algorithms, we have to manage shared memory to accelerate them.
+A distinctive point of this kind of memory is that this memory has the smallest latency, 
+ at the same time sharing data between some others computing unit,
+As far as, memory bandwith remains a bottleneck, this kind of optimization would fit any situations.
+In our case, summation occurs over the surface of the ship,
+ so we copy small pieces of it to shared memory. 
+By this action we reduce number of access to global memory, which has a much bigger latency.
+Following these simple rules, we can easily implement efficient algorithm. 
+All we have to do is: 
+ check storage order; 
+ include vector operations, as much, as possible; 
+ and finally, manage shared memory.
+
+
+
+
+
+
+
 \section{Results}
 \section{Discussion}
 \section{Conclusion}

	iccsa-20-waves
	git clone https://git.igankevich.com/iccsa-20-waves.git
	Log \| Files \| Refs