iccsa-16-factory-extended

git clone https://git.igankevich.com/iccsa-16-factory-extended.git
Log | Files | Refs

commit 9acad5ab324056231e2ff923b1548b5d181bbafb
parent 37a2ec994299227ee5dbd3d38fa1efe41fd52e16
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Mon, 13 Feb 2017 11:18:36 +0300

Run throught ':Wordy weasel'.

Diffstat:
src/sections.tex | 44++++++++++++++++++++++----------------------
1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/src/sections.tex b/src/sections.tex @@ -4,28 +4,28 @@ To infer fault tolerance model which is suitable for big data applications we use bulk-synchronous parallel model~\citep{valiant1990bridging} as the basis. -This model assumes that a parallel programme is composed of several sequential -steps that are internally parallel, and global synchronisation of all parallel -processes occurs after each step. In our model all sequential steps are -pipelined where it is possible. The evolution of the computational model is -described as follows. - -Given a programme that is sequential and large enough to be decomposed into -several sequential steps, the simplest way to make it run faster is to exploit -data parallelism. Usually it means finding multi-dimensional arrays and loops -that access their elements and trying to make them parallel. After transforming -several loops the programme will still have the same number of sequential -steps, but every step will (ideally) be internally parallel. - -After that the only possibility to speedup the programme is to overlap execution -of code blocks that work with different hardware devices. The most common -pattern is to overlap computation with network or disk I/O. This approach makes -sense because all devices operate with little synchronisation, and issuing -commands in parallel makes the whole programme perform better. This behaviour -can be achieved by allocating a separate task queue for each device and -submitting tasks to these queues asynchronously with execution of the main -thread. After this optimisation the programme will be composed of several steps -chained into the pipeline, each step is implemented as a task queue for a +This model assumes that a parallel programme is composed of a number of +sequential steps that are internally parallel, and global synchronisation of +all parallel processes occurs after each step. In our model all sequential +steps are pipelined where it is possible. The evolution of the computational +model is described as follows. + +Given a programme that is sequential and large enough to be decomposed into a +number of sequential steps, the simplest way to make it run faster is to +exploit data parallelism. Usually it means finding multi-dimensional arrays and +loops that access their elements and trying to make them parallel. After +transforming the loops the programme will still have the same number of +sequential steps, but every step will (ideally) be internally parallel. + +After that the only possibility to speedup the programme is to overlap +execution of code blocks that work with different hardware devices. The most +common pattern is to overlap computation with network or disk I/O. This +approach makes sense because all devices operate with little synchronisation, +and issuing commands in parallel makes the whole programme perform better. This +behaviour can be achieved by allocating a separate task queue for each device +and submitting tasks to these queues asynchronously with execution of the main +thread. After this optimisation the programme will be composed of a number of +steps chained into the pipeline, each step is implemented as a task queue for a particular device. Pipelining of otherwise sequential steps is beneficial not only for the code