commit 88a7005cec64e444836bf5703d9c6a374d88167d
Author: Ivan Gankevich <i.gankevich@spbu.ru>
Date: Mon, 31 May 2021 18:09:16 +0300
Initial.
Diffstat:
2 files changed, 57 insertions(+), 0 deletions(-)
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,40 @@
+
+# Created by https://www.toptal.com/developers/gitignore/api/vim,linux
+# Edit at https://www.toptal.com/developers/gitignore?templates=vim,linux
+
+### Linux ###
+*~
+
+# temporary files which can be created if a process still has a handle open of a deleted file
+.fuse_hidden*
+
+# KDE directory preferences
+.directory
+
+# Linux trash folder which might appear on any partition or disk
+.Trash-*
+
+# .nfs files are created when an open file is removed but is still being accessed
+.nfs*
+
+### Vim ###
+# Swap
+[._]*.s[a-v][a-z]
+!*.svg # comment out if you don't need vector files
+[._]*.sw[a-p]
+[._]s[a-rt-v][a-z]
+[._]ss[a-gi-z]
+[._]sw[a-p]
+
+# Session
+Session.vim
+Sessionx.vim
+
+# Temporary
+.netrwhist
+# Auto-generated tag files
+tags
+# Persistent undo
+[._]*.un~
+
+# End of https://www.toptal.com/developers/gitignore/api/vim,linux
diff --git a/abstract.txt b/abstract.txt
@@ -0,0 +1,17 @@
+Verifiable application-level checkpoint and restart framework for parallel computing
+
+Ivan Gankevich, Ivan Petriakov, Anton Gavrikov, Dmitry Tereschenko, Gleb Mozhaiskii
+
+Fault tolerance of parallel and distributed applications is one of the concerns
+that becomes topical for large computer clusters and large distributed systems.
+For a long time the common solution to this problem was checkpoint and restart
+mechanisms implemented on operating system level, however, they are inefficient
+for large systems and now application-level checkpoint and restart is considered
+as a more efficient alternative. In this paper we implement application-level
+checkpoint and restart manually for the well-known parallel computing benchmarks
+to evaluate this alternative approach. We measure the overheads introduced
+by creating and restarting from a checkpoint, and the amount of effort
+that is needed to implement and verify the correctness of the resulting programme.
+Based on the results we propose generic framework for application-level checkpointing
+that simplifies the process and allows to verify that the application
+gives correct output when restarted from any checkpoint.