hpcs-17-collector

git clone https://git.igankevich.com/hpcs-17-collector.git
Log | Files | Refs

commit 0dfb3d4cbfa1be28f4ff6500cd5570c2ddc00f25
parent aa4e8affaa848c20e440ceb5d32da26afe99ec09
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Sat, 20 May 2017 10:20:46 +0300

Final corrections. Remove page numbers.

Diffstat:
.gitignore | 1+
Makefile | 16++++++++++++++++
main.tex | 13+++++++------
src/main_text.tex | 22++++++++++++++++++----
src/references.bib | 4++++
5 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/.gitignore b/.gitignore @@ -0,0 +1 @@ +/build/ diff --git a/Makefile b/Makefile @@ -0,0 +1,16 @@ +build/main.pdf: *.tex *.bib build src/* + latexmk \ + -interaction=nonstopmode \ + -output-directory=build \ + -pdf \ + -bibtex \ + -shell-escape \ + -quiet \ + -f main.tex + + +build: + mkdir -p build + +clean: + rm -rf build diff --git a/main.tex b/main.tex @@ -1,4 +1,6 @@ -\documentclass[conference]{IEEEtran} +\documentclass{IEEEtran} +\pagestyle{empty} % remove page numbers + \usepackage{cite} \usepackage{graphicx} \graphicspath{{./graphics/}} @@ -13,10 +15,10 @@ \begin{document} -\title{Using virtualisation for reproducible research and code portability} +\title{Using Virtualisation For Reproducible Research And Code Portability} \author{% - \IEEEauthorblockN{Svetlana Sveshnikova \quad Ivan Gankevich} + \IEEEauthorblockN{Svetlana Sveshnikova \quad Ivan Gankevich\\} \IEEEauthorblockA{% Dept. of Computer Modeling and Multiprocessor Systems\\ Saint Petersburg State University\\ @@ -25,13 +27,12 @@ }% }% -\IEEEspecialpapernotice{(poster extended abstract)} - - +\IEEEspecialpapernotice{\normalfont{\textbf{EXTENDED ABSTRACT}}} \maketitle \IEEEpeerreviewmaketitle +\thispagestyle{empty} % remove page numbers \input{src/abstract} diff --git a/src/main_text.tex b/src/main_text.tex @@ -18,7 +18,7 @@ There are several stages on each of which (ideally) there should be a tool that \item publication stage (writing and publishing the paper with all the data, graphs and the source code included). \end{itemize} -In this proposal we deal with operating system and software stages~--- automate creation of environment to compile and run the programme in. For this purpose we use lightweight virtualisation technologies (Linux namespaces) on the example of distributed batch processing programme that runs on a cluster of nodes and processes the data in parallel. Our tool, called \emph{Collector}, creates root file system with the specified version of Linux distribution, the compiler and all the dependent packages. Then it compiles and runs the source code inside this virtual environment. The resulting root file system is portable to any platform with the same processor architecture and compatible kernel version. +In this proposal we deal with operating system and software stages~--- automate creation of environment to compile and run the programme in. For this purpose we use lightweight virtualisation technologies (Linux namespaces) on the example of distributed batch processing programme that runs on a cluster of nodes and processes the data in parallel. Our tool, called \emph{Collector}, creates root file system with the specified version of Linux distribution, the compiler and all the dependent packages. Then it compiles and runs the source code inside this virtual environment. The resulting root file system is portable across any platform with the same processor architecture and compatible kernel version. The advantages of using raw file system over opaque operating system images are clear: \begin{itemize} @@ -27,6 +27,11 @@ The advantages of using raw file system over opaque operating system images are \item It can be directly patched/upgraded by changing the current root directory to the path of the raw file system. \end{itemize} +The objective of the study reported here is to develop a tool that automates +creation of such portable environments and makes building particular source +code inside it repeatable regardless of underlying operating system. This is +the first publication on this tool. + %One of the problems in science it is verifying of researches. In many fields reproducibility of research requires time, resources, some materials and many other. Computer science has more advantages for that. We do not need chemical reactive or expensive equipment. There are can define next parts: hardware configuration, software environment and useful data. Hardware structure can very complex and exclusive, but for more easy cases you can use virtualization for simulate need platform. Possible of reproducibility increase interest in you research and also level of confidence. %Computer science is most easy to reproducibility. You can use virtualisation for simulate need configuration and source code from repository. @@ -72,7 +77,17 @@ Root file system that was created during the first run is saved, and subsequent %при запуске создается виртуальное пространство имен, в котором происходит вся работа. когда при первом запуске происходит настройка окружения и установка программ вам не нужны права администратора, т.к. в виртуальном пространстве вы сами себе root. используется механизм cgroups для создания виртуального окружения. минус - для каждой платформы нужно создавать свой контейнер. рассмотреть как создается виртуальное пространство имен. тонкости change_root -In the experiment we compile and run the test programme two times. During the first run Collector downloads and installs all the dependencies before compiling and running, and during the second run it only checks that dependencies are satisfied. After that it compiles the programme and runs tests. The experiment showed that initialising a separate root file system takes considerable amount of time compared to the execution time of tests, whereas subsequent runs are faster as they use already initialised environment (Table~\ref{tab:actions}). Performance-wise it would be more efficient to store read-only base image of the operating system in cache directory and use Union/Overlay file system to mount it under writable directory to reduce initialisation time. +In the experiment we compile and run the test programme~\cite{spec-factory} two +times. During the first run Collector downloads and installs all the +dependencies before compiling and running, and during the second run it only +checks that dependencies are satisfied. After that it compiles the programme +and runs tests. The experiment showed that initialising a separate root file +system takes considerable amount of time compared to the execution time of +tests, whereas subsequent runs are faster as they use already initialised +environment (Table~\ref{tab:actions}). Performance-wise it would be more +efficient to store read-only base image of the operating system in cache +directory and use Union/Overlay file system to mount it under writable +directory to reduce initialisation time. %do view of my experiment %Containerization it is good idea, but it also need good implementation. One of those tools -- Docker -- has using in many commercial projects. Docker allow to put any software tool in container that may be running on your system. You just download it and start works! @@ -115,4 +130,4 @@ There are a number of potential problems that are related to lightweight virtual One of the problems in research reproducibility is the absence of tools to reproduce specified operating system with specific version of the software installed. Lightweight virtualisation technologies is a solution to this problem, that uses unprivileged Linux namespaces to create such execution environment in a separate root file system directory and package it together with the source code of the programme and its binary form. The solution does not pollute host operating system with programme dependencies and does not require super user privileges to create the environment. The future work is to investigate how network Linux namespace and control groups can improve application execution inside the environment. -%\section{References}- \ No newline at end of file +%\section{References} diff --git a/src/references.bib b/src/references.bib @@ -52,3 +52,7 @@ } +@misc{spec-factory, + title={Spectrum processing programme}, + howpublished={\url{https://bitbucket.org/igankevich/spec-factory}} +}