csit-15-vsc-exp

git clone https://git.igankevich.com/csit-15-vsc-exp.git
Log | Files | Refs

commit 94136db86e092c138eb63ba8391363ab50cf9747
parent 536e578c3ce76981aeb49be792ee41ab5a9f1aa0
Author: Ivan Gankevich <igankevich@ya.ru>
Date:   Wed, 10 Aug 2016 23:09:13 +0300

Generate makefile.

Diffstat:
csit-15-vsc-exp.tex | 82+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
main.tex | 82-------------------------------------------------------------------------------
makefile | 28++++++++++++++++++++++++++++
vsc.bib | 1-
4 files changed, 110 insertions(+), 83 deletions(-)

diff --git a/csit-15-vsc-exp.tex b/csit-15-vsc-exp.tex @@ -0,0 +1,81 @@ +\documentclass{CSITproc} +\usepackage{booktabs} +\usepackage{url} +\graphicspath{ {graphics/} } +\input{header} + +\begin{document} +\maketitle + +\begin{abstract} +Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomuputer (VPS), a virtual computing environment tailored specifically for a target user with a particular target application. Virtualization is one of the cornerstone technologies that helps shaping resources to what is needed by actual users by providing as much as needed when it is needed. However, new issues arise when working with large-scale applications that require large amounts of resources working together. We describe and evaluate possibilities to create the VPS based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management. + +\end{abstract} +\keywords{Virtual cluster, application container, job scheduling, virtual network, high-performance computing.} + +\section{Introduction} +\label{sec:intro} + +Virtualization refers to the act of creating a virtual version of an object, including but not limited to a virtual computer hardware platform, operating system (OS), storage device, or computer network resources~\cite{wiki-viz}. It can be divided onto several types, and each one has its own pros and cons. Generally, hardware virtualization refers to abstraction of functionalities from physical devices. Nowadays, on modern multicore systems with powerful hardware it is possible to run several virtual guest operating systems on a single physical node. In a usual computer system, a single operating system uses all available hardware resources (CPU, RAM, etc), whilst virtualized system can use a special layer that spreads low-level resources to several systems or applications; this layer looks like a real machine for launched applications. + +Virtualization technologies facilitate creation of a virtual supercomputer or virtual clusters that are adapted to problem being solved and help to manage processes running on these clusters (see Figure~\ref{fig:fig-vc}). The work described in this paper continues and summarizes our earlier research presented in~\cite{vworkspace,gankevich-ondemand2015,vsc-csit13,vsc-iccsa14,vkorkhov-iccsa15}. + +\begin{figure} +\includegraphics[width=8cm]{fig-vcluster.pdf} +\caption{A testbed example with a set of virtual clusters over several physical resources.} +\vspace{60pt} +\label{fig:fig-vc} +\end{figure} + +In our experience, the main benefit of virtualization for high-performance computing is structural decomposition of a distributed system into unified entities -- virtual machines or application containers -- which simplifies maintenance of the system. A new entity can be created for each new version of the application with optimal configuration and set of libraries, so that multiple versions of the same software may co-exist and run on the same physical cluster. Entities can be copied or efficiently shared between different physical machines to create private cluster for each application run. + +In our experience, virtualization sometimes gives increase in application performance, however, it is not easy achievable. Allocating a separate container for each application allows compiling it for hybrid GPGPU systems which may or may not improve performance of an application. However, such optimizations are possible even without application containers. Full virtualization gives an option of choosing the right operating system for an application, but gives constant decrease in performance due to overheads, which is not tolerable for large-scale parallel applications. + +Thus, for high-performance computing virtualization is a tool that helps manage parallel and distributed applications running on physical cluster. It allows different versions of the same libraries and operating systems to co-exist and to be used as environments for running applications that depend on them. + +In this work we evaluate the capabilities given by different approaches and virtualization technologies to build a computational environment with configurable computation (CPU, memory) and network (latency, bandwidth) characteristics, which we call Virtual Private Supercomputer (VPS)~\cite{vsc-csit13}. Such configuration enables flexible partitioning of available physical resources between a number of concurrent applications utilizing a single infrastructure. Depending on application requirements and priorities of execution each application can get a customized virtual environment with as much resources as it needs or is allowed to use. + +Section~\ref{sec:relwork} gives an overview of related work in the area of virtualization applied to high-performance computing. +%Section~\ref{sec:full} describes our experience in using full virtualization for HPC problems. +Section~\ref{sec:cont} presents the approaches to use light-weight virtualization to build the virtual computing environment along with some results of its experimental evaluation. Section~\ref{sec:discussion} discusses the experience and observed experimental results; and Section~\ref{sec:conclusion} concludes the paper. + +\section{Related work} +\label{sec:relwork} + +Research works on the subject of virtual clusters can be divided into two broad groups: works dealing with provisioning and deploying virtual clusters in high performance environment or GRID and works dealing with overheads of virtualization. Works from the first group typically assume that virtualization overheads are low and acceptable in high performance computing, and works from the second group in general assume that virtualization has some benefits for high performance computing. + +In~\cite{chen2009efficient} authors evaluate overheads of the system for on-line virtual cluster provisioning (based on QEMU/KVM) and different resource mapping strategies used in this system and show that the main source of deploying overhead is network transfer of virtual machine images. To reduce it they use different caching techniques to reuse already transferred images as well as multicast file transfer to increase network throughput. Simultaneous use of caching and multicasting is concluded to be an efficient way to reduce overhead of virtual machine provisioning. + +In~\cite{ye2010analyzing} authors evaluate general overheads of Xen para‑virtualization compared to fully virtualized and physical machines using HPCC benchmarking suite. They conclude that an acceptable level of overheads can be achieved only with para‑virtualization due to its efficient inter domain communication (bypassing dom0 kernel) and absence of high L2 cache miss rate when running MPI programs which is common to fully virtualized guest machines. + +In contrast to these two works the main principles of our approach can be summarized as follows. Do not use full or para‑virtualization of the whole machine but use virtualization of selected components so that overheads occur only when they are unavoidable (i.e. do not virtualize processor). Do not transfer opaque file system images but mount standard file systems over the network so that only minimal transfer overhead can occur. Finally, amend standard task schedulers to work with virtual clusters so that no programming is needed to distribute the load efficiently. These principles are paramount to make virtualization light-weight and fast. + +%\input{fullvirt} +\input{containers} + +%\section{Experience with distributed schedulers} +%TODO (bonus section, can be omitted) + +\section{Discussion} +\label{sec:discussion} + +Light-weight container-based virtualization is the most promising technology for using as an enabling part of the virtual supercomputer concept~\cite{vsc-csit13,vsc-iccsa14} to ensure proper and efficient distribution of resources between several applications. Knowing the application demands in advance we can create appropriate infrastructure configuration giving just as much resources as needed to each particular instance of a virtual supercomputer running a particular application. In such a way, free resources can be controlled and granted to other applications without negative effect on current executions with minimal overhead. + +\section{Conclusion} +\label{sec:conclusion} + +Presented approach for creating virtual clusters from Linux containers was found to be efficient and its performance comparable to ordinary physical cluster. +%: not only usage of containers does not incur significant virtualization processor overheads but also network virtualization overheads can be totally removed if host's network name space is used and network bandwidth saved by automatically transferring only those files that are needed through network-mounted file system rather than the whole images. +From the point of view of system administrator, storing each HPC application in its own container makes versioning and dependencies control easy manageable and their configuration does not interfere with the configuration of host machines and other containers. +Usage of standard virtualization technologies can improve overall behavior of a distributed system and adapt it to problems being solved. In that way virtual supercomputer can help people efficiently run applications and focus on domain-specific problems rather than on underlying computer architecture and placement of tasks. + +\section{Acknowledgement} +The research presented in this paper was carried out using +computational resources of Resource Center ``Computer Centre of +Saint-Petersburg State University'' with support of grants of +Russian Foundation for Basic Research (project no.~\mbox{13-07-00747}) and Saint Petersburg State University (projects \mbox{9.38.674.2013}, \mbox{0.37.155.2014}). + +\bibliography{vsc}{} +\bibliographystyle{plain} + +\end{document}+ \ No newline at end of file diff --git a/main.tex b/main.tex @@ -1,81 +0,0 @@ -\documentclass{CSITproc} -\usepackage{booktabs} -\usepackage{url} -\graphicspath{ {graphics/} } -\input{header} - -\begin{document} -\maketitle - -\begin{abstract} -Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomuputer (VPS), a virtual computing environment tailored specifically for a target user with a particular target application. Virtualization is one of the cornerstone technologies that helps shaping resources to what is needed by actual users by providing as much as needed when it is needed. However, new issues arise when working with large-scale applications that require large amounts of resources working together. We describe and evaluate possibilities to create the VPS based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management. - -\end{abstract} -\keywords{Virtual cluster, application container, job scheduling, virtual network, high-performance computing.} - -\section{Introduction} -\label{sec:intro} - -Virtualization refers to the act of creating a virtual version of an object, including but not limited to a virtual computer hardware platform, operating system (OS), storage device, or computer network resources~\cite{wiki-viz}. It can be divided onto several types, and each one has its own pros and cons. Generally, hardware virtualization refers to abstraction of functionalities from physical devices. Nowadays, on modern multicore systems with powerful hardware it is possible to run several virtual guest operating systems on a single physical node. In a usual computer system, a single operating system uses all available hardware resources (CPU, RAM, etc), whilst virtualized system can use a special layer that spreads low-level resources to several systems or applications; this layer looks like a real machine for launched applications. - -Virtualization technologies facilitate creation of a virtual supercomputer or virtual clusters that are adapted to problem being solved and help to manage processes running on these clusters (see Figure~\ref{fig:fig-vc}). The work described in this paper continues and summarizes our earlier research presented in~\cite{vworkspace,gankevich-ondemand2015,vsc-csit13,vsc-iccsa14,vkorkhov-iccsa15}. - -\begin{figure} -\includegraphics[width=8cm]{fig-vcluster.pdf} -\caption{A testbed example with a set of virtual clusters over several physical resources.} -\vspace{60pt} -\label{fig:fig-vc} -\end{figure} - -In our experience, the main benefit of virtualization for high-performance computing is structural decomposition of a distributed system into unified entities -- virtual machines or application containers -- which simplifies maintenance of the system. A new entity can be created for each new version of the application with optimal configuration and set of libraries, so that multiple versions of the same software may co-exist and run on the same physical cluster. Entities can be copied or efficiently shared between different physical machines to create private cluster for each application run. - -In our experience, virtualization sometimes gives increase in application performance, however, it is not easy achievable. Allocating a separate container for each application allows compiling it for hybrid GPGPU systems which may or may not improve performance of an application. However, such optimizations are possible even without application containers. Full virtualization gives an option of choosing the right operating system for an application, but gives constant decrease in performance due to overheads, which is not tolerable for large-scale parallel applications. - -Thus, for high-performance computing virtualization is a tool that helps manage parallel and distributed applications running on physical cluster. It allows different versions of the same libraries and operating systems to co-exist and to be used as environments for running applications that depend on them. - -In this work we evaluate the capabilities given by different approaches and virtualization technologies to build a computational environment with configurable computation (CPU, memory) and network (latency, bandwidth) characteristics, which we call Virtual Private Supercomputer (VPS)~\cite{vsc-csit13}. Such configuration enables flexible partitioning of available physical resources between a number of concurrent applications utilizing a single infrastructure. Depending on application requirements and priorities of execution each application can get a customized virtual environment with as much resources as it needs or is allowed to use. - -Section~\ref{sec:relwork} gives an overview of related work in the area of virtualization applied to high-performance computing. -%Section~\ref{sec:full} describes our experience in using full virtualization for HPC problems. -Section~\ref{sec:cont} presents the approaches to use light-weight virtualization to build the virtual computing environment along with some results of its experimental evaluation. Section~\ref{sec:discussion} discusses the experience and observed experimental results; and Section~\ref{sec:conclusion} concludes the paper. - -\section{Related work} -\label{sec:relwork} - -Research works on the subject of virtual clusters can be divided into two broad groups: works dealing with provisioning and deploying virtual clusters in high performance environment or GRID and works dealing with overheads of virtualization. Works from the first group typically assume that virtualization overheads are low and acceptable in high performance computing, and works from the second group in general assume that virtualization has some benefits for high performance computing. - -In~\cite{chen2009efficient} authors evaluate overheads of the system for on-line virtual cluster provisioning (based on QEMU/KVM) and different resource mapping strategies used in this system and show that the main source of deploying overhead is network transfer of virtual machine images. To reduce it they use different caching techniques to reuse already transferred images as well as multicast file transfer to increase network throughput. Simultaneous use of caching and multicasting is concluded to be an efficient way to reduce overhead of virtual machine provisioning. - -In~\cite{ye2010analyzing} authors evaluate general overheads of Xen para‑virtualization compared to fully virtualized and physical machines using HPCC benchmarking suite. They conclude that an acceptable level of overheads can be achieved only with para‑virtualization due to its efficient inter domain communication (bypassing dom0 kernel) and absence of high L2 cache miss rate when running MPI programs which is common to fully virtualized guest machines. - -In contrast to these two works the main principles of our approach can be summarized as follows. Do not use full or para‑virtualization of the whole machine but use virtualization of selected components so that overheads occur only when they are unavoidable (i.e. do not virtualize processor). Do not transfer opaque file system images but mount standard file systems over the network so that only minimal transfer overhead can occur. Finally, amend standard task schedulers to work with virtual clusters so that no programming is needed to distribute the load efficiently. These principles are paramount to make virtualization light-weight and fast. - -%\input{fullvirt} -\input{containers} - -%\section{Experience with distributed schedulers} -%TODO (bonus section, can be omitted) - -\section{Discussion} -\label{sec:discussion} - -Light-weight container-based virtualization is the most promising technology for using as an enabling part of the virtual supercomputer concept~\cite{vsc-csit13,vsc-iccsa14} to ensure proper and efficient distribution of resources between several applications. Knowing the application demands in advance we can create appropriate infrastructure configuration giving just as much resources as needed to each particular instance of a virtual supercomputer running a particular application. In such a way, free resources can be controlled and granted to other applications without negative effect on current executions with minimal overhead. - -\section{Conclusion} -\label{sec:conclusion} - -Presented approach for creating virtual clusters from Linux containers was found to be efficient and its performance comparable to ordinary physical cluster. -%: not only usage of containers does not incur significant virtualization processor overheads but also network virtualization overheads can be totally removed if host's network name space is used and network bandwidth saved by automatically transferring only those files that are needed through network-mounted file system rather than the whole images. -From the point of view of system administrator, storing each HPC application in its own container makes versioning and dependencies control easy manageable and their configuration does not interfere with the configuration of host machines and other containers. -Usage of standard virtualization technologies can improve overall behavior of a distributed system and adapt it to problems being solved. In that way virtual supercomputer can help people efficiently run applications and focus on domain-specific problems rather than on underlying computer architecture and placement of tasks. - -\section{Acknowledgement} -The research presented in this paper was carried out using -computational resources of Resource Center ``Computer Centre of -Saint-Petersburg State University'' with support of grants of -Russian Foundation for Basic Research (project no.~\mbox{13-07-00747}) and Saint Petersburg State University (projects \mbox{9.38.674.2013}, \mbox{0.37.155.2014}). - -\bibliography{vsc}{} -\bibliographystyle{plain} - -\end{document}- \ No newline at end of file diff --git a/makefile b/makefile @@ -0,0 +1,28 @@ +NAME = csit-15-vsc-exp + +$(NAME).pdf: $(NAME).tex makefile + pdflatex $(NAME) + pdflatex $(NAME) + ls *.bib 2>/dev/null && bibtex $(NAME) || true + pdflatex $(NAME) + +%.eps: %.svg + inkscape --without-gui --export-eps=$@ $< + +clean: + rm -f $(NAME).log $(NAME).aux $(NAME).pdf *-converted-to.pdf + rm -f $(NAME).nav $(NAME).snm $(NAME).toc $(NAME).out + rm -f $(NAME).bbl $(NAME).blg $(NAME).vrb + rm -f ./graphics/ping-1-eps-converted-to.pdf + rm -f ./graphics/imb-1-eps-converted-to.pdf + rm -f ./graphics/openfoam-1-eps-converted-to.pdf + rm -f ./graphics/ping-2-eps-converted-to.pdf + rm -f ./graphics/imb-2-eps-converted-to.pdf + rm -f ./graphics/openfoam-2-eps-converted-to.pdf + +$(NAME).pdf: ./graphics/ping-1.eps +$(NAME).pdf: ./graphics/imb-1.eps +$(NAME).pdf: ./graphics/openfoam-1.eps +$(NAME).pdf: ./graphics/ping-2.eps +$(NAME).pdf: ./graphics/imb-2.eps +$(NAME).pdf: ./graphics/openfoam-2.eps diff --git a/vsc.bib b/vsc.bib @@ -381,7 +381,6 @@ booktitle = {Proceedings of International Conference on Computational Science an volume = 8584, pages = {341-354}, year = 2014 -%pages = {1-6} } @Inproceedings{vkorkhov-iccsa15,