Fix RMSE. - iccsa-21-wind

commit 9b416ed1733da8ca3fa83cfc2b094961982a4414
parent f431571dbc0962e3d5791af065aeda50644bd71a
Author: Ivan Gankevich <i.gankevich@spbu.ru>
Date:   Thu,  8 Apr 2021 17:55:19 +0300

Fix RMSE.

Diffstat:
R/anal.R  | 5 +++--
main.tex  | 166 +++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------

2 files changed, 110 insertions(+), 61 deletions(-)
diff --git a/R/anal.R b/R/anal.R
@@ -29,8 +29,9 @@ power <- function(x,y) {
 
 # normalized rmse
 rmse <- function(estimated, actual) {
-  s <- max(actual)
-  sqrt(mean(((estimated - actual)/s)**2))
+  s_max <- max(actual)
+  s_min <- min(actual)
+  sqrt(mean((estimated - actual)**2)) / (s_max - s_min)
 }
 
 dweibull2 <- function (x, a1=1, b1=1, c1=1, a2=1, b2=1, c2=1, g=0) {
diff --git a/main.tex b/main.tex
@@ -7,8 +7,10 @@
 \usepackage{graphicx}
 \usepackage{listings}
 \usepackage{url}
+\usepackage{textcomp}
 
 \DeclareMathOperator{\Mode}{Mo}
+\DeclareMathOperator{\Expectation}{E}
 
 \begin{document}
 
@@ -163,32 +165,31 @@ that control the scale and the shape.
 
 \subsection{Three-dimensional ACF of wind velocity}
 
-Usually, autocorrelation is modeled using exponential functions~\cite{box1976time}.
-In this paper we use one-dimensional autocorrelation function written as
+Usually, autocovariance is modeled using exponential functions~\cite{box1976time}.
+In this paper we use one-dimensional autocovariance function written as
 \begin{equation}
-    \rho\left(t\right) = a_3 \exp\left(-\left(b_3 t\right)^{c_3} \right).
+    K\left(t\right) = a_3 \exp\left(-\left(b_3 t\right)^{c_3} \right).
     \label{eq-acf-approximation}
 \end{equation}
-Here \(a_3>0\), \(b_3>0\) and \(c_3>0\) are parameters of the autocorrelation
+Here \(a_3>0\), \(b_3>0\) and \(c_3>0\) are parameters of the autocovariance
 function that control the shape of the exponent.
 
-In order to construct three-dimensional autocorrelation function we assume that
-one-dimensional autocorrelation function is the same for each coordinate and
+In order to construct three-dimensional autocovariance function we assume that
+one-dimensional autocovariance function is the same for each coordinate and
 multiply them.
 \begin{equation}
-    \rho\left(t,x,y,z\right) = a \exp\left(
+    K\left(t,x,y,z\right) = a \exp\left(
         -\left(b_t t\right)^{c_t}
         -\left(b_x x\right)^{c_x}
         -\left(b_y y\right)^{c_y}
         -\left(b_z z\right)^{c_z}
     \right).
+    \label{eq-acf}
 \end{equation}
-Here \(a>0\), \(b_{t,x,y,z}>0\) and \(c_{t,x,y,z}>0\) are parameters of the autocorrelation
-function. Parameter \(a\) is proportional to the square of scalar wind velocity,
-parameter \(b_{t,x,y,z}\) is proprotional to the absolute value of mode of
-projection of wind velocity
-on the corresponding axis (the most common wind speed in the corresponding direction).
-Parameter \(c_{t,x,y,z}\) controls the shape of the autocorrelation function in
+Here \(a>0\), \(b_{t,x,y,z}>0\) and \(c_{t,x,y,z}>0\) are parameters of the autocovariance
+function. Parameter \(a\) and \(\exp\left(b\right)\) are proportional to wind
+velocity projection on the corresponding axis.
+Parameter \(c\) controls the shape of the autocovariance function in
 the corresponding direction; it does not have simple relationship to the wind
 velocity statistical parameters.
 
@@ -231,21 +232,19 @@ m/s (table~\ref{tab-coefficients}).
 
 We noticed that ambient temperature affects values reported by our load cells:
 when the load cell heats up (cools down), it reports values that increase
-(decrease) linearly in time. We removed this linear trend from the measured
-values using linear regression. The code in R~\cite{r-language} that transforms
-sensor values into wind speed is presented in
-listing~\ref{lst-sample-to-speed}.
+(decrease) linearly in time due to thermal expansion of the material. We
+removed this linear trend from the measured values using linear regression. The
+code in R~\cite{r-language} that transforms sensor values into wind speed is
+presented in listing~\ref{lst-sample-to-speed}.
 
-\begin{lstlisting}[label={lst-sample-to-speed},caption={The code that transforms raw load cell sensor values into wind speed projection to the corresponding axis.}]
+\begin{lstlisting}[label={lst-sample-to-speed},caption={The code that transforms raw load cell sensor values into wind speed projections to the corresponding axis.}]
 sampleToSpeed <- function(x, c1, c2) {
- # remove linear trend
  t <- c(1:length(x))
  reg <- lm(x~t)
- x - reg$fitted.values
+ x <- x - reg$fitted.values # remove linear trend
  x <- sign(x)*sqrt(abs(x)) # convert from force to velocity
- # scale sensor values to wind speed using calibration coefficients
- x[x<0] = x[x<0] / c1
- x[x>0] = x[x>0] / c2
+ x[x<0] = x[x<0] / c1 # scale sensor values to wind speed
+ x[x>0] = x[x>0] / c2 # using calibration coefficients
  x
 }
 \end{lstlisting}
@@ -314,25 +313,32 @@ statistics that is not distorted by the turbulence.
 
 \subsection{Anemometer verification}
 
-\subsection{Verification of wind velocity distribution}
+In order to verify that our anemometer produces correct measurements we
+calculated wind speed and direction from the collected samples and fitted them
+into Weibull distribution and von Mises distribution respectively.  These are
+typical models for wind speed and direction~\cite{carta2008,carta2008joint}.
+Then we found the intervals with the best and the worst fit for these models
+using normalised root-mean-square error (RMSE) calculated as
+\begin{equation}
+    \text{RMSE} = \frac{\sqrt{\Expectation\left[\left(X_\text{observed}-X_\text{estimated}\right)^2\right]}}{X_\text{max}-X_\text{min}}.
+\end{equation}
+Here \(\Expectation\) is statistical mean, \(X_\text{observed}\) and
+\(X_\text{estimated}\) are observed and estimated values respectively.
 
-The wind speed data collected with three-axis anemometer was approimated by
+The wind speed data collected with three-axis anemometer was approximated by
 Weibull distribution using least-squares fitting. Negative and positive wind
 speed projections to each axis both have this distribution, but with different
-parameters. Most of the data intervals contain only one mean wind direction, which
-means that one of the distributions is for incident wind flow on the arm of the
-anemometer and another one is for the turbulent flow that forms behind the arm.
-For \(z\) axis both left and right distributions have similar shapes, for \(x\)
-and \(y\) axes the distribution for incident flow is taller than the
-distribution for turbulent flow. Distributions for each axis are presented in
-figure~\ref{fig-velocity-distributions}.
-
-RMSE of wind speed distribution approximation has positive correlation with
-wind speed: the larger the wind speed, the larger the error and vice versa.
-Larger error for low wind speeds is caused by larger skewness and kurtosis (see
-the first row of figure~\ref{fig-velocity-distributions}). RMSE is reduced
-when~\eqref{eq-velocity-distribution} is used as the distribution for both low
-and high wind speeds. TODO
+parameters. Most of the data intervals contain only one prevalent mean wind
+direction, which means that one of the distributions is for incident wind flow
+on the arm of the anemometer and another one is for the turbulent flow that
+forms behind the arm.  For \(z\) axis both left and right distributions have
+similar shapes, for \(x\) and \(y\) axes the distribution for incident flow is
+taller than the distribution for turbulent flow. The best-fit and worst-fit
+distributions for each axis are presented in figure~\ref{fig-velocity-distributions}.
+
+%RMSE is reduced
+%when~\eqref{eq-velocity-distribution} is used as the distribution for both low
+%and high wind speeds. TODO
 
 \begin{figure}
     \centering
@@ -347,20 +353,17 @@ and high wind speeds. TODO
     \label{fig-velocity-distributions}}
 \end{figure}
 
-\subsection{Verification of wind direction distribution}
-
-\cite{carta2008}
-
-TODO joint wind speed and direction distribution \cite{carta2008joint}
-
-TODO wind direction distribution sectors \cite{feng2015sectors}
-
-TODO asymmetry
-
-
-RMSE of wind direction distribution approximation has negative correlation with
-wind speed: the larger the wind speed, the smaller the error and vice versa.
-Larger error for low wind
+Wind direction was approximated by von Mises distribution using least-squares
+fitting. Following the common
+practice~\cite{feng2015sectors} we divided direction axis
+into sectors: ``positive'' (from 0 to 180\textdegree) and ``negative'' (from 0 to
+-180\textdegree) --- and fitted each sector independently. We choose two
+sectors as they give reasonably small error: one sector is not enough here,
+because we always have at least two mean directions (one for incident flow and
+one for turbulent flow behined the arm), and four sectors or more is too many
+as we have four mean directions only when the wind speed is low and the flow
+is close to turbulent. The best-fit and worst-fit distributions for each axis
+are presented in figure~\ref{fig-direction-distributions}.
 
 \begin{figure}
     \centering
@@ -385,7 +388,21 @@ Larger error for low wind
     \label{fig-direction-distributions}}
 \end{figure}
 
-\section{Verification of ACF}
+%TODO asymmetry
+
+Finally, we computed autocovariance for each axis as
+\begin{equation}
+    K(\tau) = \Expectation\left[ \left(X_t-\bar{X}\right)\left(X_{t-\tau}-\bar{X}\right) \right]
+\end{equation}
+and fitted it into~\eqref{eq-acf-approximation}.
+Per-axis ACFs have pronounced peak at nought lag and long tails.
+The largest RMSE is 2.3\%.
+Variances for \(x\) and \(y\) axes are comparable, but ACF for \(z\) axis has
+much lower variance.
+Parameters \(a\) and \(b\) from~\eqref{eq-acf-approximation} are positively
+correlated with wind speed for the corresponding axis.
+The best-fit and worst-fit ACFs for each axis
+are presented in figure~\ref{fig-acf}.
 
 \begin{figure}
     \centering
@@ -396,20 +413,51 @@ Larger error for low wind
         ACF of positive wind speed projections, blue line shows estimated
         ACF of negative wind speed projections and circles denote observed
         ACF of wind speed projections.
-    \label{fig-velocity-distributions}}
+    \label{fig-acf}}
 \end{figure}
 
+\subsection{Wind simulation using measured ACFs}
+
+TODO Anton
+
 \section{Discussion}
 
+RMSE of wind speed distribution approximation has positive correlation with
+wind speed: the larger the wind speed, the larger the error and vice versa.
+Larger error for low wind speeds is caused by larger skewness and kurtosis (see
+the first row of figure~\ref{fig-velocity-distributions}). Similar
+approximation errors can be found in~\cite{carta2008joint} where the authors
+improve approximation accuracy using joint wind speed and direction
+distributions. Such studies are outside of the scope of this paper, because
+here we verify anemometer measurements using well-established mathematical
+models, but the future work may include the study of these improvements.
+
+RMSE of wind direction distribution approximation has negative correlation with
+wind speed: the larger the wind speed, the smaller the error and vice versa.
+This is in agreement with physical laws: the faster the flow is the more
+determinate its mean direction becomes, and the slower the flow is the more
+undeterminate its mean direction is.
+
 One disadvantage of three-axis anemometer is that the arm for the \(z\) axis is
 horizontal, and snow and rain put additional load on this cell distorting the
-measurements. This affects \(z\) mean value and can be compensated in software by
-comparing the mean value within small time frame with the historical mean. With
-compensation \(z\) arm can be used as a snow or rain indicator. From that
-point of view having horizontal arm can be seen as an advantage.
+measurements. Also, thermal expansion and contraction of the material changes
+the resistance of load cells and distorts the measurements. Both these
+deficiences can be compensated in software by removing linear trend from the
+corresponding interval. 
 
 \section{Conclusion}
 
+In this paper we proposed three-axis anemometer that measures wind speed for
+each axis independently. We analysed the data collected by this anemometer and
+verified that per-axis wind speeds fit into Weibull distribution with the
+largest RMSE of 14\% and wind directions fit into von Mises distibution with
+the largest RMSE of 15\%. We estimated autocovariance functions for wind speed
+for each axis of the anemometer and used this approximations to simulate wind
+flow in Virtual Testbed. The parameters of these functions allow to control
+both wind speed and mean direction. The future work is to construct new
+anemometer that is able to measure spatial autocovariances using the proposed
+anemometer as the base.
+
 \subsubsection*{Acknowledgements.}
 Research work is supported by Council for grants of the President of the
 Russian Federation (grant no.~MK-383.2020.9).

	iccsa-21-wind
	git clone https://git.igankevich.com/iccsa-21-wind.git
	Log \| Files \| Refs

R/anal.R	\|	5	+++--
main.tex	\|	166	+++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------