Article citation information:
Tomko, T., Puskar, M., Fabian, M., Boslai R. Procedure for the
evaluation of measured data in terms of vibration diagnostics by application of
a multidimensional statistical model. Scientific Journal of Silesian University of
Technology. Series Transport. 2016, 91,
125-131. ISSN: 0209-3324. DOI:
10.20858/sjsutst.2016.91.13.
Tomas TOMKO[1], Michal
PUSKAR [2], Michal
FABIAN[3],
Robert BOSLAI4
PROCEDURE FOR THE EVALUATION OF MEASURED DATA IN TERMS
OF VIBRATION DIAGNOSTICS BY APPLICATION OF A MULTIDIMENSIONAL STATISTICAL MODEL
Summary. The
evaluation process of measured data in terms of vibration diagnosis is
problematic for timeline constructors. The complexity of such an evaluation is
compounded by the fact that it is a process involving a large amount of
disparate measurement data. One of the most effective analytical approaches
when dealing with large amounts of data is to engage in a process using multidimensional
statistical methods, which can provide a picture of the current status of the
flexibility of the machinery. The more methods that are used, the more precise the
statistical analysis of measurement data, making it possible to obtain a better
picture of the current condition of the machinery.
Keywords: vibration diagnostics, statistical
methods
1. INTRODUCTION
From the perspective of
understanding the context of the statistical evaluation of measured data regarding
the amplitude and frequency of its engine, the Honda GX25 may experience
irregularities during the application of the statistical model. This
raises a number of important questions. Is the liner regression model suitable?
What percentage of the measured data for the statistical model can be explained
by this model? What percentage of the model is strong? Are the measured data
affected by the multicollinearity, autocorrelation or heteroscedasticity? How
much of this is determined by the normality of residues? What are the
possible errors of the model and how can they be identified? These questions are
very resolutely answered by the methodology described in this work. In the
process, these questions will also evaluate the effectiveness of the correct
application of the linear regression model with two variables using RStudio
software.
2. CALCULATION OF THE
REGRESSION COEFFICIENTS Β
The first step in
the progressive realization of the linear regression model with two variables
is to calculate the regression coefficients as follows:
(1)
In view of Equation (1), it is clear that the RStudio software must
calculate an inverse matrix by multiplying the transposed matrix of matrix X by XT,
then multiplying it by the transposed matrix XT and matrix y. The
problem with Equation (1) is easy to define in the program environment of RStudio
when using the following command:
(2)
This operation assures linear model parameter
estimation, which will take the shape of a vector. The output from RStudio is a
vector of estimation parameters in the following form:
3. RANDOM DISTURBANCES AND THEIR
DISPERSION IN THE FORM OF A VARIANCE–COVARIANACE MATRIX
The next step is to
define the problems of random disturbances in the environment program of
RStudio. The difficulty of this step is the fact that the variance must be
expressed in the variance-covariance matrix. From this point of view, it
is necessary to define the problem expressed by Equations (3) and (4):
(3)
(4)
In order for Equation
(3) to be expressed by the RStudio software, it is necessary to calculate the variance
of random values according to Equation (4). This equation refers to the proportion
by which the numerator is the product of the transposed matrix of the vector of
residual components of the ordinary matrix vector eT of residuals e. In the
denominator, the number of explanatory variables needs to increase by one. Then
the coefficient is subtracted from the number of
observations n. This problem can be defined in the
RStudio software in the following form:
(5)
In this case, I used the generic
function resid (), which comes with a
standard offer RStudio program environment. This feature was chosen because of
the purpose in calculating the residual component of the regression model
itself. In addition to the generic function resid
(), I used another generic function, i.e., dim (), in this step [1]. This function determines the number of
rows of a matrix with dimension n,
while the number of columns is provided by using the dimension in relation
to k+1. The outcome of the RStudio
program is based on the defined command to estate the variance of random
failures, as presented in (5). In this case, its value equates to 7.019. After
successfully estimating the variance of random failures, I then defined the
problem of S2 (3) using the RStudio environment program. Since this software
understands the calculated variance matrix, a scalar variable is not so
necessary in this step in order to express this matrix in scalar form. This
transformation takes place within the defined RStudio program, using the function
as. vector (), as follows:
(6)
After the expression of the scalar site of dispersion,
it is necessary to estimate the variance–covariance matrix using another
generic function. This time, I used the function vcov, which is found in an environment of the RStudio software in
this form:
(7)
The outcome from RStudio is a variance-covariance
matrix of the dispersion folder of the measured values of the time series.
4. QUANTIFICATION OF THE COEFFICIENT OF
DETERMINATION
The third and very important step in the
application of a linear model with two variables is the quantification of the
coefficient of determination, which helps to answer the question about how much,
in percentage terms, my algorithms were taken into account in relation to the
measured data. I have defined this problem in RStudio. In the first place, it was
necessary to recall the fact that the coefficient of determination works with
the total sum of three types of squares: the total sum of squares, the
residual sum of squares and explained sum of squares. These squares are takes
into consideration in the RStudio program by using the function r. squared. This function concerns the
quantification issue relating to the coefficient of determination, whose function
in RStudio takes this form:
(8)
In this
case, after applying the defining process in RStudio I calculated the value to
be 98.9%. This value defines the following: approximately 98.9% of the total
variability values of the dependent variable is explained by Equation (14).
5. DETECTION OF THE
VECTOR OF PARAMETERS IN THE FORM OF A CONFIDENCE INTERVAL
After the evaluation of the coefficient of
determination, it is necessary to go to the next step, namely, to detect the
interval of estimation β. The statistics indicate that this interval is a
confidence interval, which, with 95% probability, include parameters β. The progress
of this detection is implemented in an RStudio environment program using
another generic function: confint ().
The function code () takes the following shape in the RStudio environment
program:
(9)
The outcome
of this calculation reveals a 95% confidence interval for the estimation of the parameters
β in the form of a two-sided vector.
6. VERIFICATION OF THE NORMAL HYPOTHESIS OF THE
MODEL WITH TWO VARIABLES
The proper application of this model evolves out of the
nature of the hypothesis that the model must meet. This is a hypothesis that
states that the residual component in the time series must follow the normal
distribution probability. For this reason, in order to verify this assumption,
it is necessary to use the well-known statistical test known as the Jarque-Bera
test of normality. Of course, this issue is dealt with in the defined RStudio
program. The functions are determined by using the jbTest (), which comes with the standard R-studio package. This
function enters the calculation process within RStudio with the following code:
(10)
This test
takes into consideration two possible hypotheses, which might occur during the evaluation.
The zero hypothesis H0 states that the residual component shall enter into a normal
probability distribution. The alternative hypothesis H1 states that this
component shall be something other than a normal probability distribution. An important
feature of this test is that involves a p-value. This value is an essential
indicator of the acceptance or rejection of the null hypothesis H0. The aim of
this test is to equate 0 with a p-value of 739. If the p-value is greater than
the level of significance for α, the null hypothesis could not be rejected
at 0.05. It is therefore possible that the hypothesis of a normal distribution
of residual components can be regarded as satisfied.
7. DETECTION OF POSSIBLE DEFECTS IN THE MODEL
(RAMSEY TEST)
The most important part of this work is to verify whether
a linear regression model is defined correctly. In order to verify the accuracy
of the model specifications for the purposes of this thesis, I used a famous statistical
test known as the Ramsey regression equation specification error test. On the
chosen level of significance, α = 0.05, as defined according to Ramsey
test in R-studio. Using the generic function resettest () enables this program to assess very quickly whether
the proposed model is defined correctly or not. If the generic resettest () function is used on the
basis of model specifications, errors in the RStudio environment program occur as
follows:
(11)
Test specifications for the chosen level of
significance’s testing errors were determined in relation to the zero hypothesis
H0, which states how the shape of this model is correctly defined in comparison
to the alternative hypothesis H1, which considers the model to be
incorrectly defined. As such, implementing the square expansion of the independent
variable should take this form:
(12)
After using this test, I failed to reject the zero
hypothesis H0 at this stage. Given the level of significance α, the result
of this test represents clear statistical proof that the proposed model meets
all the statistical assumptions and may be considered as a model in the correct
form.
8. QUANTIFICATION OF ANTICIPATED VALUES OF THE
MODEL
One of the advantages of using a linear regression
model in practice is the fact that this model offers the possibility of
calculating the anticipated values. This issue is in the defined RStudio program whose purpose is to estimate the
values that can inform future generations of the model. For this purpose, it is
necessary to estimate the interval of values. To this end, this chapter makes
use of the command predict (), located in the argument folder of the confidence
interval’s specified code, in which the value is calculated. After the
inclusion of this interval, the command to predict
() in RStudio takes the following form:
(13)
The outcome from RStudio, after the anticipated data
(13) are defined, is the estimation interval of the variables X.
Fig. 1. RStudio software environment
Fig. 2. Visualization of Honda GX25
motor
Fig. 3. Honda GX25
4. CONCLUSION
The application of the RStudio program in order to measure
data in terms of the diagnosis of internal combustion engines included
measurement data on the amplitude and speed of the Honda GX25 engine. This can
be expressed as a linear regression model in the following form:
(14)
This equation enables the following interpretation:
the change of the parameter y on one
drive will change the value of the parameter X to 4.216. The parameter y in this case is the magnitude and
parameter X is the speed of the Honda GX25 engine, which is used in the Shell Eco-marathons.
Equation (14) is not the only outcome following application
of a linear model with two variables. This methodology also statistically
confirmed the fact that the model relating to (14) is based on the Ramsey
test, whose specifications were defined in the right form as errors. Meanwhile,
as this equation is able to include 98.9% of all measured values, it clearly calculates
the value of the coefficient of determination.
This
paper was elaborated within the framework of the following projects:
VEGA1/0197/14 – research on new methods and innovative design solutions in
order to increase efficiency and to reduce emissions of transport vehicle
driving units, together with the evaluation of possible operational risks; VEGA
1/0198/15 – research on innovative methods for emission reduction of driving
units used in transport vehicles and the optimization of active logistic
elements in material flows in order to increase their technical level and
reliability; and KEGA 021TUKE–4/2015 – development
of cognitive activities focused on innovations in educational programs in the discipline
of engineering branch, as well as building and modernizing specialized
laboratories specified for logistics and intra-operational transport.
References
1.
Hatrák
M. 2010. Ekonometria. [In Slovak: Econometrics]. Bratislava: IURA.
ISBN 978-80-8078-150-7.
2.
Piotrowski
J. 1995. Shaft alignment handbook. New
York: CRC Press.
ISBN 1-57444-721-1.
3.
Wackerly
D., W. Mendenhall, R. Scheaffer. 2008. Mathematical
Statistics with Applications. Belmont, USA: Thomson Brooks/Cole. ISBN-10:
0-495-38508-5.
Received 13.10.2015; accepted in revised form 02.03.2016
Scientific Journal of Silesian
University of Technology. Series Transport is licensed under a Creative
Commons Attribution 4.0 International License
[1] Faculty of Mechanical
Engineering, Technical University of Košice, 9 Letná Street, 042 00 Košice,
Slovakia. E-mail: tomas.tomko@tuke.sk.
[2] Faculty of Mechanical
Engineering, Technical University of Košice, 9 Letná Street, 042 00 Košice,
Slovakia. E-mail: michal.puskar@tuke.sk.
[3] Faculty of Mechanical
Engineering, Technical University of Košice, 9 Letná Street, 042 00 Košice,
Slovakia. E-mail: michal.fabian@tuke.sk.
4 Faculty of Mechanical Engineering, Technical
University of Košice, 9 Letná Street, 042 00 Košice, Slovakia. E-mail: robert.boslai@gmail.com.