What should you do if you have a lot of scatter data and you need to draw conclusions from this? This data needs to be interpreted. And how to interpret them? With the help of methods that came from statistics.
Sources of data scatter are measurement error or fluctuations in the external environment. Interpreting data can be quite a challenge when there is a lot of data and when there are many sources of scatter. A mathematical model of the phenomenon should be built. The elements included in such model are divided into deterministic and stochastic (random).
Reality can be too complex. Then it is necessary to determine the optimal set of elements that will be included in the model. It is necessary to find a compromise between accuracy and computational costs. The current state of computing technology development allows the use of fairly complex models. However, there are limits here too.
The probability distribution of a random variable is the probabilities of the possible values that have been observed. The probability distribution function can be specified up to 1 or 2 unknown parameters. On the basis of random variables, hypotheses are built and estimates are made. For example, the theory of decision making has become widespread. Essence is to analyze the losses and wins in making right and wrong decisions.
Will the mathematical model work? A systematic set of rules for testing statistical hypotheses is called analysis of variance. This includes the maximum likelihood method, the least squares method.
But what if the duration of experiments for data accumulation is too long or expensive? Then it is necessary to apply special methods for planning comparative experiments.
*********
And if the data source itself changes according to some law? For example, the consumption of ice cream depends on the weather, and the weather obeys the laws of nature and is seasonal. In such cases, it is necessary to resort to the theory of time series.