Average salary... Average life expectancy... Almost every day we hear these phrases used to describe a multitude with a single number. But oddly enough, "average value" is a rather insidious concept, often misleading an ordinary person who is inexperienced in mathematical statistics.

What is the problem?

The average value most often means the arithmetic mean, which varies greatly under the influence of single facts or events. And you won't get a real idea of ​​how exactly the values ​​you're learning are distributed.

Let's take a classic example of the average salary.

An abstract company has ten employees. Nine of them receive a salary of about 50,000 rubles, and one 1,500,000 rubles (by a strange coincidence, he is also the general director of this company).

The average value in this case will be 195,150 rubles, which, you see, is wrong.

What are the ways to calculate the average?

The first way is to calculate the already mentioned arithmetic mean, which is the sum of all values ​​divided by their number.

  • x – arithmetic mean;
  • x n - specific value;
  • n - number of values.
  • Works well with a normal distribution of values ​​in the sample;
  • Easy to calculate;
  • Intuitive.
  • Doesn't give a real idea of ​​the distribution of values;
  • An unstable quantity that is easily thrown out (as in the case of the CEO).

The second way is to calculate fashion, which is the most frequently occurring value.

  • M 0 - mode;
  • x0 is the lower bound of the interval that contains the mode;
  • n is the value of the interval;
  • f m - frequency (how many times a particular value occurs in a series);
  • f m-1 - the frequency of the interval preceding the modal;
  • f m+1 is the frequency of the interval following the modal.
  • Great for getting a sense of public opinion;
  • Good for non-numeric data (colors of the season, bestsellers, ratings);
  • Easy to understand.
  • Fashion may simply not exist (no repetitions);
  • There can be several modes (multi-modal distribution).

The third way is to calculate medians, that is, the value that divides the ordered sample into two halves and lies between them. And if there is no such value, then the arithmetic mean between the boundaries of the halves of the sample is taken as the median.

  • M e is the median;
  • x0 is the lower bound of the interval that contains the median;
  • h is the value of the interval;
  • f i - frequency (how many times a particular value occurs in a series);
  • S m-1 - the sum of the frequencies of the intervals preceding the median;
  • f m is the number of values ​​in the median interval (its frequency).
  • Provides the most realistic and representative estimate;
  • Emission resistant.
  • It is more difficult to calculate, since the sample must be ordered before calculation.

We have considered the basic methods for finding the average value, called measures of central tendency(actually there are more, but these are the most popular).

Now let's go back to our example and calculate all three variants of the average using special Excel functions:

  • AVERAGE(number1;[number2];…) — function for determining the arithmetic mean;
  • FASHION.ONE(number1,[number2],...) - fashion function (older versions of Excel used FASHION(number1,[number2],...));
  • MEDIAN(number1;[number2];...) is a function for finding the median.

And here are the values ​​we got:

In this case, the mode and median characterize the average salary in the company much better.

But what to do when there are not 10 values ​​in the sample, as in the example, but millions? In Excel, this cannot be calculated, but in the database where your data is stored, no problem.

Calculate the arithmetic mean in SQL

Everything is quite simple here, since SQL provides a special aggregate function AVG .

And to use it, it is enough to write the following query:

Computing the mode in SQL

SQL does not have a separate function for finding the mode, but you can easily and quickly write it yourself. To do this, we need to find out which of the salaries is most often repeated and choose the most popular one.

Let's write a query:

/* WITH TIES must be added to TOP() if the set is multimodal, meaning the set has multiple modes */ SELECT TOP(1) WITH TIES salary AS "Salary Mode" FROM employees GROUP BY salary ORDER BY COUNT(*) DESC

Calculate the median in SQL

As with mode, SQL does not have a built-in function for calculating the median, but it does generic function to calculate percentiles PERCENTILE_CONT .

It all looks like this:

/* In this case, the 0.5 percentile will be the median */ SELECT TOP(1) PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER() AS "Median salary" FROM employees

It is better to read more about the work of the PERCENTILE_CONT function in the help of Microsoft and Google BigQuery .

What way to use anyway?

From the above it follows that the median The best way to calculate the average value.

But it is not always the case. If you are working with the mean, then beware of multimodal distribution:

The graph shows a bimodal distribution with two peaks. Such a situation may arise, for example, when voting in elections.

In this case, the arithmetic mean and median are values ​​somewhere in the middle and they will not say anything about what is really happening and it is better to immediately recognize that you are dealing with a bimodal distribution by reporting two modes.

Better yet, divide the sample into two groups and collect statistical data for each.

Conclusion:

When choosing a method for finding the mean, it is necessary to take into account the presence of outliers, as well as the normal distribution of values ​​in the sample.

The final choice of the measure of the central trend always lies with the analyst.

Let's say you need to find the average number of days for tasks to be completed by different employees. Also, you want to calculate the average temperature for a given day over a 10-year period. Calculating the average value for a group of numbers can be done in several ways.

The AVERAGE function calculates the mean, which is the center of a set of numbers in a statistical distribution. There are three most common ways to determine the mean:

    Mean This is the arithmetic average, which is calculated by adding a group of numbers and dividing them by the number of these numbers. For example, the average for the numbers 2, 3, 3, 5, 7, and 10 is 5, which is the result of dividing their sum, which is 30, by their number, which is 6.

    Median The middle number of a group of numbers. Half of the numbers contain values ​​greater than the median, and half of the numbers contain values ​​less than the median. For example, the median for the numbers 2, 3, 3, 5, 7, and 10 is 4.

    Fashion The most frequently occurring number in a group of numbers. For example, the mode for the numbers 2, 3, 3, 5, 7, and 10 would be 3.

With a symmetrical distribution of a set of numbers, all three values ​​of the central tendency will coincide. In the deviated distribution of a group of numbers, they can be different.

Calculate the average value in adjacent rows or columns

Follow the steps below.

Calculating the average value beyond a continuous row or column

To accomplish this task, use the function AVERAGE. Copy the table below to a blank sheet.

Calculation of the weighted average

To accomplish this task, use the functions SUMPRODUCT and sum. The WWIS example calculates the average prices paid per unit for three purchases, where each is for a different item on a different unit.

Copy the table below to a blank sheet.

Remember!

To find the arithmetic mean, you need to add all the numbers and divide their sum by their number.


Find the arithmetic mean of 2, 3 and 4 .

Let's denote the arithmetic mean by the letter "m". By the definition above, we find the sum of all numbers.


Divide the resulting amount by the number of numbers taken. We have three numbers.

As a result, we get arithmetic mean formula:


What is the arithmetic mean for?

In addition to the fact that it is constantly offered to be found in the classroom, finding the arithmetic mean is very useful in life.

For example, you decide to sell soccer balls. But since you are new to this business, it is completely incomprehensible at what price you sell balls.

Then you decide to find out at what price your competitors are already selling soccer balls in your area. Find out the prices in stores and make a table.

Prices for balls in stores turned out to be quite different. What price should we choose to sell the soccer ball?

If we choose the lowest one (290 rubles), then we will sell the goods at a loss. If you choose the highest one (360 rubles), then buyers will not purchase soccer balls from us.

We need an average price. Here comes to the rescue average.

Calculate the arithmetic mean of the prices for soccer balls:

average price =

290 + 360 + 310
3
=
960
3
= 320 rub.

Thus, we got the average price (320 rubles), at which we can sell a soccer ball not too cheap and not too expensive.

Average moving speed

Closely related to the arithmetic mean is the concept average speed.

Observing the movement of traffic in the city, you can see that the cars either accelerate and travel at high speed, then slow down and travel at low speed.

There are many such sections along the route of vehicles. Therefore, for the convenience of calculations, the concept of average speed is used.

Remember!

The average speed of movement is the total distance traveled divided by the total time of movement.

Consider the problem for the average speed.

Task number 1503 from the textbook "Vilenkin Grade 5"

The car traveled 3.2 hours on a highway at a speed of 90 km/h, then 1.5 hours on a dirt road at a speed of 45 km/h, and finally 0.3 hours on a country road at a speed of 30 km/h. Find the average speed of the car for the entire journey.

To calculate the average speed of movement, you need to know the entire distance traveled by the car, and the entire time that the car was moving.

S 1 \u003d V 1 t 1

S 1 \u003d 90 3.2 \u003d 288 (km)

- highway.

S 2 \u003d V 2 t 2

S 2 \u003d 45 1.5 \u003d 67.5 (km) - dirt road.

S 3 \u003d V 3 t 3

S 3 \u003d 30 0.3 \u003d 9 (km) - country road.

S = S 1 + S 2 + S 3

S \u003d 288 + 67.5 + 9 \u003d 364.5 (km) - the entire path traveled by the car.

T \u003d t 1 + t 2 + t 3

T \u003d 3.2 + 1.5 + 0.3 \u003d 5 (h) - all the time.

V cf \u003d S: t

V cf \u003d 364.5: 5 \u003d 72.9 (km / h) - the average speed of the car.

Answer: V av = 72.9 (km / h) - the average speed of the car.

In mathematics, the arithmetic mean of numbers (or simply the average) is the sum of all the numbers in a given set divided by their number. This is the most generalized and widespread concept of the average value. As you already understood, in order to find the average value, you need to sum up all the numbers given to you, and divide the result by the number of terms.

What is the arithmetic mean?

Let's look at an example.

Example 1. Numbers are given: 6, 7, 11. You need to find their average value.

Decision.

First, let's find the sum of all given numbers.

Now we divide the resulting sum by the number of terms. Since we have three terms, respectively, we will divide by three.

Therefore, the average of the numbers 6, 7 and 11 is 8. Why 8? Yes, because the sum of 6, 7 and 11 will be the same as three eights. This is clearly seen in the illustration.

The average value is somewhat reminiscent of the "alignment" of a series of numbers. As you can see, the piles of pencils have become one level.

Consider another example to consolidate the knowledge gained.

Example 2 Numbers are given: 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29. You need to find their arithmetic mean.

Decision.

We find the sum.

3 + 7 + 5 + 13 + 20 + 23 + 39 + 23 + 40 + 23 + 14 + 12 + 56 + 23 + 29 = 330

Divide by the number of terms (in this case, 15).

Therefore, the average value of this series of numbers is 22.

Now consider negative numbers. Let's remember how to sum them up. For example, you have two numbers 1 and -4. Let's find their sum.

1 + (-4) = 1 – 4 = -3

Knowing this, consider another example.

Example 3 Find the average value of a series of numbers: 3, -7, 5, 13, -2.

Decision.

Finding the sum of numbers.

3 + (-7) + 5 + 13 + (-2) = 12

Since there are 5 terms, we divide the resulting sum by 5.

Therefore, the arithmetic mean of the numbers 3, -7, 5, 13, -2 is 2.4.

In our time of technological progress, it is much more convenient to use computer programs to find the average value. Microsoft Office Excel is one of them. Finding the average in Excel is quick and easy. Moreover, this program is included in the software package from Microsoft Office. Consider a brief instruction on how to find the arithmetic mean using this program.

In order to calculate the average value of a series of numbers, you must use the AVERAGE function. The syntax for this function is:
=Average(argument1, argument2, ... argument255)
where argument1, argument2, ... argument255 are either numbers or cell references (cells mean ranges and arrays).

To make it clearer, let's test the knowledge gained.

  1. Enter the numbers 11, 12, 13, 14, 15, 16 in cells C1 - C6.
  2. Select cell C7 by clicking on it. In this cell, we will display the average value.
  3. Click on the "Formulas" tab.
  4. Select More Functions > Statistical to open the drop down list.
  5. Select AVERAGE. After that, a dialog box should open.
  6. Select and drag cells C1-C6 there to set the range in the dialog box.
  7. Confirm your actions with the "OK" button.
  8. If you did everything correctly, in cell C7 you should have the answer - 13.7. When you click on cell C7, the function (=Average(C1:C6)) will be displayed in the formula bar.

It is very useful to use this function for accounting, invoices, or when you just need to find the average of a very long range of numbers. Therefore, it is often used in offices and large companies. This allows you to keep the records in order and makes it possible to quickly calculate something (for example, the average income per month). Also with using Excel you can find the mean value of the function.

Average

This term has other meanings, see the average meaning.

Average(in mathematics and statistics) sets of numbers - the sum of all numbers divided by their number. It is one of the most common measures of central tendency.

It was proposed (along with the geometric mean and harmonic mean) by the Pythagoreans.

Special cases of the arithmetic mean are the mean (of the general population) and the sample mean (of samples).

Introduction

Denote the set of data X = (x 1 , x 2 , …, x n), then the sample mean is usually denoted by a horizontal bar over the variable (x ¯ (\displaystyle (\bar (x))) , pronounced " x with a dash").

The Greek letter μ is used to denote the arithmetic mean of the entire population. For a random variable for which a mean value is defined, μ is probability mean or the mathematical expectation of a random variable. If the set X is a collection of random numbers with a probability mean μ, then for any sample x i from this collection μ = E( x i) is the expectation of this sample.

In practice, the difference between μ and x ¯ (\displaystyle (\bar (x))) is that μ is a typical variable because you can see the sample rather than the entire population. Therefore, if the sample is represented randomly (in terms of probability theory), then x ¯ (\displaystyle (\bar (x))) (but not μ) can be treated as a random variable having a probability distribution on the sample (probability distribution of the mean).

Both of these quantities are calculated in the same way:

X ¯ = 1 n ∑ i = 1 n x i = 1 n (x 1 + ⋯ + x n) . (\displaystyle (\bar (x))=(\frac (1)(n))\sum _(i=1)^(n)x_(i)=(\frac (1)(n))(x_ (1)+\cdots +x_(n)).)

If a X is a random variable, then the mathematical expectation X can be considered as the arithmetic mean of the values ​​in repeated measurements of the quantity X. This is a manifestation of the law of large numbers. Therefore, the sample mean is used to estimate the unknown mathematical expectation.

In elementary algebra, it is proved that the mean n+ 1 numbers above average n numbers if and only if the new number is greater than the old average, less if and only if the new number is less than the average, and does not change if and only if the new number is equal to the average. The more n, the smaller the difference between the new and old averages.

Note that there are several other "means" available, including power-law mean, Kolmogorov mean, harmonic mean, arithmetic-geometric mean, and various weighted means (e.g., arithmetic-weighted mean, geometric-weighted mean, harmonic-weighted mean).

Examples

  • For three numbers, you need to add them and divide by 3:
x 1 + x 2 + x 3 3 . (\displaystyle (\frac (x_(1)+x_(2)+x_(3))(3)).)
  • For four numbers, you need to add them and divide by 4:
x 1 + x 2 + x 3 + x 4 4 . (\displaystyle (\frac (x_(1)+x_(2)+x_(3)+x_(4))(4)).)

Or easier 5+5=10, 10:2. Because we added 2 numbers, which means that how many numbers we add, we divide by that much.

Continuous random variable

For a continuously distributed value f (x) (\displaystyle f(x)) the arithmetic mean on the interval [ a ; b ] (\displaystyle ) is defined via a definite integral:

F (x) ¯ [ a ; b ] = 1 b − a ∫ a b f (x) d x (\displaystyle (\overline (f(x)))_()=(\frac (1)(b-a))\int _(a)^(b) f(x)dx)

Some problems of using the average

Lack of robustness

Main article: Robustness in statistics

Although the arithmetic mean is often used as means or central trends, this concept does not apply to robust statistics, which means that the arithmetic mean is heavily influenced by "large deviations". It is noteworthy that for distributions with a large skewness, the arithmetic mean may not correspond to the concept of “average”, and the values ​​of the mean from robust statistics (for example, the median) may better describe the central trend.

The classic example is the calculation of the average income. The arithmetic mean can be misinterpreted as a median, which can lead to the conclusion that there are more people with more income than there really are. "Mean" income is interpreted in such a way that most people's incomes are close to this number. This "average" (in the sense of the arithmetic mean) income is higher than the income of most people, since a high income with a large deviation from the average makes the arithmetic mean strongly skewed (in contrast, the median income "resists" such a skew). However, this "average" income says nothing about the number of people near the median income (and says nothing about the number of people near the modal income). However, if the concepts of "average" and "majority" are taken lightly, then one can incorrectly conclude that most people have incomes higher than they actually are. For example, a report on the "average" net income in Medina, Washington, calculated as the arithmetic average of all annual net incomes of residents, will give a surprisingly high number due to Bill Gates. Consider the sample (1, 2, 2, 2, 3, 9). The arithmetic mean is 3.17, but five of the six values ​​are below this mean.

Compound interest

Main article: ROI

If numbers multiply, but not fold, you need to use the geometric mean, not the arithmetic mean. Most often, this incident happens when calculating the return on investment in finance.

For example, if stocks fell 10% in the first year and rose 30% in the second year, then it is incorrect to calculate the "average" increase over these two years as the arithmetic mean (−10% + 30%) / 2 = 10%; the correct average in this case is given by the compound annual growth rate, from which the annual growth is only about 8.16653826392% ≈ 8.2%.

The reason for this is that percentages have a new starting point each time: 30% is 30% from a number less than the price at the beginning of the first year: if the stock started at $30 and fell 10%, it is worth $27 at the start of the second year. If the stock is up 30%, it is worth $35.1 at the end of the second year. The arithmetic average of this growth is 10%, but since the stock has only grown by $5.1 in 2 years, an average increase of 8.2% gives a final result of $35.1:

[$30 (1 - 0.1) (1 + 0.3) = $30 (1 + 0.082) (1 + 0.082) = $35.1]. If we use the arithmetic mean of 10% in the same way, we will not get the actual value: [$30 (1 + 0.1) (1 + 0.1) = $36.3].

Compound interest at the end of year 2: 90% * 130% = 117% , i.e. a total increase of 17%, and the average annual compound interest is 117% ≈ 108.2% (\displaystyle (\sqrt (117\%))\approx 108.2\%) , that is, an average annual increase of 8.2%.

Directions

Main article: Destination statistics

When calculating the arithmetic mean of some variable that changes cyclically (for example, phase or angle), special care should be taken. For example, the average of 1° and 359° would be 1 ∘ + 359 ∘ 2 = (\displaystyle (\frac (1^(\circ )+359^(\circ ))(2))=) 180°. This number is incorrect for two reasons.

  • First, angular measures are only defined for the range from 0° to 360° (or from 0 to 2π when measured in radians). Thus, the same pair of numbers could be written as (1° and −1°) or as (1° and 719°). The averages of each pair will be different: 1 ∘ + (− 1 ∘) 2 = 0 ∘ (\displaystyle (\frac (1^(\circ )+(-1^(\circ )))(2))=0 ^(\circ )) , 1 ∘ + 719 ∘ 2 = 360 ∘ (\displaystyle (\frac (1^(\circ )+719^(\circ ))(2))=360^(\circ )) .
  • Second, in this case, a value of 0° (equivalent to 360°) would be the geometrically best mean, since the numbers deviate less from 0° than from any other value (value 0° has the smallest variance). Compare:
    • the number 1° deviates from 0° by only 1°;
    • the number 1° deviates from the calculated average of 180° by 179°.

The average value for a cyclic variable, calculated according to the above formula, will be artificially shifted relative to the real average to the middle of the numerical range. Because of this, the average is calculated in a different way, namely, the number with the smallest variance (the center point) is chosen as the average value. Also, instead of subtracting, modulo distance (i.e., circumferential distance) is used. For example, the modular distance between 1° and 359° is 2°, not 358° (on a circle between 359° and 360°==0° - one degree, between 0° and 1° - also 1°, in total - 2 °).

Weighted average - what is it and how to calculate it?

In the process of studying mathematics, students get acquainted with the concept of the arithmetic mean. In the future, in statistics and some other sciences, students are also faced with the calculation of other averages. What can they be and how do they differ from each other?

Averages: Meaning and Differences

Not always accurate indicators give an understanding of the situation. In order to assess this or that situation, it is sometimes necessary to analyze a huge number of figures. And then averages come to the rescue. They allow you to assess the situation in general.


Since school days, many adults remember the existence of the arithmetic mean. It is very easy to calculate - the sum of a sequence of n terms is divisible by n. That is, if you need to calculate the arithmetic mean in the sequence of values ​​27, 22, 34 and 37, then you need to solve the expression (27 + 22 + 34 + 37) / 4, since 4 values ​​\u200b\u200bare used in the calculations. In this case, the desired value will be equal to 30.

Often, as part of the school course, the geometric mean is also studied. The calculation of this value is based on extracting the root of the nth degree from the product of n terms. If we take the same numbers: 27, 22, 34 and 37, then the result of the calculations will be 29.4.

harmonic mean in general education school usually not the subject of study. However, it is used quite often. This value is the reciprocal of the arithmetic mean and is calculated as a quotient of n - the number of values ​​and the sum 1/a 1 +1/a 2 +...+1/a n . If we again take the same series of numbers for calculation, then the harmonic will be 29.6.

Weighted Average: Features

However, all of the above values ​​may not be used everywhere. For example, in statistics, when calculating some average values, the "weight" of each number used in the calculation plays an important role. The results are more revealing and correct because they take into account more information. This group of values ​​is collectively referred to as the "weighted average". They are not passed at school, so it is worth dwelling on them in more detail.

First of all, it is worth explaining what is meant by the "weight" of a particular value. The easiest way to explain this is to specific example. The body temperature of each patient is measured twice a day in the hospital. Of the 100 patients in different departments of the hospital, 44 will have a normal temperature - 36.6 degrees. Another 30 will have an increased value - 37.2, 14 - 38, 7 - 38.5, 3 - 39, and the remaining two - 40. And if we take the arithmetic mean, then this value in general for the hospital will be over 38 degrees! But almost half of the patients have a completely normal temperature. And here it would be more correct to use the weighted average, and the "weight" of each value will be the number of people. In this case, the result of the calculation will be 37.25 degrees. The difference is obvious.

In the case of weighted average calculations, the "weight" can be taken as the number of shipments, the number of people working on a given day, in general, anything that can be measured and affect the final result.

Varieties

The weighted average corresponds to the arithmetic average discussed at the beginning of the article. However, the first value, as already mentioned, also takes into account the weight of each number used in the calculations. In addition, there are also weighted geometric and harmonic values.

There is another interesting variety used in series of numbers. It's about about the weighted moving average. It is on its basis that trends are calculated. In addition to the values ​​themselves and their weight, periodicity is also used there. And when calculating the average value at some point in time, values ​​​​for previous time periods are also taken into account.

Calculating all these values ​​is not that difficult, but in practice, only the usual weighted average is usually used.

Calculation methods

In the age of computerization, there is no need to manually calculate the weighted average. However, it would be useful to know the calculation formula so that you can check and, if necessary, correct the results obtained.

It will be easiest to consider the calculation on a specific example.

It is necessary to find out what is the average wage at this enterprise, taking into account the number of workers receiving a particular salary.

So, the calculation of the weighted average is carried out using the following formula:

x = (a 1 *w 1 +a 2 *w 2 +...+a n *w n)/(w 1 +w 2 +...+w n)

For example, the calculation would be:

x = (32*20+33*35+34*14+40*6)/(20+35+14+6) = (640+1155+476+240)/75 = 33.48

Obviously, there is no particular difficulty in manually calculating the weighted average. The formula for calculating this value in one of the most popular applications with formulas - Excel - looks like the SUMPRODUCT (series of numbers; series of weights) / SUM (series of weights) function.

How to find average value in excel?

how to find arithmetic mean in excel?

Vladimir09854

As easy as pie. In order to find the average value in excel, you only need 3 cells. In the first we write one number, in the second - another. And in the third cell we will score a formula that will give us the average value between these two numbers from the first and second cells. If cell No. 1 is called A1, cell No. 2 is called B1, then in the cell with the formula you need to write like this:

This formula calculates the arithmetic mean of two numbers.

For the beauty of our calculations, we can highlight the cells with lines, in the form of a plate.

There is also a function in Excel itself to determine the average value, but I use the old-fashioned method and enter the formula I need. Thus, I am sure that Excel will calculate exactly as I need, and will not come up with some kind of rounding of its own.

M3sergey

This is very easy if the data is already entered into the cells. If you are just interested in a number, just select the desired range / ranges, and the value of the sum of these numbers, their arithmetic mean and their number will appear in the status bar at the bottom right.

You can select an empty cell, click on the triangle (drop-down list) "Autosum" and select "Average" there, after which you will agree with the proposed range for calculation, or choose your own.

Finally, you can use the formulas directly - click "Insert Function" next to the formula bar and cell address. The AVERAGE function is in the "Statistical" category, and takes as arguments both numbers and cell references, etc. There you can also choose more complex options, for example, AVERAGEIF - calculation of the average by condition.

Find average in excel is a fairly simple task. Here you need to understand whether you want to use this average value in some formulas or not.

If you need to get only the value, then it is enough to select the required range of numbers, after which excel will automatically calculate the average value - it will be displayed in the status bar, the heading "Average".

In the case when you want to use the result in formulas, you can do this:

1) Sum the cells using the SUM function and divide it all by the number of numbers.

2) A more correct option is to use a special function called AVERAGE. The arguments to this function can be numbers given sequentially, or a range of numbers.

Vladimir Tikhonov

circle the values ​​​​that will be used in the calculation, click the "Formulas" tab, there you will see "AutoSum" on the left and next to it a downward-pointing triangle. click on this triangle and choose "Average". Voila, done) at the bottom of the column you will see the average value :)

Ekaterina Mutalapova

Let's start at the beginning and in order. What does average mean?

The mean value is the value that is the arithmetic mean, i.e. is calculated by adding a set of numbers and then dividing the total sum of numbers by their number. For example, for the numbers 2, 3, 6, 7, 2 it will be 4 (the sum of the numbers 20 is divided by their number 5)

In an Excel spreadsheet, for me personally, the easiest way was to use the formula =AVERAGE. To calculate the average value, you need to enter data into the table, write the function =AVERAGE() under the data column, and in brackets indicate the range of numbers in the cells, highlighting the column with the data. After that, press ENTER, or simply left-click on any cell. The result will be displayed in the cell below the column. On the face of it, the description is incomprehensible, but in fact it is a matter of minutes.

Adventurer 2000

The Excel program is multi-faceted, so there are several options that will allow you to find the average:

First option. You simply sum all the cells and divide by their number;

Second option. Use a special command, write in the required cell the formula "=AVERAGE (and here specify the range of cells)";

Third option. If you select the required range, then note that on the page below, the average value in these cells is also displayed.

Thus, there are a lot of ways to find the average value, you just need to choose the best one for you and use it constantly.

In Excel, using the AVERAGE function, you can calculate the simple arithmetic mean. To do this, you need to enter a number of values. Press equals and select in the Statistical category, among which select the AVERAGE function



Also, using statistical formulas, you can calculate the arithmetic weighted average, which is considered more accurate. To calculate it, we need the values ​​​​of the indicator and the frequency.

How to find the average in Excel?

The situation is this. There is the following table:

The columns shaded in red contain the numerical values ​​of the grades for the subjects. In the "Average" column, you need to calculate their average value.
The problem is this: there are 60-70 objects in total and some of them are on another sheet.
I looked in another document, the average has already been calculated, and in the cell there is a formula like
="sheet name"!|E12
but this was done by some programmer who got fired.
Tell me, please, who understands this.

Hector

In the line of functions, you insert "AVERAGE" from the proposed functions and choose from where they need to be calculated (B6: N6) for Ivanov, for example. I don’t know for sure about neighboring sheets, but for sure this is contained in the standard Windows help

Tell me how to calculate the average value in Word

Please tell me how to calculate the average value in Word. Namely, the average value of the ratings, and not the number of people who received ratings.


Yulia pavlova

Word can do a lot with macros. Press ALT+F11 and write a macro program..
In addition, Insert-Object... will allow you to use other programs, even Excel, to create a sheet with a table inside a Word document.
But in this case, you need to write down your numbers in the table column, and put the average in the bottom cell of the same column, right?
To do this, insert a field into the bottom cell.
Insert-Field...-Formula
Field content
[=AVERAGE(ABOVE)]
returns the average of the sum of the cells above.
If the field is selected and the right mouse button is pressed, then it can be Updated if the numbers have changed,
view the code or field value, change the code directly in the field.
If something goes wrong, delete the entire field in the cell and re-create it.
AVERAGE means average, ABOVE - about, that is, a row of cells above.
I did not know all this myself, but I easily found it in HELP, of course, thinking a little.

In most cases, the data is concentrated around some central point. Thus, to describe any data set, it is enough to indicate the average value. Consider successively three numerical characteristics that are used to estimate the mean value of the distribution: arithmetic mean, median and mode.

Average

The arithmetic mean (often referred to simply as the mean) is the most common estimate of the mean of a distribution. It is the result of dividing the sum of all observed numerical values ​​by their number. For a sample of numbers X 1, X 2, ..., Xn, the sample mean (denoted by the symbol ) equals \u003d (X 1 + X 2 + ... + Xn) / n, or

where is the sample mean, n- sample size, Xii-th element samples.

Download note in or format, examples in format

Consider calculating the arithmetic average of the five-year average annual returns of 15 very high-risk mutual funds (Figure 1).

Rice. 1. Average annual return on 15 very high-risk mutual funds

The sample mean is calculated as follows:

This is a good return, especially when compared to the 3-4% return that bank or credit union depositors received over the same time period. If you sort the return values, it is easy to see that eight funds have a return above, and seven - below the average. The arithmetic mean acts as a balance point, so that low-income funds balance out high-income funds. All elements of the sample are involved in the calculation of the average. None of the other estimators of the distribution mean have this property.

When to calculate the arithmetic mean. Since the arithmetic mean depends on all elements of the sample, the presence of extreme values ​​significantly affects the result. In such situations, the arithmetic mean can distort the meaning of the numerical data. Therefore, when describing a data set containing extreme values, it is necessary to indicate the median or the arithmetic mean and the median. For example, if the return of the RS Emerging Growth fund is removed from the sample, the sample average of the return of the 14 funds decreases by almost 1% to 5.19%.

Median

The median is the middle value of an ordered array of numbers. If the array does not contain repeating numbers, then half of its elements will be less than and half more than the median. If the sample contains extreme values, it is better to use the median rather than the arithmetic mean to estimate the mean. To calculate the median of a sample, it must first be sorted.

This formula is ambiguous. Its result depends on whether the number is even or odd. n:

  • If the sample contains an odd number of items, the median is (n+1)/2-th element.
  • If the sample contains an even number of elements, the median lies between the two middle elements of the sample and is equal to the arithmetic mean calculated over these two elements.

To calculate the median for a sample of 15 very high-risk mutual funds, we first need to sort the raw data (Figure 2). Then the median will be opposite the number of the middle element of the sample; in our example number 8. Excel has a special function =MEDIAN() that works with unordered arrays too.

Rice. 2. Median 15 funds

Thus, the median is 6.5. This means that half of the very high-risk funds do not exceed 6.5, while the other half do so. Note that the median of 6.5 is slightly larger than the median of 6.08.

If we remove the profitability of the RS Emerging Growth fund from the sample, then the median of the remaining 14 funds will decrease to 6.2%, that is, not as significantly as the arithmetic mean (Fig. 3).

Rice. 3. Median 14 funds

Fashion

The term was first introduced by Pearson in 1894. Fashion is the number that occurs most often in the sample (the most fashionable). Fashion describes well, for example, the typical reaction of drivers to a traffic signal to stop traffic. A classic example of the use of fashion is the choice of the size of the produced batch of shoes or the color of the wallpaper. If a distribution has multiple modes, then it is said to be multimodal or multimodal (has two or more "peaks"). The multimodality of distribution gives important information about the nature of the variable under study. For example, in sociological surveys, if a variable represents a preference or attitude towards something, then multimodality could mean that there are several distinctly different opinions. Multimodality is also an indicator that the sample is not homogeneous and that the observations may be generated by two or more "overlapped" distributions. Unlike the arithmetic mean, outliers do not affect the mode. For continuously distributed random variables, such as the average annual returns of mutual funds, the mode sometimes does not exist at all (or does not make sense). Since these indicators can take on a variety of values, repeating values ​​are extremely rare.

Quartiles

Quartiles are measures that are most commonly used to evaluate the distribution of data when describing the properties of large numerical samples. While the median splits the ordered array in half (50% of the array elements are less than the median and 50% are greater), quartiles break the ordered dataset into four parts. The Q 1 , median and Q 3 values ​​are the 25th, 50th and 75th percentile, respectively. The first quartile Q 1 is a number that divides the sample into two parts: 25% of the elements are less than, and 75% are more than the first quartile.

The third quartile Q 3 is a number that also divides the sample into two parts: 75% of the elements are less than, and 25% are more than the third quartile.

To calculate quartiles in versions of Excel prior to 2007, the function =QUARTILE(array, part) was used. Starting with Excel 2010, two functions apply:

  • =QUARTILE.ON(array, part)
  • =QUARTILE.EXC(array, part)

These two functions give slightly different values ​​(Figure 4). For example, when calculating the quartiles of a sample containing data on the average annual return of 15 very high-risk mutual funds, Q 1 = 1.8 or -0.7 for QUARTILE.INC and QUARTILE.EXC, respectively. By the way, the QUARTILE function used earlier corresponds to the modern QUARTILE.ON function. To calculate quartiles in Excel using the above formulas, the data array can be left unordered.

Rice. 4. Calculate quartiles in Excel

Let's emphasize again. Excel can calculate quartiles for univariate discrete series, containing the values ​​of a random variable. The calculation of quartiles for a frequency-based distribution is given in the section below.

geometric mean

Unlike the arithmetic mean, the geometric mean measures how much a variable has changed over time. The geometric mean is the root n th degree from the product n values ​​(in Excel, the function = CUGEOM is used):

G= (X 1 * X 2 * ... * X n) 1/n

A similar parameter - the geometric mean of the rate of return - is determined by the formula:

G \u003d [(1 + R 1) * (1 + R 2) * ... * (1 + R n)] 1 / n - 1,

where R i- rate of return i-th period of time.

For example, suppose the initial investment is $100,000. By the end of the first year, it drops to $50,000, and by the end of the second year, it recovers to the original $100,000. The rate of return on this investment over a two-year period is equal to 0, since the initial and final amount of funds are equal to each other. However, the arithmetic average of annual rates of return is = (-0.5 + 1) / 2 = 0.25 or 25%, since the rate of return in the first year R 1 = (50,000 - 100,000) / 100,000 = -0.5 , and in the second R 2 = (100,000 - 50,000) / 50,000 = 1. At the same time, the geometric mean of the rate of return for two years is: G = [(1–0.5) * (1 + 1 )] 1/2 – 1 = ½ – 1 = 1 – 1 = 0. Thus, the geometric mean more accurately reflects the change (more precisely, no change) in the volume of investment over the biennium than the arithmetic mean.

Interesting Facts. First, the geometric mean will always be less than the arithmetic mean of the same numbers. Except for the case when all the taken numbers are equal to each other. Secondly, having considered the properties of a right triangle, one can understand why the mean is called geometric. The height of a right-angled triangle, lowered to the hypotenuse, is the average proportional between the projections of the legs on the hypotenuse, and each leg is the average proportional between the hypotenuse and its projection on the hypotenuse (Fig. 5). This gives a geometric way of constructing the geometric mean of two (lengths) segments: you need to build a circle on the sum of these two segments as a diameter, then the height, restored from the point of their connection to the intersection with the circle, will give the desired value:

Rice. 5. The geometric nature of the geometric mean (figure from Wikipedia)

The second important property of numerical data is their variation characterizing the degree of dispersion of the data. Two different samples can differ both in mean values ​​and in variations. However, as shown in fig. 6 and 7, two samples can have the same variation but different means, or the same mean and completely different variation. The data corresponding to polygon B in Fig. 7 change much less than the data from which polygon A was built.

Rice. 6. Two symmetric bell-shaped distributions with the same spread and different mean values

Rice. 7. Two symmetric bell-shaped distributions with the same mean values ​​and different scatter

There are five estimates of data variation:

  • span,
  • interquartile range,
  • dispersion,
  • standard deviation,
  • the coefficient of variation.

scope

The range is the difference between the largest and smallest elements of the sample:

Swipe = XMax-XMin

The range of a sample containing data on the average annual returns of 15 very high-risk mutual funds can be calculated using an ordered array (see Figure 4): range = 18.5 - (-6.1) = 24.6. This means that the difference between the highest and lowest average annual returns for very high risk funds is 24.6%.

The range measures the overall spread of the data. Although the sample range is a very simple estimate of the total spread of the data, its weakness is that it does not take into account exactly how the data is distributed between the minimum and maximum elements. This effect is well seen in Fig. 8 which illustrates samples having the same range. The B scale shows that if the sample contains at least one extreme value, the sample range is a very inaccurate estimate of the spread of the data.

Rice. 8. Comparison of three samples with the same range; the triangle symbolizes the support of the balance, and its location corresponds to the average value of the sample

Interquartile range

The interquartile, or mean, range is the difference between the third and first quartiles of the sample:

Interquartile range \u003d Q 3 - Q 1

This value makes it possible to estimate the spread of 50% of the elements and not to take into account the influence of extreme elements. The interquartile range for a sample containing data on the average annual returns of 15 very high-risk mutual funds can be calculated using the data in Fig. 4 (for example, for the function QUARTILE.EXC): Interquartile range = 9.8 - (-0.7) = 10.5. The interval between 9.8 and -0.7 is often referred to as the middle half.

It should be noted that the Q 1 and Q 3 values, and hence the interquartile range, do not depend on the presence of outliers, since their calculation does not take into account any value that would be less than Q 1 or greater than Q 3 . The total quantitative characteristics, such as the median, the first and third quartiles, and the interquartile range, which are not affected by outliers, are called robust indicators.

While the range and interquartile range provide an estimate of the total and mean scatter of the sample, respectively, neither of these estimates takes into account exactly how the data are distributed. Variance and standard deviation free from this shortcoming. These indicators allow you to assess the degree of fluctuation of the data around the mean. Sample variance is an approximation of the arithmetic mean calculated from the squared differences between each sample element and the sample mean. For a sample of X 1 , X 2 , ... X n the sample variance (denoted by the symbol S 2 is given by the following formula:

In general, the sample variance is the sum of the squared differences between the sample elements and the sample mean, divided by a value equal to the sample size minus one:

where - arithmetic mean, n- sample size, X i - i-th sample element X. In Excel before version 2007, the function =VAR() was used to calculate the sample variance, since version 2010, the function =VAR.V() is used.

The most practical and widely accepted estimate of data scatter is standard deviation. This indicator is denoted by the symbol S and is equal to square root from the sample variance:

In Excel before version 2007, the =STDEV() function was used to calculate the standard deviation, from version 2010 the =STDEV.B() function is used. To calculate these functions, the data array can be unordered.

Neither the sample variance nor the sample standard deviation can be negative. The only situation in which the indicators S 2 and S can be zero is if all elements of the sample are equal. In this completely incredible case range and interquartile range are also zero.

Numeric data is inherently volatile. Any variable can take on a set different values. For example, different mutual funds have different rates of return and loss. Due to the variability of numerical data, it is very important to study not only estimates of the mean, which are summative in nature, but also estimates of the variance, which characterize the scatter of the data.

The variance and standard deviation allow us to estimate the spread of data around the mean, in other words, to determine how many elements of the sample are less than the mean, and how many are greater. The dispersion has some valuable mathematical properties. However, its value is the square of a unit of measure - a square percentage, a square dollar, a square inch, etc. Therefore, a natural estimate of the variance is the standard deviation, which is expressed in the usual units of measurement - percent of income, dollars or inches.

The standard deviation allows you to estimate the amount of fluctuation of the sample elements around the mean value. In almost all situations, the majority of observed values ​​lie within plus or minus one standard deviation from the mean. Therefore, knowing the arithmetic mean of the sample elements and the standard sample deviation, it is possible to determine the interval to which the bulk of the data belongs.

The standard deviation of returns on 15 very high-risk mutual funds is 6.6 (Figure 9). This means that the profitability of the bulk of funds differs from the average value by no more than 6.6% (i.e., it fluctuates in the range from – S= 6.2 – 6.6 = –0.4 to +S= 12.8). In fact, this interval contains a five-year average annual return of 53.3% (8 out of 15) of funds.

Rice. 9. Standard deviation

Note that in the process of summing the squared differences, items that are farther from the mean gain more weight than items that are closer. This property is the main reason why the arithmetic mean is most often used to estimate the mean of a distribution.

The coefficient of variation

Unlike previous scatter estimates, the coefficient of variation is a relative estimate. It is always measured as a percentage, not in the original data units. The coefficient of variation, denoted by the symbols CV, measures the scatter of the data around the mean. The coefficient of variation is equal to the standard deviation divided by the arithmetic mean and multiplied by 100%:

where S- standard sample deviation, - sample mean.

The coefficient of variation allows you to compare two samples, the elements of which are expressed in different units of measurement. For example, the manager of a mail delivery service intends to upgrade the fleet of trucks. When loading packages, there are two types of restrictions to consider: the weight (in pounds) and the volume (in cubic feet) of each package. Assume that in a sample of 200 bags, the average weight is 26.0 pounds, the standard deviation of the weight is 3.9 pounds, the average package volume is 8.8 cubic feet, and the standard deviation of the volume is 2.2 cubic feet. How to compare the spread of weight and volume of packages?

Since the units of measurement for weight and volume differ from each other, the manager must compare the relative spread of these values. The weight variation coefficient is CV W = 3.9 / 26.0 * 100% = 15%, and the volume variation coefficient CV V = 2.2 / 8.8 * 100% = 25% . Thus, the relative scatter of packet volumes is much larger than the relative scatter of their weights.

Distribution form

The third important property of the sample is the form of its distribution. This distribution can be symmetrical or asymmetric. To describe the shape of a distribution, it is necessary to calculate its mean and median. If these two measures are the same, the variable is said to be symmetrically distributed. If the mean value of a variable is greater than the median, its distribution has a positive skewness (Fig. 10). If the median is greater than the mean, the distribution of the variable is negatively skewed. Positive skewness occurs when the mean increases to unusually high values. Negative skewness occurs when the mean decreases to unusually small values. A variable is symmetrically distributed if it does not take on any extreme values ​​in either direction, such that large and small values ​​of the variable cancel each other out.

Rice. 10. Three types of distributions

The data depicted on the A scale have a negative skewness. This figure shows a long tail and left skew caused by unusually small values. These extremely small values ​​shift the mean value to the left, and it becomes less than the median. The data shown on scale B are distributed symmetrically. The left and right halves of the distribution are their mirror images. Large and small values ​​balance each other, and the mean and median are equal. The data shown on scale B has a positive skewness. This figure shows a long tail and skew to the right, caused by the presence of unusually high values. These too large values ​​shift the mean to the right, and it becomes larger than the median.

In Excel, descriptive statistics can be obtained using the add-in Analysis package. Go through the menu DataData analysis, in the window that opens, select the line Descriptive statistics and click Ok. In the window Descriptive statistics be sure to indicate input interval(Fig. 11). If you want to see descriptive statistics on the same sheet as the original data, select the radio button output interval and specify the cell where you want to place the upper left corner of the displayed statistics (in our example, $C$1). If you want to output data to a new sheet or to a new workbook, simply select the appropriate radio button. Check the box next to Final statistics. Optionally, you can also choose Difficulty level,k-th smallest andk-th largest.

If on deposit Data in area Analysis you don't see the icon Data analysis, you must first install the add-on Analysis package(see, for example,).

Rice. 11. Descriptive statistics of the five-year average annual returns of funds with very high levels of risk, calculated using the add-on Data analysis Excel programs

Excel calculates a number of statistics discussed above: mean, median, mode, standard deviation, variance, range ( interval), minimum, maximum, and sample size ( check). In addition, Excel calculates some new statistics for us: standard error, kurtosis, and skewness. standard error equals the standard deviation divided by the square root of the sample size. asymmetry characterizes the deviation from the symmetry of the distribution and is a function that depends on the cube of differences between the elements of the sample and the mean value. Kurtosis is a measure of the relative concentration of data around the mean versus the tails of the distribution, and depends on the differences between the sample and the mean raised to the fourth power.

Calculation of descriptive statistics for the general population

The mean, scatter, and shape of the distribution discussed above are sample-based characteristics. However, if the dataset contains numerical measurements of the entire population, then its parameters can be calculated. These parameters include the mean, variance, and standard deviation of the population.

Expected value is equal to the sum of all values ​​of the general population divided by the volume of the general population:

where µ - expected value, Xi- i-th variable observation X, N- the volume of the general population. In Excel, to calculate the mathematical expectation, the same function is used as for the arithmetic mean: =AVERAGE().

Population variance equal to the sum of the squared differences between the elements of the general population and mat. expectation divided by the size of the general population:

where σ2 is the variance of the general population. In Excel prior to version 2007, the =VAR() function is used to calculate population variance, starting with version 2010 =VAR.G().

population standard deviation equals the square root of the population variance:

Prior to Excel 2007, the function =SDV() was used to calculate the population standard deviation, from version 2010 =SDV.Y(). Note that the formulas for population variance and standard deviation are different from the formulas for sample variance and standard deviation. When calculating sample statistics S2 and S the denominator of the fraction is n - 1, and when calculating the parameters σ2 and σ - the volume of the general population N.

rule of thumb

In most situations, a large proportion of observations are concentrated around the median, forming a cluster. In data sets with positive skewness, this cluster is located to the left (i.e., below) the mathematical expectation, and in sets with negative skewness, this cluster is located to the right (i.e., above) of the mathematical expectation. Symmetric data have the same mean and median, and the observations cluster around the mean, forming a bell-shaped distribution. If the distribution does not have a pronounced skewness, and the data is concentrated around a certain center of gravity, a rule of thumb can be used to estimate variability, which says: if the data has a bell-shaped distribution, then approximately 68% of the observations are within one standard deviation of the mathematical expectation, Approximately 95% of the observations are within two standard deviations of the expected value, and 99.7% of the observations are within three standard deviations of the expected value.

Thus, the standard deviation, which is an estimate of the average fluctuation around the mathematical expectation, helps to understand how the observations are distributed and to identify outliers. It follows from the rule of thumb that for bell-shaped distributions, only one value in twenty differs from the mathematical expectation by more than two standard deviations. Therefore, values ​​outside the interval µ ± 2σ, can be considered outliers. In addition, only three out of 1000 observations differ from the expected value by more than three standard deviations. Thus, values ​​outside the interval µ ± 3σ are almost always outliers. For distributions that are highly skewed or not bell-shaped, the Biename-Chebyshev rule of thumb can be applied.

More than a hundred years ago, the mathematicians Bienamay and Chebyshev independently discovered a useful property of the standard deviation. They found that for any data set, regardless of the shape of the distribution, the percentage of observations that lie at a distance not exceeding k standard deviations from mathematical expectation, not less (1 – 1/ 2)*100%.

For example, if k= 2, the Biename-Chebyshev rule states that at least (1 - (1/2) 2) x 100% = 75% of the observations must lie in the interval µ ± 2σ. This rule is true for any k exceeding one. The Biename-Chebyshev rule is of a very general nature and is valid for distributions of any kind. It indicates minimal amount observations, the distance from which to the mathematical expectation does not exceed a given value. However, if the distribution is bell-shaped, the rule of thumb more accurately estimates the concentration of data around the mean.

Computing descriptive statistics for a frequency-based distribution

If the original data is not available, the frequency distribution becomes the only source of information. In such situations, it is possible to calculate approximate values ​​of quantitative indicators of the distribution, such as the arithmetic mean, standard deviation, quartiles.

If the sample data is presented as a frequency distribution, an approximate value of the arithmetic mean can be calculated, assuming that all values ​​within each class are concentrated at the midpoint of the class:

where - sample mean, n- number of observations, or sample size, with- the number of classes in the frequency distribution, mj- middle point j-th class, fj- frequency corresponding to j-th class.

To calculate the standard deviation from the frequency distribution, it is also assumed that all values ​​within each class are concentrated at the midpoint of the class.

To understand how the quartiles of the series are determined based on frequencies, let us consider the calculation of the lower quartile based on data for 2013 on the distribution of the Russian population by average per capita cash income (Fig. 12).

Rice. 12. The share of the population of Russia with per capita monetary income on average per month, rubles

To calculate the first quartile of the interval variation series, you can use the formula:

where Q1 is the value of the first quartile, xQ1 is the lower limit of the interval containing the first quartile (the interval is determined by the accumulated frequency, the first exceeding 25%); i is the value of the interval; Σf is the sum of the frequencies of the entire sample; probably always equal to 100%; SQ1–1 is the cumulative frequency of the interval preceding the interval containing the lower quartile; fQ1 is the frequency of the interval containing the lower quartile. The formula for the third quartile differs in that in all places, instead of Q1, you need to use Q3, and substitute ¾ instead of ¼.

In our example (Fig. 12), the lower quartile is in the range 7000.1 - 10,000, the cumulative frequency of which is 26.4%. The lower limit of this interval is 7000 rubles, the value of the interval is 3000 rubles, the accumulated frequency of the interval preceding the interval containing the lower quartile is 13.4%, the frequency of the interval containing the lower quartile is 13.0%. Thus: Q1 \u003d 7000 + 3000 * (¼ * 100 - 13.4) / 13 \u003d 9677 rubles.

Pitfalls associated with descriptive statistics

In this note, we looked at how to describe a data set using various statistics that estimate its mean, scatter, and distribution. The next step is to analyze and interpret the data. So far, we have studied the objective properties of data, and now we turn to their subjective interpretation. Two mistakes lie in wait for the researcher: an incorrectly chosen subject of analysis and an incorrect interpretation of the results.

An analysis of the performance of 15 very high-risk mutual funds is fairly unbiased. He led to completely objective conclusions: all mutual funds have different returns, the spread of fund returns ranges from -6.1 to 18.5, and the average return is 6.08. The objectivity of data analysis is ensured by the correct choice of total quantitative indicators of the distribution. Several methods for estimating the mean and scatter of data were considered, and their advantages and disadvantages were indicated. How to choose the right statistics that provide an objective and unbiased analysis? If the data distribution is slightly skewed, should the median be chosen over the arithmetic mean? Which indicator more accurately characterizes the spread of data: standard deviation or range? Should the positive skewness of the distribution be indicated?

On the other hand, data interpretation is a subjective process. Different people come to different conclusions, interpreting the same results. Everyone has their own point of view. Someone considers the total average annual returns of 15 funds with a very high level of risk to be good and is quite satisfied with the income received. Others may think that these funds have too low returns. Thus, subjectivity should be compensated by honesty, neutrality and clarity of conclusions.

Ethical Issues

Data analysis is inextricably linked to ethical issues. One should be critical of the information disseminated by newspapers, radio, television and the Internet. Over time, you will learn to be skeptical not only about the results, but also about the goals, subject and objectivity of research. The famous British politician Benjamin Disraeli said it best: “There are three kinds of lies: lies, damned lies and statistics.”

As noted in the note, ethical issues arise when choosing the results that should be presented in the report. Both positive and negative results should be published. In addition, when making a report or written report, the results must be presented honestly, neutrally and objectively. Distinguish between bad and dishonest presentations. To do this, it is necessary to determine what the intentions of the speaker were. Sometimes the speaker omits important information out of ignorance, and sometimes deliberately (for example, if he uses the arithmetic mean to estimate the mean of clearly skewed data in order to get the desired result). It is also dishonest to suppress results that do not correspond to the point of view of the researcher.

Materials from the book Levin et al. Statistics for managers are used. - M.: Williams, 2004. - p. 178–209

QUARTILE function retained to align with earlier versions of Excel


close