Section A.4 Intuition for Average and SD
A data list is a sequence of numbers, called the entries in the list. For example, the sequence \(X=1,2,2,7\) is a data list with four entries. The sum of a list is the total of all the entries. The average of a list is the sum of the list divided by the number of entries. For a given number \(a\text{,}\) the deviations of a list \(X\) from \(a\) is the list \(X\) with \(a\) subtracted from each value in X. We write the sum, average, and deviations like this for \(X=1,2,2,7\text{.}\)
\begin{align*}
\SUM(X)\amp = 1+2+2+7=12\\
\AVE(X) \amp =12/4 = 3 \\
(\text{deviations of } X \text{ from } 2) \amp =
-1,0,0,5 \\
(\text{deviations of } X \text{ from } 3) \amp = -2,-1,-1,4
\end{align*}
Here are two useful concepts to visualize a data list \(X\text{.}\)
- A box model for \(X\) is a box that contains tickets with the entries of the list printed on the tickets. We imagine a lottery game, taking random draws with replacement from the box. This means that every ticket in the box has the same chance to be selected on every draw.
- A dot plot for \(X\) is a number line with dots placed at locations given by the entries in the list. We imagine that the number line is a see-saw and that the dots are small, identical weights stuck to the see-saw. Physics says that the see-saw is balanced on a fulcrum at the point \(a\) if the sum of the deviations of \(X\) from \(a\) is zero.
Problems: Let \(X\) be the data list \(X=1,2,2,7\text{,}\) and suppose we take \(100\) random draws with replacement from the box model for \(X\text{.}\)
- Among the \(100\) random draws, what is the expected number of \(1\)’s that will be drawn? The expected number of \(2\)’s? The expected number of \(7\)’s?
- What is the expected sum total of all \(100\) draws?
- Suppose you win the amount of money, in dollars, that is equal to the sum total of the \(100\) random draws. What is the amount of your per-game winning? (Your per-game winning is the amount \(A\text{,}\) such that if you won the amount \(A\) \(100\) times in a row, would give you the same total as the \(100\) draws from the box model for \(X\text{?}\))
- Does the see-saw for the dot plot of \(X\) balance at the point \(2\text{?}\) At the point \(4\text{?}\) At the point \(3\text{?}\)
- Suppose that, for every one of the \(100\) draws, you have to guess a number before the draw (your guess does not have to be one of the values in the list \(X\)). After the number is drawn, you have to pay the square of the amount by which your guess is off, in dollars. For example, if you guess \(4.5\) and then a \(2\) is drawn, you have to pay \(2.5^2= \$6.25\text{.}\) It is not obvious, but it turns out that the best strategy for making your total cost as low as possible is to guess the average value of \(X\) every time. Try it out! Play the game \(10\) times, say, guessing \(2\) every time, and keep track of the total cost. Play another \(10\) times, guessing however you like. Then play \(10\) times, guessing the average of \(X\) every time. Now compare the total costs. Which one is best?