Tuesday, March 30, 2010

Precision by Averaging

Peter Newman points me to this claim by Chiefio that it is impossible to obtain 0.01°C precision by taking the average of measurements that are rounded to the nearest 1°C. Actually, he talks about °F, but it's the same argument. Chiefio appears to be overlooking two points. First, a thermometer that rounds to the nearest 1°C correctly must be able to distinguish between 1.49999°C and 1.50001°C. The perfectly-rounding thermometer is a perfectly-accurate thermometer. Second, every measurement has noise, and noise has some marvelous properties when it comes to taking averages.

Suppose we have an insulated box whose temperature is exactly constant at 20.23°C. We measure its temperature with a thermometer that is perfectly accurate but rounds to the nearest 1°C. So, we get a measurement of 20°C for the box, and we're wrong by 0.23°C. No matter how many perfectly-accurate thermometers we place in the box, we will always get a measurement of 20°C, and we will always be off by 0.23°C.

But thermometers are not perfect. Let's suppose we have a factory that produces thermometers that are on average perfectly accurate, but they each have an offset that is evenly-distributed in the range ±1°C. So far as we are concerned, the thermometer detects the temperature, adds the offset, and rounds to the nearest °C. To obtain the correct temperature, we subtract the offset from the thermometer reading. Half the thermometers have offset >0°C, a quarter have offset >0.5°C, and none have an offset >1°C.

Now we put 100 of these thermometers in our box and take the average of their measurements. Any thermometer with offset <0.27°C and >−0.73°C will give us a measurement of 20°C. Any thermometer with offset >0.27°C will give us a measurement of 21°C. And those with offset <−0.73°C will say 19°C. Applying our even distribution of offsets, we see that we have 13.5% saying 19°C, 50% saying 20°C, and 36.5% saying 21°C. So the average temperature of a large number of thermometers will be 0.135×19 + 0.5×20 + 0.365×21 = 20.23 °C, which is exactly correct.

In addition to permanent offsets, thermometers are subject to random errors from one measurement to the next. By taking many measurements with the same thermometer, we can obtain more precision. The underlying physical quantity tends to vary too, with short-term random fluctuations. These too we can overcome with many measurements, in many places, and at many times. The rounding error of a thermometer ends up being one of many sources of error.

In my field, we deal with rounding error all the time. We call it quantization noise. If we round to the nearest 1°C, the quantization noise is ±0.5°C, is evenly-distributed, and has standard deviation 1/√12 = 0.29 °C.

Our many evenly-distributed errors end up being added together. A consequence of the Central Limit Theorem is that the sum of many evenly-distributed errors will look like a gaussian distribution. So we tend to think of each measurement as being the correct physical value plus a random error with gaussian distribution. The arguments we have presented above will still work when applied to gaussian errors, because the gaussian distribution is symmetric.

So, despite Cheifio's skepticism, we can indeed obtain an exact measurement from a large number of thermometers that each round to the nearest °C.


  1. Dear Kevan,
    If by "Now we put 100 of these thermometers in our box and take the average of their measurements" that is what Chiefio refers to as "oversampling" then he agrees that you can get a greater accuracy than your actual accuracy. But to quote him directly "that requires measuring the same thing repeatedly. We don’t measure each day/location repeatedly. We measure it once. Then NOAA averages those data points (min / max) to get a daily average. Exactly two items are averaged. Then they take those daily averages for the month and average them to get a monthly average mean of the daily means. At most 31 items are averaged."
    Or am I on the wrong track entirely with this?
    Cheers, Peter

  2. He says they take the average of min and max, then of the 31 averages of these min and max. That's 60 readings being averaged. It does not matter whether you take the average for each day then average the days, or if you take the average minimum and the average maximum and then take the average of those two, or any other combination. After a month, you have taken the average of up to 62 measurements.

    Suppose this thermometer has a permanent offset. No problem: you look at the trend in the measurements made by this thermometer and the offset drops out of the trend.

    Suppose your thermometer and the local climate have a random variability on top of the month-to-month trend (should such a trend exist). Call this the stochastic error, which means random from one measurement to the next. Taking the average of 60 measurements gives us a √60 times improvement in precision. So if the individual measurements were rounded to 1°C, the monthly average is good to 1/√12/√60 = 0.03°C (where the square root of 12 is from a characteristic of the rounding error distribution, see the post above).

  3. Dear Kevan,
    “After a month, you have taken the average of up to 62 measurements.” . Of 31 discrete states, given that a state is the average temperature for a particular day. Is that the same as – After a month, we have taken the average of 31 averages. - So the monthly average can be more accurate than any of the daily averages just by adding the daily averages together and dividing by 31? That may not be the point Chiefio is disputing but only my interpretation of it.
    E.g. in the 31 day example below, the monthly average that NOAA calculates (based on whole numbers) is not as accurate to 2 decimal places than it would be without them.
    Cheers, Peter

    Actual Max Actual Min Read Max Read Min Average Real Average
    31.5 15.1 32 15 24 23.30
    32.1 15.8 32 16 24 23.96
    32.5 15.7 33 16 25 24.11
    32.0 15.4 32 15 24 23.69
    32.5 15.5 33 16 25 23.99
    32.5 15.7 33 16 25 24.09
    31.6 16.1 32 16 24 23.84
    31.5 15.6 32 16 24 23.54
    32.3 15.9 32 16 24 24.10
    31.7 15.1 32 15 24 23.39
    32.4 15.6 32 16 24 24.02
    32.4 15.5 32 16 24 23.94
    31.8 15.5 32 16 24 23.67
    31.8 15.4 32 15 24 23.61
    31.9 15.7 32 16 24 23.81
    32.4 15.4 32 15 24 23.91
    31.6 15.2 32 15 24 23.43
    31.6 15.8 32 16 24 23.70
    32.1 15.2 32 15 24 23.65
    32.5 15.9 33 16 25 24.16
    31.8 15.8 32 16 24 23.80
    31.5 15.7 32 16 24 23.59
    31.7 15.3 32 15 24 23.49
    31.7 15.2 32 15 24 23.47
    31.5 15.6 32 16 24 23.56
    32.4 15.5 32 16 24 23.92
    31.8 15.5 32 16 24 23.65
    31.5 15.6 32 16 24 23.55
    31.5 15.2 32 15 24 23.37
    32.1 15.5 32 16 24 23.78
    31.9 15.8 32 16 24 23.84

    Average NOAA gives GISS 24.12

    Real average 23.74

  4. Nice work, Mr. Newnam. Thank for these enlightening data points. First, let me say that in my previous reply, I rounded the expected error incorrectly: it's 0.04°C, not 0.03°C.

    My calculation assumes that the rounding error is random from one measurement to the next, which requires that the day-to-day fluctuations in max and min temperature are larger than the rounding error. If there is no change in the temperature, we have the situation I described with the insulated box: we repeat the same rounding error over and over again, so we get nothing from taking averages.

    Let's look at the data you provided. The fluctuations from day to day are small: standard deviation is 0.38°C for your max and 0.25°C for your min. The standard deviation of a 1°C rounding error is 1/√12 = 0.29°C. In the case of your data, we do not gain much by taking averages.

    But the standard deviation of daily maximum temperature is higher than 0.38°C in my experience. In Boston last month, the daily max temperature varied from 0°C to 24°C, with standard deviation of around 6°C, which is over ten times higher than in your data set.

    Suppose we add to your data random noise evenly distributed in the range ±6°C. I did this in Excel. With your original data, the average max is 31.9°C without rounding and 32.1°C with rounding. But when we add the random noise, the average max is (in one particular calculation) 31.78°C without rounding and 31.81°C with rounding.

    Without noise, rounding causes an error of 0.3°C. With noise, rounding causes an error of 0.03°C. Note that the addition of noise does change the average. In this particular calculation, the average max went from 31.9°C to 31.8°C. But overall, we are better off as a result of adding noise.

    In the old days of electrical engineering, we we had fast analog-to-digital converters that produced a number between 0 and 255, for a precision of 0.4%. In order to obtain more precision when measuring a fixed voltage, we sometimes added white noise to the fixed voltage, measured the result 1000 times, and took the average in a computer. This average gave us a precision of 0.3%/√1000 = 0.01%.

    Pretty strange, isn't it? You can see why Chiefio can't believe it's true, because it defies one's intuition at first.

  5. Paint me scepticalApril 3, 2010 at 6:56 PM

    The fundamental problem is the assumption of independence. The statement of the central limit theorem usually mentions it as a condition.

    In general, the variance of an average of a set of measurements is the average over all elements of the covariance matrix. So if the measurement errors are independent, the matrix is diagonal, and its sum increases in proportion to n while the number of elements with n^2.

    But if the error means are never quite zero, if the off-diagonal elements never vanish, no matter how far you get from the diagonal, the tiny residual correlation between any pair of errors breaks this convergence, and there is a fundamental accuracy limit below which you cannot go. You cannot even in principle, for example, resolve individual atoms by averaging a sufficiently huge number of eyeball observations.

    Adding noise prior to apply the non-linear quantisation breaks the statistical dependence between strongly correlated errors. It reduces the off-diagonal elements. But it can never eliminate them entirely, and taking that 0.3%/√1000 estimate as gospel is risky.

    Chiefio isn't quite right, but you can't extrapolate the magic properties of noise and averaging too far either. Sometimes, the information simply isn't in the data.