How To Lie With Statistics
2025-02-22
With a small sample size, any kind of result is possible. A biased sample indicates that the sample is not representative of the entire population, and this can lead to errors in reporting. Samples can be biased by selection or be too small in size, causing unrepresentative samples.
Bias due to self-selection: a Gallup poll reported that 33 per cent of men and women has never heard of metric system. Another newspaper conducted poll and reported 98 per cent of its readers knew about metric system. How can they be so much different? The Gallup poll surveyed a carefully selected cross-section of the public. The newspaper concluded the results by coupons clipped, filled in and mailed by the readers. Readers who don't know about metric system did not bother about filling in the survey and selected themselves out of the poll, causing a biased or unrepresentative sample.
A survey of people on magazine readership said that Harper's is more read than True Story and yet the publishers result is opposite. Why is that? People lied in the survey - Harper is a high-brow magazine while True Story is a low-brow.
In a fact-minded culture, statistics is employed to sensationalize, inflate, confuse and oversimplify.
The source of bias can be invisible - always allow yourself some degree or skepticism about results.
A report based on a sampling must use a representative sample, which is one from which every source of bias is removed. A sample is truly random when every name or thing in the whole group has an equal change to be in the sample. But a purely random sample is difficult and expensive to obtain.
In a survey, the sample of population can be far from random (first level). The questions in the questionnaire in the survey are a sample of possible questions (second level). The answer which the respondent gives is also a sample of his or her attitude and experience on such question (third level).
Next time you read that a average man brushes his teeth 1.02 times a day, ask yourself a question: how can anyone have found such a thing? what is the sample for the data? does the sample have bias in it?
The word average can disguise any of the three - mean, median and mode. All the three averages come together when the data follows a normal distribution.
When reading an average based statistic, like the average American family income was $6,940 in some specified year, ask what kind of average is this - mean, median or mode, what family means, who says so, how he knows and how accurate the figure is.
When told the results of a survey, ask How many samples were used to make the results?
because a well-biased samples can be used to produce almost any result anyone may wish.
With a large group, any difference produced by chance is likely to be a small one. But with a small group, any kind of result can be produced by chance. Take a coin and toss it ten times - you will get 8 times head which proves that heads come up 80 per cent of time. But toss it large enough times and heads will come close to 50 per cent of the time - a result that is represents real probability. Only when there is a substantial number of trails, the law of averages gives a useful description or prediction.
Degree of significance is a figure that represents whether a result is a real result rather than something produced by chance. It is expressed as a probability. When it is said that there are nineteen chances out of twenty that the figures have a specified degree of precision, it means that less than five per cent of result are produced by chance. For most purposes, nothing less than five per cent level of significance is good enough.
Along with average, there should also be range. Range represents the deviation in the results. Think about sampling results in ranges - all sampling results have an associated error representing how accurate the measurement is. You might take 61 degree's as a comfortable annual mean temperature in California, but you can freeze or roast if you ignore the range. For San Nicolas, it is 47 to 87 degrees but for the desert, it is 15 to 104.
Take this sentence from a magazine: a new cold temper bath which triples the hardness of steel, from Westinghouse
.
It looks like quite a development until you try to put your finger on what it really means. Does the new bath make any kind of steel three times as hard as it was before treatment? Or does it produce a steel three times as hard as any previous steel? Or what does it do?
When words won't do the work and numbers in the tabular format cannot convince, draw a picture. Graphs are often made out of proportion to exaggerate results. Exaggeration is done by reducing/increasing the frequency or range in the marks on the graph's axis.
A chart with a picture of a man to represent a million men, a moneybag representing a million dollars are known as pictorial graphs. Pictorial graphs have eye-appeal and it is also capable of becoming a fluent, devious and successful liar. They dramatize the difference in results to suit writer's argument and to create impressions of exaggeration in the reader's mind, thereby sensationalizing the facts to make a better story.
A bar chart is capable of deceit too. Look with suspicion on any version in which the bars change their widths as well as their length while representing a single factor. When looking at the graphs look for the base line.
Semi-attached figures - numbers which do not give a complete information. A report says 27 per cent of the doctors smoke Throaties more than any other brand. So what? Do the doctors know anything about tobacco brands than you do? No. Do the doctors know any inside information that permits them to smoke the least harmful cigarette? They don't. This 27% - a semi-attached figure somehow gives a false sense of information.
A juice extractor is said to extract 26% more juice. What does that figure mean? It means that compared to an old-fashioned hand reamer, the juice extractor is 26% more efficient. The 26% is totally irrelevant without additional contextual information about the juicer.
More people were killed in aeroplane last year than in 1910. Does it mean aeroplane's are more dangerous? It means more people are flying today.
There are many ways of expressing any figure. A company reported a low net earnings of only 1.1 per cent of sales. The number is distressingly small. The catch is that annual return on investment is not the same as earning on total sales. If I purchase an article at 99 cent every morning and sell it each afternoon for one dollar, I will make only 1 per cent on total sales but 365 per cent on invested money during the year. Magazines, corporate, advertisers chose the method which serves the best purpose at hand in reporting numbers.
An association between two factors is not proof that one has caused another. Sometimes, there might be a hidden third factors which might be causing the other factors which correlate.
A statement of relationship should be put through a sharp inspection. In a small sample, substantial correlation can be found purely by chance. Two correlating factors might be influenced by an unknown third factor.
Correlation can be produced by a stream of events, the trends of the time. For example, the salaries of Presbyterian ministers in Massachusetts and the price of rum in Havana had a close relationship (both growing). In this case, the factor responsible is a third factor - the historic and world-wide rise in price level of practically everything.
Percentage and confusion of base: A Christmas deal says save 100 per cent - it sounds like a great offer but it is only due to confusion of base. The price-reduction is only fifty per cent of the actual price but the saving is computed from the new reduced price. To offset a pay cut of 50 per cent, you must get an increment of 100 per cent.
How to prod statistical data for lies? Use these five questions:
-
Who says so? There might be a newspaper which aims to say a good story, labour or management with wage level stakes. Lookout for conscious bias - bias that may be introduced to produce favorable results and suppression of the unfavorable, using one year for comparison and a favorable year for another, cases where a mean over a median may be more informative. Look for unconscious bias - the survey may be from Cornell University but the interpretation will be of the author who is writing the article. Check whether the university or the laboratory stands behind the interpretation.
-
How does he know? Ask questions on the sample size - is the sample large enough to provide a valid conclusion? In case of correlation, ask is it big enough to mean anything?
-
What's missing? A correlation without a measure of reliability (probable error, standard error), a mean without range, average without specifying mean, media or mode are not to be taken seriously. Look out for semi-attached figures. When looking at percentages, look for bases from which the percentages are taken.
-
Did somebody change the subject? More reported cases of a disease are not always the same thing as more cases of the disease - it can also mean better diagnosis.
-
Does it make sense? Many a statistics is false on its face. It gets by only because the magic of numbers brings in a suspension of common sense.