Association and Causation are two terms in Statistics that some people use as words with the same meaning. However, both words have different connotation in the world of statistics. Both words can be used in a subject. However, the meaning would depend on how these two words are used by the statistician. Jerry Dallal of Tufts University (2007) had shown the history of the use of both words in his article. He had stated that the words association does not mean that there is already causation.
Two variables may have high association but this doesn’t mean that both are linked with each other. There are times that an association is also a causation but this depends on several cases. An example of an association that does not imply causation is the increase of swimsuit sales every summer season. Swimsuit is highly associated with summer because this is the time where people can go to the beach and enjoy swimming. The cause of buying a swimsuit is not implied in the association.
Although there would be a strong relationship between the sales of the swimsuit and summer this does not mean that the summer season is the cause why people are buying swimsuit. Another association example based on the article of Mr. Dallal is that during the World War II the bombers were less accurate in a clear weather. As stated in the article, the accuracy of the bombers are associated with the weather. There are more enemies during the clear weather and enemies can see the plane easily that is why the bombers are inaccurate.
On the other hand, causation in statistics means that there is a direct relationship between two variables and that the reason why there is a change in a particular variable is attributed to its relationship with the other variable. The factors that can influence the relationship between two variables are the correlation of the variables and the relationship of the variables be it linear or bivariate. There are some scientists who had used the regression as a basis for causality. One example of a statistical study that has been misinterpreted because of the factors stated above is from a statistician named G. Udny Yule (1889).
During his time he had used a regression model to determine whether the change in pauperism is positively related to the change of the proportion treated outside the poor-houses. He had reported that the conditions outside of the poor houses create pauperism. However, David Freedman, another statistician had countered the claim saying that if the efficiency administrations are better outside and inside the poor houses then there would be no association between pauperism and the way the aid was provided for the poor.
Sir Austin Bradford Hill (childrens-mercy. org, 2007) article in 1965 had provided nine factors that may explain causality of a variable these are the strength of association, consistency of results in studies, specifity, temporality, biological gradient, plausibility, coherence, experimental evidence and analogy. These factors are still being used until now. Correlations have implicit confidence range, based on the number of sample in the population. Averaging correlations is limited if it is inconsistent with each other.
A test of homogeneity is needed in order to prove the consistency of the correlation coefficients. If the r’s prove to be consistent then it can be assumed that the average estimates the groups or samples that is being studies. However, if the r’s are inconsistent then it can only be identified as an average. Statisticians must be aware of these facts so that he would not assume anything about his data. Data can be misinterpreted if statisticians would assume something that has not been proven yet.