What is meant by the phrase, “Correlation does not imply causality?”

What is meant by the phrase, “Correlation does not imply causality?”

To describe what the phrase “correlation does not imply causality “means, we need to define causation first. Causation is the relationship that exists between a cause and an effect. Therefore, when a cause results in an impact, then we call that causation. The phrase, therefore, means that just because one can see a connection of a mutual relationship between two variables, it does not necessarily mean that one causes the other. When an individual, for example, reports that A and B are correlated it might mean that A causes B, B causes A, both variables are not causally related other factors such as E causes them both or it might mean both A and B  are not causally related but the ostensible correlation is a statistical coincidence. This example can be used well to describe the meaning of this phrase.

Compare correlation and regression to other types of “causal” statistical techniques

Correlation measures the closeness of the relationship between two or many variables without functional relationship knowledge. Regression, on the other hand, can show the form of the relationship of one variable that is independent with the other variable that is dependent. The structures of the intrinsic linear relationship such as polynomial, exponential and logarithmic regression tend to have low correlation coefficient values; however, the multiple correlation coefficient (coefficients of determination) is high on the regression function. Regression may involve many independent variables including complex functions of polynomial regression. Other causal statistics techniques such as the causal inference conclude and suggest that there is a causal relationship between two variables that correlate. In such a case two variable A and B will always be assumed to have a causal relationship that results in the correlation. Such statistics are mainly characterized by formulating of hypothesis which has to be tested by a process called random variation.

Provide real-world examples where a correlation would be useful and where an association would be inadequate and inappropriate.

Correlation is useful in investigating the relationship between two quantitative variables. It quantifies the strength of the linear relationship between a pair of variables in patients attending an accident and emergency unit (A and E) for example; we can use correlation to determine whether there is a  and relationship between their age and their urea level. Correlation will best determine whether age and urea level have an impact on each other. The most common error while interpreting the correlation coefficient is the failure to consider the possibility of a third variable related to the variables being investigated, which may be responsible for causality. It is necessary to find other possible underlying variables and whether the relationship holds in different populations. The correlation coefficient may fail to detect the existence of a non-linear relationship that may exist between variables or may sometimes describe the relationship inadequately. Sometimes a data set may comprise distinct subgroups such as males and females which could result in clusters of points leading to an exaggerated correlation coefficient

A research article from a reputable conference proceeding that uses correlation in its research.


The study was done to compare a correlation bet6ween altimetric Attention Scores and citations for six PLOS journals. The study considered all the articles that were published in six Public Library of Science (PLOS) journals in 2012 and the Web of Science citations for the same materials as of May 2015. There was a total of 2406 articles which were analyzed to examine the relationships. The Altmetric Attention Scores (AAS), for an article, provided Altmetric aggregates activities surrounding research outputs in social media. Spearman correlation testing was done on all materials and those with AAS.

Further analysis compared the stratified datasets based on percentile ranks of AAS that is, top 50%, 25%, 10%, and 1%. Comparisons across the six journals provided additional insights. There was a significant positive correlation between AAS and citations although there was a variation in strength for each article. Four of the PLOS journals, Genetics, Pathogens, Computational Biology, and Neglected Tropical Disease, show significant positive correlations among all datasets. PLOS Biology and medicine with high impact factors have results that are unexpected.  The medicine articles showed no significant associations, the Biology articles texted positive for correlation with the whole dataset and the set with AAS. The use of correlation was useful in answering the study questions. The study has a definite conclusion basing on the finding and results. Although the results from these study indicate that a higher AAS  is likely associated with more top citations, such correlations cannot suggest a causal relationship. If AAS is used as a valid indicator, it must be differentiated from citation counts. Further studies need to be done to explore methods that could help evaluate different weighting schemes and should include social media which has a potential to impact but are not included in the Altmetric’s scoring system. Once a rigorous AAS tool is founded and adopted, integrating it with traditional metrics such as citations, to aid in constructing a comprehensive measurement tool for evaluating research impact will be the only challenge.


Do you need high quality Custom Essay Writing Services?

Custom Essay writing Service