I know getting all excited about statistics is one of the nerdiest things a person can do. Without a solid statistical foundation, however, one can never really understand the results from any major research work. So I hope to give everyone a simple overview of what some of the common statistical methods are, and how to think about them.

As a psychologist, one looks for patterns of human behavior. Even with the most rigid theories, we still see that some cases where, despite all logic and reason, people act in strange ways. Let’s look at another scientific investigation first. A physicist is sitting around trying to discern the exact amount of pressure one would need to exert to always flip heads or tails. The scientist finds out that it is not simply a matter of force, but also of pressure and humidity. After constructing the coin flipping device, and creating a climate controlled environment, the scientist successfully controlled all of the possible factors that could influence the outcome. Thus he finds that his equation for coin flipping works. When air pressure is X, and humidity is Y, force needs to be Z, so it will land tails.

Psychology, on the other hand, rarely gets that level of control over their participants. We cannot, for example, ensure that everyone had the same background. Sometimes, we can find people who are genetically identical, but they always have different personal histories. All of these minor, uncontrolled, differences result in variance in our models. This is why people will often critique psychological experiments as only being “quasi-experimental.” They are correct. That is the term used to describe psychological experiments in most cases (outside of a handful of neurology studies), however, we always keep our findings in check. The way we keep our findings from being overstated is through peer-reviewed and solid theoretical foundations for our claims.

So a final note to keep in mind: as models become more complex for predicting human behavior, they tend to explain more of the variance that occurs. Imagine that I create a model for predicting which soda someone will buy at a vending machine. One model only looks at male versus female. That male versus female model finds a weak relationship for drink preference, perhaps. When I make my next model, I look at age, sex, overall health, education level, political preference, Big 5 personality factors, etc. My model is trying to remove the variance between subjects and it may be more successful, but this is just speculation. Before we get to more complex modeling (regressions) we have to start at the most basic – the correlation!

A correlation simply shows the relationship between two factors. It does not say how or why the relationship exists. It does not show which one causes which. I’m sure you’ve heard this one a few thousand times, “correlation does not make causation.” This is a true point, but it there are times when you ignore this rule. For example, if I make a correlation that shows a relationship between shoe size and IQ, we would see that as shoe size increases one’s IQ does as well. This is obviously because our shoe sizes are larger as we get older, and there is nothing more to it. It is impossible to argue that IQ is causing a growth in shoe size, but statistically you cannot support or deny that claim. So it is important to have a theoretical reasoning for causation, not just a mathematical one. Correlation may not be causation, but the theory around it may lead you to causation in the end. Just keep it in mind.

So how does this method work? There are two main types of correlation that you will hear about – Positive and Negative. A **positive** correlation means that if there is an increase in variable one, then there will always be an increase in variable two. Here’s a graphical example.

As we can see, there is a** positive** correlation between the number of hours spent reading and the number of books one owns. This is made up data, of course, just to be clear. However, I wanted to give you an example of an ambiguous situation. We cannot actually discern from my graph if reading hours causes one to own more books, or if having more books around you leads you to read more often. I’m not worried about the truth of the matter, I just wanted to give you an example of the dangers of drawing conclusions from a graph. Next is a **negative** correlation

I’m just joking around with my roleplaying friends, haha! As you can see, the number of friends one has decreases as the number of hours spent playing roleplaying games increases. Any time one factors raises as another factor lowers (or vice-versa) you have a **negative** correlation. Once again, we cannot discern which causes which with this graph. Does spending hours and hours a week playing a roleplaying game lead you to be socially isolated, or does being socially isolated lead you to spend your time roleplaying? We cannot make a conclusion from the correlation alone.

What happens when the relationship is not so perfect? As the graphs above have shown, for every change in one value came and equal change in another. This means they were “perfect correlations.” Correlations are judged by their strength in most cases. So if you have a correlation with the strength of +1.0, for every change in factor one comes an equal change in factor 2. If your strength is -1.0, for every positive change in one factor comes an equally negative change in the second. A correlation is stronger as the absolute value approaches 1.0. Thus +1.0 and -1.0 are both strong correlations. Between -0.83, and +0.77, -0.83 is the stronger correlation. Further, no correlation strength can go above +1/-1. So lets see what these non-perfect correlations looks like.

This graph shows us the number of hours students spent studying and the number of assignments they completed. As you can see, there is a general trend for more study hours to lead to more assignments being completed. However, you can also see that some people who turned in 5 assignments studied for fewer hours than those who turned in 4 assignments. This is all fictional data, of course, just for example. This is quite typical of most correlations. There are a lot of factors that cause one student to be stronger than another outside of the number of study hours spent on a topic. Sometimes, for example, a student could happen to be very strong in the given subject, thus they need less prep time to finish their work. This correlation coefficient (r) is +0.71. So for every 1 hour increase in studying, one should complete 0.71 more assignments. In psychological papers we will see this written as, r=0.71. The best fit line you see going through the center of the graph represents the 0.71 slope (although excel didn’t do it quite right, haha). We use it to help use judge where the model expects your score to lie if you follow the correlation. It can also be thought of as the mean line. All things being equal, most people who study for 10 hours, will turn in more than students who study for 5, at least according to the model. Once again, this would be inferring causation. So it could be that those who complete more assignments have to spend time studying to complete them (just something to keep in mind!)

Finally, we see that the student who completed 10 assignments also spent almost twice as much time studying than any other student (20 hours, the next closest is at 11 hours). This is what we like to refer to as an outlier. This is a participant who skews the distribution in one way other another. If I remove this person from the study, suddenly r=0.61 instead of r=0.71. That’s a pretty strong influence for one person. Well technically speaking, small samples like this one (only 10 observations) are prone to large variance from single strong participants like this.

In conclusion, I hope we have a bit of understanding for what a correlation looks like in psychology. Stronger correlations are those whose absolute value is closer to one (and these correlation coefficients never go above one). Correlations can be positive or negative, which simply describe the relationship between the factors. Finally, causation cannot be proven by a correlation alone! It must have a theoretical foundation for you to draw any causal conclusions. Next time we will look at interaction and mediation!

Congratulations — I’ve nominated you for The Versatile Bloggers Award — to claim go to http://whatsitallaboutandotherstories.wordpress.com/2012/04/01/moi-the-versatile-blogger-award/