The Amazing Meeting 2014

Like it? Share it!

Sign up for news and updates!






Enter word seen below
Visually impaired? Click here to have an audio challenge played.  You will then need to enter the code that is spelled out.
Change image

CAPTCHA image
Please leave this field empty

Login Form



What is a Good Study? Questions You Can Ask PDF Print E-mail
Swift
Written by Kyle Hill   

In becoming a science-based person, I can imagine a process that involves three tiers. First, you decide that you are going to get your information from reputable sources like scientific journals and then decide that any other claims that you find should have a similar backing. Second, pushing past the veneer of scientific legitimacy, you decide to look into the claims for yourself. This involves not only getting your information from sources based on scientific journal articles, for example, but also going through the study yourself to determine whether it is a “good” study. Lastly, after having navigated scientific sources for some time, you are able to evaluate claims base on methodologies and procedures that you would expect the offered evidence to have if it were indeed credible. Because most of us are not scientists and find it hard to invest in the education it would require to reside comfortably in the third tier, I will try to offer some help with the second.

If you ever would consider a career as a science writer or science journalist, there are a few basic techniques that you must master or at least become proficient at. Among them are learning statistics and how to interpret them, interviewing scientists to get the best information, and how to translate sometimes complex and technical scientific information into something that the lay audience can digest. Another fundamental skill that you must wield effectively is being able to confidently answer the question, “What is a good study?” To this end, what follows are some basic questions that you should ask yourself when trying to determine the validity of a scientific study. You would find these kinds of questions in any introductory level science-writing textbook, and they will become a valuable tool in your skeptical arsenal.

Keep in mind that when you are evaluating a study, the more of these questions that you can have answered, the better off you are. However, if you find yourself questioning every single procedure, method, and ethical choice in a study, this may be a red flag in itself. As a properly skeptical consumer of scientific information, a good place to start at is what is called the null hypothesis. That is to say, assume that a new medical treatment or physics experiment won’t work. Without being downright cynical, greet every claim with this assumption. Your new motto when faced with a claim in a study or elsewhere should be “show me.”

Is the study large enough to pass statistical muster?

Numbers are very important in this regard. For example, the number of patients that a study includes in a clinical trial says a lot about that trial’s “power,” or relative generalizeability (does the study include enough patients to distinguish between treatments?, etc.). Taking a more basic approach, if you were to read in a study that “the majority of US citizens now reject the theory of evolution,” you should find out how many people were in the study. The statistics turn out that if you have less than around 1,024 people for a nationwide study, the margin or error exponentially increases beyond three percent. In study that reports a 49/51 split, this could render the claim worthless.

The other side of this question is to determine if the findings of a study are statistically significant, meaning that there is only an acceptably small chance that the findings were due to random chance alone. The value that is typically used in scientific research is p=0.05. This "p-value" means that the probability that chance alone will produce the findings in a study is only 1 in 20. This is because we must assume the null hypothesis is true, and then assess the probability of some outcome given this assumption. (If this seems to low, it should be noted that many fields in science have much more rigorous standards. Physicists use p-values of p=0.001 to validate their findings. Still, even with the less rigorous standards, most scientific papers are made to be replicated, eliminating chance occurrences even further.) When evaluating a study, pay close attention to this value. As a general rule, any correlation that has a p-value of greater than 0.05 (p>0.05) should not be taken as evidence for anything.

Is the study designed well? Could unintentional bias have affected the results?

This is hard to determine if you are not familiar with a particular field, but you are still able to ask questions that should help you sort the bad studies from the good. Was there a systematic design to the study that remained the same throughout? What were the specific hypotheses of the study and how did the study test for them? If it was a clinical trial, who were the patients and how were they selected?

More generally, was there a control group? Was the sample population that the study selected representative of the general population? Was the study as “blinded” as possible, meaning that no one involved with the study knew which condition was which and who was involved with it? Were there any conflicts of interest that should have been disclosed by the researchers? Funding from a corporation does not automatically mean that the results of a study are false, but it is something that absolutely can bias research.

Did the study last long enough?

This question may not apply to some sciences, but it is especially important in medicine. For example, if a study claims that a new treatment put some cancer patients into remission, the study should also follow those patients for some amount of time afterwards to see if they stayed in remission. If all of the participants died two weeks after the study, you may be getting horrendously skewed conclusions.

Are there any other possible explanations for the findings or reasons to doubt the conclusions?

Remembering that correlation does not prove causation, how does the study frame the findings? Is any association statistically strong? If a causal link is suggested, does the cause indeed precede the effect? Are the associations that are found consistent when other methods are used? Did the study look for other possible explanations, called confounding variables, which could explain the results? For example, a study that claims reading science blogs increases the level of scientific literacy may be leaving out the confounding variable of formal education, which could be controlling both.

(For medical claims) Does a treatment really work?

Could the patient’s improvements be changes that are occurring in the normal course of their disease? This is a source of great confusion for alternative treatment claims like the ones offered by homeopathic “medicine.” While a patient may feel better after taking homeopathic medicine, the improvement could indeed have nothing to do with the treatment and be just the normal ebb and flow of illness. Taking this into account, most studies have found that homeopathic “medicine” does not work.

If a treatment is claimed to work, are there any follow up studies that are needed to confirm that finding?  Are the results applicable to the general population? All of these questions should be answered by the study itself.

Do the conclusions fit other scientific evidence?

Are the results of a study consistent with other findings in that field? If not, why not? Has the study been replicated and confirmed?

Virtually no one study proves anything. Consistency and the preponderance of evidence are what point us in the direction of truth. Of course, the claims of quantum mechanics and other seemingly impossible notions are bizarre at first, but they are then supported and backed up by other research. Contrast this with a pseudoscience like “free energy.” Mountains of evidence and thermodynamics as a whole will refute a study claiming to have cracked the free energy code. A study that goes up against such opposition is not necessarily wrong, but it better offer some extraordinary evidence to show that it is not.

Do I have the full picture?

How does this research play into the field as a whole? Does the study leave out some important aspect of the science that would prove it wrong? Is the study even relevant given other findings? Like the previous question, it is important to understand how a finding fits into other research that has been done. Is it in opposition? Which way is the field moving? Getting the whole picture is critical if you want to understand the importance of a study.

Have the findings been checked by other experts?

This is one of the most important questions that you can ask when looking at a study. Ask yourself: are there experts who disagree with the claims in a study? Why or why not? Are the researchers speaking in an area of their own expertise or have they ventured outside of it? Does the researcher have a good track record when it comes to findings standing up to scrutiny?

Most importantly, as one of the safety nets of science, has the study been through peer review? Is the journal that the study is published in reputable? A study coming out of an obscure journal with no peer review, that is to say, no experts to check over the work of the researchers, is not necessarily wrong but should be highly suspect.

What now?

When looking at scientific studies you need to ask even more basic questions than whether or not the study was systematically designed. Ask common sense questions, like asking if the data really justify the conclusions. If the researchers have extrapolated beyond the evidence, it is warranted? Does the researcher frankly admit any flaws or limitations of the study? Does the researcher acknowledge that the findings may be tentative and offer important caveats?

If you can get your hands on a copy of the original study, and not a press release of the abstract, do it. You may not be able to evaluate all of the procedures and methods, but a good study will be written in a way that answers many of these important questions. Getting good at this kind of evaluation takes practice, but no one ever said science was easy.  

 

Examples from this post were adapted from the book “News and Numbers” by Victor Cohn and Lewis Cope.

You can find a reproducible list of the guidelines above for your use here.

Kyle Hill is the newly appointed JREF research fellow specializing in communication research and human information processing. He writes daily at the Science-Based Life blog and you can follow him on Twitter here.

Trackback(0)
Comments (15)Add Comment
Ask the above questions to the global warming crowd..., Lowly rated comment [Show]
Bullshit on Global warming criticism
written by sailor, March 15, 2012
Question 1 "Is the study large enough to pass statistical muster?" No, you cant take a snap shot of a few years against billions and get an accurate answer

In fact we know much about climate going back thousands of years. Direct measurement of temperature is not the only way, there are ice core samples, tree rings and other material. Climate scientists have done a lot of work on computer models and analyzing all the possible inputs to the current trends.

It is quite clear that the climate is warming, and it is also quite clear that the only obvious reason for this is greenhouse gasses. Lucky you denialists did not hold sway over the argument about the ozone layer or it would still be getting worst instead of repairing itself nicely.


report abuse
vote down
vote up
Votes: +17
@sailor, Lowly rated comment [Show]
...
written by lytrigian, March 15, 2012
It's funny how global warming denialists keep bringing up the University of East Anglia thing, as if the entire scientific consensus within Climatology that anthropogenic global warming is in fact occurring stands or falls by this single research group. EVEN IF it were true that they falsified data -- it isn't -- it wouldn't affect the observations over other climatologists all over the world.

It's at least as much a problem that denialists are willing to commit international crimes in order to foster a smear campaign, that scientists within the climatology community may or may not have used less than optimal analysis methods. (This was about the only real issue found, and correcting it doesn't significantly change the result.)

Anthropogenic global warming isn't happening? Tell it to the residents of Kiribati.
report abuse
vote down
vote up
Votes: +12
It would take a huge conspiracy for global warming science to be faked, then they would have to tamper with my senses because it has become obvious
written by sailor, March 15, 2012
"So you are saying global warming "scientists" have never been guilty of changing data and reporting false results?"

There is no reason to doubt the current research. For the current consensus to be based on falsified data would take world-wide conspiracy of staggering proportions, involving hundreds of agencies.
report abuse
vote down
vote up
Votes: +13
@sailor, Lowly rated comment [Show]
...
written by vanadamme, March 15, 2012
Great article, definitely going to use this to teach my kids.
report abuse
vote down
vote up
Votes: +1
...
written by mdw, March 15, 2012
A nitpick: "The statistics turn out that if you have less than around 1,024 people for a nationwide study, the margin or error exponentially increases beyond three percent."
No, it increases as a *power law*: the margin of error is proportional to one over the square root of the sample size.

Also note that the required sample size varies greatly with the size of the effect. For example, we take a sample of 5 people and have them jump from a plane without a parachute. All five died within minutes of jumping. We know that if we choose a random person at random point in their life, the odds of them dying within a few minutes of our chosen time point is extremely low, so 5 out of 5 jumpers dying is highly significant.

Something you missed: Does the study report real end points or substitute end points? ("Substitute" isn't the right word, but my mind has blanked on the correct term.) E.g. for a heart medication, the real end point is "did the experiment group suffer fewer heart attacks?" while a substitute end point is "did the experiment group have lower blood cholesterol?" Real end points are to be preferred where possible, but sometimes they are impractical. Be somewhat wary of substitute end points.

Briefly on the global climate change debate: Note the above comment on sample size and effect size - a few decades of 'experiment' is not necessarily insufficient to reach a conclusion. (I don't know the data well enough to say more than this.)
Climate change is not based solely on a few points on the 'hockey stick' temperature graph, any more than evolution is based solely on a handful of fossils. That we have greatly increased the atmospheric CO2 content by burning fossil fuels is beyond question. It is proven by multiple lines of reasoning, including carbon isotope ratios, comparison with air bubbles in ancient ice, and simply by calculating the quantity of CO2 we're releasing and comparing it to the background level and the lifetime of carbon in the atmosphere. (If you insist on statistics, go look at how many atoms they counted in the isotope ratio measurements.) Extremely well established physics says that with more atmospheric CO2, the greenhouse effect will cause warming. It is only when we try to move beyond this point that things get complicated - what climate feedback loops might cause the warming to be more or less than a simple calculation would indicate? Now we get into a mess of albedo, snowfall, cloud cover, ocean currents, vegetation etc.
It is like I've just shot a bullet at somebody, and assert that things are about to go badly for them. The basic physics is very much on my side, but specific circumstances (e.g. bullet proof vest, or I shot at their reflection by mistake) may invalidate my prediction. However, my case is strong enough that it is now up to those who think the bullet will be harmless to justify that view.
For climate change, the scientists have looked into these messy details to see if they change the expectation from basic physics. This leads to a lot of fiddling at the margins of the prediction - this effect makes warming a bit greater, that effect a bit smaller, reanalysis of some third effect shows it is more or less efficient than the original analysis claimed. However, the very strong scientific consensus is that after all of these effects are accounted for, the warming will still be significant (in both the statistical and common use of the word.)
report abuse
vote down
vote up
Votes: +3
...
written by mdw, March 15, 2012
My mind had de-blanked. "Surrogate endpoint" (vs "clinical endpoint") is the term I was looking for.
http://en.wikipedia.org/wiki/Surrogate_endpoint
report abuse
vote down
vote up
Votes: +0
...
written by MadScientist, March 15, 2012
@lytrigian: I wouldn't use Kiribati as an example of the effects of global warming; the immediate threat to the islands is due to the action of the waves, not rising sea levels as the news sources love to claim, and the waves haven't been linked to global warming. The dwindling glaciers and retreating ice sheets are a different story - we haven't had so little ice in the summers for a few tens of thousands of years. A century ago the "northwest passages" would only open up occasionally, but there have been navigable channels every summer for the past 6 years.
report abuse
vote down
vote up
Votes: +1
...
written by Fanitullen, March 16, 2012
@Davis: Humans contribute a hundred times more CO2 than volcanoes. Looking at ourselves is easier than trying to control nature. Human-caused pollution is the main problem, and we should try to fix it.
report abuse
vote down
vote up
Votes: +4
@Fanitullen, Lowly rated comment [Show]
Good summary but misleading about sample size
written by gciriani, March 16, 2012
The article gives the impression that if sample size is not large enough, a study is not valid whatever p is. I think that's misleading for people who do not have a solid understanding of statistics. Sample size is already taken into account when calculating p, so if p passes statistical muster it is incorrect to look at sample size to further prove or disprove the study. Obviously if the effect studied is real, increasing the sample size will improve p (lower p).

I also completely side with the post written by mdw on 3/15/12. Among other things, the article doesn't help statistically-less-educated readers, to understand that the magnitude of the effect is paramount. One may have an extremely low effect, say 1% with a large population, showing impressive statistics (say p=0.001), which would be completely irrelevant. Conversely one may have a sizeable effect, say 10% with a small sample, and poor statistics (say p=0.1). I think if I had to decide where to allocate research money, or what to learn more about, I would probably go for the second study.
report abuse
vote down
vote up
Votes: +3
...
written by rwpikul, March 16, 2012
@Davis:

You might want to catch up on the past couple of decades of development in wind and solar power:

Both are comparable in costs to other forms of power, (wind is already cheaper than nuclear, solar only slightly more expensive), with costs that are falling.
As a pair they make for a good source of base generation, wind tends to generate more at night, when you lose solar-photovoltaic[1].
While production is variable, over large areas it averages out and it will be easy enough to design deployment so as to be sufficient even during rare slumps. Excess power generation is already a solved problem, just build something that works as a power sink[2], (wind generation can also be reduced by feathering the turbines).


Look at places like Germany and Texas to see what they really can do. Texas is currently getting about 20% of its electricity from wind, (with the limitation being the need for more power lines), while Germany has seen solar cut its electricity prices by 10%, (40% during the demand peak in the afternoon), and sometimes is getting so much power from wind that they have to start giving it away, (electricity prices in Germany have occasionally gone negative).


[1] Solar-thermal delivers power 24/7.

[2] Traditionally this had been done with aluminum refineries, although in this case I would suggest Fischer-Tropsch plants to turn biomass into things like gasoline.
report abuse
vote down
vote up
Votes: -1
...
written by jmarley42, March 16, 2012
@Davis
You are not a skeptic, you are a denier. You are applying Kyle's second step without haven done the first one.
report abuse
vote down
vote up
Votes: +1

Write comment
This content has been locked. You can no longer post any comment.
You must be logged in to post a comment. Please register if you do not have an account yet.

busy