The 10 Threats to
the External Validity of a Study

1. Verification of the Independent Variable

2. Multiple Treatment Interference

3. Hawthorne Effect

4. Novelty and Disruption Effects

5. Experimenter Effects

6. Pretest Sensitization

7. Posttest Sensitization

8. Interaction of History and Treatment Effects

9. Measurement of the Dependent Variable

10. Interaction of Time of Measurement and Treatment Effects

The following are the 10 primary threats to the external validity of a study (Simkus, 2023, Martella et al., 2013).

Verification of the Independent Variable

If the independent variable is not implemented in a consistent fashion, we cannot determine what was done to produce a change in the dependent variable. A critical aspect of science is replicating the findings of previous research. We cannot replicate a study when the independent variable has not been implemented as intended.

For example, suppose we attempted to find the effects of specific praise on the reading motivation of a student. Specific praise was defined as a verbal indication of desirable behaviors that the student exhibited (e.g., "Good job, Billy. You read the book," versus "Good job"). The praise was provided for every book read. The implementation of the independent variable was not documented. Our results indicated that the student's motivation improved; this improvement seemed to be due to the independent variable. If we were to take this study and implement the same techniques in another setting, we may obtain different results. The person implementing the independent variable may have praised the student for reading every other book, not every book. If this occurred, we would need to know about it to replicate the study and understand the results of the study.

 

Multiple Treatment Interference

Multiple treatment interference occurs when there are multiple independent variables used together. This situation prevents us from claiming that any one independent variable produced the effects. We would have to take into consideration that all independent variables were used together. To make any claims for ecological validity, we would have to describe the combined effects of the independent variables.

For example, say we exposed a group of students to a whole language reading program and measured their decoding and reading comprehension performance. The same participants were then assessed on how they performed when taught with a phonics-based approach. The results indicated that the phonics approach was superior to the whole language approach. However, the two independent variables were not separated. The results may have been different if the approaches had been implemented in isolation or if the whole language approach had been implemented before the phonics approach.

 

Hawthorne Effect

The Hawthorne effect is a potential problem whenever the participants are aware that they are in an experiment and/or are being assessed.

For example, assume we wanted to assess the effects of one-on-one instruction for individuals who had reading problems. Two groups of individuals who had been diagnosed as having a reading difficulty were formed. The individuals were randomly assigned to one of two groups. One group (experimental) received one-on-one instruction, while the other group (control) received the standard group instruction. A large improvement in the experimental group's level of reading performance was seen; however, there was not a change in the control group's reading performance. We concluded that the one-on-one instruction aided in the improvement of reading skill. However, this increase may not be replicated in situations where the researchers are not present or where the individuals did not receive the level of attention the experimental group received. 

 

Novelty and Disruption Effects

Novelty and disruption effects may occur when the experimental variables introduced into an investigation change the situation in such a manner that the participants react to the changes in the situation in general versus reacting to the presentation of the independent variable in particular. In other words, being part of an investigation itself may change the impact of the independent variable. When novelty effects are present, the effect of the independent variable may be enhanced. Thus, it is difficult to generalize the initial results of an investigation if the investigation brings something into the environment (in addition to the independent variable, such as more people, recording instruments) that is different from the ordinary routine. The initial effects may not hold true if the novelty wears off or the independent variable becomes part of the normal classroom routine. The opposite is true for a disruption threat. When the experimental variables of an investigation are implemented, the investigation may disrupt the normal routine of the participants. If this occurs, the effectiveness of the independent variable may be suppressed if the participants fail to respond to the independent variable, or respond to it negatively due to having their routine broken or disrupted.

For example, suppose we wanted to study the effects of large group versus small group instruction in a beginning reading program. The type of group instruction currently used in the classroom was large group. The teacher changed to a small group format in which students were skill grouped based on their reading level. The students responded positively (i.e., their academic progress accelerated) to the change in grouping. If our presence and/or the novelty of the small group instruction affected the students, we would have had a problem with the external validity of the study. When other teachers attempt to use the small group instruction, they may not get the same results because novelty effects may not be in effect (i.e., students had already been exposed to small group instruction in the past). The opposite would hold for disruption effects. Our presence in the classroom may have caused a disruption in the class routine, thereby making the independent variable less effective. In this example, the disruption caused by the study may have interfered with the students’ concentration and, thus, decreased the academic progress they were making before we became involved with the class.

 

Experimenter Effects

Just because we have certain findings with one person implementing the independent variable does not mean that others implementing the independent variable will have the same success. Studies are usually well planned, and the individuals conducting the studies are usually well trained and typically have some expertise in the area they are investigating. Thus, it may be difficult to claim the results of a study will generalize to other individuals who implement the independent variable without the same level of training, the same motivation, or the same personological variables that were present in the original experiment.

Consider the following example. Teaching is a profession that is affected by a variety of factors. What makes one teacher effective and another teacher ineffective does not only depend on the curriculum they are using, but also depends on other factors, such as the excitement they exhibit toward the subject matter. Suppose we implemented a systematic phonics program in the first grade. The teacher in this classroom was especially eager to use the program. Her enthusiasm was high and she was able to get the students excited about reading due to her enthusiasm. The new systematic phonics program indicated that we were effective in teaching reading skills to the students. Another teacher purchased the program to use with her students. She began the program but believed that the program was too regimented for her students. She took them through the program but did so in a very subdued and unexcited fashion. After the program she found that the students did not improve and concluded that the program did not work. We must be concerned about the possibility that the manner in which the independent variable is presented, and by whom, may make a difference in the replicability of the results.

 

Pretest Sensitization

Many times researchers will provide a pretest when attempting to determine the effects of the independent variable on the dependent variable. After this pretest, they will provide the independent variable followed by the posttest. The difference between the pretest and the posttest may then be determined to be the result of the independent variable. However, the pretest may make the independent variable more effective than it would have been without it.

For example, suppose we wanted to see the effects of a class on developing one's decoding skills for students in second grade. Up until now, the students had been given books to read of their choosing and had usually selected books that they could read easily. As part of the research program we pretest the students on reading passages that are appropriate for second graders. Many of the students find the passages especially difficult to read. Thus, for the first time, students were given an indication of where their reading skills were. After the pretest, the students were provided a reading program geared to the lowest 25% of the students. These students demonstrated large improvements in their decoding skills after 1 year of the program. The difficulty here is that it is possible that teachers who use the program based on the research results will not use a pretest (unless the program calls for one as a placement test). If the pretest is not provided, the question remains as to whether the students will perform differently from the students in the research study given that they were not sensitized to the program beforehand.

 

Posttest Sensitization

Posttest sensitization is similar to the pretest sensitization threat. The application of a posttest may make the independent variable more effective since the posttest is essentially a synthesizing of the previously learned material.

For example, suppose we wanted to find out how much information from a reading passage students remembered after 2 weeks. A pretest was given, and the results indicated that the students were able to remember approximately 50% of the material they read. The students were then taught memory aids in remembering information that was read. They were also told that they would be provided another reading passage and would be tested in 2 weeks to determine how much they remembered. The students were then given a posttest at the end of the term. The problem here is that the extent to which the posttest improved the effectiveness of the independent variable was not known.

 

Interaction of History and Treatment Effects

All research is conducted in some time frame. This time frame can affect the generalization of the findings if the environment has changed. We live in a different world as compared to when a study was conducted, no matter how slight the difference is between the world of today and the world of yesterday. If these differences are major, we would have an especially difficult time in making external validity claims.

If we see a researcher who cites a study from 1955, for example, we may be very concerned about the generalizability of those results. However, if there is a particular need to set the context for the present investigation (e.g., set a historical context) and an investigation conducted long ago is of such importance that it is needed to set such a context, an older reference may be appropriate. The key here is to determine the purpose of such a reference and how it is used. If the reference has historical significance, it is probably appropriate; if the reference is not used for such a purpose, it may be not appropriate.

For example, suppose we assessed the effects of a reading approach and found that students did well on standardized tests of reading. We must ask if these results will hold up twenty years from now. In other words, is it possible that the way we teach reading today will be less relevant tomorrow? For instance, the advancement of computer technology has progressed rapidly over the last 10 years. Could there be some advancement that will change the way we teach poor readers from a teacher-directed approach to a computer-based one? If this occurs, much of the research that occurred previously may not be as relevant in the future.

 

Measurement of the Dependent Variable

When reviewing a study, the method of measuring the dependent variable must be viewed closely. There should be some assurances that the results were not limited to particular measures. A concern arises as to whether the results may differ if other measures were used.

For example, suppose we are working with students who have intellectual disabilities and teach them several sight words to allow them to participate in the community. For example, we teach the students the words men, women, boys, girls, stop, walk, push, pull, etc. We teach the students these sight words to 100% mastery and conclude that the training worked. A more serious concern would be whether these students could respond appropriately to these words in the community. In other words, it is important to determine if reading sight words in a teaching situation actually generalizes to a community one. If this situation is not assessed in the investigation, a concern would be whether or not the same level of performance would occur if the dependent variable was measured differently.

 

Interaction of Time of Measurement and Treatment Effects

The interaction of time of measurement and treatment effects is a problem when we consider that the time the measurements are taken may determine the outcomes of a study. Most skills taught to individuals may be expected to deteriorate over time if these individuals are not given sufficient opportunities to perform or practice the behavior. However, the effect of the independent variable could diminish, stay the same, or even be enhanced over time (Bracht & Glass, 1968). We must be concerned with the maintenance of the independent variable's effects or the generalization across time. If maintenance is not assessed, this threat is one we should consider with regard to external validity.

For example, suppose we wanted to assess the effects of the sight word program described above. Also suppose that the researchers assessed whether the use of these sight words generalized to the community. The question would be whether the use of these sight words in the community maintains after training, and if so, for how long. If maintenance has not been assessed, we should be concerned with this threat to external validity.

Top