The interpretation and evaluation of quantitative research studies in Second Language Acquisition

This article argues that many quantitative research studies are deficient as fa r as their method o f research is concerned. The ‘M ethod ' section should he critically evaluated in order to determine whether the findings that are reported are valid and reliable. Each o f the sub-seclions usually included in the method section is discussed, and a checklist fo r the evaluation o f quantitative research studies is provided.


Introduction
In carrying out research, the issue or question one w ants to investigate should form the point o f departure in deciding on the appropriate research method.M any issues in second language acquisition are appropriately researched by means o f a quantitative research method.This method has becom e very popular in recent years, because it is a powerful research tool which allows researchers to go beyond the identification and linear description o f language learning pheno m ena and to draw formal inferences from the data about expected frequencies o f occurrence, to assess the likelihood that phenomena are generalizable beyond a given instance, or to com pare adequacies o f existing theories and models to account for the phenomena in question.
M any quantitative studies, however, are deficient and reveal many weaknesses.H enning (1986), for example, points out that many studies do not provide any estimate o f the validity and reliability o f the instrumentation and procedures used to elicit data.In this regard W olfson (1986:690) states that "N o m atter w hat else we do, we must remember that if data are inadequate, there is always the danger that the theory and conclusions drawn from them could be unreliable and misleading" .Researchers wishing to replicate studies fall into the trap o f using inadequate test instruments or tools for analyses.M any post-graduate students are unable to analyze and evaluate quantitative studies critically.Results are often quoted without determining if these results are valid and reliable.For example, Carver (1993:287) states that too many research results are blatantly described as significant, when they are in fact trivially small and unimportant.He says that there is no excuse for saying that a statistically significant result is significant because this language use erroneously suggests to many readers that the result is automatically large, important and substantial.
The result o f this state o f affairs is that many readers, when confronted with quantitative studies -especially those using statistics, either avoid reading the article, or take a short cut through it.Very often this entails skipping the 'M ethod o f R esearch' section to get to the 'D iscussion', w here they try to find out what the study w as all about and if they can find something useful to implement in their classroom s (Brown, 1991).By skipping the 'M ethod' section, how ever, readers not only miss the heart o f the study, but also buy the author's argument without critical evaluation.
The purpose o f this article is first to indicate some o f the deficiencies in quan titative research studies very briefly, and then to provide guidelines for inter preting and evaluating the 'Research M ethod' section o f quantitative studies within the field o f second language research.

An an alysis o f q uan titative research studies
In recent years considerable concern has arisen over the misapplication or avoidance o f appropriate quantitative methods in language learning research.Brown (quoted in Henning, 1986), for example, expresses concern that established conventions in quantitative research methodology are not consistently adhered to by quantitative researchers in second language research.Table 1 presents an analysis o f the method sections o f a num ber o f quantitative studies investigating the influence o f affective factors (such as personality, motivation, anxiety, com petitiveness, etc.) on second language acquisition.This analysis highlights some o f the w eaknesses in these studies.It is clear that re searchers, students and teachers need to be wary o f quoting the results o f studies at random without critically evaluating the studies.

T able 1: A n an alysis o f the M ethod Section s o f quantitative research on affective factors up to 1986
R esea rch er(s) Subj.In str.V ar.D C P Des.Anal.The analysis in Table 1 is not intended to be detailed or com prehensive, but merely to illustrate that not all studies are perfect.The areas which are deficient are indicated by X in the table.It is obvious that most o f these studies have a num ber o f deficiencies.For example, the article by G ardner and Lam bert (1959) reveals the following:

* S ubjects
The researchers do not mention whether the subjects w ere selected randomly, w hether they constituted an intact group or w hether they w ere volunteers.Failure to address these issues will impede the generalizability o f the results.The internal and external reliability and validity can therefore be influenced.

* Instrum ents
The researchers do not give any indication o f the reliability and validity o f the instruments used, nor do they mention whether all the tests w ere standardized.
The researchers also m ade use o f a few sub-scales o f tests constituting a larger battery.However, very often the battery needs to function as a unit and by using only a few sub-scales the researchers may influence the reliability o f the instruments.

* V ariables
The variables used in the study are not clearly specified and operationalized.

* Data collection procedure
No information is given on how and when the researchers collected the data.No indication is given o f the setting, the instructions given to the students or the time period needed for data collection.

* Design
The design is not specified; therefore it is impossible to determine w hether the correct one w as chosen.

* Analysis
The researchers do not mention w hether the assum ptions underlying the use o f the statistical procedures em ployed w ere met.For example, correlations presum e the use o f normally distributed data.
The brief analysis presented above reveals that very often basic information which is essential for the methodology section is either not reported or buried aw ay in the body o f the article.It is interesting to note that a review o f a few South African journals, which specifically focus on language learning, for example the SAALT (,Journal f o r Language Teaching), reveals the limited number o f studies concentrating 011 quantitative research.M ost o f the studies that have been conducted suffer from the same deficiencies mentioned in Table 1.
The rest o f this article will discuss the com ponents o f quantitative research studies in second language acquisition and indicate how these studies can be evaluated and conducted.As stated above, the focus is on the M ethod o f Research section.The purpose o f this section is to explain how the study has been conducted.The standard rule is that the description should be thorough enough for a com petent researcher to reproduce the study (Hatch & Lazaraton, 124 ISSN 0258-2279 Literator 15 (3) Nov. 1994Nov. :121-137 1991)).The article will focus on the subjects, the instrumentation, the variables, the data collection procedure, the research design and the data analysis.

Subjects
This section describes how and why the subjects are selected and what cha racteristics they have that are pertinent to the study.Since language learning deals primarily with human beings, a large proportion o f studies gather data about characteristics o f designated human populations.The study itself will generally be directed to a particular population, but the researcher must decide which specific individuals (the sample) will provide the data.The key issue is how this group is selected.Are they randomly chosen from a larger population?Are they volunteers?Are there any special criteria used for choosing them?(Brown, 1988).The answers to these questions are important if one is to decide whether the results can be generalized to the field at large.The major criteria in evaluating these descriptions are precision and replicability (Hatch & Lazaraton, 1991).
Since the purpose o f drawing a sample from the population is to obtain infor mation concerning that population, it is extremely important that the individuals included in the sample constitute a representative cross section o f the individuals in the population.That is, samples must be representative if one is to be able to generalize with confidence to the population.Various sampling techniques are available to the researcher: random sampling, stratified sampling, cluster samp ling and systematic sampling.A problem that must be faced in planning every re search study is to determine the size o f the sample necessary to attain the ob jectives o f the planned research.Technically, the size o f the sample depends on the precision the researcher desires in estimating the population param eter at a particular confidence level.

Instrum ents/M aterials
This section should give the reader a description o f the instruments, materials, or tests used to collect the data.Teaching materials, questionnaires, rating scales and tests should be described in detail unless they are well known (Brown, 1988).Any other pertinent information, such as range o f possible scores, scoring methods used, types o f questions, and types o f scales, should also be included.
Inasmuch as the instruments used will provide the operational definition o f the variables, their use must be justified as being appropriate for that purpose.The researcher should explain why the instrument used was selected as the most appropriate definition o f the variable under consideration.If an instrument is one already established, the researcher should include reported evidence o f its reliability (consistency) and validity (what the test measures) for the purpose o f the study.If the researcher is developing his/her own instruments lie/she should outline the procedure to be followed in their development.
In the next section variables are briefly discussed.Variables do not usually form a separate heading in the Method section; however, a b rief discussion o f variables is included because they are so closely linked to the instruments section.

V ariables
A variable is an attribute or set o f observations that may vary, or differ in a study (Hatch & Farhady, 1982;Brown, 1992).Most research in the second language field is concerned with identifying the variables that are important to language learning and discovering how these variables affect the learning and teaching o f languages.Five different types o f variables can be distinguished according to the functions they perform in a study: dependent, independent, m oderator, control and intervening variables.
A dependent variable is observed to determine what effect, if any, the other types o f variables may have on it.In other w ords, it is the variable o f focus -the central variable -on which other variables will act if there is any relationship (Brown, 1988).
Independent variables are variables selected by the researcher to determ ine their effect on or relationship with the dependent variable.An independent variable is one that is selected and systematically manipulated by the researcher to determine whether, or the degree to which, it has any effect on the dependent variable.
A m oderator variable is a special type o f independent variable that the researcher has chosen to include in order to determine if this m oderator variable has an effect on the relationship between the independent and dependent variables.
It is virtually impossible to include all the potential variables in each study.As a result, the researcher must attempt to control, or neutralize, all other extraneous variables that are likely to have an effect on the relationship between the in dependent, dependent and m oderator variables.Control variables, then, are those that the researcher has chosen to keep constant, neutralize, or otherwise eliminate so that they wil!not have an effect on the study.
The intervening variable may be used to describe the theoretical relationship betw een the independent and dependent variables.They are constructs that may explain the relationship between independent and dependent variables but are not directly observable themselves (Brown, 1992).
A number o f problems can arise, both within and outside a study, that may create flaws in terms o f the validity and reliability o f the study, the degree to which a study and its results correctly lead to, or support, exactly w hat is claimed.The problem s themselves result from extraneous variables that are relevant to a study but are not noticed or controlled.Brown (1988) discusses extraneous variables from four perspectives: environmental issues, grouping issues, people issues, and m easurem ent issues., 1988).
All variables must be operationally defined (Brown, 1988).An operational de finition ascribes meaning to a construct by specifying the operations that must be performed in order to measure the concept.This type o f definition is essential in research, since data must be collected in terms o f observable events.An operational definition is very specific in meaning; its purpose is to delimit a term, to insure that everyone concerned understands the particular w ay in which a term is being used.It must be a definition that is based on observable, testable or quantifiable characteristics.

Data collection procedure
This section should describe how the data are obtained.All testing procedures for obtaining scores on the variables o f interest should be explained.How tests are adm inistered and who does so are important features.The setup o f the testing situation and instructions given to the subjects should be noted.W hat w ere the environmental conditions like during the experiment?W ere they the same for all the subjects involved?The answers to these and many other potential questions should make it possible for the reader to understand exactly how the study w as conducted.
The 'procedures' section contains most o f the detail that allow s another re searcher to replicate the study.Tuckman (1988)  Literator 15 (3) Nov. 1994:121-137

D esign
The research design refers to the conceptual framework within which the experiment is conducted.It is important to plan the research design because it will help the researcher determine how the data should be analysed.A research design has tw o very important functions: it provides opportunity for the com parisons required by the hypotheses o f the experiment, and it enables the researcher through statistical analysis o f data to make a meaningful interpretation o f the results (Borg & Gall, 1979).
Design is the key to controlling the outcomes from experimental research.A well-designed study is one in which the only explanation for the change in the dependent variable is how the subjects were treated (independent variable).The design enables the researcher to eliminate all rival or alternate hypotheses.The basic types o f research design can be divided into three categories: preexperimental, true experimental and quasi-experimental (Campbell & Stanley, 1963;Borg & Gall, 1979).The type o f design the researcher selects will depend on the hypothesis or research objective he/she has set for him-/herself.Each type o f research design answers a different question.If the hypothesis the researcher is testing asks 'D oes a change in the independent variable produce a change in the dependent variable?',then a true experimental design is required.However, a true experimental design cannot always be used, as variables are often difficult to control.One o f the other research designs must then be used, but one should realize that this is not the 'ideal' design.Conclusions should only be drawn as data and research design permit.It must be borne in mind that research is limited because o f the use o f a research design other than a true experimental design (the 'ideal').The limitations section o f the article is the place to dem onstrate that the researcher is aw are o f the fact that the research is not com pletely ideal.Table 3 gives an outline o f some o f the most commonly used research designs in the second language field.
Pre-experimental designs (cf.A group o f subjects are given a pretest followed by a treatment period and then they are given a posttest to observe whether any change in perform ance has occurred.

• Static-group comparison: X O, o2
This design com pares two groups, one o f which receives the treatment and one o f which does not.
In true experimental designs (cf.Table 3) the groups are randomly formed, allow ing the assumption that they were equivalent at the beginning o f the research.

• Randomized posttest-only control-group design: R X O, R 0 2
This design is similar to the static group comparison design except that the groups are randomly formed, therefore, allowing the conclusion that significant differen ces between O, and 0 2 are due to X.

• Randomized matched subjects posttest only
This design is similar to the randomized posttest-only control-group design except that instead o f using random selection to obtain equivalent groups, it uses a matching technique.Subjects are matched on one or more variables that can be measured, such as IQ, or placement test scores.
• Randomized pretest-posttest control group design: R O, X 0 2 R 0 3 0 4 In this design the groups are randomly formed, but both groups are given a pretest as well as a posttest.The major purpose o f this type o f design is to determine the amount o f change produced by the treatment; that is, does the experimental group change more than the control group?
• Solomon three-group design: R O, X 0 2 R 0 3 O, R X Os This design is similar to the randomized pretest-posttest control group design, but it has the advantage that it employs a second control group and thereby overcom es the difficulty inherent in the randomized pretest-posttest control group design, namely the interactive effect o f pretesting and the experimental manipulation.This second control group is not pretested but is exposed to the X treatment.
This design provides still more rigorous control by extending the three-group to include one more control group.The purpose is explicitly to determ ine whether the pretest results in increased sensitivity o f the subjects to the treatment.This design allows a replication o f the treatment effect (is 0 2> 0 4) and (is 0 5> 0 6), an assessment o f the amount o f change due to the treatment (is 0 2-0 , > 0 4-0 3), an evaluation o f the testing effect (is 0 4> 0 6) and an assessm ent o f w hether the pretest interacts with the treatment (is 0 2> 0 5).

• Factorial designs: R X, O,
A factorial design is one in which two or more variables are manipulated simultaneously in order to study the independent effect o f each variable on the dependent variable as well as the effects due to interactions among the various variables.In this case, three levels o f the independent variable exist, w here one is the control and the X, and X2 represent two levels o f treatment.
The purpose o f quasi-designs (cf.Table 3) is to fit the design to settings more like the real world while still controlling as many o f the threats to internal validity as possible.
• Nonrandomized control-group pretest-posttest design: O, X 0 2 This design is similar to the randomized control-group pretest-posttest design except that in this case intact groups are used (e.g., classes in school).
• Tim e series: O, 0 2 0 3 0 4 X O , Os O, 0" This design has only one group but attem pts to show that the change that occurs when the treatment is interjected differs from the time when it is not.

ISSN 0258-2279
IMerator 15 (3) Nov. 1994:121-137 This design can also be used with intact groups, and it rotates the groups at intervals during the experimentation.All subjects receive all experimental treat ments at some time during the experiment.
Control is the essence o f the quantitative method.W ithout control it is impos sible to evaluate unambiguously the effects o f an independent variable.In order to be able to draw a conclusion concerning the relationship o f the independent variable and the dependent variable, it is necessary to control the effects o f any extraneous variables.An extraneous variable is a variable not related to the purpose o f the study, but which may affect the dependent variable (Brown, 1988).Control is the term used to indicate a researcher's procedures for eliminating the differential effects o f all variables extraneous to the purpose o f the study.He controls, for instance, when he makes the groups com parable on extraneous variables that are related to the dependent variable.O ther methods o f control include: simple randomization, randomized matching, homogeneous selection and analysis o f covariance.It is therefore important to take note o f all the potentially influential variables (cf.section 5).
The design o f the study determines what statistical techniques should be used, not vice versa.In other words, one decides what design will enable one to observe the hypothesized relationships, then one selects the statistical procedure that fits the questions asked and the nature o f the data involved.The appropriate statistic to use is determined partly by the type o f measurement scale characterizing the dependent variable.

A n alysis
The data analysis procedure must also be reported.In most quantitative studies some type o f statistical analysis is used.Typically, the researcher will explain the proposed application o f the statistics.In nearly all cases, descriptive statistics are provided, such as means and standard deviations for each o f the variables.If correlational techniques (relationships among variables) are used, then the variables to be correlated and the techniques are named.Statistical analyses have many variants, and choosing one variant over another can dramatically affect the results.So it should be clear to the reader exactly which analyses w ere used and in w hat order.In other words, the analyses should be explained ju st as they were planned, step by step (Brown, 1988).
Brown (1992) states that assumptions are preconditions that are necessary for accurate application o f a particular statistical test.
In some cases, these assumptions are not optional; they must be met for the statistical test to be meaningful.It should be clear to the reader that the assumptions w ere checked and met.A few o f the principal assumptions discussed by Brown (1992) are the following: The assumption o f independence o f groups implies that there must be no association between the groups in a study.The m ost obvious violations o f this assumption occur when the same people appear in more than one group.A se cond assumption is independence o f observations.This is often required for proper application o f correlational and other statistics.Here, the assumption is that there is no association betw een the observations within a group.Norm ality o f the distributions is often required for proper application o f statistical tests in mean comparisons.Violations o f this assumption are less troublesom e if the sample sizes are large.The distribution can be taken as normal if there is room for two or three standard deviations on either side o f the mean and if there are no outliers (extremely large or small values).Violations o f the assumption o f equal variances can be detected by examining the standard deviations in a study be cause the variances are simply the standard deviations squared.If there are big differences in these squared values, there are probably violations o f this assumption.The assumption o f linearity often applies in the correlational and prediction family o f statistics.It means that there is a straight-line relationship between the two variables involved.This assumption can be checked by exami ning a scatterplot o f the tw o variables.The assumption o f nonmulticollinearity is a problem if the variables in a study are too highly interrelated.This assumption can be checked by examining a table o f the correlation coefficient for each pair o f variables in the study.The final assumption o f concern is that o f homoscedasticity.This assumption, which is often applied to statistical procedures based on correlation and prediction, is that the variability o f scores on one variable is about the same at all values o f the other variable.This assumption can also be checked by examining a scatterplot o f the variables involved.

C onclusion
Reading and interpreting quantitative research in the second language acquisition field is, or should be, a creative and critical exercise.According to Brown (1988), it is creative in the sense that the reader must actively participate with the original researcher.It is, therefore, important that the study must be replicable.This is perhaps the single most important yardstick to hold up against any study.
Teachers and students must be able to evaluate studies critically.If, for example, the basic research design or the primary statistical tests are faulty, the results may be meaningless.A sophisticated and critical audience can only help to improve research in the second language field.
Brown (1991) states that there are no guarantees that the articles that appear in print are 100% correct or uncontroversial.It is therefore the student and teacher's responsibility to read any articles that interest them as carefully and cri tically as they can so the the interface between teaching and research can be strengthened.

B ibliograp hy
a n d o m a s s ig n m e n t o f s u b je c ts to g ro u p s. 0 -A n o b se rv a tio n o r te s t ( s u b s c rip ts r e fe r to th e o r d e r o f te s tin g , th a t is, O ) is th e firs t tim e a te s t is g iv e n , w h ile O 2 is th e s e c o n d te s t a d m in is tra tio n ).X -m e a n s a tr e a tm e n t is a p p lie d .(S u b s c r ip ts X j , X 2 o n d if fe r e n t lin e s r e fe r to d if fe r e n t tre a tm e n ts: s u b s c rip ts o n th e sa m e lin e m e a n th e tr e a tm e n t is a d m in is te r e d m o re th a n o n c e ; a b la n k s p a c e m e a n s th e g ro u p is a c o n tro l).-A d o tte d lin e b e tw e e n g ro u p s m e a n s th e g ro u p s a re u s e d in ta c t r a th e r th a n b e in g r a n d o m ly fo rm e d .•O ne-shot case study: X O A group o f subjects receive a treatment followed by a test to evaluate the treatment.

ab le 2: E xtran eou s variables: Potential problem s Focus Potential problems
Table 2 gives a brief summary o f some o f the common problem s experienced with extraneous variables.T l.iierator 15 (3) Nov. 1994:121-137 ISSN 0258-2279 M easurement Practice effect (taking same test twice) Reactivity (different pre-and posttest-standard) Instability o f measures and results (Adapted from Brown

able 4: C hecklist for the evaluation o f the method section o f a quantitative research study
Table4contains a checklist that researchers, students and teachers may find useful when writing or reading and evaluating a quantitative research study.