221x Filetype PDF File size 0.13 MB Source: iase-web.org
22 QUALITATIVE RESEARCH: AN ESSENTIAL PART OF STATISTICAL COGNITION RESEARCH3 PAV KALINOWSKI Statistical Cognition Laboratory, School of Psychological Science, La Trobe University p.kalinowski@latrobe.edu.au JERRY LAI Statistical Cognition Laboratory, School of Psychological Science, La Trobe University kj2lai@students.latrobe.edu.au FIONA FIDLER Statistical Cognition Laboratory, School of Psychological Science, La Trobe University f.fidler@latrobe.edu.au GEOFF CUMMING Statistical Cognition Laboratory, School of Psychological Science, La Trobe University g.cumming@latrobe.edu.au ABSTRACT Our research in statistical cognition uses both qualitative and quantitative methods. A mixed method approach makes our research more comprehensive, and provides us with new directions, unexpected insights, and alternative explanations for previously established concepts. In this paper, we review four statistical cognition studies that used mixed methods and explain the contributions of both the quantitative and qualitative components. The four studies investigated concern statistical reporting practices in medical journals, an intervention aimed at improving psychologists’ interpretations of statistical tests, the extent to which interpretations improve when results are presented with confidence intervals (CIs) rather than p-values, and graduate students’ misconceptions about CIs. Finally, we discuss the concept of scientific rigour and outline guidelines for maintaining rigour that should apply equally to qualitative and quantitative research. Keywords: Statistics education research; Mixed methods; Scientific rigour; Qualitative analysis 1. MIXED METHODS IN STATISTICAL COGNITION Statistical cognition refers to “the cognitive processes, representations, and activities involved in acquiring and using statistical knowledge,” as well as the research program that investigates these processes (Beyth-Marom, Fidler, & Cumming, 2008, p. 22). In this way statistical cognition is similar to the discipline of cognition, which refers to both mental processes and the body of research investigating these processes. In this paper we describe how both quantitative and qualitative methods are used together in our statistical cognition research program. Statistics Education Research Journal, 9(2), 22-34, http://www.stat.auckland.ac.nz/serj International Association for Statistical Education (IASE/ISI), November, 2010 23 Regardless of whether research is quantitative or qualitative, we believe that researchers should describe the context of their work and their preconceptions and assumptions. For this reason, we begin this paper by stating that we are advocates of statistical reform in psychology; that is, we believe that the dichotomous thinking associated with Null Hypothesis Significance Testing (NHST) has damaged the progress of psychology and that estimation-based techniques, that is, effect sizes and confidence intervals (CIs), are better tools for statistical communication. However, we also believe that statistical reform should be evidence-based. As such, we believe that advocates of reform should provide empirical evidence that the alternatives to NHST that they promote are better communicators of inferential information and less prone to misinterpretation and misuse. Our statistical cognition program has produced some evidence in favour of CIs (e.g., Fidler & Loftus, 2009), but the four studies recounted here show that collecting such evidence is by no means straightforward! Qualitative research is essential in fulfilling the goals of the statistical cognition program in at least two ways. First, it helps achieve fuller and more complete descriptions of phenomena. We illustrate this in the first two of our four studies: Fidler, Thomason, Cumming, Finch, and Leeman (2004) used a mixed approach to examine the effect of error bars in result interpretation in medical journals. Faulkner (2005) used interviews to explore students’ preference and efficiency in interpreting CIs and NHST. Secondly, qualitative methods may be very useful in suggesting new directions for research. Our exploratory studies, open-ended questions, and interviews have yielded unexpected and novel insights and have led to new research programs. Again, two studies are offered as examples: Coulson, Healy, Fidler, and Cumming (2010) produced unexpected results when comparing researchers’ interpretations of NHST and CIs, which led to a new research program. Kalinowski (2010) explored student misconceptions of CIs using both qualitative and quantitative methods. Of course, qualitative methods have more to offer than just these two features (more complete description and new directions). In our account of the four studies that follows we will also illustrate how qualitative methods have helped correct our misinterpretations of quantitative results, and in other cases provided triangulation. Statistical reasoning is often fragile, and quantitative methods can fail to capture subtleties and layered misconceptions. For example, a quantitative survey may provide an indication of how many students have a false belief about some statistical concept, but not necessarily how they arrived at that false belief, or which other statistical concepts might be implicated. Qualitative methods can help us access processes and the mental models at work in the formation of misconceptions. Finally, we will address the issue of robustness in qualitative research. Qualitative methods are often mis-associated with terms such as subjective or biased. In reality, research judgment is an integral and important part of both quantitative and qualitative methods. In the final section of this paper we will explicate established guidelines (namely those of Elliott, Fisher, & Rennie, 1999) for maintaining rigour in qualitative research and argue that the same standards should also be expected of quantitative research. 2. ACHIEVING MORE COMPLETE DESCRIPTIONS OF PHENOMENA: FIDLER ET AL. (2004) As mentioned above, one major goal of statistical reform in psychology is the replacement of NHST p-values with CIs. A common way to examine reform progress is via journal surveys on the prevalence of reporting practices (e.g., Cumming et al., 2007; 24 Thompson & Snyder, 1997). Such surveys provide quantitative estimates of the extent or lack of change in statistical practice. In psychology, such journal surveys have consistently demonstrated little change in response to reformers’ calls for downplaying NHST. In medicine, by contrast, changes have been reasonably dramatic, starting in the mid-1980s when several journal editors enforced new reporting policies. Fidler et al. (2004) investigated changes in medicine by surveying statistical practices in two medical journals, the American Journal of Public Health (AJPH) and Epidemiology. Both journals were subject to strict editorial policies from then-editor Kenneth Rothman that eschewed p-values and encouraged use of CIs. Quantitative The quantitative component of this study recorded the proportion of articles reporting p-values versus CIs. Results revealed a dramatic increase in the uptake of CIs under Rothman’s editorship—from 10% pre-Rothman (1983) to 60% at the peak of his influence (1987). There was a corresponding drop in p-value reporting: from 63% in 1982 to just 6% in 1986–1989. In Epidemiology, the influence of Rothman’s policy was even more striking: 94% of articles reported CIs in 2000 and none reported p-values. From the quantitative survey alone it seemed that statistical reform in medicine had been quite successful. Qualitative The qualitative component examined the interpretation of results, in particular, how the increase in CI reports changed the way authors discussed their results. Did they now reflect on the width of the CI and talk about issues of statistical power/precision (we know they didn’t with p-values!)? Conclusion Results from the qualitative analysis revealed that, despite the frequent reporting of CIs, incidences of CI interpretation were rare. Of the articles reporting CIs, the vast majority still made their interpretations in NHST terms: They continued to make references to the null hypothesis and to discuss results in terms of significant and/or non- significant. In many ways, the discussion sections of these papers were identical to those in p-value papers. In other words, CIs had been reported (added to tables, text, and occasionally figures) to fulfill editorial hurdles, but they had made little impact on how researchers thought about and interpreted their results. The discrepancy between the proportion of reporting (the quantitative component of the study) and incidences of interpretation (the qualitative component of the study) revealed that the seemingly successful statistical reform in medicine was in fact relatively superficial. In this study the use of mixed methods revealed a more complete picture: Medical researchers conformed to the new reporting policy and included CIs in their papers, but there had been no substantial cognitive change from dichotomous NHST thinking to CI estimation-based thinking. Fidler et al. (2004) concluded that “editors can lead researchers to confidence intervals, but can’t make them think” (p. 119). 3. ACCESS TO PROCESSES AND REASONING: FAULKNER (2005) Qualitative methods help describe complex mental processes and reasoning that are difficult to examine with quantitative methods alone. Faulkner (2005) provides an example. Faulkner aimed to improve probationary psychologists’ interpretation of the outcomes of Randomized Control Trials (RCT). The study was again motivated by the argument that CIs are easier to understand than NHST, and can elicit more comprehensive and adequate interpretations (e.g., Schmidt, 1996; Schmidt & Hunter, 1997). Thirty-five probationary psychologists took part in a teaching intervention, which 25 consisted of one-to-one tutorials on how to interpret various RCT outcomes. In some RCT scenarios results were presented as NHST p-values and in others exactly the same results were presented as CIs. Immediately after the intervention, the participants completed two tasks. First, the participants rated their preference for each of the two presentation styles on Likert scales (quantitative). Second, they wrote short interpretations of results of some new RCT scenarios in their own words (qualitative). Quantitative Students rated their preference for NHST or CI presentation on a 7- point Likert scale (e.g., 1=strongly prefer CI format, 4=indifferent, 7=strongly prefer NHST format). Overall, 75% of participants expressed a preference (i.e., strongly, somewhat, or slightly preferred) towards the CI format. Only a minority of participants (25%) had any level of preference for the NHST format. Qualitative Students wrote short interpretations of RCT results presented as either CIs or NHST p-values in their own words. We coded and analysed their texts. In our analysis of qualitative data, we considered the comprehensiveness, structure, and quality of their descriptions. For comprehensiveness, we looked at the number of descriptions containing the following five components: (1) the direction of effect, (2) effect size, (3) clinical significance, (4) difference between groups/statistical significance, and (5) power/precision (interval width). To analyse structure we looked at how similar each of the students’ responses were. Was there a routine answer, or a lot of variation in their responses? Finally, for quality we examined whether qualifying and linking statements were used to make conceptual connections between the five components in the comprehensiveness list above. For both NHST and CI presentations of results, students’ descriptions were surprisingly comprehensive, with above 90% of students mentioning components (1) to (4). The only substantial difference between the presentation formats was in how often students mentioned (5) power/precision. When results were presented as NHST, only 70% of students made mention of power/precision; when results were presented as CIs, 97% of students did. The analysis of structure revealed that participants generally resorted to a rigid interpretational routine when presented with NHST. CI descriptions in comparison were more varied in both content and order. Table 1 provides some typical examples of interpretations of the two formats. As mentioned, when assessing quality we looked for qualifying and linking statements that reflected conceptual connections between the components listed above. In other words, we searched students’ answers for any extra elements within the NHST and CI descriptions that were not part of the tutorial instructions. Qualifying statements included statements such as “a large effect size is good” or “clinical significance of 50% is encouraging.” Examples of linking statements included “effect size is large leading to a clinically significant effect” and “non-statistically significant results were due to low power.” Examples of overall conclusions included “therapy has a good effect overall” and “I would use Therapy A because it appeared to have a greater effect.” On average, these extra elements were found in 90% of descriptions of CI results, compared to only 15% of descriptions of NHST results. In sum, the qualitative analysis in Faulkner’s (2005) study supported the argument that CIs can elicit better, more insightful interpretations.
no reviews yet
Please Login to review.