top of page
Writing on the Board

Student participation in assessments

How to prevent misconduct

White Structure

The testing effect

by Simon Moss

Introduction

After learning material, students often need to complete tests or exams. Interestingly, tests do not only assess the learning of students but have also been shown to facilitate learning—called the testing effect (see Carpenter, Pashler, & Vul, 2006; McDaniel, Roediger, & McDermott, 2007; Roediger & Karpicke, 2006a, 2006b). Somehow, the tests themselves activate retrieval processes that ultimately facilitate the learning and memory of study material.  If educators understand the features and causes of the testing effect, they can apply this information to enhance student learning (e.g., Shobe, 2022).     

 

Variations of the testing effect

Campbell and Mayer (2009) illustrated the testing effect within the format of lectures. In this study, participants attended a lecture, in which 25 slides were presented. The material revolved around the effects of feedback on learning.  After four of the slides, a multiple-choice question was presented. To answer the question correctly, participants would need to integrate information from multiple slides. Each participant was granted access to an electronic device in which they could submit their response anonymously. After the aggregate of these responses was presented, the lecturer then justified the correct answer.

 

Relative to participants who had not been exposed to the multiple-choice questions, the students who did answer these questions performed more proficiently on a subsequent exam. This effect was observed when the exam comprised short answers, demanding retention rather than transfer.

 

Explanations of the testing effect

Theories that explain how students learn can be adapted to explain the testing effect.  To illustrate, according to Mayer (2001, 2005, 2008), three sets of cognitive operations underpin meaningful learning. First, participants need to orient their attention to relevant information, called selecting. Second, individuals need to combine this information together, called organizing. Finally, individuals need to integrate this consolidated information with existing knowledge, called integration.

 

Tests can facilitate each of these processes. For example, in anticipation of forthcoming questions, individuals need to orient their attention to information that might be germane to future tests—an act that facilitates selecting (Campbell & Mayer, 2009). Similarly, to answer the questions themselves, individuals often need to combine information from multiple topics, organizing and integrating their knowledge (Campbell & Mayer, 2009). Finally, when individuals receive feedback, they often need to adjust their assumptions, further enhancing the organization and integration of information (Campbell & Mayer, 2009).

 

Alternatively, the testing effect may be ascribed to an increase in the number of retrieval routes. For example, if people learn information in a specific room, cues that evoke memories of this room may also activate this information. If people later test themselves on the material in a different context, cues that evoke memories of this setting will also activate this information. A broader range of cues, therefore, will prompt these memories (Bjork, 1975; for a distinct, but related, mechanism, see McDaniel & Masson, 1985).

 

Many other accounts can also contribute to the testing effect.  To illustrate, as research has confirmed, when students test themselves, their attention is not as likely to wander (Wong & Lim, 2022).  In contrast, when students merely attempt to learn the information again, rather than test themselves, their mind is more likely to wander.

 

Practices that amplify or diminish the testing effect: Retrieval versus recognition tests

Some tests, such as multiple-choice examinations, demand recognition rather than recall. That is, participants merely need to recognize the correct answer from a range of alternatives. Other tests, such as examinations in which the answers entail paragraphs or essays, demand recall. In general, tests that demand recall amplify the testing effect (Butler & Roediger, 2007; Glover, 1989; Kang, McDermott, & Roediger, 2007).

 

These findings are consistent with the argument that retrieval that demands effort and concentration might enhance, or even underpin, the testing effect—called the theory of retrieval difficulty (Bjork & Bjork, 1992). Indeed, as this argument would predict, when the test is delayed rather than immediate, the testing effect is more pronounced (Jacoby, 1978; Modigliani, 1976; Pashler, Zarow, & Tripplet, 2003).

 

According to the theory of retrieval difficulty (Bjork & Bjork, 1992), the capacity of individuals to retrieve information depends on two considerations: storage strength and retrieval strength. Storage strength refers to the enduring accessibility of the information, partly related to the frequency of the words or concepts. Retrieval strength refers to the momentary accessibility of the information.

 

Interestingly, according to this theory, if retrieval strength is high, the storage strength of this information will not increase appreciably. If retrieval strength is low, the storage strength of this information will increase to a larger extent. For example, as Agarwal, Karpizke, Kang, Roediger III, and McDermott (2008) argue, if the information is available when individuals complete the initial test—that is, an open book test—the retrieval strength is high. That is, the information is very accessible. Consequently, the storage strength of this information will not improve appreciably. Hence, this information might not be as accessible after a delay.

 

Practices that amplify or diminish the testing effect: Delayed versus immediate feedback

After individuals complete a test, feedback can be immediate or delayed. Delayed feedback seems to magnify the testing effect (Bangert-Drowns, Kulik, Kulikm & Morgan, 1991). For example, as Scmidt, Young, Swinnen, and Shapiro (1989) showed, delayed rather than immediate feedback was more likely to facilitate retention in the future in a motor learning activity. Hence, the testing effect is more pronounced if feedback is delayed until after the individual has completed an entire exam rather than presented after each answer.

 

Practices that amplify or diminish the testing effect: Open versus closed book tests

Several studies have examined whether tests, primarily administered to facilitate learning, should be open book or closed book. In contrast to open book tests, during close book tests, students are not permitted to consult their notes or textbooks during the examination process.

 

Several arguments imply that open book tests might be superior. First, in most circumstances, open book tests are designed to encourage more advanced cognitive operations, such as problem solving and reasoning rather than rote memorization (Feller, 1994, Jacobs & Chase, 1992; see also Eilersten & Valdermo, 2000). These advanced cognitive operations might facilitate the capacity of individuals to memorize and apply the material in different contexts.

 

Second, when individuals prepare before the exam, open book tests might promote less stress and anxiety (Theophilides & Dionysiou, 1996; Theophilides & Koustelini, 2000). The stress and anxiety, provoked by closed book tests, might reduce the capacity of individuals to relate the material to broader knowledge structures. That is, stress and anxiety inhibit many knowledge structures (Kuhl, 2000).

 

Third, open book tests might provoke fewer errors of commission than closed book tests. That is, during open book tests, individuals are not as likely to entertain incorrect facts about the topic, because they can access relevant knowledge during the examination. Hence, during open book tests, individuals are less inclined to integrate false information into their knowledge structures (Butler, Marsh, Goode, & Roediger, 2006;; Roediger & Marsh, 2005).

 

In contrast, some arguments imply that closed book examinations might enhance the testing effect relative to open book examinations. Closed book tests might demand more effort to retrieve. When retrieval demands effort, the testing effect might be amplified (e.g., Bjork, 1999; Karpicke & Roediger, 2007a). Consistent with this proposition, relative to recognition tests, recall tests—an approach that demands more effort—tends to amplify the testing effect (Butler & Roediger, 2007; Glover, 1989; Kang, McDermott, & Roediger, 2007).

 

In addition, when individuals complete open book tests, they often receive more immediate feedback about their performance. While they complete the examination, they can ascertain whether some of their initial assumptions were correct. In contrast, when individuals complete closed book tests, feedback is almost invariably delayed. Delayed feedback has also been shown to amplify the testing effect (Bangert-Drowns, Kulik, Kulikm & Morgan, 1991).

 

Agarwal, Karpizke, Kang, Roediger III, and McDermott (2008) conducted a study to compare the effect of open book tests and closed book tests. In their first study, participants read six passages, each about 1000 words in length, from a textbook. In the first session, six different study conditions were arranged. In the second session, one week later, the final test was presented: a closed book test. During the study session, some participants merely studied the material. Other participants completed a closed book test or open book test. Some of the participants who completed the closed book tests also evaluated their own performance later, with the passage available. Some of the participants who completed the open book tests completed this exam while studying.

 

Overall, a testing effect was observed. Nevertheless, if closed book tests were administered, the testing effect was more pronounced when individuals evaluated their performance--although this effect could be ascribed to additional exposure to the material. On the delayed test, closed and open book tests generated comparable levels of performance on the delayed test, administered one week after the study session.

 

Practices that amplify or diminish the testing effect: High stakes during the initial tests

When the stakes to perform well on the initial test are high, evoking pressure, the testing effect actually dissipates. To illustrate, in one study, conducted by Hinze and Rapp (2014), participants read various passages of text about biology. To increase the stakes, some participants were told they will receive an extra $5 if, later, they and another person perform well on a subsequent quiz of this material. In addition, they were informed the other person had already performed well—an instruction that raised their pressure to excel. Next, over the next 5 minutes, some but not all participants completed various quizzes to test their learning of this material. Finally, 7 days later, participants returned to complete a final test that assessed understanding of this material. Before completing this test, all participants were informed they will receive the $5 regardless of performance.

 

Performance on the quizzes did not depend on whether the stakes were elevated or not. So, the stakes did not affect initial retrieval. Participants who completed the quizzes, on average, performed better on the final test, consistent with the testing effect. However, the benefit of this quiz was not observed when the stakes had been raised.

 

Arguably, the testing effect can be ascribed to the possibility that, during the initial quizzes, participants tend to reclassify and elaborate the material. These cognitive operations demand effort and concentration. The elevated stakes and performance pressure may compromise the motivation of individuals to engage in these operations and, therefore, could diminish the testing effect.

 

Practices that amplify or diminish the testing effect: Emotional events after the test

The testing effect is often ascribed to cognitive operations that coincide with the retrieval of this material during the test. However, as Finn and Roediger (2011) showed, even cognitive operations after the retrieval of this material could amplify the testing effect. Specifically, if emotional images follow the first test, the testing effect is amplified.

 

To illustrate, in one study, participants who spoke English first learnt various Swahili words. In particular, these participants received a series of Swahili words, each presented alongside the English translation, such as lulu-pearl. Then, participants received the initial test. That is, some of the Swahili words appeared alone. Participants were then asked to retrieve the English translation. Finally, they completed the final assessment, similar to the first test, but comprising more words.

 

During the initial test, after each correct answer, no picture, a blank picture, or a distressing picture was presented. If the picture was distressing, participants were especially likely to remember the English translation of that word in the final assessment as well. That is, these upsetting photographs magnified the testing effect. These distressing pictures were effective even if presented 2 seconds after the words appeared—but only if the participants had retrieved the right answer.

 

According to Finn and Roediger (2011), these emotional images activate limbic regions such as the amygdala, activating the hippocampus. The hippocampus facilitates the capacity of individuals to remember these pairs of words later. Nevertheless, emotional pictures during the learning of these words can be disruptive, because attention may be diverted from the material that needs to be memorized.

 

Related techniques: The clicker technique

The clicker technique is a teaching method that has been shown to facilitate learning. During a workshop, the instructors present multiple-choice questions, assessing knowledge or opinions. The students or participants use a hand-held device, called a clicker, to indicate their response. The distribution of responses is then presented. This technique has been shown to expedite the rate of learning (Anderson, Healy, Kole, & Bourne, 2013).

 

The clicker technique offers two key benefits. First, instructors do not need to devote time to material that participants already know. Second, the clicker technique generates the same benefits as other tests. The combination of studying vital material and retesting has been shown to be more beneficial than either facet alone (Anderson, Healy, Kole, & Bourne, 2013). Anderson, Healy, Kole, and Bourne (2013) showed the clicker technique promotes learning that generalizes to other domains and compresses the time needed to teach material.

 

Practical implications

After individuals learn material, they should receive an optional test. The test could include some open-book or multiple-choice questions, to facilitate initial confidence, as well as closed-book questions, involving short answers, to facilitate future retention. Feedback should be presented after the test is completed. 

 

This testing effect can be applied in many circumstances.  For example, as Jonsson et al. (2021) revealed, the degree to which the testing effect is useful does not depend on the cognitive intelligence of students.  Specifically, when individuals retrieve information, activation of the inferior frontal gyrus in the brain increases.  This increase is more pronounced if individuals had tested themselves earlier.  The left inferior frontal gyrus tends to activate semantic representations that are distributed across the brain, vital to retrieval.

 

The testing effect, although robust, might not be useful in all topics.  For example, as Ferreira and Wimber (2021) revealed, the testing effect tends to be more pronounced when students need to learn verbal information rather than pictorial or spatial information. More precisely, if students need to memorize a series of abstract shapes, the testing effect dissipates.  If students need to memorize a set of meaningful images, the testing effect is observed. 

 

References

  • Agarwal, P. K., Karpizke, J. D., Kang, S. H. K., Roediger III, H. L., & McDermott, K. B. (2008). Examining the testing effect with open- and closed-book tests. Applied Cognitive Psychology, 22, 861-876.

  • Anderson, L. S.., Healy, A. F., Kole, J. A., & Bourne, I. E. (2013). The clicker technique: Cultivating efficient teaching and successful learning. Applied Cognitive Psychology, 27, 222-234. doi: 10.1002/acp.2899

  • Baillie, C., & Toohey, S. (1997). The power test: Its impact on student learning in a materials science course for engineering. Assessment & Evaluation in Higher Education, 22, 33-49.

  • Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe, & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185-205). Cambridge, MA: MIT Press.

  • Bjork, R. A. (1999). Assessing our own competence: Heuristics and illusions. In D. Gopher, & A. Koriat (Eds.), Attention and performance XVII. Cognitive regulation of performance: Interaction of theory and application (pp. 435-459). Cambridge, MA: MIT Press.

  • Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35-67). Hillsdale, NJ: Erlbaum.

  • Butler, A. C., & Roediger, H. L. (2007). Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology,

  • Butler, A. C., Marsh, E. J., Goode, M. K., & Roediger, H. L. (2006). When additional multiple-choice lures aid versus hinder later memory. Applied Cognitive Psychology, 20, 941-956.

  • Campbell, J., & Mayer, R. E. (2009). Questioning as an instructional method: Does it affect learning from lectures. Applied Cognitive Psychology, 23, 747-759.

  • Carpenter, S. K., Pashler, H., & Vul, E. (2006). What types of learning are enhanced by a cued recall test? Psychonomic Bulletin & Review, 13, 826-830.

  • Chan, J. C. K., McDermott, K. B., & Roediger, H. L. (2006). Retrieval-induced facilitation: Initially nontested material can benefit from prior testing of related material. Journal of Experimental Psychology: General, 135, 553-571.

  • Cnop, I., & Grandsard, F. (1994). An open-book exam for non-mathematics majors. International Journal of Mathematical Education in Science and Technology, 25, 125-130.

  • Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for judgments of learning (JOL) and the delayed-JOL effect. Memory & Cognition, 20, 374-380.

  • Eilertsen, T. V., & Valdermo, O. (2000). Open-book assessment: A contribution to improved learning? Studies in Educational Evaluation, 26, 91-103.

  • Feller, M. (1994). Open-book testing and education for the future. Studies in Educational Evaluation, 20, 235-238.

  • Ferreira, C. S., & Wimber, M. (2021). The testing effect for visual materials depends on pre-existing knowledge.

  • Finn, B., & Roediger, H. L. (2011). Enhancing retention through reconsolidation: Negative emotional arousal following retrieval enhances later recall. Psychological Science, 22, 781-786. doi:10.1177/0956797611407932

  • Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221-233.

  • Glover, J. A. (1989). The testing phenomenon: Not gone, but nearly forgotten. Journal of Educational Psychology, 81, 392-399.

  • Hinze, S. R., & Rapp, D. N. (2014). Retrieval (sometimes) enhances learning: performance pressure reduces the benefits of retrieval practice. Applied Cognitive Psychology, 28, 597-606. doi: 10.1002/acp.3032

  • Hogan, R. M., & Kintsch, W. (1971). Differential effects of study and test trials on long-term recognition and recall. Journal of Verbal Learning & Verbal Behavior, 10, 562-567.

  • Ioannidou, M. K. (1997). Testing and life-long learning: Open-book and closed-book examination in a university course. Studies in Educational Evaluation, 23, 131-139.

  • Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. Journal of Verbal Learning & Verbal Behavior, 17, 649-667.

  • Jonsson, B., Wiklund-Hörnqvist, C., Stenlund, T., Andersson, M., & Nyberg, L. (2021). A learning method for all: The testing effect is independent of cognitive ability. Journal of Educational Psychology, 113(5), 972.

  • Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test format and corrective feedback modify the effect of testing on long-term retention. European Journal of Cognitive Psychology, 19, 528-558.

  • Karpicke, J. D., & Roediger, H. L. (2007a). Expanding retrieval promotes short-term retention, but equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 704-719.

  • Karpicke, J. D., & Roediger, H. L. (2007b). Repeated retrieval during learning is the key to long-term retention. Journal of Memory and Language, 57, 151-162.

  • Koriat, A. (1997). Monitoring one's own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126, 349-370.

  • Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. K. (2004). Predicting one's own forgetting: The role of experience-based and theory-based processes. Journal of Experimental Psychology: General, 133, 643-656.

  • Koriat, A., Sheffer, L., & Ma'ayan, H. (2002). Comparing objective and subjective learning curves: Judgments of learning exhibit increased underconfidence with practice. Journal of Experimental Psychology: General, 131, 147-162.

  • Kuhl, J. (2000). A functional-design approach to motivation and volition: The dynamics of personality systems interactions. In M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Self-regulation: Directions and challenges for future research (pp. 111-169). New York: Academic Press.

  • Mayer, R. E. (1975). Forward transfer of different reading strategies due to test-like events in mathematics text. Journal of Educational Psychology, 67, 165-169.

  • Mayer, R. E. (2001). Multimedia learning. New York: Cambridge University Press.

  • Mayer, R. E. (2005). The cognitive theory of multimedia learning. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 31-48). New York: Cambridge University Press.

  • Mayer, R. E. (2008). Learning and instruction (2nd ed.). Upper Saddle River, NJ: Merrill Prentice Hall Pearson.

  • McDaniel, M. A., & Masson, M. E. J. (1985). Altering memory representations through retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 371-385.

  • McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19, 494-513.

  • McDaniel, M. A., Roediger, H. L., & McDermott, K. B. (2007). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin & Review, 14, 200-206.

  • Modigliani, V. (1976). Effects on a later recall by delaying initial recall. Journal of Experimental Psychology: Human Learning & Memory, 2, 609-622.

  • Mulligan, N. W., Buchin, Z. L., & West, J. T. (2021). Attention, the testing effect, and retrieval-induced forgetting: Distraction dissociates the positive and negative effects of retrieval on subsequent memory. Journal of Experimental Psychology: Learning, Memory, and Cognitio

  • Pashler, H., Zarow, G., & Triplett, B. (2003). Is temporal spacing of tests helpful even when it inflates error rates? Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1051-1057.

  • Pauker, J. D. (1974). Effect of open book examinations on test performance in an undergraduate child psychology course. Teaching of Psychology, 1, 71-73.

  • Roediger, H. L., & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181-210.

  • Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249-255.

  • Roediger, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155-1159.

  • Schmidt, R. A., Young, D. E., Swinnen, S., & Shapiro, D. C. (1989). Summary knowledge of results for skill acquisition: Support for the guidance hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 352-359.

  • Shobe, E. (2022). Achieving testing effects in an authentic college classroom. Teaching of Psychology, 49(2), 164-175.

  • Sun, J., Liu, Y., & Guo, C. (2022). The impacts of the processing levels on testing effect. NeuroReport, 33(9), 369-379.

  • Theophilides, C., & Dionysiou, O. (1996). The major functions of the open-book examination at the university level: A factor analytic study. Studies in Educational Evaluation, 22, 157-170.

  • Theophilides, C., & Koutselini, M. (2000). Study behavior in the closed-book and open-book examination: A comparative analysis. Educational Research and Evaluation, 6, 379-393.

  • Wheeler, M. A., Ewers, M., & Buonanno, J. F. (2003). Different rates of forgetting following study versus test trials. Memory, 11, 571-580.

  • Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: Effects of spacing. Journal of Verbal Learning & Verbal Behavior, 16, 465-478.

  • Wong, S. S. H., & Lim, S. W. H. (2022). A mind-wandering account of the testing effect: Does context variation matter? Psychonomic Bulletin & Review, 29(1), 220-229.

White Structure

How examiners assess the quality of theses

by Simon Moss

Introduction

Graduate researchers, such as doctoral candidates, generally need to submit a thesis.  Usually, two or more specialists in the field will examine this thesis.  Because this examination is the main determinant of whether the degree will be conferred, candidates and their supervisors are obviously interested in the main qualities that examiners judge, or the criteria they utilize, to evaluate this thesis. 

 

One complication is the qualities that examiners judge may vary across nations and disciplines as well as depend on the experience and inclinations of individuals.  To illustrate, as Chetcuti et al. (2022) revealed, the criteria that examiners utilize to evaluate theses varies across fields of research.  For example, in all disciplines, examiners judge whether the thesis significantly contributes to the literature.  However, definitions of significant contribution differ between the humanities and the sciences.  Typically

 

  • in the humanities, significant contribution refers to research that enhances theoretical knowledge in the field

  • in the sciences and health, significant contribution often refers more to studies that benefit society.    

 

This variation across nations and disciplines may concern candidates and their supervisors because

 

  • at least one of the examiners is likely to live in another nation; indeed, in about 25% of Australian universities, one examiner must be from another nation

  • research is increasingly likely to span multiple disciplines

 

Fortunately, as Golding et al. (2014) underscored, the main qualities that examiners prioritize seems relatively consistent across the globe. Indeed, Golding et al. suggested that examiners tend to be consistent with one another.  They tend to be, to a significant extent, sensitive to the same qualities—regardless of the instructions they receive, the discipline in which they study, or their experience in examinations. 

 

Even their recommendations tend to be consistent. As Holbrook et al. (2008) revealed, in 96% of the theses they examined, the recommendations of examiners were consistent.  That is, in 96% of these occasions, examiners both approved the thesis with no revisions or minor revisions or both recommended the thesis be revised and resubmitted or failed.  If the examiners observe an oral presentation, this level of agreement tends to increase.  This level of consistency, however, declines when the research is interdisciplinary (Mitchell & Willetts 2009).

These qualities may not be as relevant to exegeses of creative works.  Therefore, this document will not discuss thesis by creative output. 

 

How examiners judge theses: Overview of the qualities they prioritize

To clarify the expectations of theses, many institutions distribute precise instructions to examiners.  For example, in about half the university of Australia, examiners receive instructions that specify the criteria they should apply to evaluate the thesis (Dally et al., 2019).  These criteria often include

 

  • a systematic, comprehensive review of the literature

  • methodology that is rigorous and relevant to the research question

  • a discussion that connects the results to the research question

  • at least some work that could be published

  • writing that is coherent, accurate, concise, and authoritative.  

  • demonstration of the capacity to apply research skills and knowledge independently

 

Yet, Chetcuti et al. (2022) show, these criteria do not encompass all the principles that examiners apply to evaluate theses.  That is, examiners have typically decided which qualities they will assess before they receive these instructions.  Fortunately, many researchers have attempted to characterize the main qualities of theses to which examiners are especially sensitive (e.g., Chetcuti et al., 2022). Golding et al. (2014), after completing a review of 30 publications on this topic, divided these qualities into five key themes.

 

First, examiners are sensitive to the degree to which a thesis is coherent.  They want to feel that all of the arguments are arranged in a logical order but all cohere around one argument that evolves gradually across the thesis.  Yet, because examiners often read the thesis in spurts, they appreciate short reiterations, summaries, and signposts the explicitly connect each section—such as a chapter—to both the previous section as well as the overarching argument. 

 

Second, examiners are sensitive to the degree to which candidates seem engaged in the literature.  Rather than only evaluate whether the candidate seems informed—that is, attuned to the main theories, findings, approaches, and perspectives on the topic—examiners also appraise whether the candidate has reviewed the literature critically (Holbrook et al., 2007).  As evidence of this critical engagement, examiners consider

 

  • whether the candidate has referred to the similarities or differences between diverse strands of literature; some sections should include studies from diverse fields but on an overlapping topic

  • whether the candidate has arranged and discussed various studies or theories in a unique order, uncovering a novel perspective

  • whether the candidate has proposed a distinct interpretation of past research, such as generated a conclusion from multiple studies—a conclusion that no single project would have unearthed

  • whether the candidate has referred to some limitations of past studies, such as alluded to ambiguity about the cause and effect of some statistical association

  • whether the candidate has underscored some of the conflicting findings or perspectives in the literature

 

Ultimately, according to examiners, the literature should achieve several goals simultaneously.  In particular, the literature should present an engaging narrative or story that

 

  • justifies the importance, prevalence, or implications of the problem or issue the research is designed to solve

  • outlines past attempts—such as theories, interventions, or approaches—to solve this issue and to summarizes the evidence that supports or conflicts with these attempts

  • ends with a summary and justification of the approach this research will adopt, and

  • if possible, includes some criteria to judge the contribution of this thesis—especially important in qualitative research

 

Third, in some instances, candidates will adopt an approach that is not especially common or inevitable in their field.  They might, for example, utilize social network analysis or grounded theory in disciplines in which these approaches are not especially common.  In these instances, examiners are sensitive to the extent to which the candidates

 

  • outlined the benefits of this approach in this instance

  • outlined the limitations of this approach as well as measures to limit or manage these limitations

  • followed this approach diligently and rigorously; that is, some candidates justify one approach but then deviate from this methodology

 

These practices are especially important when the approach the candidate adopted diverges from the approach the examiner usually espouses.  Although examiners usually accept when candidates apply an approach they do not espouse (Mullins & Kiley, 2002), these examiners may be more inclined to recognize limitations of this approach.

 

Fourth, examiners tend to prefer theses that critically evaluate the findings, often in the discussion section. In particular, they evaluate the extent to which candidates

 

  • are aware of limitations in the design and methods of this research—and consider how these limitations might affect the reliability and validity of the key findings

  • suggest how future research could address or circumvent these limitations

  • derive conclusions from the findings holistically rather than merely list and interpret each result in turn

 

Fifth, to pass a thesis, examiners consider whether the research is publishable.  To reach this choice, examiners determine whether the thesis imparts an original contribution.  To gauge whether a thesis imparts an original contribution, examiners seek evidence such as whether the candidates

 

  • utilized a method, theory, or concept that is common in some fields of research but has not been applied to address this specific problem or topic

  • integrated literature, data, or arguments that have not been combined before, generating novel conclusions

  • uncovered original findings

  • could publish some of the research—although only about half the examiners interviewed in one study indicated they are swayed by the inclusion of publications (Mullins & Kiley, 2002)

 

How examiners judge theses: Practices that bias these judgments

Several common practices or tendencies of examiners, however, may bias or affect how examiners judge or evaluate these qualities.  First, most examiners assume the thesis will pass and hope the thesis will pass, recognizing the thesis was supervised and that failure could decimate the wellbeing of candidates. Less than 1% of examiners recommend a fail (Lovat et al., 2008).

 

Second, the first set of pages that examiners skim—such as the abstract, introductory chapter, or first few paragraphs of the discussion or conclusion—shape their judgment of quality and may even override their expectation the thesis will pass.  That is, examiners very rapidly determine whether they are likely to appreciate the thesis (Carter, 2008).  This first impression can significantly bias their judgment of this thesis.  If these pages reveal a problem, the examiner might read the remainder of this thesis more assiduously and critically.

 

Third, because examiners often read a thesis during evenings or weekends, they are sensitive to the extent to which the thesis is enjoyable to read.  Enjoyment derives not only from sound arguments, organized coherently, but also the degree to which thesis is interesting (Johnston, 1997).  When the thesis includes some interesting studies or insights, examiners are absorbed in the material rather than judgmental.  Conversely, when the thesis includes many presentation errors, such as spelling mistakes, examiners feel more irritated than absorbed (Johnston, 1997).  They become more inclined to judge the material harshly. 

 

Specifically, according to transportation theory, when individuals read a compelling narrative, they often feel they have been transported into the world this narrative depicts.  They imagine the events vividly, feel the emotions of this world, and become oblivious to their surroundings (Green, 2004; Green & Brock, 2000; Thompson & Haddock, 2012).  In this transported state, individuals feel embedded rather than detached from this world.  Accordingly, they are not as likely to judge the work like an independent critic (Green, 2004; Green & Brock, 2000).  Therefore, if the examiners feel absorbed in the research, they might experience this transported state, and be more inclined to accept, rather than challenge, the key arguments.   

 

Practices that might bias examiners: Acknowledgements

Most PhD candidates and graduate researchers include an acknowledgements section in their thesis, in which they express gratitude to their supervisors, funding bodies, and people who supported their journey.  Many supervisors believe this section could bias examiners and thus often guide their candidates on how to write acknowledgements section (Kumar & Sanderson, 2020).  Specifically, supervisors occasionally

 

  • remind the candidate this statement is public and should thus be written professionally rather than frivolously, eccentrically, or too informally

  • remind the candidate to refrain from breaching confidentiality, such as identifying people who may prefer to remain anonymous

  • remind the candidate to show gratitude to funders, assistants, and other supervisors

  • help the candidate demonstrate their writing skills—such as the capacity to write concisely, precisely, and coherently.

 

Kumar and Sanderson (2020) explored whether the acknowledgements in theses may indeed affect the judgments and evaluations of examiners.  To explore this question, the researchers distributed an online qualitative survey to academics at a New Zealand university, who then shared this survey with doctoral examiners in their network.  Ultimately, 145 doctoral examiners, from around the globe, completed the survey.  The survey prompted these examiners to specify whether they read the acknowledgements section and, if so, to convey their responses to this section.  

 

The study revealed that 86% of the respondents indicated they do read the acknowledgements section, often before they read the main text.  Their primary motivation to read this section was to

 

  • appreciate the humans and the experiences that underpinned this research

  • ascertain the quality of research, because limited acknowledgement of supervisors often suggests inadequate support, training, and skills

  • check the research was conducted independently and the candidate did not depend unduly on the contributions of other people

 

Some examiners read the acknowledgements section after reading the text, often to integrate their knowledge of the research with the circumstances or setting implied by the acknowledgements.  Regardless of when they read this section, examiners who read the acknowledgements tended to extract three key sources of information

 

  • the overall journey that characterizes the experience of candidates

  • the relationship with supervisors or other challenges, and

  • the level of gratitude to friends, assistance, and the institution

 

The respondents who did not read this section felt the acknowledgements are personal accounts that are unrelated to the quality of this work.  These examiners were more likely to conduct research in the sciences rather than social sciences.

 

About 70% of these examiners maintained the acknowledgements do not affect their judgments and evaluations, even if this section does initially elicit a positive impression.  Yet, many examiners also conceded they cannot be certain of whether the acknowledgements may inadvertently bias these judgments or recognize they evaluate the thesis holistically and the acknowledgments section is part of this holistic evaluation.  Some examiners also recognized that acknowledgement sections that demonstrate limited humility and gratitude can diminish their tolerance to errors.  Therefore

 

  • the degree to which acknowledgements bias examiners is uncertain and warrants further quantitative research

  • but these sections are likely to affect the judgment of some examiners at least to a modest extent

 

Practices that might impress examiners: Publications

In most nations, doctoral candidates may include one or more of their publications in the thesis, provided this publication is relevant to the overarching research question.  These publications signify that reviewers had perceived the research as rigorous and valuable enough to publish, potentially impressing the examiner. 

 

Sharmini and Kumar (2018) explored the responses of examiners to theses that included publications.  Specifically, the researchers scrutinized 12 examination reports, surveyed 62 examiners of these theses, and interviewed 15 examiners as well.  This analysis revealed some important patterns. 

 

First, when academics examine theses that include publications, they often suggest corrections to published work—and usually expect candidates to introduce these changes, even if they cannot change the actual publications.  That is, examiners believe that reviewers may have overlooked concerns in these publications, because these reviewers had not read the other chapters and are thus not as informed about the work.  If the candidate is unable to change the publication, the examiners prefer the candidate address these concerns in the general discussion. 

 

Second, in the report, the feedback was often directive, in which the examiners prompted the candidate to address some concern.  Most often, this directive feedback comprised specific instructions—or, to a lesser extent, questions that prompt the candidate to reconsider and to clarify some of their arguments.  Less than 20% of this directive feedback were merely suggestions.

 

In addition, the feedback was also referential, in which the examiner referred the candidates to other books, articles, or sources to read.  For example, examiners often referred the candidate to conventions on how to write and to present the thesis better as well as articles about specific theories or methods. 

 

Interestingly, examiners seldom expressed feedback about the structure or organization of this thesis.  Conceivably, when publications are included, examiners assume the thesis might not be arranged as coherently and accept this limitation.  Despite this difference, most examiners felt the detail of feedback they write does not depend on whether the thesis includes publications. 

 

Practices that might impress examiners: Standards to evaluate the research

One complication that candidates often experience is that research practices that some researchers embrace, such as the need to read the literature thoroughly before conducting interviews, other researchers denigrate.  Candidates are often uncertain which practices their examiners will prefer. 

 

However, candidates should not attempt to accommodate each reviewer or examiner.  Instead, especially in qualitative research, they should

 

  • identify one or more guidelines that specifies the principles or standards they should follow to demonstrate rigor

  • outline these guidelines in their thesis and demonstrate how they fulfilled these guidelines.

 

Sometimes, these guidelines are specific to a particular methodology or method. If they want to conduct a reflexive thematic analysis for example, Braun and Clarke (2006) stipulates a series of criteria they should fulfill to demonstrate rigor.  Indeed, most publications that outline an approach will stipulate the standards that researchers should observe.  Alternatively, if their research integrates several methodologies in qualitative research, they might instead invoke more generic standards, stipulated for example in the work of

 

  • Akkerman et al. (2006).

  • Koch (2006).

  • Lincoln (1995)

  • Shenton (2004)

 

In addition, when candidates write their thesis, they should comply with the standards that correspond to qualitative research, randomized control trials, quasi-experiments, or observational studies respectively, depending on the designs of their research.  Specifically, they should utilize the standards or conventions that are called

 

  • COREQ if the research is qualitative (Tong et al., 2007; also see Levitt et al., 2018)

  • CONSORT if the research is a randomized control trial in which individuals are randomly assigned to conditions (e.g., Moher et al., 2001)

  • TREND if the research is quasi-experimental—in which the conditions are not randomly assigned (e.g., Haynes et al., 2021)

  • STROBE if the research is observational or correlational, in which participants are not separated into distinct conditions (e.g., Von Elm et al., 2014)

 

References

  • Akkerman, S., Admiral, W., Brekelmans, M. and Oost, H. (2006). Auditing quality of research in social sciences. Quality and Quantity, 42(2).

  • Bourke, S. (2007). Ph. D. thesis quality: the views of examiners. South African Journal of Higher Education, 21(8), 1042-1053.

  • Bourke, S., Hattie, J., & Anderson, L. (2004). Predicting examiner recommendations on Ph. D. theses. International journal of educational research, 41(2), 178-194.

  • Bourke, S., & Holbrook, A. P. (2013). Examining PhD and research Masters theses. Assessment & Evaluation in Higher Education, 38(4), 407-416.

  • Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology.  Qualitative Research in Psychology, 3, 77-101.

  • Carter, S. (2008). Examining the doctoral thesis: A discussion. Innovations in Education and Teaching International, 45(4), 365-374.

  • Chetcuti, D., Cacciottolo, J., & Vella, N. (2022). What do examiners look for in a PhD thesis? Explicit and implicit criteria used by examiners across disciplines. Assessment & Evaluation in Higher Education.

  • Dally, K., Holbrook, A., Graham, A., & Lawry, M. (2004). The processes and parameters of Fine Art PhD examination. International Journal of Educational Research, 41(2), 136-162.

  • Dally, K., Holbrook, A., Lovat, T., & Budd, J. (2019). Examiner feedback and Australian doctoral examination processes. Australian Universities' Review, The, 61(2), 31-41.

  • Golding, C., Sharmini, S., & Lazarovitch, A. (2014). What examiners do: What thesis students should know. Assessment & Evaluation in Higher Education, 39(5), 563-576.

  • Grabbe, L. L. (2003). The trials of being a PhD external examiner. Quality Assurance in Education.

  • Green, M. C. (2004). Transportation into narrative worlds: The role of prior knowledge and perceived realism. Discourse Processes, 38, 247-266.

  • Green, M. C., & Brock, T. C. (2000). The role of transportation in the persuasiveness of public narratives. Journal of Personality and Social Psychology, 79, 701-721.

  • Haynes, A. B., Haukoos, J. S., & Dimick, J. B. (2021). TREND reporting guidelines for nonrandomized/quasi-experimental study designs. JAMA surgery, 156(9), 879-880.

  • Holbrook, A., Bourke, S., Fairbairn, H., & Lovat, T. (2007). Examiner comment on the literature review in Ph. D. theses. Studies in Higher Education, 32(3), 337-356.

  • Holbrook, A., Bourke, S., Fairbairn, H., & Lovat, T. (2014). The focus and substance of formative comment provided by PhD examiners. Studies in Higher Education, 39(6), 983-1000.

  • Holbrook, A., Bourke, S., Lovat, T., & Dally, K. (2004). Qualities and characteristics in the written reports of doctoral thesis examiners. Australian Journal of Educational & Developmental Psychology, 4, 126-145.

  • Holbrook, A., Bourke, S., Lovat, T., & Fairbairn, H. (2008). Consistency and inconsistency in PhD thesis examination. Australian journal of education, 52(1), 36-48.

  • Holbrook, A., St George, J., Ashburn, L., Graham, A., & Lawry, M. (2006). Assessment practice in fine art higher degrees. Media International Australia, 118(1), 86-97.

  • Johnston, S. (1997). Examining the examiners: An analysis of examiners' reports on doctoral theses. Studies in higher education, 22(3), 333-347.

  • Kiley, M., & Mullins, G. (2004). Examining the examiners: How inexperienced examiners approach the assessment of research theses. International Journal of Educational Research, 41(2), 121-135.

  • Koch, T. (2006). Establishing rigour in Qualitative Research: the decision trail. Journal of Advanced Nursing. 53, (1), 91-103.

  • Kumar, V., & Sanderson, L. J. (2020). The effects of acknowledgements in doctoral theses on examiners. Innovations in Education and Teaching International, 57(3), 285-295.

  • Levitt, H. M., Bamberg, M., Creswell, J. W., Frost, D. M., Josselson, R., Suárez-Orozco, C. (2018).   Journal article reporting standards for qualitative primary, qualitative meta-analytic, and mixed methods research in psychology: The APA publications and communications board task force report.  American Psychologist, 73(1), 26-46.  doi: 10.1037/amp0000151

  • Lincoln, Y. S. (1995). Emerging criteria for quality in qualitative and interpretive research. Qualitative Inquiry, 1, 275–289.

  • Lovat, T., Holbrook, A., & Bourke, S. (2008). Ways of knowing in doctoral examination: How well is the doctoral regime? Educational Research Review, 3(1), 66-76.

  • Mason, S. (2018). Publications in the doctoral thesis: challenges for doctoral candidates, supervisors, examiners and administrators. Higher Education Research & Development, 37(6), 1231-1244.

  • Mitchell, C., & Willetts, J. (2009). Quality criteria for inter-and trans-disciplinary doctoral research outcomes. Prepared for ALTC Fellowship: Zen and the Art of Transdisciplinary Postgraduate Studies. Sydney, Australia: Institute for Sustainable Futures, University of Technology, Sydney.

  • Moher, D., Schulz, K. F., Altman, D. G., & Consort Group. (2001). The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials.

  • Mullins, G., & Kiley, M. (2002). 'It's a PhD, not a Nobel Prize': how experienced examiners assess research theses. Studies in Higher Education, 27(4), 369-386.

  • Ramlall, S., Singaram, V. S., & Sommerville, T. E. (2019). Doctorates by thesis and publication in clinical medicine: an analysis of examiners' reports. Perspectives in Education, 37(1), 130-147.

  • Sharmini, S., & Kumar, V. (2018). Examiners’ commentary on thesis with publications. Innovations in Education and Teaching International, 55(6), 672-682.

  • Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects.  Education for Information, 22, 63-75 63.

  • Starfield, S., Paltridge, B., McMurtrie, R., Holbrook, A., Bourke, S., Fairbairn, H., ... & Lovat, T. (2015). Understanding the language of evaluation in examiners’ reports on doctoral theses. Linguistics and Education, 31, 130-144.

  • Tan, W. C. (2022). Doctoral examiners’ narratives of learning to examine in the PhD viva: a call for support. Higher Education.

  • Thompson, R., & Haddock, G. (2012). Sometimes stories sell: When are narrative appeals most likely to work? European Journal of Social Psychology, 42, 92-102.

  • Tong, A., Sainsbury, P., & Craig, J. (2007). Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. International journal for quality in health care, 19(6), 349-357.

  • Von Elm, E., Altman, D. G., Egger, M., Pocock, S. J., Gøtzsche, P. C., Vandenbroucke, J. P., & Strobe Initiative. (2014). The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: Guidelines for reporting observational studies. International journal of surgery, 12(12), 1495-1499.

  • Wellington, J. (2020). Examining doctoral work: exploring principles, criteria and processes. Routledge.

White Structure

Banks of comments to deliver feedback more efficiently

by Simon Moss

Introduction: The impediments to exemplary feedback

Most students and educators recognize the importance of feedback on assignments.  As many researchers and scholars argue (Hattie & Clarke, 2019; Hattie et al., 2021), feedback is necessary to

 

  • justify the marks that students receive, ultimately to foster a sense of fairness

  • impart some insights on how students can improve their performance in the future

  • correct misguided assumptions and beliefs

  • motivate and inspire students to study effectively as well as to improve or to protect the self-esteem of these individuals

 

Yet, the quality and impact of feedback differs considerably across institutions, instructors, students, and assignments.  To illustrate, many students receive no feedback at all or minimal feedback (Elkins,

2016).  Other students receive feedback but either disregard the comments, especially if the feedback is delayed, or feel uncertain about how to apply these comments to improve their work in the future (Elkins, 2016). 

 

To override these concerns, researchers have identified the features of exemplary feedback.  To illustrate

 

  • feedback should delineate precise actions or changes that students can apply in the future; otherwise, students tend to bias their attention to positive feedback (Hattie et al., 2016)  

  • feedback should be concise and, if possible, simple rather than overwhelming (Hattie & Clarke, 2019; Shute, 2008)

  • feedback should reinforce the capacity of students to learn and to improve, to foster a learning orientation (Shute, 2008)

  • feedback should clarify how the work of students complies or does not comply with specific marking criteria

  • feedback should revolve around specific paragraphs, sentences, or actions rather than refer to the strengths or limitations of the student more generally (Shute, 2008)

  • feedback should be personalized rather than generic

 

Unfortunately, several impediments compromise the feedback that educators often write.  Specifically

 

  • educators often feel too busy to deliver exemplary, personalized feedback

  • educators are not granted opportunities to learn how to deliver feedback more effectively and consistently

 

To illustrate, Henderson-Brooks (2016) revealed how the feedback that educators deliver is often unhelpful.  In particular, as Henderson-Brooks revealed, after a thorough analysis of comments, assessors often deliver feedback about problems with specific words or punctuation rather than how to organize and arrange the arguments better.  When the comments did revolve around how to organize the arguments, students were more likely to pass after they resubmitted the assignment later.  

 

Banks or databases of feedback comments may address these problems, at least partially.  Typically,

 

  • the educators are granted access to a bank or database of comments, relevant to a specific assignment or type of assignment

  • the educators can extend or improve this bank

  • the educators utilize some tool in which they can insert these comments into the assignments they are grading or insert additional comments as well

 

These banks not only expedite feedback but also expose these educators to exemplary feedback practices.  Consequently, many institutions have introduced banks to facilitate marking.

 

Introduction: Examples of banks

To develop and to utilize these banks of feedback comments, educators can utilize a range of tools.  One of the most common tools is called Feedback Studio, previously known as GradeMark.  This product will often accompany Turnitin—software that was originally developed to identify plagiarism in assignments. Specifically, Feedback Studio comprises a series of banks or sets of comments, developed previously, called QuickMark comments. When markers utilize Feedback Studio, they can

 

  • insert additional comments into these banks

  • insert these comments into the assignments they are grading—while also writing other comments or highlighting, deleting, and modifying the text the students had submitted

 

For example, to utilize these banks of comments, the educators would

 

  • click the QuickMark icon to activate this feature

  • identify a relevant set of comments—such as “comments about formatting” or “comments about punctuation”

  • if they want to insert a comment, drag and drop the comment on the assignment

  • if they want to write their own comment, click the Comment Bubble icon

  • if they want to save and store this comment in another bank, click Save and so forth

 

Besides Feedback Studio, institutions may consider a range of other tools or options to utilize and to develop banks of feedback comments.  For example, they might

 

  • utilize and extend the comments that are available in a public bank, accessible at www.thefeedbackbank.com

  • develop their own tools

  • utilize the features of Microsoft Word or other prevalent software

 

To illustrate, rather than purchase additional software, Biggam (2010) described a case study in which the institution merely

 

  • created a bank of potential comments in Microsoft Word

  • created macros that enable markers to retrieve and to insert a relevant comment from this bank

 

Case Study: Application of Feedback Studio

Research has revealed how Feedback Studio, and the corresponding banks or libraries of comments, when embedded in an overarching program, can benefit students.  For example, Graham et al. (2022) reported a study that demonstrated the benefits of Feedback Studio coupled with workshops to engage students in marking criteria. 

 

Specifically, this initiative was designed to inspires students to engage, rather than to disregard, marking criteria and rubrics.  The goal was to encourage students to apply the criteria or rubric to evaluate, and ultimately to improve, their work.  Nevertheless, to limit the likelihood that students would apply these criteria or rubrics mechanically, rather than learn from this exercise, the university encouraged teachers to deliver more personalized and detailed feedback—feedback that does not merely outline errors but instead prompts students to pose questions and to explore other avenues. 

 

The problem, however, is that academic staff are often inundated with demands and, thus, too busy to delivery such personalized and detailed feedback effectively.  To override this problem, the university utilized Feedback Studio, enabling the educators to embed preset comments into the feedback. 

 

During this initiative, undergraduate students in various courses, enrolled at Newcastle University UK, attended three sessions, designed to help these individuals understand the assessment and feedback of subsequent assignments.  During the first session, students were prompted to discuss their perceptions of the goals or purpose of feedback, the features of helpful feedback, and how they will utilize this feedback.  According to students

 

  • feedback should help students improve, justify the mark, and show interest in the work the student had submitted

  • even students who receive high marks should receive feedback on why they achieved a good mark and how to develop further

  • students preferred comments that could be applied to other assignments in the future—rather than comments that were too specific to a particular task

  • students did not like repeated comments about, for example, misused semicolons

  • most students preferred comments that were related to specific marking criteria

 

Second, after they received instructions about a specific assignment, students attended another session.  They discussed how they could apply the marking criteria to improve example assessments, submitted by previous students. Most students perceived these sessions as helpful, clarifying the marking criteria.

After participants submitted their assignments, the educators used Feedback Studio to apply a rubric to grade the work, to insert preset comments, to add other personalized comments, and to summarize their overall impression.  Coordinators had developed a bank of comments that were relevant to the marking criteria and fulfilled the other preferences that students had expressed about feedback.

 

Finally, students attended a session that was designed to evaluate this initiative, including the utility of these marking criteria and the application of Feedback Studio.  According to students

 

  • the comments they received were more encouraging than expected

  • the comments were more relevant to the marking criteria

  • they felt they received helpful comments about grammar and style

  • almost 80% of students would like educators to use this tool in the future

  • nevertheless, about half the students felt the comments were not specific enough to their work.  

 

Similarly, in a previous study, Buckley and Cowap (2013) evaluated the benefits of Turnitin, coupled with GradeMark, now called Feedback Studio, to mark assignments and to deliver feedback online.  In particular, 160 undergraduate students, studying psychology, submitted three assignments online.  Staff used Turnitin and GradeMark to identify plagiarism and to deliver feedback.  Although staff felt the QuickMark Comment feature saved time and was very useful, some participants indicated they could not readily add the annotated comments to the location they wanted—and the operation was slightly cumbersome.

 

Case Study: Development of customized tools

Rather than utilize Feedback Studio, some institutions have developed a tool to store a bank of comments and to insert these comments into assignments.  For example, Barker (2011) developed a tool, over several iterations, that presents feedback to students on each question of their assignments.  The tool was designed to deliver feedback on assignments in which students need to complete practical tasks, such as present a sequence of photos to attract customers. 

 

When designing this tool, Barker (2011) recognized that students may commit a range of errors when they complete multifaceted activities, integrating several tasks.  Therefore, educators could not readily develop feedback that was relevant to all students.  To overcome this problem, multifaceted activities were divided into more specific tasks.  For each task, the final version of the tool enabled educators to

 

  • choose a mark—a choice that would automatically generate some relevant feedback

  • amend this feedback if necessary

  • insert other feedback, such as overall comments towards the end of this assignment

 

Students appreciated the automated feedback, as a subsequent evaluation survey revealed.  According to students

 

  • the tool enabled these individuals to receive feedback more promptly than in previous assignments

  • the comments were detailed and helpful, clarifying the shortfalls and identifying how to override these problems in the future

 

The assessors also valued the tool.  They felt the automated feedback expedited the marking, decreasing the time they dedicate to this task appreciably.

 

Case Study: Benefits to educators

Some research, however, reveals that banks of feedback comments do not only expedite marking but also provide other benefits to staff. To illustrate, Schneider (2021) conducted an action research study to explore how a bank of comments, developed to facilitate the delivery of feedback, influences the efficacy, attitudes, and experience of 18 instructors, employed at a private university in America.  Data were collected from surveys, administered before and after the bank was developed and utilied, coupled with interviews, conversations, and document analysis. 

 

Initially, the bank comprised four clusters of over 100 feedback comments.  Specifically

 

  • the first category of comments related to the comments that students posted on discussion boards

  • the second category of comments related to the digital presentations that students needed to complete

  • the third category of comments related to written assignments

  • the final category of comments addressed concerns about grammar and style, such as APA format.

 

The comments were developed carefully, with consideration of tone and perspective.  The instructors could insert these comments into assignments, while grading online, and could also download the comments if they wanted.  They could search relevant comments as they graded the assignments.   The institution utilized both Google tools as well as an accessible website, www.thefeedbackbank.com, to host and to extend the bank of feedback comments.  The institution encouraged the academics to share feedback about these comments as well as to submit additional comments.  Instructors who wanted to use this bank attended three online webinars, lasting 30 minutes each, before this initiative was launched, four weeks after commencement, and eight weeks after commencement of this initiatives to discuss the bank of comments.  

 

Surveys revealed the bank of feedback comments was effective.  The extent to which instructors felt confident in their online teaching, called online teaching efficacy, and online grading, called online grading efficacy, improved significantly as a consequence of this initiative.  However, the degree to which instructors collectively felt their role improves student learning, called collective teacher efficacy, did not improve significantly. 

 

The qualitative data revealed many benefits of this bank of feedback comments.  For example

 

  • the staff perceived this bank as useful, eliciting some positive feelings, diminishing the negative feelings, such as the exhaustion, of marking, and greatly saving time

  • the participants especially valued the idea of including some images in the bank as well

  • they felt the search feature to identify the comments was useful

 

Besides these improvements in attitudes and efficiency, the bank also enhanced the quality of feedback.  That is, participants recognized their exposure to other comments extended their knowledge of when, where, and how to deliver feedback.  Likewise, the staff felt the bank diminished inconsistencies across instructors as well.  Finally, participants felt the evolution of this bank enabled instructors to develop meaningful feedback—as well as to personalize the feedback more efficiently. 

 

References

  • Barker, T. (2011). An Automated Individual Feedback and Marking System: An Empirical Study. Electronic Journal of E-Learning, 9(1)

  • Biggam, J. (2010). Using automated assessment feedback to enhance the quality of student learning in universities: A case study. In International conference on technology enhanced learning (pp. 188-194). Springer, Berlin, Heidelberg.

  • Buckley, E., & Cowap, L. (2013). An evaluation of the use of Turnitin for electronic submission and marking and as a formative feedback tool from an educator's perspective. British Journal of Educational Technology, 44(4), 562-570.

  • Burrows, S., & Shortis, M. (2011). An evaluation of semi-automated, collaborative marking and feedback systems: Academic staff perspectives. Australasian Journal of Educational Technology, 27(7).

  • Elkins, D. M. (2016). Grading to learn: An analysis of the importance and application of specifications grading in a communication course. Kentucky Journal of Communication, 35(2), 26–48

  • Graham, A. I., Harner, C., & Marsham, S. (2022). Can assessment-specific marking criteria and electronic comment libraries increase student engagement with assessment and feedback? Assessment and Evaluation in Higher Education, 47(7), 1071–1086.

  • Hattie, J., & Clarke, S. (2019). Visible learning feedback. New York, NY: Routledge

  • Hattie, J., Crivelli, J., Van Gompel, K., West-Smith, P., & Wike, K. (2021). Feedback that leads to improvement in student essays: Testing the hypothesis that “where to next” feedback is most powerful. In Frontiers in Education (p. 182). Frontiers.

  • Hattie, J., Fisher, D., & Frey, N. (2016). Do they hear you? Educational Leadership, 73(7), 16–21.

  • Henderson, P. (2008). Electronic grading and marking: A note on Turnitin’s GradeMark or Studio Feedback function. History Australia, 5(1).

  • Henderson-Brooks, C. (2016). Grademark: Friend or foe of academic literacy? Journal of Academic Language and Learning, 10(1), A179-A190.

  • Özbek, E. A. (2016). Plagiarism detection services for formative feedback and assessment: Example of turnitin. Journal of Educational and Instructional Studies in the World, 6(3), 64–72.

  • Sawdon, S., & Curtis, F. (2011). Evaluation of GradeMark for electronic assessment & feedback: Staff and students’ perspectives.

  • Schneider, J. (2021). Web-based comment banks as support for the online grading feedback process. International Journal, 15(2), 87-104.

  • Shute, V. J. (2008). Focus on formative feedback. Review of educational research, 78(1), 153-189.

  • Setiawati, E., Perdhani, W. C., & Budiana, N. (2020). Using Turnitin feedback studio through pedagogy approaches. Journal of Innovation and Applied Technology, 2020, 52-58.

  • Turnitin (2022).  The powerful direction of “Where to next?” feedback.  A white paper

  • Watkins, D., Dummer, P., Hawthorne, K., Cousins, J., Emmett, C., & Johnson, M. (2014). Healthcare students’ perceptions of electronic feedback through GradeMark®. Journal of Information Technology Education. Research, 13.

White Structure

 Co-design of assessments

by Simon Moss

Introduction

Typically, educators, such as coordinators, lecturers, and tutors, set the assessments, design the criteria or rubric to grade the assessments, and then mark the assessments themselves.  More recently, however, educators have granted students opportunities to contribute to this procedures.  For example

 

  • students might design some features of the assessments —such as write some of the multiple-choice questions on exams or the essay topics

  • students might be able to choose which of several assessments they will complete

  • students might refine the marking criteria

  • students might grade the work of peers as well as negotiate with educators to finalize the mark

 

Educators might choose to co-design assessments because of many reasons.  For example, they might feel

 

  • students tend to learn more effectively when they actively contribute rather than passively listen to content

  • when students create assessments, such as multiple-choice questions, they actually need to invoke and integrate more information than when they answer questions (Palmer & Devitt, 2006)

  • when students construct multiple-choice questions, and thus contrive some false answers, they need to possible misconceptions—and, therefore, become more attuned to the nuances that differentiate accurate and inaccurate information.  

 

This co-design of assessments is consistent with many of the pedagogies that educators often espouse.  For example, proponents often invoke constructivism to justify this approach (Casey et al. 2014).  According to exponents of this pedagogy, individuals do not merely passively store the information they see or hear but actively construct knowledge from the gamut of experiences in the learning environment.  Students are more likely to adopt this active role when they co-design assignments.   

 

Likewise, this co-design of assessments is consistent with the notion of assessment for learning.  According to this notion, assessment is not merely a means to gauge learning but also an opportunity to engage students and thus facilitate learning (e.g., Deeley & Bovill, 2017).

 

Indeed, studies do tend to vindicate the benefits of co-design. Yu and Liu (2008) instructed some civil engineering students to construct a series of multiple-choice questions and other civil engineering students to answer a series of multiple-choice questions.  Subsequently, the educators assessed the capacity of students to apply methods and approaches that facilitate learning, called meta-cognitive strategies.  If students had constructed, rather than merely answered, multiple-choice questions, their meta-cognitive strategies were more likely to improve.  That is, they were more likely to understand which strategies will facilitate their learning.     

Case study: Construction of multiple-choice questions

Students can undertake a range of activities to participate in the design of assessments.  Despite this range, one activity is perhaps the most common: the construction of multiple-choice questions.  That is, many instructors prompt students to construct multiple-choice questions about the topics they need to study.  Occasionally, the final exam includes a portion of these questions. 

 

To illustrate, Doyle et al. (2019) conducted a study in which undergraduate students, completing a module on tax, constructed multiple-choice questions.  Their assignment was worth only 5% of the overall grade on this module, to diminish the likelihood of plagiarism or cheating.  To construct a question, students needed to write

 

  • the stem or question

  • the correct answer and incorrect answers

  • a justification of the correct answer, usually comprising a series of computations.

 

Students were divided into distinct clusters, each corresponding to one of the 12 topics in this module. They wrote the questions within the week after learning the topic in class.  To facilitate this goal, students received some information on how to write effective questions, such as how to prevent ambiguity.  The participants submitted the questions to the lecturer, and the lecturer refined and then posted the questions onto the virtual learning platform.  In addition, 5 of the 159 questions on the final exam were derived from this pool.    

 

To assess their attitudes towards this exercise, students later completed a survey.  As this survey revealed

 

  • 64% of students felt they understood the topic better after they completed this exercise

  • 60% of students indicated they enjoyed or greatly enjoyed this exercise

  • 89% of students perceived the multiple-choice questions as useful

  • 70% of students agreed the assignment should be worth 5%—and the rest evening divided over whether the assignment should be worth more or less than 5%

  • 93% of the students completed this assignment

 

Students also expressed the belief the assignment prompted careful deliberation and seemed creative and interesting. They also perceived the questions as a helpful resource to assist their studies later.

 

Not all the comments were supportive, however. Many students would have liked more guidance on how to construct a question and more feedback on the questions they constructed.  Other students understood the task but could not readily develop useful questions.  Some participants felt that some topics were harder than other topics to generate questions.   

 

The instructors also expressed a range of opinions about the approach.  Although the exercise was effective, some complications had not been anticipated—such as the observation that students would often submit assignments but forget to indicate the answer or even their name.  Instructors also needed to dedicate significant time to improve the grammar and precision of questions.    

 

Case study: Use of Peerwise

In the study that Doyle et al. (2019) reported, the students merely emailed the questions they developed.  However, many educators now use a website, called Peerwise, available at http://peerwise.cs.auckland.ac.nz at no cost, to manage these questions more effectively.  Peerwise to write questions, to share these questions, as well as to evaluate, improve, and attempt the questions that peers have constructed.  In 2022, the website stored over 6 million multiple-choice questions, often accompanied by annotations or explanations

 

To use Peerwise, the instructors merely enter their contact details to open a free account.  Once this account is activated, the instructors enter the main topics.  Most instructors will construct a few multiple-choice questions themselves, primarily to model the format and style of questions they want to encourage.  Then, instructors can encourage students to log in and select the course.  Students next choose one of three options: your questions, answered questions, and unanswered questions.  Specifically

 

  • if students choose “your questions”, a table appears, in which each row displays the questions and alternatives these students have written and enables individuals to add more questions

  • if students choose “answered questions”, another table appears, in which each row presents details about their questions, such as the number of answers or comments by peers and the perceived difficulty of these questions, as rated by peers

  • if students choose “unanswered questions”, a final table appears, presenting questions that peers have written and some information about these questions—such as the topic or difficulty.

 

Whenever students answer questions, they receive instantaneous feedback and information.  This feedback and information will tend to include

 

  • an explanation of the answer, as written by the person who constructed the question

  • the comments or feedback of other students about this question

  • a histogram that displays the frequency with which other students chose the various alternatives

  • questions that prompt students to rate the quality and difficulty of each question

 

Most intriguingly, to encourage participation, PeerWise assigns a score to each student, indicative of their contribution level.  This score gamifies the experience and thus can motivate students (Bottomley & Denny, 2011). Specifically, the score that students are assigned increases automatically, according to an algorithm, whenever

 

  • their peers rate their questions highly

  • their peers agree with how they rated other questions—indicating they evaluate other students fairly

  • their peers correctly answer one of their questions

 

Many studies have explored the benefits of Peerwise (e.g., Kay et al., 2020).  These studies tend to underscore the benefits of this platform (e.g., McClean, 2015; McQueen et al., 2014; Rhodes, 2015).  Bottomley and Denny (2011), for example, explored the responses of students to Peerwise.  In this case study, 107 Australian university students, enrolled in biomedical science, utilized Peerwise to construct, share, discuss, and practice multiple-choice questions.  Specifically, the instructors encouraged students to

 

  • construct four or more of these multiple-choice questions that relate to the various topics

  • evaluate and share feedback on eight questions that peers had written

 

The score that PeerWise assigned to each student affected their overall grade in this unit.  Specifically, up to 10% of their grade was dependent on this score.

 

In this study, the researchers appraised the accuracy and quality of questions as well as attempted to uncover instances of plagiarism.  To uncover instances of plagiarism

 

  • the instructors identified questions that included terms or phrases that are seldom used in this nation or by students at this level—or were similar to questions the instructor had read before

  • the instructors identified questions that were written in a font that deviated from the default, indicative of copying and pasting

  • the instructors then attempted to trace the source of these questions

 

This analysis revealed that most of the questions that students constructed were high in quality.  For example

 

  • 87% of the questions were deemed as clear

  • 88% of the questions were devoid of errors

  • for 81% of the questions, the incorrect alternatives, the distractors, were feasible

  • for 64% of the questions, the correct answer was justified; in another 30% of the questions, the student had explained why the incorrect answers were false.

  • the instructor rated about 80% of the questions as fair or good rather than as very poor, poor, very good, or excellent, suggesting the questions were solid but not exemplary

  • the possibility of plagiarism was detected in only 3.4% of instances

 

The instructor also rated the questions on the Revised Bloom taxonomy.  About 56% of the questions assessed memory, 35% of the questions assessed understanding, and 9% of the questions assessed application.  Few of the questions assessed analysis, evaluation, or creation.

 

The students also completed a survey, designed to appraise their experience with the tool.  As these results showed

 

  • according to 70% of students, attempts to construct a question reinforced their understanding of the materials

  • according to 74% of students, attempts to answer a question that peers had constructed reinforced their understanding of the materials

  • about 62% of students would like to use PeerWise in another course

 

Interestingly, the correlation between the PeerWise score of students and their total grade for other assignments was .48.  This high correlation indicates that either contributions to PeerWise facilitated learning or the top students also perform well on PeerWise.

 

The survey, however, did uncover a couple of challenges.  For example, according to some participants, students tend to rate questions harshly if their answer was incorrect.  In addition, some of the questions are replete with spelling and grammatical errors.  Galloway and Burns (2015) reported similar concerns: students occasionally rated questions harshly and sometimes wrote questions ungrammatically. 

 

Case study: Contribution to marking criteria

Rather than merely write multiple-choice questions, students may undertake other activities to contribute towards the assessments.  Meer and Chapman (2014) investigated how undergraduate students, enrolled in Business and Management, can participate in the development of marking criteria—to optimize these criteria, to help these students understand these criteria, and to foster a sense of empowerment. In this project, students and academics collaborated to develop and to refine the marking criteria. 

 

Specifically, students completed an assignment in which needed to present a speech about which skills are deficient in the cohort.  The students were divided into four teams, each comprising six individuals.  To enhance the marking criteria

 

  • the lecturer facilitated a discussion with students about the marking criteria on this assignment

  • over the next week, each team of students attempted to rewrite the marking criteria, using their own words

  • the teams then convened to integrate their work

  • in a subsequent class, the lecturer facilitated another discussion to combine the original marking criteria with the criteria that students had constructed

  • this blended set of marking criteria was used to grade the assignments.

 

To illustrate the impact of this approach, originally, the first criterion was “Introduction and planning of the seminar”.  In addition

 

  • to receive over 70%, students would need to construct a “Clear and detailed plan of the training seminar, aims and timings given. Self-introductions and confidence shown. Excellent flow of training seminar between trainers and topics. Evidence of a great deal of preparation”.

  • However, according to the students, to receive over 70%, the assignment needs to be “Well planned, considered and thought out.  Practiced and good flow”

  • Ultimately, after these sets were combined, to receive over 70%, the assignment needs to be “Clear and detailed plan, introductions, aims and timings given well. Well prepared, excellent flow between trainers and engaged delegates”

  • Hence, the criteria were simplified, but without sacrificing key details and nuances.

 

Besides co-design of the marking criteria, this approach also comprised collaboration on the assessment.  That is, after each team presented their speech, they received an assessment from the lecturers, an assessment from all peers in the class, and a self-assessment.  Each team then negotiated with the lecturer to determine the final grade on this assignment. 

 

As subsequent focus groups and surveys revealed, some participants felt they understood the marking criteria in more depth than ever before.  The marking criteria, originally complicated, was more concise and unambiguous after the participation of students.

 

Challenges of co-design

Despite the benefits of this approach, researchers have acknowledged that co-design of assessments can raise some challenges.  For example, external professional bodies often regulate the assessments in some disciplines, such as law, accounting, and medicine.  These regulations might constrain the assessments and preclude co-design or at least limit the role of co-design (Deeley & Bovill 2017).

 

Furthermore, not all students appreciate the opportunity to construct multiple-choice questions.  In one study, conducted by Palmer and Devitt (2006), some Australian students, enrolled in medicine, constructed multiple-choice questions about surgical attachment.  As this study revealed

 

  • the students were, in general, able to develop suitable multiple-choice questions

  • however, compared to peers who did not construct these questions, students who constructed these questions did not receive significantly better grades on a subsequent assessment

 

Interestingly, in general, before they attempted to construct these questions, students felt this method was less stimulating than other modes of learning, such as tutorials.  After they attempted to construct these questions, their attitudes to this method improved, but they still preferred other approaches to learning.  Arguably, this construction of questions felt unfamiliar to students.   

 

References

  • Arthur, N. (2006). Using student-generated assessment items to enhance teamwork, feedback and the learning process. Synergy, 24, 21-23.

  • Bottomley, S., & Denny, P. (2011). A participatory learning approach to biochemistry using student authored and evaluated multiple‐choice questions. Biochemistry and Molecular Biology Education, 39(5), 352-361.

  • Bovill, C., Cook-Sather, A., Felten, P., Millard, L., & Moore-Cherry, N. (2016). Addressing potential challenges in co-creating learning and teaching: Overcoming resistance, navigating institutional norms and ensuring inclusivity in student–staff partnerships. Higher Education, 71(2), 195-208.

  • Chatterjee, S., Rana, N. P., & Dwivedi, Y. K. (2022). Assessing consumers' co-production and future participation on value co-creation and business benefit: An FPCB model perspective. Information Systems Frontiers, 24(3), 945-964.

  • Colson, N., Shuker, M. A., & Maddock, L. (2021). Switching on the creativity gene: a co-creation assessment initiative in a large first year genetics course. Assessment & Evaluation in Higher Education.

  • Deeley, S. J., & Bovill, C. (2017). Staff student partnership in assessment: enhancing assessment literacy through democratic practices. Assessment & Evaluation in Higher Education, 42(3), 463-477.

  • Doyle, E., Buckley, P., & Whelan, J. (2019). Assessment co-creation: An exploratory analysis of opportunities and challenges based on student and instructor perspectives. Teaching in Higher Education, 24(6), 739-754.

  • Fellenz, M. R. (2004). Using assessment to support higher level learning: The multiple-choice item development assignment. Assessment & Evaluation in Higher Education, 29(6), 703-719.

  • Galloway, K. W., & Burns, S. (2015). Doing it for themselves: students creating a high-quality peer-learning environment. Chemistry Education Research and Practice, 16(1), 82-92.

  • Hardy, J., Bates, S. P., Casey, M. M., Galloway, K. W., Galloway, R. K., Kay, A. E., ... & McQueen, H. A. (2014). Student-generated content: Enhancing learning through sharing multiple-choice questions. International Journal of Science Education, 36(13), 2180-2194.

  • Kay, A. E., Hardy, J., & Galloway, R. K. (2020). Student use of PeerWise: A multi‐institutional, multidisciplinary evaluation. British Journal of Educational Technology, 51(1), 23-35.

  • McClean, S. (2015). Implementing PeerWise to engage students in collaborative learning. Perspectives on Practice and Pedagogy, 6, 89-96.

  • McQueen, H. A., Shields, C., Finnegan, D. J., Higham, J., & Simmen, M. W. (2014). PeerWise provides significant academic benefits to biological science students across diverse learning tasks, but with minimal instructor intervention. Biochemistry and Molecular Biology Education, 42(5), 371-381.

  • Meer, N., & Chapman, A. (2014). Co-creation of marking criteria: students as partners in the assessment process. Business and management education in HE.

  • Palmer, E., & Devitt, P. (2006). Constructing multiple choice questions as a method for learning. Annals-academy of medicine Singapore, 35(9).

  • Rhodes, J. (2015). Using peerwise in nursing education-a replicated quantitative descriptive research study. Kai Tiaki Nursing Research, 6(1), 10-15.

  • Yu, F. Y., & Liu, Y. H. (2008). The comparative effects of student question-posing and question-answering strategies on promoting college students’ academic achievement, cognitive and metacognitive strategies use. Journal of Education and Psychology, 31(3), 25-52.

White Structure

Discussion forums in tertiary education

by Simon Moss

Introduction

To enhance learning during online courses, many lecturers, instructors, and teachers introduce activities to promote discussion and collaboration between students.  Discussion forums, common in most learning management systems such as Blackboard, is one of the most common techniques to achieve this goal. 

 

Although ubiquitous, some researchers have questioned the utility of discussion forums.  For example, some researchers have shown that discussion forums are useful only when the educators actively contribute to the platform and guide students appropriately (Darabi et al., 2013).  Likewise, many students also feel that educators should configure these experiences appropriately.  To illustrate, as Mazzolini and Maddison (2007) revealed, students tend to feel that educators who facilitate online discussion forums should

 

  • answer questions as rapidly as possible—but then prompt further discussion and contemplation by posing additional questions

  • suggest and inspire novel concepts, perspectives, or approaches to contemplate a problem or topic

  • deliver feedback, such as praise as well as extend or elaborate the answers of students, introducing other dimensions or facets

 

The benefits of discussion boards partly depend on the degree to which students participate.  Nandi et al. (2012) differentiated three levels of participation.  In particular, some participants of discussion boards are lurkers, who read messages but do not post messages.  Some participants, in contrast, post their opinions or perspectives but do not interact with other students.  Finally, some participants do interact with other students, commenting on other posts and thus generating conversation.   

 

As a detailed case study and analysis of a discussion forum on Blackboard revealed (Nandi et al., 2012), students who contribute to discussion boards tend to ask questions and to answer questions.  The answers can be divided into several kinds, such as answers that include recommendations or tips, answers that refer to personal experiences, answers that include an example, and answers that include some justification, proof, or evidence, and answers that are personal opinions rather than substantiated justifications.   

 

Impact of discussion boards: Course grades

Discussion boards can generate a range of benefits.  Discussion boards may inspire students to engage critically with the material as well as foster a sense of belonging or community, promoting collaboration. 

 

For example, some research indicates that discussion boards, in which students collaborate asynchronously, are more effective than instant messaging, in which students tend to interact simultaneously.  Hernández-Lara et al. (2021) compared online discussion forums and instant messaging on the interactions between students, enrolled in a business course online.  That is, this study was designed to compare the communication patterns between asynchronous communication, epitomized by discussion boards, and synchronous communicated, epitomized by instant messaging. 

 

In this study, the participants were Spanish undergraduate students, enrolled in business and management courses.  These individuals were also participants of an online business simulation game, called the Cesim Global Challenge, lasting four months, in which students need to manage a global technology enterprise in teams. To prevail during this challenge, the teams compete and participate in eight rounds of decisions on HR, R&D, taxation, corporate social responsibility, finance, and logistics after they receive information.   

 

To measure behavior and communication, the researchers measured performance on the game—such as shareholder return and profit, grades in the course, number of words over time, number of messages over time, and content of these messages.  Interestingly, students exposed to asynchronous discussion groups outperformed students exposed to more synchronous instant messaging on performance in the simulation and grades in the course.

 

As further analysis showed, when the communication was asynchronous, the interactions were more likely to revolve around the tasks and learning objectives than social or other peripheral topics.  Conversely, when individuals communicated synchronously, they tended to interact at night, perhaps explaining the reduced emphasis on the task.  Likewise, when individuals communicated synchronously, they often felt the need to respond immediately—a feeling of pressure that could also stymie contemplation

 

Impact of discussion boards: Examples of limited impact on course grades

Despite these apparent benefits, some research has shown that participation in discussion boards does not always significantly improve grades.  For example, in one study, Song and McNary (2011) examined the comments of students towards the start, middle, and end of a semester—on topics such as teaching philosophies and emerging technologies.  The educators posted questions to guide these forums.  The researchers utilized a taxonomy to code the function of each post, such as disagree, conciliate, explain, summarize, appreciate, clarify, elaborate, request information, and coordinate the team.  This analysis revealed some comments, such as confirming or suggesting, were common initially but dissipated over time. 

 

However, the number of entries that students posted was not significantly correlated with course grades.  As the researchers suggested, in undergraduate courses, the number of posts is often positively correlated with course grades (e.g., Kay, 2006).  In postgraduate course, however, variability in the participation of students in discussion boards or course grades might be diminished, compromising statistical power.  Alternatively, many of the posts were not relevant to the learning objectives of this course, suggesting that some of the discussions may be unproductive.

 

Ramos and Yudko (2008) uncovered comparable findings.  Their study revolved around the performance of students, residing in Hawaii, but studying online, enrolled in psychology courses, at a liberal arts college.  The researchers collated information on

 

  • the number of entries that students posted on the discussion board

  • the number of times the students viewed the entries that peers had posted on the discussion board

  • the number of times students viewed the content pages in the learning management system

  • the average score on the exams, as a measure of learning and performance.

 

As the findings revealed

 

  • all three metrics—number of discussion posts, views of discussion posts, and views of content pages—were positively and moderately correlated with average score on the exams

  • however, only views of content pages remained significant after the three metrics were included in the same regression analysis.

 

These findings suggest that submitting discussion posts and viewing these posts do not appreciably enhance performance, at least not after controlling views of content pages.  The researchers ascribe these results to some potential limitations of discussion boards.  In particular, performance mainly depends on the degree to which students consider the material carefully and construct their own perspective rather than merely the duration of participation. Posts on discussion boards might not promote this careful analysis or construction of knowledge unless educators provide suitable feedback, encourage collaboration, and challenge the students.  Rather than ignite careful analysis and consideration, submitting and reading posts might distract students from more efficient sources of learning, such as content pages.      

 

Challenges experienced during discussion forums

To inform educators on how to manage and configure these discussion forums, some researchers have explored the challenges that students experience while they navigate these platforms.  To illustrate, Apps et al. (2019) uncovered some of the challenges that students experience when they complete assignments that revolve around either discussion forums or individual journals.  Specifically, in this study, the participants completed one of three assignments, depending on the course in which they were enrolled.  Specifically

 

  • to complete the first assignment, students wrote entries, each comprising at least 150 words, on a discussion forum in response to weekly prompts about the topic around contemporary popular music; students were not, however, obliged to comment on the entries that other students posted

  • to complete the second assignment, students were instructed to post eight commentaries, comprising at least 300 words, about the film and readings presented each week; students were permitted to comment on the posts of peers

  • to complete the third assignment, students wrote three entries, each comprising 500 or more words, on a question that relates to justice and punishment; students, however, did not have to post these entries publicly.

 

After engaging in these tasks, 34 students participated in interviews, designed to characterize their experiences with these assignments.  The students reported a range of challenges.  For example, some of these challenges revolved around confidence.  To illustrate

 

  • 6 students felt uncomfortable posting their work publicly, because they felt their reflections and contemplations should be private; similarly, 8 students felt their knowledge of this topic was inadequate, compromising their confidence in these postings

  • 14 students felt unconfident because the task was unfamiliar and the constraints or expectations seemed vague

 

Other challenges revolved more around their motivation. Specifically

 

  • 8 students did not like the design—such as the instruction to post many short entries or blogs, but prefer longer assignments that maintain their attention and focus

  • 7 students could not readily manage their time or remember to post regularly

  • 4 students perceived assignments as repetitive, because they felt they had to write an essay each week

  • 5 students did not like the notion they would post each week but that only the overall assignment was graded at the end

 

Students, however, uncovered a range of strategies to address these challenges.  For example, students would often scan the posts of peers and examples of tutors to clarify the expectations of this task and overcome uncertainty

 

The role of educators: Benefits and drawbacks

As many commentators argue and as many studies have revealed, the benefits of discussion groups on learning and collaboration depend appreciably on the role and practices of educators and instructors.  Yet, the interjections and interventions of educators can both facilitate and impede discussion forums.  To illustrate, Belcher et al (2015) conducted a study to explore whether the various attempts of educators to promote critical thinking in discussion forums did indeed promote critical thinking.  To fulfill this goal, the researchers analyzed close to 20 000 responses or interactions on discussion boards.  Specifically, they applied the interaction analysis model to evaluate the level of learning or thinking that each response epitomized, such as sharing, exploring dissonance, co-constructing meaning, testing, and apply co-constructed knowledge. In addition, the researchers explored how the level of learning or thinking was related to the previous behavior of educators on the discussion forum.   

 

The researchers uncovered some behaviors of educators that fostered more critical learning or thinking, such as genuine compliments of posts, summaries of posts, frequent responses, and clarification of the subject matter.  Conversely, the researchers also uncovered some behaviors that impaired critical thinking, such as sporadic feedback; however, these associations were not especially pronounced, suggesting only a limited role of these interjections or interventions by educators.

 

After conducting a detailed case study and analysis of a discussion forum on Blackboard, Nandi et al. (2012) recommended that instructors or educators, when facilitating a discussion group, should fulfill five distinct roles.  Specifically, these instructors or educators should

 

  • optimize the design and administration, such as impart specific guidelines about how to complete the tasks as well as provide rubrics or templates to clarify their expectations

  • apply suitable pedagogy to promote learning, primarily by posing challenging questions and problems, intervening to guide and to extend an important discussion, as well as encouraging students to apply their knowledge to practice, connect distinct information, or inspire other kinds of deep learning.

  • provide technical assistance when needed

  • deliver feedback, answers, and examples to optimize the discussion

  • foster a sense of community or belonging

 

The role of educators: Strategic practices

According to Darabi et al. (2013), discussion forums enhance learning only when educators offer enough guidance to encourage students to contemplate and to apply the knowledge they learn, called strategic discussion groups.  That is, as this perspective implies, when educators arrange discussion forums, they should

 

  • participate in the discussion and model suitable responses rather than merely post questions

  • introduce a format, template, or rubric to guide these discussions

  • offer some training or orientation on how to participate in these discussions

  • occasionally encourage smaller teams of students to discuss a topic in depth

  • offer frequent and specific feedback, including praise, usually with a supportive tone

 

To validate this perspective, Darabi et al. (2013) conducted a meta-analysis of eight publications, designed to explore whether these more strategic attempts to foster critical thinking and contemplation enhance learning more than unfettered online discussion.  The meta-analysis revealed that strategic discussion groups, compared to discussion groups that did not include this guidance, were more likely to enhance various indices of learning. The benefits of strategic discussion groups were moderate.  Yet, these strategic discussion groups were particularly effective when

 

  • the discussions were asynchronous

  • the participants were enrolled in university—either as undergraduate or postgraduate students—rather than in high school

  • the discussion revolved around how to apply the knowledge in practice rather than to elaborate, justify, or substantiate a topic

  • the educators included many features to guide the discussion, such as rubrics, an orientation, teamwork, and regular feedback

 

Likewise, Balaji and Chakrabarti (2010) explored some of the features of discussion forums that promote interaction.  In particular, 227 MBA students completed a survey.  The survey assessed the degree to which conversations on the discussion forum were perceived as interactive, engaging, and productive rather than merely a series of unrelated messages.  In addition, the survey assessed the degree to which participants felt the discussions on this forum enhanced their curiosity and learning.  Finally, the survey assessed a range of characteristics and conditions that could foster more interactive conversations, such as

 

  • the familiarity of students with discussion forums and similar tools

  • the degree to which the students were extraverted and outgoing rather than reserved

  • the extent to which the instructor motivated students to contributed and guided the conversations effectively

  • the degree to which the students felt they belong to a supportive community of peers

  • the extent to which the discussion forums prompted students to reflect carefully

  • the extent to which the discussion forums enabled students to learn material that is relevant to their needs

  • the extent to which unambiguous and helpful rubrics guide the assessments of these discussions

  • the degree to which the discussion forum was perceived as a suitable medium to communicate

 

In general, as these results showed, many of these conditions enhanced the degree to which the discussions were interactive, engaging, and productive.  Specifically, the interactions were more interactive, engaging, and productive if

 

  • students felt they belong to a supportive community of peers

  • the instructor motivated students to contributed and guided the conversations effectively,

  • unambiguous and helpful rubrics guided the assessments of these discussions, and

  • the discussion forum was perceived as a suitable medium to communicate, consistent with the theory around media richness.

 

As these results indicate, educators should not only inspire students to contemplate the materials carefully—but should also foster a sense of support and belonging among peers as well as encourage individuals to use many artefacts, such as pictures and videos, to facilitate communication.  When the interactions were more interactive, engaging, and productive, students were more likely to feel they learned the material effectively.

 

The role of educators: Clarity of expectations

As many researchers propose (Apps et al., 2019; Vlachopoulos & Makri, 2019), students often feel uncertain of how to contribute to discussion boards, and thus educators must address this uncertainty and clarity their expectations.  To achieve this goal, Nandi et al. (2012) proposed a rubric that educators could utilize to evaluate the quality of interactions between students on discussion boards.  Specifically, Nandi et al. delineated exemplars of poor, satisfactory, good, and excellent instances of asking questions, answering questions, justifications, clarifications, critical discussions of contributions, ideas, opinions, feedback, sharing knowledge, and engagement of participants.  To illustrate excellent responses

 

  • when answering questions, students should post detailed answers, with examples, suggesting alternative solutions if appropriate

  • when justifying a position or opinion, students should refer to relevant cases, studies, concepts, or theories and show the implications of this evidence

  • when clarifying or explaining a concept, students should extend the ideas that other peers had posted, perhaps with reference to examples

  • when critically evaluating a post, students should advance the ideas of peers and acknowledge the comments of these individuals in their own posts

  • when developing ideas, students should attempt to adopt a collaborative approach and integrate many of the solutions posted.

  • when posting opinions, students should express perspectives that demonstrate and utilize some knowledge of the topic, prompt discussion, and acknowledge limitations in their opinions—such as the notion this post is merely an opinion rather than verified

  • when delivering feedback, students should offer detailed, justified comments

  • when sharing knowledge, students should disclose personal experiences or other credible sources together with examples of relevant problems and solutions

  • students should also encourage peers to participate and assist one another by fostering a supportive atmosphere

 

The role of educators: Encourage peer review

Some researchers, such as Darabi et al. (2013), recommend that discussion forums are especially beneficial when students collaborate in smaller teams.  Rather than smaller teams, students might also work in pairs as well.  For example, to enhance the benefits of discussion forums, educators should encourage pairs of students to review the work of one another.    

 

Research indicates this peer review might be useful.  For example, Madland and Richards (2016) explored the perceptions of students towards a scheme called study buddies.  In this scheme, pairs of students reviewed the work of one another.  To facilitate this scheme, students identified buddies who exhibit similar preferences, such as the tendency to complete work early or to complete work just before deadlines.  In this scheme, students could earn up to 5% if they review four assignments their buddy completed and submit a reflection of this review, comprising 1 to 2 pages. 

 

Qualitative data, derived from a survey, revealed that students tended to value this activity. This scheme encouraged students to contemplate the material more deeply and carefully.  Almost 90% of buddies felt the activity was worthwhile and should be included in graduate courses.  The only students who did not value the experience felt the buddy was not committed or compatible in their study practices.   

 

This scheme could, in principle, be extended to discussion forums.  That is, in one class, all students might contribute to the same discussion forum—but pairs of students would be assigned the task to deliver feedback on the posts of one another.

 

Recommendations

Several researchers, such as Vlachopoulos and Makri (2019), have outlined a series of recommendations that educators should follow to optimize the benefits of discussion forums. Specifically, as past research implies, educators should introduce a range of practices to motivate participation and engagement.  For example

 

  • to promote motivation, educators could introduce engaging tasks—such as games or simulations—into the discussion groups

  • the discussion board should not be mandatory because otherwise the task is perceived as a chore and thus may deter engagement

  • yet, to encourage participation, educators could dedicate a few marks to this forum, such as 5%--enough to promote contributions but not compromise engagement

  • educators should also integrate the discussion group with synchronous workshops; for example, during a tutorial, they might prompt students to contribute towards the discussion group

 

To initiate the discussion board, educators should

 

  • convey specific guidance—such as explain the expectations, model suitable responses, and organize small teams

  • prepare threads about specific topics as well as a common thread that is relevant to all concerns

  • then gradually withdraw and enable students to develop their own communities and explore the topics themselves

  • encourage a few students to assume leadership roles—such as ask these individuals to organize discussion topics or introduce scenarios or problems to discuss

 

Nevertheless, because participation often dissipates over time, educators should maintain regular surveillance of these discussion boards.  For example, they should occasionally

 

  • pose questions that encourage students to apply their knowledge in practice and to integrate knowledge from distinct topics

  • guide students on which topics they should learn and highlight the importance of these topics to their future

  • utilize a supportive, enthusiastic tone to impart feedback and to encourage participation

  • assume the role of a student or peer, striving to collaborate and to resolve a problem or understand a topic like the other students

 

References

  • Apps, T., Beckman, K., Bennett, S., Dalgarno, B., Kennedy, G., & Lockyer, L. (2019). The role of social cues in supporting students to overcome challenges in online multi-stage assignments. The Internet and Higher Education, 42, 25–33.

  • Balaji, M. S., & Chakrabarti, D. (2010). Student interactions in online discussion forum: Empirical research from “media richness theory” perspective. Journal of Interactive Online Learning, 9(1)

  • Belcher, A., Hall, B. M., Kelley, K., & Pressey, K. L. (2015). An analysis of faculty promotion of critical thinking and peer interaction within threaded discussions. Online Learning Journa

  • Borokhovski, E., Tamim, R., Bernard, R. M., Abrami, P. C., & Sokolovskaya, A. (2012). Are contextual and designed student–student interaction treatments equally effective in distance education? Distance Education, 33(3), 311–329.

  • Darabi, A., Liang, X., Suryavanshi, R., & Yurekli, H. (2013). Effectiveness of online discussion strategies: A meta-analysis. American Journal of Distance Education, 27(4), 228–241.

  • Hernández-Lara, A. B., Perera-Lluna, A., & Serradell-López, E. (2021). Game learning analytics of instant messaging and online discussion forums in higher education. Education+ Training

  • Kay, R. H. (2006). Developing a comprehensive metric for assessing discussion board

  • effectiveness. British Journal of Educational Technology, 37(5), 761-783.

  • Madland, C., & Richards, G. (2016). Enhancing student-student online interaction: Exploring the study buddy peer review activity. The International Review of Research in Open and Distributed Learning.

  • Mazzolini, M., & Maddison, S. (2007). When to jump in: The role of the instructor in online discussion forums. Computers & Education, 49, 193–213

  • Nandi, D., Hamilton, M., & Harland, J. (2012). Evaluating the quality of interaction in asynchronous discussion forums in fully online courses. Distance Education, 3(1), 5-30

  • Ramos, C., & Yudko, E. (2008). “Hits” (not “discussion posts”) predict student success in online courses: A double cross-validation study. Computers and Education, 50(4), 1174-1182

  • Song, L., & McNary, S. W. (2011). Understanding students' online interaction: Analysis of discussion board postings. Journal of Interactive Online Learning, 10(1).

  • Vlachopoulos, D., & Makri, A. (2019). Online communication and interaction in distance higher education: A framework study of good practice. International Review of Education, 65(4), 605-632.

  • Xing, W., Tang, H., & Pei, B. (2019). Beyond positive and negative emotions: Looking into the role of achievement emotions in discussion forums of MOOCs. The Internet and Higher Education, 43

White Structure

 Contract cheating

by Simon Moss

Introduction

Many tertiary education students commission another person or organization to complete their assignments, called contract cheating.  Contract cheating includes online essay mills or other assignment writing services, although the term can be applied to instances in which individuals complete an assignment to assist a friend or family member, often at no cost.  Thousands of websites offer this service (Rowland et al. 2017) and usually include provisions that guarantee anonymity.  Indeed, a systematic review of studies, published after 2014, indicate that about 16% of student admit, in anonymous surveys, they have engaged in contract cheating)—and this rate seems to be rising (Newton, 2018; for estimates in various nations, see Eaton, 2022; Foltýnek & Králíková, 2018; Shala et al., 2020).

 

Determinants of contract cheating

The prevalence of contract cheating depends on many of the characteristics and circumstances of students as well as the practices of institutions.  For example, students are more likely to engage in contract cheating whenever they believe this behavior is common (Bretag et al., 2019).  Consequently, Bretag et al. (2019) explored in which circumstances students and educators assume that contract cheating is common.  In a survey of over 14 000 students and over 1100 educators at Australian universities, these researchers showed that individuals are more likely perceive contract cheating as likely if

 

  • students speak a language other than English at home—especially if the assignment assesses skills in research and analysis or is heavily weighted

  • students felt dissatisfied with the teaching and support they received from the university

  • assignments were weighted heavily and assigned ambitious due dates—countering the belief that short timelines might prevent contract cheating

  • the assignments are not in-class tasks, personalized tasks, vivas or presentations, and reflections on placements

 

Other studies have explored the reasons that students express to explain why they engage or do not engage in contract cheating.  For example, Rundle et al. (2019) conducted a survey of over 1200 students.  The survey prompted students to indicate why they do not engage in contract cheating.  Specifically, participants specified, on a five-point scale, the degree to which various reasons explain their reluctance to cheat, such as “I am afraid of being punished”, “I don’t trust people to do my assignment well”, “I think it’s wrong and immoral”, “I would feel shame, guilt, or remorse”, “I can’t afford to pay someone to do assignments for me”, and “I want the sense of achievement in doing the work myself”. The survey also measured various personality traits that could affect the likelihood of this behavior, such as self-control, grit, and the personality traits of narcissism, Machiavellianism, and psychopathy.  Finally, the survey assessed the degree to which individuals felt their basic needs—of competence, autonomy, and relationships—were fulfilled.

 

As a factor analysis revealed, the reasons that students are reluctant to pursue contract cheating can be divided into five main reasons: fear of detection and punishment, self-efficacy and mistrust, morality and norms, limited opportunity, and motivation for learning.  Subsequent regressions uncovered some key insights:

 

  • students were more inclined to resist the temptation to cheat because of a motivation to learn if they reported high levels of self-control, persistence, and need fulfillment but low levels of Machiavellianism

  • students were more inclined to resist the temptation to cheat because this act felt immoral if they reported low levels of Machiavellianism and psychopathy but high levels of self-control and grit.

 

Characterization of essay mills and similar websites

To characterize the services that essay mills afford students, Medway et al. (2018) undertook a covert investigation.  Specifically, the researchers assumed the role of students and engaged with five essay mills in the UK and purchased two assignments.  To identify these mills, the authors googled “essay writing services UK” and “essay writing help UK”.   

 

In some instances, if students entered some basic details, such as the word length, level of study, the desired grade, and due date, they could receive an immediate quote.  Students could engage in live text chats with representatives of these sites.  

 

The researchers ordered three assignments.  For example, they ordered one assignment to be completed within 15 days, corresponding to a specific title, and generating a grade of 2:1 or B. They paid a fee of £250 and received an assignment two days before the deadline.  In another instance, in which the fee was higher, the assignment was delivered 9 days before the deadline and was accompanied with a plagiarism report.  However, in one instance, the assignment was never completed and the money was returned.

 

The researchers arranged 10 markers to grade the purchased assignments.  These assignments received grades that were similar to, but perhaps less than, the desired grade.  One assignment was written proficiently, included many sources, but did not answer the question effectively.  A quality report had also accompanied this assignment, indicating that another employee had checked the essay was written proficiently, fulfilled the instructions, and demonstrated critical thought.  The second assignment demonstrated greater understanding of the question but relied heavily on textbook sources.  Turnitin suggested that plagiarism was unlikely.     

 

Characterization of essay mills and similar websites: Attempts to target doctoral candidates

As Kelly and Stevenson (2021) revealed, many of the websites that supply assignments attempt to attract doctoral candidates and other graduate researchers, because theses are especially long and thus lucrative.  That is the researchers conducted a textual analysis of websites that target doctoral candidates. Specifically, these individuals first googled phrases like “PhD writing help” to identify 27 of these sites, such as essayassist.com, thesisrush.com, australianhelp.com, and australian-writing.com.  Six of the websites specialize in theses.  One website even argued that 2839 PhD experts were available to write a thesis.   

 

Next, the researchers attempted to extract common patterns or themes from the key words and content on these sites, uncovering four themes: balancing work and personal life, the complexity of doctoral academic writing, self-efficacy, and academic career progression.  They also conducted textual analysis to explore these themes in the websites.  To illustrate, many of the websites alluded to the conflict between thesis responsibilities and personal life—and that such websites could help students resolve this dilemma.  They depicted themselves as opportunities that enable candidates to fulfill important responsibilities in life.  Accordingly, these websites imply that students who are struggling to progress, often because of family responsibilities, might be especially tempted to cheat. 

 

Second, many of the websites referred to the complexity of this writing task.  They indicated an awareness that many candidates may have commenced their writing but now feel overwhelmed by the insurmountable and demanding task.  Arguably, exemplary writing services in universities should thus diminish the need of candidates to pursue contract cheating.   

 

Third, some of these websites challenged the confidence or self-efficacy of candidates.  They might include phrases that prompt candidates to question whether they are knowledgeable or skilled enough to complete a thesis.  The websites then indicate their services could allay these doubts. 

 

Finally, some of the websites indicated how their services could enhance the careers of these doctoral candidates.  For example, their services could help students differentiate themselves from peers or grant these individuals time to develop other skills or complete other projects too.

 

Characterization of essay mills and similar websites: Business models

As Ellis et al. (2018) revealed, these websites tend to adopt one of three business models.  First, in some instances, the writers are merely individual freelancers, working independently.  Some writers are graduates, living in a range of nations.  Some writers, such as Dave Tomar, are renowned. 

 

Second, some individuals develop a web presence to promote the services of other independent writers.  These businesses might approach other academics or graduates to become writers. The business then retains a percentage of the revenue that client orders generate.  Dave Tomar (2012) reported that, in his experience, these businesses will usually retain about half the revenue.

 

Third, some individuals then extend this model, promoting web applications that facilitate the transaction.  All the business processes are integrated into the website.  Therefore, the website does not merely promote independent writers but includes additional features, such as automatic quotes.  In some instances, potential writers can access a catalogue of orders.  Or these writers might receive automatic feeds or alerts, in which they are informed about assignments that match their capabilities.  The writer who responds first is usually assigned the order (for similar studies, see Sivasubramaniam et al., 2016).  

  

Interventions: Training to detect contract cheating

As some research indicates, after markers received training on how to detect contract cheating, their capacity to identify this cheating improves.  For example, in one study, conducted by Dawson and Sutherland-Smith (2019), academics in genetics, psychology, and nutrition received a batch of assignments in their discipline.  Their first task was to detect possible instances of contract cheating.  The researchers then identified the assignments in which markers had incorrectly designated as illegitimate—that is, examples of contract cheating—or incorrectly designated as legitimate

 

Next, markers attended a workshop, lasting three hours, that was dedicated to their academic discipline.  During these workshops, markers received the assignments that were often judged incorrectly.  They discussed their reasons why they felt the assignments were legitimate or not, then were informed whether the assignment was legitimate, and finally transcribe the lessons they learned from each of these discussions.  The workshops culminated with a set of indicators they could apply to identify these illegitimate assignments.  Finally, these markers received a second batch of assignments to mark.  After the workshop, markers were more likely to identify, rather than overlook, illegitimate assignments.  That is, sensitivity, but not specificity, improved significantly. 

 

Interestingly, the sites differed in how they justified the ethics of their endeavors.  Two of the mills did not discuss ethics.  Other mills discussed ethics in their FAQs.  For example, in response to the question of whether this activity can be deemed as cheating, one mill indicated they write the papers only for research purposes and, hence, the activity cannot be designed as cheating.  More intriguingly, two sites warned that students should not depict the essay they receive as their own work.  They should merely use this essay as a model, otherwise they are cheating and compromising the reputation of their profession and their career prospects.  Some of the essay mills reinforced this position during the live chats.

 

Interventions: Common features of illegitimate assignments

The features and patterns of illegitimate assignments—that is, assignments that were not written by the student—varies across disciplines, websites, and many other features.  Nevertheless, researchers have uncovered some common features of illegitimate assignments.  For example, these assignments sometimes include

 

  • references that comprise the author and date of one article but the title of another article—to circumvent software that is intended to detect plagiarism (Lines, 2016)

  • spelling and words that are more common in another culture or nation, such as US spelling at a British university (Lines, 2016; Rogerson, 2017)

  • suspicious meta-data in the document, such as author information in the properties section that diverges from the name of this student (Lines, 2016).

 

Interventions: Stylometry

Rather than depend on markers, Ison (2020) suggested and piloted another technique that could be applied to identify contract cheating: stylometry (see also Juola, 2017). In essence, stylometry is a technique that can examine whether several documents were indeed written by the same person.  Hence, stylometry software can determine whether several assignments, purportedly submitted by one student, were actually written by more than one person—indicative of contract cheating.  Briefly, the likelihood that one software tool, JStylo Authorship Attribution Framework, could identify contract cheating approached 89% when simple logistic regression was applied. Interestingly, other stylometry tools, such as the Signature Stylometry System and Java Graphical Authorship Attribution Program, were not as accurate in this study.

 

To identify whether several documents are written by one person, the software quantifies various features of the text.  For example, the software identifies the frequencies of n-grams—sequences of two words, three words, or even more words that are repeated in the text—the number of words in each sentence, the richness of vocabulary, misspellings, words that are common in specific communities or cultures, and use of punctuation.  A variety of machine learning algorithms, including Chain Augmented Naïve Bayes, nearest neighbor calculations, and neural networks, can be applied to distinguish sets of documents written by one person from sets of documents written by more than one person.  Texts of 500 words, with five documents written by each author, are sufficient to generate accurate estimates. 

 

Interventions: Workshops and redesign of assignments

Some researchers have introduced workshops in which academics, learning designers, and curriculum designers assemble to redesign assessments, ultimately to deter contract cheating and other illicit behavior.  To illustrate, Slade et al. (2019) piloted a series of workshops to achieve this goal.  In essence, the participants discussed how to modify assignments that are susceptible to ghost writing, particularly assignments that are weighted heavily, to prevent this problem.  Each workshop comprised two facilitators and up to 25 participants.

 

Specifically, before the workshop, the facilitators designed 10 fictional, but prototypical assessments, from a range of fields, in which contract cheating is possible—including essays, presentations, reflections, and diaries.  Then, during the workshop

 

  • teams identified concerns about why they could not readily evaluate whether the student, or someone else, completed one assignment

  • teams then presented solutions on how the assignment could be reconfigured to overcome these concerns

  • the facilitators subjected the concerns, raised previously, to a thematic analysis to identify the main impediments to verification of student identity

 

Some key insights emanated from these workshops.  First, the main concern that participants raised was that cheating is perhaps more likely when the assignments are authentic, corresponding to activities that are relevant to actual jobs.  In contrast, cheating is perhaps not as likely when the assignments are not authentic, such as exams. Second, participants raised the concern that academic workloads preclude the analysis that is needed to identify contract cheating.  In addition, participants indicated they felt that cheating is more likely if

 

  • the relevant information to complete the assignment can be readily obtained online

  • the assignments are not closely related to course content, in which case generic answers are more feasible

  • the assignment had been assigned to classes in the past

  • the assignment does not include provisional activities, such as the contributions of each group member

  • the assignment does not prompt students to consider their own experience or reflect upon their own lives

  • the assignment is worth many marks and the stakes are high

  • the assignment is not engaging

 

To address these concerns, the participants suggested a range of adjustments to the assignments.  For example, and similar to the recommendations that Taylor (2014) proposed

 

  • the assignments should comprise several phases or steps, each of which should be submitted and should attract feedback

  • some of these phases should include activities students complete in person or live, such as a presentation or video task

  • assignments in which contract cheating is likely and unverifiable should be formative, helping the students develop, rather than summative or greatly valued in the final grade

  • assignment rubrics should encourage personalized activities, such as original ideas or personal experiences.

 

Interventions: Policies and laws. 

After conducting a literature review, Morris (2018) suggested how institutions could improve their policies and procedures to prevent and to manage contract cheating better. For example

 

  • policies and procedures should frame academic integrity as an inspiring and shared commitment to foster a fair, respected, and collegiate environment; that is, academic integrity should be perceived as an opportunity to advance the institution rather than a punitive or patronizing demand

  • guidelines should delineate and illustrate all variants of academic misconduct with examples in various disciplines and measures to identify and to prevent these variants

  • in contrast to other breaches, many of which only attract a warning, reduced mark, or resubmission, contract cheating is deliberate and thus should attract a severe penalty, such as suspension

  • peers who suspect academic misconduct should be informed of avenues on how to report this behavior confidentially, but with safeguards to prevent false accusations.  

 

In contrast, Draper and Newton (2017) discussed legal provisions that jurisdictions could introduce to deter contract cheating.  These authors suggest that a strict liability offence—in which the facts alone support the commission of offence, without the need to prove intent—could be enacted against contract cheating.  Accordingly, the person or business that supplies an assignment would be liable, unless they had reasonably attempted to prevent the students from submitting this assignment. Draper and Newton (2017) applied this principle to develop a draft law, designed to stem contract cheating (for a discussion on the likely effect of similar laws as well as the existing laws in America, New Zealand, Ireland, UK, and Australia, see Awdry et al., 2022).    

 

References

  • Awdry, R., Dawson, P., & Sutherland-Smith, W. (2022). Contract cheating: To legislate or not to legislate-is that the question? Assessment & Evaluation in Higher Education, 47(5), 712-726.

  • Bretag, T., Harper, R., Burton, M., Ellis, C., Newton, P., van Haeringen, K., ... & Rozenberg, P. (2019). Contract cheating and assessment design: exploring the relationship. Assessment & Evaluation in Higher Education, 44(5), 676-691.

  • Clare, J., Walker, S., & Hobson, J. (2017). Can we detect contract cheating using existing assessment data? Applying crime prevention theory to an academic integrity issue. International Journal for Educational Integrity, 13(1).

  • Clarke, R., & Lancaster, T. (2007, July). Establishing a systematic six-stage process for detecting contract cheating. In 2007 2nd International conference on pervasive computing and applications (pp. 342-347). IEEE.

  • Curtis, G. J., & Clare, J. (2017). How prevalent is contract cheating and to what extent are students repeat offenders? Journal of Academic Ethics, 15(2), 115-124.

  • Dawson, P., & Sutherland-Smith, W. (2018). Can markers detect contract cheating? Results from a pilot study. Assessment & Evaluation in Higher Education, 43(2), 286-293.

  • Dawson, P., & Sutherland-Smith, W. (2019). Can training improve marker accuracy at detecting contract cheating? A multi-disciplinary pre-post study. Assessment & Evaluation in Higher Education, 44(5), 715-725.

  • Draper, M. J., & Newton, P. M. (2017). A legal approach to tackling contract cheating? International Journal for Educational Integrity, 13(1), 1-16.

  • Eaton, S. E. (2022). Contract cheating in Canada: A comprehensive overview. Academic Integrity in Canada, 165-187.

  • Ellis, C., Zucker, I. M., & Randall, D. (2018). The infernal business of contract cheating: understanding the business processes and models of academic custom writing sites. International Journal for Educational Integrity, 14(1), 1-21.

  • Foltýnek, T., & Králíková, V. (2018). Analysis of the contract cheating market in Czechia. International Journal for Educational Integrity, 14(1), 1-15.

  • Harper, R., Bretag, T., Ellis, C., Newton, P., Rozenberg, P., Saddiqui, S., & van Haeringen, K. (2019). Contract cheating: a survey of Australian university staff. Studies in Higher Education, 44(11), 1857-1873.

  • Ison, D. C. (2020). Detection of online contract cheating through stylometry: A pilot study. Online Learning, 24(2), 142-165.

  • Juola, P. (2017). Detecting contract cheating via stylometry methods. Plagiarism Across Europe and Beyond 2017, 187–198.

  • Kelly, A., & Stevenson, K. J. (2021). Students pay the price: Doctoral candidates are targeted by contract cheating websites. International Journal of Doctoral Studies, 16, 363.

  • Lancaster, T. (2019). Profiling the international academic ghost writers who are providing low cost essays and assignments for the contract cheating industry. Journal of Information, Communication and Ethics in Society, 17(1), 72–86.

  • Lines, L. (2016). Ghostwriters guaranteeing grades? The quality of online ghostwriting services available to tertiary students in Australia. Teaching in Higher Education, 21(8), 889–914.

  • Medway, D., Roper, S., & Gillooly, L. (2018). Contract cheating in UK higher education: A covert investigation of essay mills. British Educational Research Journal, 44(3), 393-418.

  • Morris, E. J. (2018). Academic integrity matters: five considerations for addressing contract cheating. International Journal for Educational Integrity, 14(1), 1-12.

  • Newton, P. M. (2018). How common is commercial contract cheating in higher education and is it increasing? A systematic review. Frontiers in Education, 3.

  • Rigby, D., Burton, M., Balcombe, K., Bateman, I., & Mulatu, A. (2015). Contract cheating & the market in essays. Journal of Economic Behavior & Organization, 111, 23–37.

  • Rogerson, A. M. (2017). Detecting contract cheating in essay and report submissions: Process, patterns, clues and conversations. International Journal for Educational Integrity, 13(1).

  • Rowland, S., Slade, C., Wong, K. S., & Whiting, B. (2018). “Just turn to us”: The persuasive features of contract cheating websites. Assessment & Evaluation in Higher Education, 43(4), 652-665.

  • Rundle, K., Curtis, G. J., & Clare, J. (2019). Why students do not engage in contract cheating. Frontiers in Psychology, 10.

  • Shala, S., Hyseni-Spahiu, M., & Selimaj, A. (2020). Addressing contract cheating in Kosovo and international practices. International Journal for Educational Integrity, 16(1).

  • Sivasubramaniam, S., Kostelidou, K., & Ramachandran, S. (2016). A close encounter with ghost-writers: an initial exploration study on background, strategies and attitudes of independent essay providers. International Journal for Educational Integrity, 12(1).

  • Slade, C., Rowland, S., & McGrath, D. (2019). Talking about contract cheating: facilitating a forum for collaborative development of assessment practices to combat student dishonesty. International Journal for Academic Development, 24(1), 21–34.

  • Taylor, S. M. (2014). Term papers for hire: How to deter academic dishonesty. The Education Digest, 80(2), 52

  • Tomar, D. (2012). The shadow scholar: How I made a living helping college kids cheat. Bloomsbury, New York

  • Wallace, M. J., & Newton, P. M. (2014). Turnaround time and market capacity in contract cheating. Educational Studies, 40(2), 233-236.

White Structure

Virtual or online proctoring

by Simon Moss

Evolution of online exam proctoring: Alternative procedures

As distance and remote learning proliferated, and students were granted the opportunity to complete exams and tests online, tertiary education institutions soon became concerned about the possibility of cheating.  For example, they were concerned that students could

 

  • collaborate with one another to complete these exams and tests collectively

  • seek the advice of someone else, such as a specialist or past student, on challenging questions

  • utilize resources that are prohibited, such as Wikipedia

  • somehow obtain access to the test banks of publishers or similar resources in advance

 

To counteract this concern, tertiary institutions considered a range of technological solutions.  For example, some institutions explored the possibility that relevant staff could monitor the webcams of students during the exam or test, confirming these individuals were completing the exercise alone.  Other institutions explored the possibility that technologies, rather than humans, could utilize artificial intelligence to monitor students and to identify suspicious behavior as well as to prevent students from searching the web. 

 

Initially, many scholars and practitioners in the field resisted the potential of both human online proctors and automated online proctors. For example, according to Cluskey et al. (2011), the costs of online proctoring may not outweigh the benefits.  According to these authors, to limit the prevalence of misconduct on exams, tertiary education institutions should

 

  • schedule online exams to a specific time and close this exam once the allotted duration on this test expires

  • randomize the order of both the questions and the order of options in multiple-choice questions

  • minimize the repetition of questions in successive years

  • activate the option, available in most learning management systems, that limit or prevent internet browsing during the exam

  • permit students to answer only one question at a time and prevent students from returning to previous questions.

 

To complement these strategies, tertiary institutions would need to introduce a range of additional measures to address the complications of these practices.  For example, if a subset of students is unavailable at a specific time, an alternative exam could be arranged, in which their webcams are monitored and the questions are updated. 

 

Evolution of online exam proctoring: Biometric monitoring and artificial intelligence

During the early 2010s, online proctoring became increasingly prevalent.  At this time, online proctoring largely, although not exclusively, revolved around human invigilators, monitoring the webcam of students during the exam (Selwyn et al., 2021), called live proctoring (Hussein et al., 2020).  These human proctors tended to fulfill two main goals.  First, these proctors confirmed the individuals who were completing the exam, as revealed in the webcam, were actually the students.  For example, the proctor might compare these individuals with photo IDs or ask questions in which only the student knows the answer.  Second, these proctors confirmed the students were not perpetrating the prohibited behaviors, such as searching the internet to unearth answers.   

 

As 2020 approached, institutions had increasingly explored, and often introduced, measures that confirm the identity of individuals who complete the exam, but without reliance on human invigilators.  That is, institutions gravitated to biometric measures, coupled with artificial intelligence, to confirm the identity of students (Selwyn et al., 2021).  For example, some institutions deployed facial recognition software to match the individual completing the exam with a photo ID.      

 

In addition, around this time, institutions experimented with many other automated technologies that monitor the behavior of students during the exam.  Specifically, these tools are designed to check that students are not breaching guidelines—such as receiving advice from another person or searching the internet.  These tools can track eye movements, ambient sound, facial characteristics, as well as keystrokes such as tabs and, then, apply artificial intelligence to identify possible instances of unauthorized behavior, sometimes called automated proctoring (Hussein et al., 2020). If the tools uncover possible instances of unauthorized behavior, human operators then need to assess and to judge the legitimacy of these actions in greater detail.    

 

Coupled with techniques that monitor students are tools, called computer lockdown or browser lockdown, that prevent students from searching the web or other computer files during the exam.  Rather than monitor the keystrokes of students, these tools disable attempts to leave the test environment.  For example, the Lockdown Browser, a tool in Respondus, disables all browser tabs except one.  Similarly, an option, called the Kiosk Mode, in software called the Safe Exam Browser, also prevents students from browsing other tabs while they complete an online exam (Slusky, 2020).

 

Many companies developed and introduced virtual or online proctoring tools.  Examples include B Virtua, Examit, ExamSoft, Honorlock, Kryterion, Proctorio, ProctorU, Respondus, Software Secure, Tegrity (for comparisons, see Foster & Layman, 2013; Hussein et al., 2020; Slusky, 2020).  These tools tend to be expensive, although open-source variants have been developed (see Jia & He, 2021). Often, if institutions use these tools, no live human needs to be deployed to monitor students. 

 

According to Wan (2019), even by 2019, before COVID-19, the market to purchase these tools was enormous, approaching $19 billion US.  In a subsequent report, prepared by Insight Partners (2021), analyses indicate this market might increase annually by 16.4% until late in this decade.

 

Evolution of online exam proctoring: Resistance and concerns

Because these tools proliferated in the aftermath of COVID-19, many of the potential complications with online proctoring surfaced.  Many student associations arranged petitions that challenge the legitimacy or suitability of these tools.  Protests were widespread and prominent. The main concerns revolved around breaches to privacy and the sense that surveillance of homes and eye movements was invasive and unsettling (Selwyn et al., 2021).  Other concerns revolved around equity, because these technologies may disadvantage particular communities, such as individuals who cannot afford stable internet connections.   

 

Academics also expressed concerns that proctoring companies had swayed the narratives of institutions and overlooked the multiple problems with this technology, such as issues around equity and privacy.  Some academics even proposed counternarratives and campaigns to stem the proliferation of online proctoring (e.g., Logan, 2021)

 

Evolution of online exam proctoring: Implementation of online proctoring

Selwyn et al. (2021) explored how tertiary education institutions embedded these online proctoring tools into the fabric of their operations.  Specifically, Selwyn et al. explored

 

  • how institutions choose and purchase these tools and technologies

  • how these tools and technologies are integrated with existing physical resources, social practices, cultural practices, and relationships

 

To answer these questions, Selwyn et al collated data from a range of sources.  Specifically, the researchers conducted interviews with

 

  • eight Australian university students and three staff who had experienced online proctored examinations

  • three students who had initiated campaigns that raise concerns about this technology

  • four staff at Australian universities, including managers and technicians, who had managed the procurement and implementation of these tools

 

The researchers also analyzed institutional statements, policies, and guidelines, derived from five Australian universities, that are related to online proctoring.  Finally, the researchers analyzed published interviews with CEOs in the proctoring industry along with other documents produced from these companies, such as social media. 

 

Thematic analysis of these data uncovered some vital insights.  First, the analysis explored how companies promoted these tools to institutions.  That is, to override the concerns of students and the public around invasive surveillance, companies depicted these technologies as

 

  • necessary to justly reward the vast majority of students who study diligently

  • necessary to protect the brand and reputation of the university

  • exciting innovations that deploy artificial intelligence and other advances—compatible with universities that depict themselves as pioneers of technology and progress

  • sensitive to diversity—to counter the concerns that facial recognition software can be biased

  • opportunities to expedite, rather than supplant, the role of decision makers, because these tools merely identify possible instances of suspicious behavior  

 

Second, the analysis showed how the attitudes of institutions, as exemplified by managers and technicians, towards online proctoring shifted during the pandemic.  In many institutions, online proctoring was touted as a necessary, but transient solution to accommodate the effects of COVID-19.  However, institutions soon recognized that such technologies, once implemented, were unlikely to be withdrawn—partly because of the apparent benefits of online exams, such as the capacity to reduce carbon emissions and include rich media in exam questions.    

 

Nevertheless, universities soon recognized the faults and limitations of these technologies.  First, these tools seemed to overlook many instances of cheating.  Rather than perceive these tools as infallible, many institutions depicted the technology as primarily a deterrent or reminder of the gravity and consequences of cheating.  Second, students often experienced challenges when using these tools.  Institutions thus introduced a range of practices to address these challenges, such as practice exams, familiarization workshops, and even sanitized areas in libraries where students could complete online exams. 

 

During the COVID-19 pandemic, most Australian universities permitted educators to choose whether to utilize online proctoring. Some educators chose to apply these tools, partly to satisfy accrediting bodies but sometimes to experiment with novel possibilities.  Other educators refrained from this option, partly to circumvent resistance and concerns from students.  Although universities arranged helpful workshops, the implementation of this technology consumed significant time—a concern that both educators, who had to set various proctoring parameters, and technicians, who had to check whether each assignment was compatible with the technology, reported. Despite all these efforts, some of the features and capabilities were deactivated because of bugs and flaws.  And most institutions still needed to arrange a live proctor to resolve technical problems that could unfold.   

 

Finally, although a few students raised concerns about the invasion of privacy, many students were not especially concerned about this matter, recognizing they often post photos of themselves in social media. Students tended to become increasingly accustomed to online proctoring and resigned to this approach.  The experience of these students with online proctoring was typically favorable, although they were sometimes concerned of technological obstacles.  However, according to some activists, online proctoring was introduced without reasonable consultation with students and may demonstrate fundamental distrust of students.  In addition, activists did not trust that university could secure the data, given some previous data breaches.    

 

Concerns about online proctoring: Performance

Despite the ubiquity of online proctoring in many institutions, many concerns persist.  One concern that scholars and practitioners have raised is that, because of a variety of reasons, online proctoring could impair the exam performance of some, but not all, students.  To facilitate online proctoring, students need to complete additional tasks, such as check their hardware, software, and environment.  They need to utilize tools that may seem unfamiliar or invasive.  Consequently, a portion of students might feel distracted or uncomfortable, potentially disrupting their performance.  

 

Some research has revealed no effect of online proctoring on exam performance (e.g., Lee, 2020).  However, as some research indicates, when online proctors monitor students, these students do not perform as well on exams (Alessio et al., 2017; Reisenwitz, 2020)—and they dedicate less time to these exams (Alessio et al., 2017).  Thus, at least in some circumstances, online proctoring seems to impair exam performance.    

 

Conceivably, this finding might indicate that online proctoring diminishes cheating, such as searching the web or contacting friends.  The decline in performance, therefore, might substantiate, rather than challenge, the utility of this tool. 

 

Unfortunately, some research challenges the assumption that decrements in performance, because online proctoring, can entirely be ascribed to a decrease in cheating.  To illustrate, in a study that Wuthisatian (2020) reported, some economic students completed an online exam, monitored by online proctoring.  Other economic students completed the same exam in a hall, monitored by an invigilator.  Presumably, level of cheating should be comparable in these conditions.  Yet, students who were exposed to the online proctoring did not perform as well as their counterparts.  As this finding suggests, albeit tentatively, the online proctoring environment might have impaired performance, even after level of cheating is controlled.  Further research is warranted to characterize the diverse psychological effects of online proctoring.

 

Concerns about online proctoring: Emotional responses

Student activists are concerned that online proctoring is a signal the institution does not trust students (Selwyn et al., 2021), potentially eliciting resentment in students towards the institution.  Misinterpretation of the data could amplify these feelings.  That is, if the automated software erroneously identifies a possible instance of cheating, the subsequent response of proctors or staff, such as investigation or questions, may also exacerbate this resentment towards the institution.     

 

Online proctoring could also magnify the anxiety that students typically experience during exams.  Several features of these tools may exacerbate anxiety (Sinha et al., 2020), including

 

  • uncertainty about whether the technology will operate as intended

  • the need to rectify flaws or problems that might unfold, such as loss of internet connection

  • discomfort with the knowledge they will be monitored over an extended period

 

In their study of medical students who had experienced online proctoring, Meulmeester et al. (2021) revealed that 54% of students were concerned that ambient noise, such as neighbors, might raise concerns and invalidate their tests.  Similarly, 62% of students were concerned that looking away from their webcam could also raise these concerns.  About 40% of students were concerned about the impact of an unstable internet connection, and over 20% of students were concerned about the impact of a dysfunctional webcam.  Nevertheless, as Woldeab and Brothen (2021) revealed, some concerns of students—such as the concern they might be flagged erroneously—was not related to exam performance.  So, at least some of these worries might not compromise grades.

 

However, Woldeab and Brothen (2019) showed that anxiety might compromise the performance of many students.  Specifically, as these researchers demonstrated, the performance of students who are susceptible to anxiety is especially likely to drop when online proctoring is introduced. 

 

Specifically, in this study, 44 undergraduate students, enrolled in a psychology course at an American university, completed the final exam online. ProctorU monitored their behavior during this exam.  Another 587 students, enrolled in the same course, completed this exam in a testing center.  Hence, ProctorU was not utilized to monitor their behavior during this exam.  Before the exam, participants completed a measure of trait anxiety, such as their tendency to worry. 

 

Trait anxiety was inversely associated with exam performance.  Importantly, however, this adverse effect of trait anxiety was especially pronounced in the students who were monitored by ProctorU.  These results suggest that online proctoring amplifies the impact of anxiety on exam performance. 

 

Concerns about online proctoring: Privacy

As many authors and users have indicated, online proctoring may jeopardize the privacy of individuals.  To illustrate, before they commence an online exam, students often need to scan an ID card to verify their identity.  This document may be stored together with other personal data that proctors, or other individuals, may be able to access and exploit (Nigam et al., 2021).  Often, the proctor companies choose the human proctors.  Therefore, institutions are often unable to verify the background and integrity of these proctors, amplifying this problem (Furby, 2020).

  

Likewise, online proctoring software can sometimes control the webcam and microphone as well as access the desktop or screen.  Because of this control, the computer becomes susceptible to attacks from malware, potentially jeopardizing privacy and security (Ilgaz & Afacan Adanır, 2020; Nigam et al., 2021). 

 

Concerns about online proctoring: Student rights

Despite these concerns, students tend feel satisfied with their experience of online proctoring.  For example, in one study, conducted by Milone (2017), pharmacy students at the University of Minnesota completed an online exam.  The coordinators instituted ProctorU to monitor the students.  After the exam, students completed questions that assess the delay before ProctorU connected, the delay before their identity was verified and the exam was accessible, as well as their satisfaction with the experience.  Interestingly, 89% of students indicated they were satisfied with the experience.  Nevertheless, a few students experienced delays with the exam or perceived the experience as awkward or creepy.  As Almutawa (2021) revealed, students are especially satisfied with online proctoring whenever a live human contributes to the invigilation.

 

Despite these promising findings, Lee and Fanguy (2022) expressed concern about the tendency of institutions to store the footage of students completing the exam, likening this experience to the Panopticon.  In their case study, in which a South Korean university adopted Safe Exam Browse, students felt they could be watched not only be proctors but also by fellow students.  Students were not granted opportunities to raise concerns about these practices, but had to consent, by clicking a button in response to a pop-up message, to access the learning management system.   

 

As Lee and Fanguy (2022) underscored, to justify online proctoring, the discourse revolved around the need to identify cheaters to maintain fairness.  Although ostensibly reasonable, this discourse assumes that students can be divided into two clusters: cheaters and non-cheaters.  This discourse, therefore, ascribes cheating to the moral shortcomings of individuals—potentially dismissing many of the social and cultural forces that may be relevant.  Accordingly, the institution may discount the role of inequities in access to technology, family circumstances, mental health, and many other considerations that could affect student behavior. 

 

Similarly, as Lee and Fanguy (2022) reported, online tests, coupled with online proctoring, also limited the capacity of educators to apply best practice.  For example, before the pandemic, in one course, the assessment included one exam and one collaborative project.  During the exam, students were permitted to consult their notes, because the teacher believed that memorization is not representative of work life.  During the pandemic, however, the project was not as viable and so the assessment revolved only around an online exam.  Because the exam was worth 100%, students were more inclined to assist one another and cheat.  Therefore, the teacher introduced online proctoring but then could not permit students to consult their materials.  This online test, therefore, diverged from the original intention of this instructor to shun memorization. 

 

Concerns about online proctoring: Minimum specifications and inequity

Most online proctoring software is effective only if the computers and hardware fulfill minimum specifications.  The webcam and microphone need to be operational, for example. The free storage in RAM and the internet bandwidth must exceed a minimum level.  If these specifications are not fulfilled at one time during the examination, the test may be suspended until these concerns are addressed (Ilgaz & Afacan Adanır, 2020), evoking considerable stress in many students.  Unfortunately, students who experience financial challenges or other impediments are not as likely to fulfill these specifications.  Therefore, online proctoring can exacerbate inequities, disadvantaging some races, ethnicities, and people with disabilities (see Brown 2020; Swauger 2020).  

 

The hardware and resources of the institution must also exceed a minimum level.  The internet connection to the institution must be stable and the servers must be robust to enable many students to access the test simultaneously (Sinha et al., 2020).  Furthermore, the institution must be able to store extensive data.  

 

Concerns about online proctoring: Techniques to circumvent these tools

Online proctoring software utilize a range of measures to prevent cheating.  Yet, in some instances, students can circumvent these measures.  To illustrate, the software often uses the IP addresses of computers or mobile phones to identify and to prevent misconduct—such as to confirm that students are using only one device.  Yet, students can also use VPNs to limit the capacity of this software to track the IP addresses (Nigam et al., 2021)     

 

Guidelines to address concerns

Because many concerns have been raised about online proctoring, Sando et al. (2021) recommended that peak bodies or other associations develop guidelines on how to administer online proctoring, especially automated online proctoring, effectively.  These authors suggested a few principles these guidelines should consider:

 

  • the guidelines should clarify how institutions should manage scratch paper—the paper the students use to perform calculations or to organize their thoughts.  For example, students may be asked to display this scratch paper at the beginning of each exam.  Or students could perhaps be permitted to use a dry erase board

  • the guidelines should specify the number of staff who should monitor each online exam, as a function of class size and other considerations

  • the guidelines should specify the support that should be available to all students in response to challenges, such as technical issues or bathroom breaks

  • guidelines should specify how institutions will support the students who cannot afford the resources that are necessary to enable online proctoring, such as a stable internet connection and private room

  • the guidelines should specify how to review exam recordings as fairly but efficiently as possible

  • the guidelines should specify responses to potential breaches, such as the possibility of additional assessment or penalties in response to continued infractions

 

Coghlan, Miller, and Paterson (2021) present a more detailed ethical account that could help institutions justify and improve the fairness of online proctoring.  For example, to manage the gamut of ethical issues this technology raises, these authors suggest that, at the very least, institutions should

 

  • consider other arrangements to accommodate some, or all, students who would prefer alternatives

  • collate evidence that online proctoring does indeed improve academic integrity to the degree that warrants the financial costs and ethical concerns

  • consider why they cannot develop assessments that are not as susceptible to cheating, such as open-book tests

 

Features and limitations of various online proctoring tools. 

The features and capabilities of online proctoring tools varies appreciably (for comparisons, see Foster & Layman, 2013; Hussein et al., 2020; Slusky, 2020).  To illustrate, most of these tools enable a human proctor to observe students and to communicate with these students during the exam.  This feature, however, is not available to users of all platforms, such as Proctorfree, Proctorio, Proctortrack, and Tegrity.  

 

Only a subset of these tools can operate without the intervention of a live proctor—in which the microphone and webcam record data automatically, even if no human is available.  This feature is available in BVirtual, Examity, PearsonVUE, Proctorfree, Proctorio, Proctortrack, and Tegrity

 

These tools also vary on countless other attributes and capabilities (Foster & Layman, 2013).  To illustrate, tools vary on

 

  • whether internet connection must be stable throughout the entire exam

  • whether inappropriate keystrokes are recorded and identified

  • whether potential incidents are logged and timestamped

  • whether the companies have published research on the accuracy of these tools

  • whether the lockdown feature prevents right-click, printing, function keys, the launch of applications, copy and paste, and the minimizing or maximizing of windows

  • whether they utilize passwords, photo comparisons, challenging questions, facial recognition, voice recognition, fingerprint reader, iris readers, or other options to authenticate users

One of the first online proctoring tools that tertiary education institutions embraced was ProctorU.  ProctorU records data from the microphone and webcam of students as well as keystrokes.  Live proctors, however, need to monitor the video footage live to check the ID cards of students and confirm the student cannot access unauthorized materials.  The tool uses artificial intelligence to identify possible instances of unauthorized behavior, such as students who leave the room.  However, the artificial intelligence can be inaccurate—and, as the company recommends, live proctors should monitor students as well and intervene if needed (Slusky, 2020)

 

Some of the online proctoring tools that utilize artificial intelligence distill some data from events that precede the exam.  For example, Examus distills some of the behavioral characteristics of students from online lecturers.  The software can then, in essence, match this information to the behavioral characteristics of these students during online exams (Slusky, 2020).  Discrepancies may demonstrate possible instances of suspicious behavior.  

 

References

  • Alessio, H. M., Malay, N., Maurer, K., Bailer, A. J., & Rubin, B. (2017). Examining the effect of proctoring on online test scores. Online Learning, 21(1), 146-161.          

  • Almutawa, A. M. (2021). Students’ perspective towards online proctoring in exams during COVID-19. Journal of Engineering Research.

  • Arnò, S., Galassi, A., Tommasi, M., Saggino, A., & Vittorini, P. (2021). State-of-the-art of commercial proctoring systems and their use in academic online exams. International Journal of Distance Education Technologies, 19(2), 41–60.

  • Atoum, Y., Chen, L., Liu, A. X., Hsu, S. D., & Liu, X. (2017). Automated online exam proctoring. IEEE Transactions on Multimedia, 19(7), 1609-1624.

  • Balash, D. G., Kim, D., Shaibekova, D., Fainchtein, R. A., Sherr, M., & Aviv, A. J. (2021). Examining the examiners: Students' privacy and security perceptions of online proctoring services. In Seventeenth Symposium on Usable Privacy and Security (SOUPS 2021) (pp. 633-652).

  • Brown, L. X. (2020). How automated test proctoring software discriminates against disabled students. center for democracy and technology.

  • Cluskey Jr, G. R., Ehlen, C. R., & Raiborn, M. H. (2011). Thwarting online exam cheating without proctor supervision. Journal of Academic and Business Ethics, 4(1), 1-7.

  • Coghlan, S., Miller, T., & Paterson, J. (2021). Good proctor or “big brother”? Ethics of online exam supervision technologies. Philosophy & Technology, 34(4), 1581-1606.

  • Foster, D., & Layman, H. (2013). Online proctoring systems compared.

  • Furby, L. (2020). Are you implementing a remote proctor solution this fall? Recommendations from NLN Testing Services. Nursing Education Perspectives, 41(4), 269–270.

  • González-González, C. S., Infante-Moro, A., & Infante-Moro, J. C. (2020). Implementation of e-proctoring in online teaching: A study about motivational factors. Sustainability, 12(8), 3488.

  • Hussein, M. J., Yusuf, J., Deb, A. S., Fong, L., & Naidu, S. (2020). An evaluation of online proctoring tools. Open Praxis, 12(4), 509-525.

  • Ilgaz, H., & Afacan Adanır, G. (2020). Providing online exams for online learners: Does it really matter for them? Education and Information Technologies, 25(2), 1255–1269

  • Insight Partners (2021) Online exam proctoring market forecast to 2027. Insight Partners: report TIPRE00013227. New York: Insight Partners.

  • Jia, J., & He, Y. (2021). The design, implementation and pilot application of an intelligent online proctoring system for online exams. Interactive Technology and Smart Education.

  • Lee, J. W. (2020). Impact of proctoring environments on student performance: Online vs offline proctored exams. Journal of Asian Finance Economics and Business, 7(8), 653-660.

  • Lee, K., & Fanguy, M. (2022). Online exam proctoring technologies: Educational innovation or deterioration? British Journal of Educational Technology, 53(3), 475-490.

  • Logan, C. (2021). Toward abolishing online proctoring: Counter-narratives, deep change, and pedagogies of educational dignity. Journal of Interactive Technology and Pedagogy, 20.

  • Meulmeester, F. L., Dubois, E. A., Krommenhoek-van Es, C. T., de Jong, P. G., & Langers, A. M. (2021). Medical students’ perspectives on online proctoring during remote digital progress test. Medical Science Educator, 31(6), 1773-1777.

  • Milone, A. S., Cortese, A. M., Balestrieri, R. L., & Pittenger, A. L. (2017). The impact of proctored online exams on the educational experience. Currents in Pharmacy Teaching and Learning, 9(1), 108-114.

  • Nigam, A., Pasricha, R., Singh, T., & Churi, P. (2021). A systematic review on AI-based proctoring systems: Past, present and future. Education and Information Technologies, 26, 1–25.

  • Reisenwitz, T. H. (2020). Examining the necessity of proctoring online exams. Journal of Higher Education Theory and Practice, 20(1), 118-124.

  • Sando, K., Medina, M. S., & Whalen, K. (2021). The need for new guidelines and training for remote/online testing and proctoring due to COVID-19. American Journal of Pharmaceutical Education.

  • Selwyn, N., O'Neill, C., Smith, G., Andrejevic, M., & Gu, X. (2021). A necessary evil? The rise of online exam proctoring in Australian universities. Media International Australia

  • Silverman, S., Caines, A., Casey, C., Garcia de Hurtado, B., Riviere, J., Sintjago, A., & Vecchiola, C. (2021). What happens when you close the door on remote proctoring? Moving toward authentic assessments with a people-centered approach. To Improve the Academy: A Journal of Educational Development, 39(3).

  • Sinha, P., Dileshwari, & Yadav, A. (2020). Remote proctored theory and objective online examination. International Journal of Advanced Networking and Applications, 11(06

  • Slusky, L. (2020). Cybersecurity of online proctoring systems. Journal of International Technology and Information Management, 29(1), 56-83.

  • Swauger, S. (2020). Our bodies encoded: Algorithmic test proctoring in higher education. Critical Digital Pedagogy.

  • Wan, T. (2019). As online learning grows, so will proctors. EdSurge, 30 April

  • Woldeab, D., & Brothen, T. (2019). 21st century assessment: Online proctoring, test anxiety, and student performance. International Journal of E-Learning & Distance Education, 34(1), 1-10.

  • Woldeab, D., & Brothen, T. (2021). Video surveillance of online exam proctoring: Exam anxiety and student performance. International Journal of E-Learning & Distance Education, 36(1)

  • Wuthisatian, R. (2020). Student exam performance in different proctored environments: Evidence from an online economics course. International Review of Economics Education, 35.

The model university 2040: An encyclopedia of research and ideas to improve tertiary education

©2022 by The model university. Proudly created with Wix.com

bottom of page