Part II: Review of Handbook of Automated Essay Evaluation: Current Applications and New Directions. Eds. Mark D. Shermis and Jill Burstein
Shermis, M., & Burstein J. (2013). Review of Handbook ofAutomated Essay Evaluation: Current Applications and New Directions. New York, NY: Routledge.
By Lori Beth De Hertogh, Washington State University
This is the second installment of a two-part review of the Handbook of Automated Essay Evaluation: Current Applications and New Directions edited by Mark D. Shermis, University of Akron, and Jill Burstein, Educational Testing Service. Part I explains the workflow of several scoring systems and provides an overview of platform options. Part II discusses how various chapters deal with automated essay evaluation (AEE) in classroom contexts as well as advances in machine scoring.
Individuals interested in learning how to apply automated essay evaluation to classroom assessment contexts will appreciate Norbert Elliot and Andrew Klobucar’s chapter, “Automated Essay Evaluation and the Teaching of Writing.” Elliot and Klobucar, professors of English at the New Jersey Institute of Technology, argue that AEE can enhance students’ learning experiences when used judiciously. They also highlight how they have “identified evidence to support the use of AEE in first-year writing” so long as “special care” is taken to observe its impact on certain student populations and to investigate its overall influence on first-year writing programs (p. 27).
Another chapter of interest to classroom educators is Changhua Rich (CTB), Christina Schneider (CTB/McGraw-Hill), and Juan D’Brot’s (West Virginia Department of Education) “Applications of Automated Essay Evaluation in West Virginia.” This chapter outlines how West Virginia Writes™, a customizable online scoring engine, was implemented in K-12 classrooms across the state. The authors explain that this program reduced the time teachers spent scoring essays and provided “students with valuable practice to build writing skills and confidence”(p. 102). They argue that improvements in machine scoring (i.e. increased ability to accurately identify and score traits such as organization, sentence structure, development, etc.) make programs like West Virginia Writes better equipped at helping students improve their writing abilities. While such a claim is debatable, individuals working in educational measurement and writing assessment might see this research as a starting point for investigating how customizable scoring tools can be used in writing classrooms.
Sara Weigle, professor of applied linguists at Georgia State University, argues in Chapter Three that AEE is a useful tool for generating error-analysis feedback in second-language learning environments, a process which “holds a promise of reducing teachers’ burdens and helping students become more autonomous” (p. 47). She also suggests that AEE’s ability to provide instant, computer-based feedback on grammatical errors allows “students to save face in a way that submitting their writing to teachers does not” (p. 47). As scholarship on error and English language learning indicates, teachers tend to respond more harshly to errors made by multilingual writers than native English speakers. Automated systems designed to provide feedback on grammatical errors may prove useful in helping to reduce teacher bias.
A long-standing complaint about AEE, particularly within the writing community, is that machine scoring does not produce a valid measurement of a student’s writing ability. Several chapters in the collection address this issue by suggesting that rather than focusing on the validity of machine scoring, educators should consider alternative ways AEE can assist teachers in classroom settings. In Chapter Fifteen, for example, authors Michael Gamon (Microsoft), Martin Chodorow (Hunter College and the Graduate Center of the City University of New York), Claudia Leacock (CTB/McGraw-Hill), and Joel Tetreault (Educational Testing Service), advocate for the use of automated essay evaluation as a tool for providing formative feedback on grammatical and sentence-level errors in second-language learning environments. Like Weigle, Gamon and his colleagues suggest that error-analysis feedback may “improve the quality of the user’s writing by highlighting errors, describing the types of mistakes the writer has made, and suggesting corrections” (p. 263). Rather than using automated essay evaluation as a means to determine students’ writing abilities, these authors view AEE as a tool—or even as an unbiased tutor—students can use to improve specific aspects of their writing.
Individuals working as writing program administrators or measurement technologists will be interested in several of the AEE advances highlighted in this collection. Chapter Fourteen, “Using Automated Scoring to Monitor Reader Performance and Detect Reader Drift in Essay Scoring,” focuses on the ability of automated scoring systems to “monitor hand-scoring accuracy” (p. 234). Drift occurs when human raters assign scores that are inconsistent or that fall outside an accepted variable range, thereby compromising “the validity of student scores” (p. 234). A scoring engine detects rater drift by comparing human raters’ scores to a model that emulates human scoring behavior; results can indicate whether a particular cohort of raters demonstrate drift in their scoring samples. Unlike traditional monitoring techniques which require a read-behind or second read (often by putting a testing sample back into a pool of raters), an automated system can efficiently assess a large number of scores without burdening raters with rereads.
Other advances in AEE of interest to those working in educational measurement, cognitive psychology, and computational linguistics include those discussed in Chapter Seventeen, which highlights original research on AEE and sentiment analysis, or a writer’s use of personal opinion statements (i.e. “I believe that…”). Authors Jill Burstein, Beata Beigman-Klebanov (ETS), Nitin Madnani (ETS), and Adam Faulkner (City University of New York) argue that the ability to recognize sentiment analysis using automated scoring systems can help in identifying “the quality of argumentation in student and test-taker essay writing” (p. 282). A scoring engine, for instance, can use natural language processing to detect if a student has stated his or her opinion in a writing sample that requires a personal response; the absence of a personal opinion may indicate that the writer is not on task or does not understand the prompt.
In reviewing the Handbook of Automated Essay Evaluation: Current Applications and New Directions, I have come to two conclusions. The first is that by educating ourselves about the capabilities of automated essay evaluation, those of us involved in writing assessment can make more informed choices about the uses and implications of machine scoring. Second, while I am not a supporter of AEE, this collection makes me wonder whether automated scoring systems can be fruitful if used judiciously, particularly in English language learning contexts or as tools for identifying rater drift. The truth is that automated essay evaluation is, in some form or another, here to stay. This means we must continue to critically engage with these tools and their proponents.
 See Peggy Lindsey and Deborah Crusan’s article, “How Faculty Attitudes and Expectations Toward Student Nationality Affect Writing Assessment” and Lyndall Nairn’s work, “Faculty Response to Grammar Errors in the Writing of ESL Students.”