Monday, September 1, 2014

Part II: Review of Handbook of Automated Essay Evaluation: Current Applications and New Directions. Eds. Mark D. Shermis and Jill Burstein

Part II: Review of Handbook of Automated Essay Evaluation: Current Applications and New Directions. Eds. Mark D. Shermis and Jill Burstein

Shermis, M., & Burstein J. (2013). Review of Handbook ofAutomated Essay Evaluation: Current Applications and New Directions. New York, NY: Routledge.

By Lori Beth De Hertogh, Washington State University

This is the second installment of a two-part review of the Handbook of Automated Essay Evaluation: Current Applications and New Directions edited by Mark D. Shermis, University of Akron, and Jill Burstein, Educational Testing Service. Part I explains the workflow of several scoring systems and provides an overview of platform options. Part II discusses how various chapters deal with automated essay evaluation (AEE) in classroom contexts as well as advances in machine scoring.

Individuals interested in learning how to apply automated essay evaluation to classroom assessment contexts will appreciate Norbert Elliot and Andrew Klobucar’s chapter, “Automated Essay Evaluation and the Teaching of Writing.” Elliot and Klobucar, professors of English at the New Jersey Institute of Technology, argue that AEE can enhance students’ learning experiences when used judiciously. They also highlight how they have “identified evidence to support the use of AEE in first-year writing” so long as “special care” is taken to observe its impact on certain student populations and to investigate its overall influence on first-year writing programs (p. 27).

Another chapter of interest to classroom educators is Changhua Rich (CTB), Christina Schneider (CTB/McGraw-Hill), and Juan D’Brot’s (West Virginia Department of Education) “Applications of Automated Essay Evaluation in West Virginia.” This chapter outlines how West Virginia Writes™, a customizable online scoring engine, was implemented in K-12 classrooms across the state. The authors explain that this program reduced the time teachers spent scoring essays and provided “students with valuable practice to build writing skills and confidence”(p. 102). They argue that improvements in machine scoring (i.e. increased ability to accurately identify and score traits such as organization, sentence structure, development, etc.) make programs like West Virginia Writes better equipped at helping students improve their writing abilities. While such a claim is debatable, individuals working in educational measurement and writing assessment might see this research as a starting point for investigating how customizable scoring tools can be used in writing classrooms.

Sara Weigle, professor of applied linguists at Georgia State University, argues in Chapter Three that AEE is a useful tool for generating error-analysis feedback in second-language learning environments, a process which “holds a promise of reducing teachers’ burdens and helping students become more autonomous” (p. 47). She also suggests that AEE’s ability to provide instant, computer-based feedback on grammatical errors allows “students to save face in a way that submitting their writing to teachers does not” (p. 47). As scholarship on error and English language learning indicates,[1] teachers tend to respond more harshly to errors made by multilingual writers than native English speakers. Automated systems designed to provide feedback on grammatical errors may prove useful in helping to reduce teacher bias.

A long-standing complaint about AEE, particularly within the writing community, is that machine scoring does not produce a valid measurement of a student’s writing ability. Several chapters in the collection address this issue by suggesting that rather than focusing on the validity of machine scoring, educators should consider alternative ways AEE can assist teachers in classroom settings. In Chapter Fifteen, for example, authors Michael Gamon (Microsoft), Martin Chodorow (Hunter College and the Graduate Center of the City University of New York), Claudia Leacock (CTB/McGraw-Hill), and Joel Tetreault (Educational Testing Service), advocate for the use of automated essay evaluation as a tool for providing formative feedback on grammatical and sentence-level errors in second-language learning environments. Like Weigle, Gamon and his colleagues suggest that error-analysis feedback may “improve the quality of the user’s writing by highlighting errors, describing the types of mistakes the writer has made, and suggesting corrections” (p. 263). Rather than using automated essay evaluation as a means to determine students’ writing abilities, these authors view AEE as a tool—or even as an unbiased tutor—students can use to improve specific aspects of their writing.

Individuals working as writing program administrators or measurement technologists will be interested in several of the AEE advances highlighted in this collection. Chapter Fourteen, “Using Automated Scoring to Monitor Reader Performance and Detect Reader Drift in Essay Scoring,” focuses on the ability of automated scoring systems to “monitor hand-scoring accuracy” (p. 234). Drift occurs when human raters assign scores that are inconsistent or that fall outside an accepted variable range, thereby compromising “the validity of student scores” (p. 234). A scoring engine detects rater drift by comparing  human raters’ scores to a model that emulates human scoring behavior; results can indicate whether a particular cohort of raters demonstrate drift in their scoring samples. Unlike traditional monitoring techniques which require a read-behind or second read (often by putting a testing sample back into a pool of raters), an automated system can efficiently assess a large number of scores without burdening raters with rereads.

Other advances in AEE of interest to those working in educational measurement, cognitive psychology, and computational linguistics include those discussed in Chapter Seventeen, which highlights original research on AEE and sentiment analysis, or a writer’s use of personal opinion statements (i.e. “I believe that…”). Authors Jill Burstein, Beata Beigman-Klebanov (ETS), Nitin Madnani (ETS), and Adam Faulkner (City University of New York) argue that the ability to recognize sentiment analysis using automated scoring systems can help in identifying “the quality of argumentation in student and test-taker essay writing” (p. 282). A scoring engine, for instance, can use natural language processing to detect if a student has stated his or her opinion in a writing sample that requires a personal response; the absence of a personal opinion may indicate that the writer is not on task or does not understand the prompt.

In reviewing the Handbook of Automated Essay Evaluation: Current Applications and New Directions, I have come to two conclusions. The first is that by educating ourselves about the capabilities of automated essay evaluation, those of us involved in writing assessment can make more informed choices about the uses and implications of machine scoring. Second, while I am not a supporter of AEE, this collection makes me wonder whether automated scoring systems can be fruitful if used judiciously, particularly in English language learning contexts or as tools for identifying rater drift. The truth is that automated essay evaluation is, in some form or another, here to stay. This means we must continue to critically engage with these tools and their proponents.

[1] See Peggy Lindsey and Deborah Crusan’s article, “How Faculty Attitudes and Expectations Toward Student Nationality Affect Writing Assessment” and Lyndall Nairn’s work, “Faculty Response to Grammar Errors in the Writing of ESL Students.”

Sunday, August 24, 2014

Part I: Review of Handbook of Automated Essay Evaluation: Current Applications and New Directions. Eds. Mark D. Shermis and Jill Burstein

Part I: Review of Handbook of Automated Essay Evaluation: Current Applications and New Directions. Eds. Mark D. Shermis and Jill Burstein

Shermis, M., & Burstein J. (2013). Review of Handbook ofAutomated Essay Evaluation: Current Applications and New Directions. New York, NY: Routledge.

By Lori Beth De Hertogh, Washington State University

The Handbook of Automated Essay Evaluation: Current Applications and New Directions edited by Mark D. Shermis, University of Akron, and Jill Burstein, Educational Testing Service, features twenty chapters that each deals with a different aspect of automated essay evaluation (AEE). The overall purpose of the collection is to help professionals (i.e. educators, program administrators, researchers, testing specialists) working in a range of assessment contexts in K-12 and higher education better understand the capabilities of AEE. It also strives to demystify machine scoring and to highlight advances in several scoring platforms.   

The collection is loosely organized into three parts. Authors of the first three chapters discuss automated essay evaluation in classroom contexts. The next section examines the workflow of various scoring engines. In the final section, authors highlight advances in automated essay evaluation. My two-part review generally follows this organizational scheme, except that I begin by examining the workflow of several scoring systems as well as platform options. I then review how several chapters describe potential uses of AEE in classroom contexts and recent developments in machine scoring.

The Handbook of Automated Essay Evaluation devotes considerable energy to explaining how scoring engines work. Matthew Schultz, director of psychometric services for Vantage Learning, describes in Chapter Six how the IntelliMetric™ engine analyzes and scores a text:

The IntelliMetric system must be ‘trained’ with a set of previously scored responses drawn from expert raters or scorers. These papers are used as a basis for the system to ‘learn’ the rubric and infer the pooled judgments of the human scorers. The IntelliMetric system internalizes the characteristics or features of the responses associated with each score point and applies this intelligence to score essays with unknown scores. (p. 89)

While the methods platforms like IntelliMetric use to determine a score are slightly different, they all employ a multistage process, which consists of four basic steps:
  •  receiving the text,
  • using natural language processing to parse text components such as structure, content, and style,
  • analyzing the text against a database of previously human- and machine-scored texts,
  • producing a score based on how the text is similar or dissimilar to previously rated texts.
In Chapter Eight, Elijah Mayfield and Carolyn Penstein Rosé, language and technology specialists at Carnegie Mellon University, demonstrate how this four-step process works by describing the workflow of LightSIDE, an open source machine scoring engine and learning tool. In doing so, they illustrate how the program is able to match or exceed “human performance nearly universally” due to its ability to track and develop large-scale aggregate data based on text data. Mayfield and Rosé argue that this feature allows LightSIDE to tackle “the technical challenges of data collection” in diverse assessment contexts (p. 130). They also emphasize that this capability can help users curate large-scale data based on error-analysis. Writing specialists can then use this information to identify areas (i.e. grammar, sentence structure, organization) where students need instructional and institutional support.

Chapter Four, “The e-rater® Automated Essay Scoring System,” provides a “description of e-rater’s features and their relevance to the writing construct” (p. 55). Authors Jill Burstein, Joel Tetreault, and Nitin Madnani, research scientists at Educational Testing Service, stress that the workflow capabilities of scoring systems like e-rater or Criterion (a platform developed by ETS) make them useful tools for providing students with immediate, relevant feedback on the grammatical and structural aspects of their writing in addition to being useful in administrative settings where access to aggregate data is critical (pp. 64-65). The authors argue that e-rater’s ability to generate a range of data make it an asset in responding to both local and national assessment requirements (p. 65).

In Chapter Nineteen, “Contrasting State-of-the-Art Automated Scoring of Essays,” authors Mark D. Shermis and Ben Hamner (Kaggle) offer readers a comparison of nine scoring engines’ responses to a variety of prompts in an effort to assess and compare the workflow and performance levels of each system, some of which include Intelligent Essay Assessor, LightSIDE, e-rater, and Project Essay Grade. This chapter may be particularly useful to individuals tasked with determining which type of automated evaluation system to adopt or replace. In addition, this chapter provides a brief guide to understanding how a variety of systems operate and an overview of “vendor variability in performance” (p. 337).

The Handbook of Automated Essay Evaluation: Current Applications and New Directions provides assessment scholars, practitioners, and writing teachers relevant information about the workflow of various scoring engines and how these systems’ functioning capabilities can be applied to a range of educational settings. By understanding how these systems work and their potential applications, individuals tasked with writing assessment can make more informed choices about the potential benefits and consequences of adopting automated essay evaluation.  

Tuesday, March 18, 2014

JWA at RNF and CCCC in Indianapolis!

JWA will be at the Research Network Forum and CCCC in Indianapolis, March 19-22, 2014.

JWA will be at the Editors' Roundtable discussion on Wednesday, March 19, 2014 from 1:15-2:30 pm.

 If you would like to talk to someone from JWA about a potential project, you can reach Peggy O'Neill at poneill1 [at] loyola [dot] edu or you can contact Jessica Nastal-Dema at jlnastal [at] uwm [dot] edu.

See you there!

Wednesday, February 5, 2014

Review of _Building Writing Center Assessments that Matter_ by Ellen Schendel and William Macauley

Review of Building Writing Center Assessments that Matter by Ellen Schendel and William Macauley (2012). Utah State University Press.

ISBN 978-0-87421-816-9, paper $28.95; ISBN 978-0-87421-834-3 e-book $22.95

By Marc Scott, Shawnee State University

     Ellen Schendel and William Macauley’s 2012 book, Building Writing Center Assessments that Matter (Building), is a co-authored text featuring an introduction and coda by both authors, three chapters authored by Macauley, three by Schendel, a brief interchapter by Neal Lerner, and an afterward by Brian Huot and Nicole Caswell. Much of Building explores how important writing assessment scholarship can apply to writing center program assessment, and often uses specific examples from the authors’ experiences directing writing centers. Schendel and Macauley’s goal in writing Building is to provide Writing Center Directors (WCDs) new to program assessment with a text that speaks specifically to the unique needs and opportunities of writing center work. While the text is geared toward assisting WCDs navigate program assessment, Building also provides assessment scholars and practitioners with important ideas and concepts for program assessment, including how to frame assessment and how to think through methodological options.
     Those wishing to develop a culture of assessment at their institution can learn much from Schendel and Macauley’s text. Throughout Building, the authors use tutoring and writing processes as metaphors for assessment work. Just as writers gain invaluable insights by sharing their work with other writers, sharing assessment projects and data with peers only benefits writing assessment. Furthermore, in Writing Center scholarship and practice, tutors strive to help a writer establish a healthy writing process rather than just proofread or edit a text. When applied to writing assessment, a similar emphasis on process over product might help instructors and students engage in assessment as a reciprocal and recursive form of inquiry that improves the writer holistically, rather than a linear process with one correct approach for each context (p. xix). In addition, the assessment process—much like the writing process—benefits from careful attention to exigency, context, purpose, and audience. Using the recursion of writing processes and the context-sensitive nature of tutoring as metaphors for assessment may provide an accessible concept for colleagues reluctant to embrace assessment.
     Writing assessment practitioners can also benefit from Building’s discussion of assessment methodologies. Schendel describes how Writing Center Directors should work to connect a program assessment’s methodology with each specific project’s purpose, audience, and available data. In fact, Schendel provides a useful chart that describes different forms of data a WCD might collect and explains how the data might be collected and who might collaborate in such efforts (pp. 127-131). The design of a writing assessment—be it a placement exam, a portfolio program, or a classroom assessment technique—should take the assessment’s context and purpose into account at each stage of the process, not just in analyzing results.  Rather, a writing assessment should be sensitive to the context of the student and classroom. Neal Lerner’s brief interchapter helps WCDs understand how qualitative and quantitative assessment methodologies might impact assessment projects in writing centers, and his thoughts can also help persuade those reluctant to assess. He argues against “maintaining the status quo” and operating on only a “felt sense” of the work done in Writing Centers (p. 113). Classroom teachers and WPAs might also feel like they “know” their classrooms, but unless they can provide evidence through assessment for what they know, their claims will fail to persuade important stakeholders.

     Building, while effectively tailored to the needs of WCDs, provides assessment scholars and practitioners with useful metaphors for discussing assessment and a thoughtful discussion of assessment methodologies. The bulk of the text provides important information for those interested in programmatic assessment, but it does so by thoughtfully weaving together assessment scholarship in a way relevant to writing centers.

Friday, January 31, 2014

JWA at WRAB 2014 in Paris, France

The Journal of Writing Assessment will be at the upcoming Writing Research Across Borders conference in Paris, France, February 19-22, 2014.

If you will be there and would like to talk with Diane Kelly-Riley, co-editor of JWA, please email her at dianek [at] uidaho [dot] edu.  We welcome WRAB presenters to adapt their writing assessment focused presentations for publication consideration to the Journal of Writing Assessment.  Presenters can find JWA submission information here.

Monday, September 2, 2013

Review of _Teaching the New Writing: Technology, Change, and Assessment in the 21st- Century Classroom_, by Anne Herrington, Kevin Hodgson, and Charles Moran, Editors.

Review of Teaching the New Writing: Technology, Change, and Assessment in the 21st- Century Classroom, by Anne Herrington, Kevin Hodgson, and Charles Moran, Editors.

Herrington, A., Hodgson, K., & Moran, C. (Eds.) (2009) Teaching the New Writing:  Technology, Change and Assessment in the 21st Century Classroom.  New York:  Teachers College Press.

By Susan Garza, Texas A&M University-Corpus Christi

Although this text was published in 2009, in 2013 as I write this review for the Journal of Writing Assessment, the use of technology continues to grow in the field of writing, and assessment seems on its way to becoming the overriding focus at all levels of education. The use of machine-scoring for writing assessment and the debate surrounding this issue is just one example of the current issues in assessment. Herrington and Moran begin the book by pointing out that while technology has propelled us forward with teaching and learning possibilities, we are being driven backwards as the drive for machine scoring unfortunately seems to be working up steam. In the last chapter, all three editors reflect on the other chapters in the book stating that their goal for the collection was to see “teachers working creatively with an expanded sense of what writing is becoming as it has accommodated emerging technologies,” and to develop an “expand[ed] sense of what the word writing might include in this new century” (198). Each chapter does have a section on assessment, although most are short and several read as if they were tacked on without much development. However, many useful rubrics/checklists that the teachers have developed in their attempts to align their activities with the curriculum standards within which they operate are included. This book could be used for a brief introduction for teachers beginning to learn about assessing writing, but it provides mostly assignment specific evaluation examples.

The remaining chapters were written by teachers at all levels from various part of the country and are divided into three sections. Most of the information is explanatory -- here's how to do an activity -- with an inspirational tone -- here’s how my students were changed by the experience.
Elementary and Middle School – In this section we learn about an elementary writing workshop for struggling readers that emphasizes the social nature of writing by having the students create web pages about topics of their choice (vampires); a fourth grade collaborative writing project where students create podcasts in order to gain a better understanding of revision; and sixth graders who create math- and science-based digital picture books to help students actively create information that ordinarily they would just read about.

Secondary Grades – In these examples, all from high school classes, we are shown how students participate in blogging for a New Journalism/Technology course by writing and sharing about their chosen topics; how students interpret poetry in the form of a video as part of the Poetry Fusion project; how seniors display their research paper findings (encompassing information from across the curriculum) in such formats as i-movies and Web pages; and how blogging and podcasting are used in a speech class.

College Years – In this final grouping, we see science writing made stronger using graphics and story boarding; students using Web 2.0 tools to create multimodal documents; and literacy narratives presented through hybrid essays using text and images.

The book is an easy read for teachers looking for innovative activities. One thing the book does well is illustrate how rich the experiences of using technology to teach writing can be. The teachers mentioned over and over that their roles change to becoming more of a coach or mentor than someone who presents information just for testing purposes. All of the chapter authors point to some way that the writing experience is deepened and made more meaningful through the use of technology, including:
• better understanding of process—making choices/revising/planning/delivering
• increased interest and ownership of writing
• more writing overall
• real world applications
• more social and interactive experiences.

In the final chapter, the editors situate the assessment discussion in the book under two categories: “Classroom Based Assessment,” where the need is for new criteria that relate more to composing with technology, which results in criteria from print culture being adapted and teachers recognizing the role of creativity in the use of a tool of choice; and “State Curriculum Standards and Standardized Assessments,” where the teachers were able to apply curriculum frameworks, but had more difficulty relating to what standardized testing can and seeks to measure. Authors Reed and Hicks, referring to work by Lankshear & Knobel (2006), provide one example of the difficulties faced: “Much of what we see in the enactment of this curriculum (as a result of the tests) does not engender the kind of changed mindset that a new literacies perspective requires: one that is characterized by openness, collaboration, collective intelligence, distributed authority, and social relations” (p. 136). In their chapter, Frost, Myatt, and Smith also talked about having to deal with “departmental outcomes that still refer to the number of pages students should write.” They felt a sense of “discomfort” when thinking about how to assess differently for “projects that couldn’t be word-count quantified” (p. 182). Many of the authors discuss how the use of technology is seen as an add-on in most standards lists, but the new literacies that students engage in, or assessment frameworks for these new literacies, aren’t included.

This collection presents a good starting point for understanding the need for new and different assessment as we experience the different writing experiences afforded with ever advancing technology possibilities.

Wednesday, April 24, 2013

Part III: Review of Norbert Elliot's and Les Perelman's (Eds.) _Writing Assessment in the 21st Century: Essays in Honor of Edward M. White_

Part III:  Review of Norbert Elliot's and Les Perelman's (Eds.)  Writing Assessment in the 21st Century:  Essays in Honor of Edward M. White

Elliot, N., & Perelman, L. (Eds.) (2012).  Writing Assessment in the 21st Century:  Essays in Honor of Edward M. White.  New York, NY:  Hampton Press. 

By Jessica Nastal, University of Wisconsin-Milwaukee

This is the third review in a series of five about Writing Assessment in the 21st Century:  Essays in Honor of Edward M. White, edited by Norbert Elliot and Les Perelman.  The collection is a “testament to White’s ability to work across disciplinary boundaries” as it includes contributions from the writing studies (including the National Writing Project, writing centers, classroom instruction, and writing programs) and educational measurement communities (p. 2).  It is also a snapshot – or a series of snapshots, since it is over 500 pages – of contemporary interests in and concerns about writing assessment; an update on Writing Assessment: Politics, Policies,Practices (1996), edited by White, William Lutz, and Sandra Kamusikiri.

Each chapter in Part III, “Consequence in Contemporary Writing Assessment:  Impact as Arbiter,” drives toward the last sentence of the last chapter in the section, written by Liz Hamp-Lyons:  “You cannot build a sturdy house with only one brick” (p.395).  Elliot and Perelman highlight the section’s dedication to the question of agency, in Edward M. White’s words as the “rediscovery of the functioning human being behind the text” (qtd. p. 371).  I also see the authors in Part III as demonstrating their dedication to understanding the variety of methods and interpretations and social consequences of writing assessment. 

Elbow pauses in his “Good Enough Evaluation” and writes, “I seem to be on the brink of saying what any good postmodern theorist would say: there is no such thing as fairness; let’s stop pretending we can have it or even try for it” (p. 305).  He doesn’t cross that brink, of course, and the writers in this section discuss how writing assessment in the twenty-first century might strive for building sturdy houses with many bricks of various shapes and sizes.  

In Chapter 17, Peter Elbow urges teachers and administrators of writing to consider “good enough evaluation,” not as a way to get us off the hook of careful evaluation, but as a way to rediscover the human being both writing and reading the text.  In the spirit of White’s practical and realistic forty-year approach, Elbow reminds us that the “value of writing is necessarily value for readers”; and yes, this even means teachers of writing (p. 310). He concludes by explaining that using such evaluation could result in evaluation sessions with “no pretense at ‘training’ or ‘calibrating’ [readers] to make them ignore their own values” (p. 321). 

Elliot and Perelman have set up another interesting contrast in Part III:  While many readers will agree with Elbow (how can we not?!), we might have some questions about how this good enough evaluation works in practice, which Doug Baldwin helps to highlight.  How is it that the results become “more trustworthy” through this process (p. 319)?  What makes Directed Self Placement the “most elegant and easy” alternative to placement testing (p. 317; Royer and Gilles discuss the public and private implications of DSP in Chapter 20)?  What impact would multidimensional grading grids, instead of GPAs, have on reading student transcripts (pp. 316-317)?  Baldwin helps to ask how we can ensure the “technical quality” of Elbow’s ideal – though non-standardized – evaluations (p. 327). 

For Baldwin, fairness, a concept authors of this section are dedicated to, “refers to assessment procedures that measure the same thing for all test-takers regardless of their membership in an identified subgroup” (p. 328).  He uses the chapter to expose instances that might display “face fairness” – allowing students to choose their prompt, use a computer, or use a dictionary – but that might reveal deeper unfairness for students.  Baldwin’s conclusion provides guidance for those of us concerned about the state of writing and writing assessment in the twenty first century, our diverse populations of students, and our “concerns about superimposing one culture’s definition of ‘good writing’ onto another culture’” (p. 336).

Asao B. Inoue and Mya Poe (Chapter 19), Gita DasBender (Chapter 21), and Liz Hamp-Lyons (Chapter 22) continue probing questions of agency, fairness, and local contexts.  The “generation 1.5” students DasBender worked with were confident in their literacy skills, identified as being highly motivated, and expressed satisfaction with their writing courses.  On the surface, it seemed like the mainstream writing courses served them well; however, instructors believed students “struggled to succeed” in them (p. 376).  DasBender observed, “generation 1.5 students’ self-perceptions as reflected in their DSP literacy profile…is at odds with” the abilities they demonstrate in mainstream writing courses (p. 383). 

This conflict seems representative of some of the concerns about contemporary writing assessment in action.  What are programs to do when they employ theoretically sound, fair policies designed to enable student participation and responsibility (“asking them where they fit,” in Royer and Gilles’ words) but that seem to fail in the eyes of instructors or administrators?  DasBender, Elbow, Baldwin, Inoue, Poe, Royer, Gilles, and Hamp-Lyons remind us that while Writing Assessment in the 21st Century does much to situate writing assessment and Ed White’s role within it, we have more work to do on behalf of all our students – which Part IV:  “Toward a Valid Future” alludes to.