Tuesday, May 12, 2015

Communities, stakeholders, and quality in assessment: A review of Sally O’Hagan’s Variability in Assessor Responses to Undergraduate Essays: An Issue for Assessment Quality in Higher Education

Communities, stakeholders, and quality in assessment: A review of Sally O’Hagan’s Variability in Assessor Responses to Undergraduate Essays: An Issue for Assessment Quality in Higher Education

O’Hagan, S. (2014). Variability in assessor responses to undergraduate essays: An issue for assessment quality in higher education. Bern: Peter Lang AG.

By Joe Cirio, Florida State University

In light of the increased growth of linguistic diversity in Australian universities, Sally O’Hagan’s book, Variability in Assessor Responses to Undergraduate Essays, seeks to explore how a more linguistically diverse student population affects assessment practices for disciplinary writing. The increased growth of diversity, as O’Hagan points out, poses a challenge to the quality and fairness of assessment—particularly, it poses a challenge to “consistency of assessment criteria and standards” (p. 11). But these challenges are only exacerbated by the lack of consensus on the meaning of quality within a discipline, and, moreover, the fact that assessors often rely upon “tacit knowledge of close communities of academics” (p. 17). This context is what ground her empirical study of assessor behaviors when responding to essays from native-speakers (NS) and non-native-speakers (NNS) of English. 

Both Chapter One and Two contextualize her study: the former chapter lays out the immediate context of the Australian university; the latter details the scholastic conversations with which her study intersects. She starts by describing the ways in which assessors vary in both marks and the evaluative judgments in support of those marks. As noted by O’Hagan, a multitude of factors can contribute to variability in scores. To rectify this variability, institutions have begun to implement policies to encourage transparency and clear communication about assessments. However, O’Hagan writes, “it has been argued that the explicit specifications of criteria and standards does little to improve the quality of assessment in the absence of supportive measures” (p. 37). She points to the fostering of genuine communities of practice as a way to ground disciplines in sharing an understanding of quality, criteria, and standards for assessment. The concluding section of Chapter Two reviews relevant research into verbal reporting methodologies, the central method used by O’Hagan to understand assessor behaviors. While verbal reporting methodologies do not “directly represent cognitive processes” (p. 76), they contribute an approach to assessor behavior that may provide a different data set to uncover implications not yet realized. Other research into assessor behaviors on second language writers—most recently in an edited collection by Cox and Zawacki (2014)—have used methods such as interviews (Zawacki & Habib, 2014; Ives et al., 2014; Dan, 2014), surveys (Ives et al., 2014), or analysis of student examples (Lancaster, 2014).

Collecting data from an academic department at a major university in Australia, the study hinged on ten assessors—nine tutors and one instructor—who would offer verbal reports as they assessed ten student essays (half NS and half NNS—this information was not made available to assessors). Chapter Four walks through the statistical analyses of three sets of data that were collected: the marks assigned to the essays, the features of the essay that prompted an evaluation, and the evaluative judgments made by assessors. The conclusions of the study can be distilled into four points: NNS essays (1) received lower marks, (2) with more negative comments, and (3) with comments more often directed toward mechanics. But (4) there was also an increased range of comments for NNS essays indicating a lack of consensus about how assessors, within the same discipline, respond to writing of second-language writers (p. 207). 

The remaining chapters of O’Hagan’s book point to improvements in assessment quality that can account for the wide range of disagreement in marks and the wide variability in evaluative judgments, regardless of student language background. As she describes her solutions, it is clear that she does not see utility in assessors simply exchanging criteria for the sake of quality assurance, but instead, she directs us toward the fostering of community to discuss how assessment values should be socialized across assessors. However, fostering community does not necessarily guarantee shared understanding of quality, but it is “through the interactions of participants” [emphasis hers] (p. 230) that is important for O’Hagan to have the constituents of quality grow and evolve dialogically.
             
The final chapter is a brief description of the limitations of the presented study and future research endeavors with which her findings might align. As she recognizes, because the scope of the study was limited to assessor behaviors, no attention was given to students themselves. As she writes, the study might have stretched its scope to include how students might benefit from transparency in assessment criteria or the degree to which students are familiarized with disciplinary knowledge. However, the suggestions offered by O’Hagan position students in a particular role: namely, on the outside. While she recognizes that students “are the primary stakeholders of assessment” (p. 248), there doesn’t seem much room to involve students in the negotiation of quality, much less involve students in the communities of practice that O’Hagan advocates. She does, however, point to some research that explores student involvement in assessment (Sadler, 2005; Woolf, 2004; Starfield, 2001), but even within the research she cites, students still do not have much of an active role in their assessment: Citing Sadler (2005) and Woolf (2004), students become empowered by transparency as the veil of the “mystique” is lifted from assessment, but their empowerment grows in only having the criteria available to them instead of their active negotiation. And citing Starfield (2001), students only negotiated after a mark was given to receive higher marks as opposed to negotiating quality with instructors in the process. If we see assessment as constituent in the formation of self (see: Faigley, 1989; Yancey, 1999), positioning students on the outside when it comes to defining assessment quality poses a problem to how we see students, how students see themselves, and how that affects their agency within writing.
             
O’Hagan offers a pragmatic contribution to questions of quality for assessment. Her study is notable for its detailed description of methods and rigorous attention to critical literature in writing assessment scholarship. As such, her book can be a useful resource for those who are designing research projects of their own or considering improvements to disciplinary writing assessment. However, we must not lose sight of the students’ stake in disciplinary considerations of quality.

References
Cox, M. & Zawacki, T. M. (Eds.) (2014). WAC and second language writers: Research towards linguistically and culturally inclusive programs and practices. Anderson, SC: Parlor Press LLC. 

Dan, W. (2014). Let’s see where your Chinese students come from: A qualitative descriptive study of writing in the disciplines in China. In M. Cox & T. M. Zawacki (Eds.), WAC and second language writers: Research towards linguistically and culturally inclusive programs and practices (pp. 233-255). Anderson, SC: Parlor Press LLC.

Faigley, L. (1989) Judging Writing, Judging Selves. College Composition and Communication,  40(4), 395-412.

Goen-Salter, S., Porter, P., & van Dommelen, D. (2009). Working with generation 1.5 pedagogical principles and practices. In M. Roberge, M. Siegal, & L. Harklau (Eds.), Generation 1.5 in college composition: Teaching academic writing to US-educated learners of ESL (pp. 235-259). New York, NY: Routledge. 

Ives, L., Leahy, E., Leming, A., Pierce, T., and Schwartz, M. (2014). ‘I don’t know if that was the right thing to do’: Cross-disciplinary/cross-institutional faculty responses to L2 writing. In M. Cox & T. M. Zawacki (Eds.), WAC and second language writers: Research towards linguistically and culturally inclusive programs and practices (pp. 211-232). Anderson, SC: Parlor Press LLC. 

Lancaster, Z. (2014). Making stance explicit for second language writers in the disciplines: What faculty need to know about the language of stance taking. In M. Cox & T. M. Zawacki (Eds.), WAC and second language writers: Research towards linguistically and culturally inclusive programs and practices (pp. 269-298). Anderson, SC: Parlor Press LLC.

Nielsen, K. (2014). On class, race, and dynamics of privilege: Supporting generation 1.5 writers across the curriculum. In M. Cox & T. M. Zawacki (Eds.), WAC and second language writers: Research towards linguistically and culturally inclusive programs and practices (pp. 129-150). Anderson, SC: Parlor Press LLC. 

O’Hagan, S. (2014). Variability in assessor responses to undergraduate essays: An issue for assessment quality in higher education. Bern: Peter Lang AG. 

Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher education. Assessment and Evaluation in Higher Education, 30(2), 175-194. 

Starfield, S. (2001). ‘I’ll go with the group’: Rethinking ‘discourse community’ in EAP. In M. Peacock and J. Flowerdew (Eds.), Research perspectives on English for academic purposes (pp. 132-147). Cambridge, England: Cambridge University Press.

Woolf, H. (2004). Assessment criteria: Reflections on current practices. Assessment and Evaluation in Higher Education, 29(4), 479-493. 

Yancey, K.B. (1999). Looking back as we look forward: Historicizing writing assessment. College Composition and Communication, 50(3), 483-503.

Zawacki, T. M. & Habib, A. S. (2014). Negotiating ‘errors’ in L2 writing: Faculty dispositions and language differences. In M. Cox & T. M. Zawacki (Eds.),  WAC and second language writers: Research towards linguistically and culturally inclusive programs and practices (pp. 183-210). Anderson, SC: Parlor Press LLC.





 

Tuesday, April 28, 2015

Who Assesses the Assessors? A Review of Assessing the Teaching of Writing: Twenty-First Century Trends and Technologies


Who Assesses the Assessors? A Review of Assessing the Teaching of Writing:
Twenty-First Century Trends and Technologies

Dayton, A. E. (Ed.). (2015). Assessing the teaching of writing:Twenty-first century trends and technologies. Boulder, CO: University Press of Colorado.

by Way Jeng, Washington State University

Overview
           
Assessing the Teaching of Writing:Twenty-First Century Trends and Technologies examines the performance of teachers. That is to say, it offers methods to investigate whether curricula are taught so students learn the material and are well-positioned to perform. The scholarship of assessment in writing studies generally looks at the performance of students, the achievement of learning outcomes for entire programs, and helps to define the curricular values of educational institutions. This volume addresses the missing middle link in the assessment chain: the teaching performance of individual educators.

Over the course of the book's 12 chapters, the authors (Amy E. Dayton, Meredith DeCosta, Duane Roen, Brian Jackson, Gerald Nelms, Kara Mae Brown, Kim Freeman, Chris Gallagher Chris M. Anson, Nichole Bennett, Cindy Moore, Amy C. Kimmie Hea, Charles Paine, Robert M. Gonyea, Paul Anderson, Deborah Minter, and Amy Goodburn) discuss topics as varied as student course evaluations and teacher portfolios to administrative priorities and issues of how to assess the efficacy of writing center consultations for students.
           
The first half of the book looks at frameworks and methods for teacher assessment. These chapters offer clear methods from those who have already implemented the discussed programs, along with a discussion of each method's validity construct. The second half of the book steps outside of the classroom to examine the larger institutional and administrative context of teacher assessment. The authors in this section show awareness that concerned teachers and administrators may perceive the assessment of teachers may as unfair, punitive, or otherwise designed to hurt teachers. Here, the emphasis on formative feedback of teaching, rather than a summative or purely evaluative frame, is very welcome.

Highlights
           
The volume's third chapter, Amy E. Dayton's "Making Sense (and Making Use) of Student Evaluations," presents the use of student course evaluations. This is the most logistically-straightforward method of teacher evaluation for many institutions simply due to the ubiquity of student course evaluations. Administrators likely already have years of data already collected. Dayton provides a detailed discussion of validity concerns surrounding course evaluations (i.e., Can our students be relied upon when they are not experts on the subject matter themselves?), as well as a framework for interpreting student comments in course evaluations in the context of programmatic learning outcomes statements.

Cindy Moore's chapter, "Administrative Priorities and the Case for Multiple Methods," encapsulates one of the main argumentative threads that permeates each chapter: No one assessment can adequately describe an activity as rich and varied as teaching. Using multiple methods allows stronger conclusions to be drawn by triangulating data to reinforce observations. Moore further expands this core argument by discussing ways administrators can overcome the obvious issue of time practicality since more assessments and more assessment data can often mean more time. Moore suggests identifying work that can be spread between stakeholders in the assessment and strategically choosing assessment materials to ensure a complete yet concise assessment procedure.

Last Thoughts
           
As the volume is only around 200 pages, readers can expect that some aspects of teacher assessment will be left unexamined. For example, none of the authors specifically discuss the effects of race and class in assessing teachers, though it seems likely that race and class bear effects for teaching just as they do for learning. Because Assessing the Teaching of Writing is concise, it is a practical guide for readers who want to revise or develop their methods for assessing teachers. Rather than providing an exhaustive treatment, this text provides a foundation for implementing teacher assessment and offers clear methods to do so. Readers are therefore well-positioned to expand that base by looking at other texts that explore specific aspects of assessment to incorporate those ideas into the existent framework of teacher assessment.
           
As such, existing works that discuss assessment complement the book very well. Newcomers to assessment will probably want to read some of the texts referenced by numerous authors, notably Brian Huot's (2002) (Re)articulating Writing Assessment for Teaching and Learning, Linda Adler-Kasser and Peggy O'Neill's (2010) Reframing Writing Assessment to Improve Teaching and Learning, Chris Anson's (1994) Portfolios for Teachers, and Peter Seldin's (1991) The Teaching Portfolio. Assessing the Teaching of Writing is a solid addition to any scholar's collection of assessment texts. It's not so much a foray into a qualitatively different model of assessment as it is the transfer of established theory into a new context. This book helps to imagine and implement assessment of teachers in nuanced ways, and as such helps to inform an important part of the assessment of writing.



References

Adler-Kassner, L., and O'Neill, P. (2010). Reframing Writing Assessment to Improve Teaching and Learning. Logan, Utah: Utah State University Press.

Anson, C. (1994). Portfolios for teachers: Writing our way to reflective practice. New Directions in Portfolio Assessment. Eds. Black, L., Daiker, D., Sommers, J., and Stygall, G. Portsmouth, NH: Neinemann.

Huot, B. (2002). (Re)articulating Writing Assessment for Teaching and Learning. Logan: Utah State University Press.

Selden, P. (1991). The Teaching Portfolio: A Practical Guide to Improved Performance and Promotional/Tenure Decisions. Bolton, MA: Anker.

Saturday, March 14, 2015

JWA at CCCC 15 in Tampa!


Do you have an idea for a manuscript related to writing assessment?  Are you interested in reviewing something for the JWA Reading List?  We would like to talk with you!

Journal of Writing Assessment's entire editorial team will be at the upcoming Conference on College Composition and Communication in Tampa, Florida March 17-22, 2015, and we'd love to talk with you about your ideas.

You can email us at journalofwritingassessment@gmail.com to set up an appointment, or we can hope to run in to you at the conference.

Safe travels and see you there!

--Diane, Carl, Jessica, Ti, David and Bruce


Saturday, February 28, 2015

Technology as Teacher: A Review of Genre-based Automated Writing Evaluation for L2 Research Writing


Technology as Teacher:  A Review of Genre-based Automated Writing Evaluation for L2 Research Writing





by Karen R. Tellez-Trujillo, New Mexico State University

In Genre-based automated writing evaluation for L2 research writing, Elena Cotos provides a broad overview of theoretical and operational frameworks that reinforce research writing pedagogy for second language (L2) writers. Her audience is vast: teachers of research writing, researchers, developers of intelligent writing technologies, and scholars. The book relies on empirical evidence and theoretical discussions as it advocates the development of an Automated Writing Evaluation (AWE) program, defined by Cotos as a technology used to “complement instruction with computerized affordances that are otherwise unavailable or extremely time and labor-intensive” (p. 5). Through formative assessment of graduate student writing and discipline-specific feedback produced by a scaffolded computer-assisted learning environment (Harasim, 2012), Cotos presents genre-based approaches to academic writing for research writers and L2 research writers. The AWE presented by Cotos is an L2 research writing program that includes a model for designing and evaluating a corpus and genre-based Automated Writing Evaluation (AWE) technology prototype.  She closely considers the research writing needs of novice scholars, uses a mixed methodological approach for empirical evaluation of the Computer-Assisted Language Learning (CALL) materials, and presents a sound resource for educators interested in exploring learning technologies that address writing challenges faced by L2 graduate student writers.

Genre-based automated writing evaluation for L2 research writing is well organized and comprehensive in its design. Seven chapters presented in two parts include sections on learning and teaching challenges in linguistics and rhetoric, automated writing evaluation, and conceptualization and prototyping of genre-based AWE. The second half of the book assesses the implementation and evaluation of genre-based AWE for L2 research writing. Cotos explores and evaluates the Intelligent Academic Discourse Evaluator (IADE) prototype she developed, and later discusses the cognitive, socio-disciplinary dimension and learning experiences for students who use the IADE. The analysis engine within Cotos’ IADE can be trained through John Swales’ move schema – establishing a territory, establishing a niche, and occupying a niche –which can lead to the identification of textual rhetorical structures. As a result, the IADE can provide feedback to students on their rhetorical structure, as well as information regarding the distribution of moves in their text in comparison to other moves presented in the student’s discipline. Cotos concludes by introducing the Research Writing Tutor (RWT), an extended “full-fledged corpus-based AWE program for L2 research writing pedagogy” (p. 214) capable of providing discipline-specific feedback attentive to the conventions of the genre for which the student is writing.

This book is ambitious: It provides an extensive list of figures and tables and an overview of technologies designed to assist students with writing challenges as it discusses and evaluates the design of the IADE system. A strength of this book is Cotos’ acknowledgement that research writers ought to write “like an authoritative member of the discourse community” (Boote & Beile, 2005, p. 18). Doing so is dependent upon the writer’s understanding of the genre for which they are writing and the expectations of the discourse communities to which they belong. Cotos emphasizes in the introduction, “For graduate students as aspiring scholars, research writing is also the foundation of their academic career and of the credibility of the scholarly dossier” (p. 2), thus strengthening the necessity for a graduate student to gain credibility and develop skills for effective written communication within their disciplinary community. 

A point of consideration in this book is the controversial AWE technology. Enthusiasts have looked to it as “a silver bullet” or simple solution perceived to be immediately successful when applied to language and literacy development (Warschauer & Ware, 2006, p. 175). Cotos explains the possibility of AWE’s use in the writing classroom and in L2 research writing if “conceptualized at the earliest design stages” (p. 40) and not as a quick cure for a fundamental problem. Numerous examples of Automated Essay Scoring (AES) technologies are discussed in support of the author’s argument that AWE technology can serve the student well when discipline specific genre and validity concerns are addressed. Various instructional tools such as timely formative and summative feedback, data-analysis and reporting features, and teacher accessibility for setting of parameters are provided by AWE and, Cotos argues, can be utilized with less time and effort than when done free of technology.  

Automated evaluation of writing remains controversial based on the lack of the “human factor.” Cotos notes the drawbacks and apprehensions about AWE and speaks to issues that have come to surface through research, recognizing that without attention to these issues, technology will suffer when put into action. And while the author’s attention to the drawbacks of AWE is admirable, leading scholar-teachers in Rhetoric and Composition/Writing Studies remain unconvinced that technology – no matter now sophisticated – can replace a human.

Paul Deane et al. (2013) suggested in their article, “Automated essay scoring in innovative assessments of writing from sources,” that human raters are needed to evaluate more complex factors such as critical reasoning, strength of evidence, or accuracy of information. Further, Brent Bridgman et al. (2012) presented scores on two high-stakes timed essay tests that use ETS’s e-rater® software: the Test of English as a Foreign Language (TOEFL) iBT and the Graduate Record Exam (GRE) in their article, “Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country.” The study revealed that e-rater scored writing by Chinese and Korean speakers more highly than did human raters, but gave lower scores to writing by Arabic, Hindi, and Spanish speakers. The authors hypothesized that these scoring differentials are attributable to stylistic differences human readers often accommodate but e-rater does not, and that some of these differences may be cultural rather than linguistic (Elliot, et al. 2013). Elliot et al. in “Uses and limitations of automated writing evaluation software” reminded readers that computational methods of assessing writing rely on cognitive and psychological models of language processing that can be at odds with theoretical understandings of writing as a rhetorically complex and socially embedded process that varies based on context and audience (2013).

Implications 
In addition to the application of Swales’ moves schema, one of the themes of this book is reflection. Regardless of the specific method used to evaluate student writing, Cotos emphasizes the need for writers to continually reflect upon their writing moves and practices.  She not only addresses the practice of graduate student research writing, but also works with three recursive processes to include planning, translating and reviewing, in addition to knowledge sub-stages (Flower et al., 1986; Hayes et al., 1987). While student writing is analyzed for moves by the IADE, the student is engaging in recursive practices and is improving metacognitive awareness of their writing. It is possible that L2 students utilizing the IADE will produce quality writing as a result of detailed feedback given by the IADE, however, outside the hands of the composition teacher style and cultural factors cannot be considered, thus providing the students incomplete feedback. 

Whether used as an introduction to the various types of intelligent writing technologies or for the sake of research, Genre-based automated writing evaluation for L2 research writing is an ideal resource for teachers and researchers in search of an instrument to aid students in learning to write into their academic discourse communities. All too often in English departments and composition programs, L2 writing issues are left to TESOL specialists. This book helps all teachers of composition recognize that it is within their means to aid all students – including L2 students – with scholarly writing. 


References

Boote, D.N., & Beile, P. (2005). Scholars before researchers: On the centrality of the dissertation literature review in research preparation. Educational Researcher, 34(6).

Bridgeman, B., Trapani, C., and Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27-40.

Deane, P., Fowles, M., Baldwin, D., & Persky, H. (2011). The CBAL summative writing assessment: A draft eighth-grade design (Research Memorandum 11-01). Princeton, NJ: Educational Testing Service.

Elliot, N. Gere, A.R., Gibson, G., Toth, C., Whithaus, C., Presswood, A. (2013). Uses and Limitations of Automated Writing Evaluation Software, WPA-CompPile Research Bibiliographies.  WPA-CompPile Research Bibliographies, 23.  

Flower, L., Hayes, J.R., Carey, L., Schriver, K., & Stratman, J. (1986). Detection, diagnosis and the strategies of revision. College Composition and Communication, 37, 16-55.

Harasim, L. (2012). Learning theory and online technologies. New York: Routledge.

Hayes, J. R., Flower, L., Schriver, K.A., Stratman, J.F., & Carey, L. (1987). Cognitive processes in revision. In S. Rosenberg (ed.), Advances in applied psycholinguistics (Vol. 2, pp. 176 241). New York, NY: Cambridge University Press.

Warschauer, M. & Ware, P. (2006). Automated writing evaluation: Defining the classroom research agenda. Language Teaching Research, 10(2), 1-4.