Iterative Design of Assessments and Constructs
McGaw highlighted that the Measurement Working Group (led by Mark Wilson) emphasized the need for iterative refinement in the development of new measures. Various groups spent much of the first decade of the 21st century debating how these proficiencies should be defined and organized. In this abstract context, this definition process could easily consume the second decade as well. Wilson’s group argues that the underlying constructs being assessed must be defined and redefined in the context of the assessment development process. Of this, McGaw said
You think about it first, you have a theory about what you want those performances to measure. You then begin to develop ways of capturing information about that skills. But the data themselves give you information about the definition, and you refine the definition. This is the important point of pilot work with these assessment devices. And not just giving the tests to students, but giving them to students and seeing what their responses are, and discovering why they gave that response. And not just in the case where it is the wrong response but in the case where it is the correct response, so that you get a better sense of the cognitive processes underlying the solution to the task.
In other words, you can’t just have one group define standards and definitions and then pitch them to the measurement group when dealing with these new proficiencies. Because of their highly contextualized nature, we can’t just pitch standards to testing companies as has been the case with hard skills for years. This has always nagged at me in previous consideration, in that they seemed to overlook both the issue and the challenge that it presents (e.g., the Partnership for 21st Century Skills). Maybe now we can officially decide to stop trying to define what assessment scholar Lorrie Shepard so aptly labeled “21st Century Bla Bla Bla.”
The Lack of Learning Progression Models
McGaw also reiterated the concerns of the Measurement Working Groups over the lack of consensus about the way these new proficiencies develop. There is a strong consensus about the development of many of the hard skills in math, science, and literacy, and these insights are crucial for developing worthwhile assessments. I learned about this first hand developing a performance assessment for introductory genetics working with Ann Kindfield at ETS. Ann taught me the difference between the easier cause-to-effect reasoning (e.g., completing the Punnett square) and the more challenging effect-to-cause reasoning (e.g., using a pedigree chart to infer mode of inheritance). We used these and other distinctions she uncovered in her doctoral studies to create a tool that supported tons of useful studies on teaching inheritance in biology classes. Other more well known work on “learning progressions” include Ravit Duncan’s work in molecular genetics and Doug Clements’ work in algebra. In each case it took multiple research teams many years reach consensus about the way that knowledge typically developed.
Wilson and McGaw are to be commended for reminding us how difficult it is going to be to agree on the development of these much softer 21st century proficiencies. They are by their very definition situated in more elusive social and technological contexts. And those contexts are evolving. Quickly. Take for example judging credibility of information on the Internet. In the 90s this meant websites. In the past decade it came to mean blogs. Now I guess it includes Twitter. (There is a great post about this at MacArthur’s Spotlight Blog, as well as a recent CBC interview about fostering new media literacies, featuring my student Jenna McWilliams.)
Consider that I taught my 11-year-old son to look at the history page on Wikipedia to help distinguish between contested and uncontested information in a given entry. He figured out on his own how to verify the credibility of suggestions for modding his Nerf guns at nerfhaven.com and YouTube. Now imagine you are ETS, where it inevitably takes a long time and buckets of money to produce each new test. They already had to replace their original iSkills test with the iCritical Thinking test. From what I can tell, it is still a straightforward test of information from a website. Lots of firms are starting to market such tests. Some places (like Scholastic’s Expert21) will also sell you curriculum and classroom assessments that will teach students to pass the test—without ever actually going on the Internet. Of course ETS know that they can’t sell curriculum if they want to maintain their credibility. But I am confident that as soon as organizations start attaching meaningful consequences to the test, social networks will spring up telling students exactly how to answer the questions.
There is lots of other great stuff in the Measurement white paper. Much if it is quite technical. But I applaud their sobering recognition of the many challenges that these new proficiencies pose for large scale measurement. And they only get harder when these new tests are used for accountability purposes.
Next up: McGaw’s comments about the Classroom Environments and Formative Evaluation working group.