BY DAVID STEINER | We are at a critical juncture for education reform in America: forty-six states have adopted a brand-new set of English Language Arts Common Core State Standards for college- and career-readiness, and most are on the verge of implementation. I supported the adoption of the Standards as Commissioner of Education in New York State because I believed – and continue to believe – that they represent a major step towards a more effective education for our students, many of whom have hitherto been subject to unpredictable and often under-demanding learning expectations.
Within the next two years, students in these same states will face a slate of new standardized tests aligned to these Common Core Standards. We know from international research that well-designed testing can drive better learning outcomes, so the new tests have the potential to benefit our students immensely – but only if the test makers take bold steps in their designs. This means incentivizing the use of meaningful content in the classroom and eschewing misguided, patronizing notions of fairness that would undermine the promise of the Common Core and harm the educational opportunities of our most disadvantaged students.
The Promise of the Common Core ELA Standards and the Role of Assessments
Formally, the ELA Standards “lay out a vision of what it means to be a literate person in the twenty-first century” by specifying and encouraging the development of “the skills in reading, writing, speaking, and listening that are the foundation for any creative and purposeful expression in language.”
These skills are important, but one cannot learn skills in the abstract: imagine trying to think critically about nothing in particular. In a February 2013 essay on the topic, E.D. Hirsch cites a 2012 study by the National Research Council, which found that “21st-century skills [are] dimensions of expertise that are specific to–and intertwined with–knowledge within a particular domain of content and performance.” Skills must be tied to content if they are to be learned effectively.
And, in fact, a signature value of the Core is its potential to bolster both skills and knowledge by encouraging sequenced, spiraled, content-rich curricula in the classroom. This promise is embodied in what Robert Pondiscio, former Vice President of Hirsch’s Core Knowledge Foundation (on the board of which I have served), quoting the Standards, called “the 57 most important words in education reform“:
By reading texts in history/social studies, science, and other disciplines, students build a foundation of knowledge in these fields that will also give them the background to be better readers in all content areas. Students can only gain this foundation when the curriculum is intentionally and coherently structured to develop rich content knowledge within and across grades.
Unfortunately, realizing this skill-knowledge potential requires more than simply adopting the Common Core Standards. The challenge is that the Standards themselves do not require specific content beyond classical mythology, one (any) play by Shakespeare, and a selection of founding American documents. (The exhortation to demonstrate knowledge of several centuries of American literature is laudatory, but hardly specific enough to guide curriculum design.) In short, the Common Core Standards do not provide curricular content – presumably because their authors realized full well that if they had specified content, few if any states would have agreed to adopt them. The fact that the ELA Standards are largely silent on content would matter far less if this country had agreed on a shared curriculum – but we have not.
Let me contrast the typical American way of structuring content and testing with the practices in many other democratic educational systems. Those countries first agree on content, then on standards, and finally on assessments to analyze students’ mastery of that content against those standards. Our social heterogeneity and long tradition of localism in determining curriculum has produced a different approach: we start with standards, next we develop assessments, and then teachers, schools, and districts decide what content to teach our students, with varying support and guidance from their states. That content may be based on whatever textbooks the school uses, the whim of the teacher, the availability of particular on-line materials, the policy of a school or district, or any combination of these factors. Resistance to prescribed content – particularly to rigorous academic content – is compounded by a long, complex history of anti-intellectualism in this country which, for more than a hundred years, has supported social and psychological, but not necessarily academic, goals for our classrooms.
Given our historical lack of consensus over curricula, it thus falls to assessments to influence the depth and quality of instruction. If the new tests assess knowledge in ways that demand mastery of sequenced domain knowledge, sophisticated vocabulary, rich content, and cross-disciplinary learning, educators across the country would have a much greater incentive to bring challenging content into their classrooms and thus realize the implicit promise of the new standards.
Assessments Today: A Patronizing Vision of Fairness
At this moment, two federally funded consortia of states, PARCC (Partnership for Assessment of Readiness for College and Careers) and Smarter Balanced (Smarter Balanced Assessment Consortium), are producing the guidelines for Common Core Standards-aligned tests. Those consortia issue RFPs (requests for proposals) to companies or institutions to submit test designs that will, in turn, become the assessments many of our students will take in the years ahead. States that have opted out of these consortia have gone or will go through a similar process to produce their own Core-aligned tests.
Unfortunately, there is reason for concern about the quality of these exams, and in particular whether they will push the rest of our education system to teach high-quality content.
One concern stems from the way test designers have come to interpret the industry-guiding principles of building tests – principles often referred to as those of Universal Design (see Table 2 here). Universal Design guidelines are intended to ensure that assessments are fair to all students. Some of these guidelines are eminently reasonable and important – for example, allowing students with special needs (such as visually-impaired students) to take an appropriate version of the test, or avoiding language that is likely to insult a particular group of test takers.
The applications of other design principles, however, are well intentioned but neither reasonable nor academically astute. Although they certainly didn’t invent them, the granular design criteria that PARCC and Smarter Balanced require test designers to adopt will perpetuate a patronizing version of fairness. This is because in the pursuit of absolute equality in every test taker’s “experience” of the test, these criteria exclude potentially upsetting passages and any other material that creates disparity, including content that rewards those with greater background knowledge.
Let me elaborate. Test designers are to avoid background knowledge that might be known to some groups but not others. For example, Smarter Balanced’s “Bias and Sensitivity Guidelines” point to the word foyer as unfair: “assuming a student knows what a “foyer” is would be unfair because the term: 1) is more likely to be known by some groups of students than by other groups of students, 2) is not required by the Common Core State Standards, and 3) is not likely to have been routinely used in the classroom.” Other forbidden content in these Guidelines includes a passage that requires knowledge of opera and how composers use the orchestra or singers; a quotation from the Old Testament (or other religious material); a passage describing the use of sailboats for racing (or any “luxuries”); and a video of a dancer requiring knowledge of ballet. PARCC’s Fairness Guidelines are similar: “avoid depicting situations that are associated with spending money on luxuries, such as eating in exclusive restaurants, joining a country club, taking a cruise…”
The technical explanation, in part, is that test designers try to build questions that avoid Differential Item Functioning (DIF) – items in which students from different groups (commonly gender or ethnicity) with the same underlying achievement levels have a different probability of giving a certain response on that particular item. To take an example, imagine that a particular sub-group of students do more poorly than expected (based on their performance on other questions testing the same math skill) on a math item that uses the word “foyer,” while other groups of students do just as well as expected. The “foyer” item functions differentially and would be deemed unfair. The difficulty is defining the “underlying” achievement level. If it were defined to include more sophisticated vocabulary and wider domain knowledge, individual items testing for these elements would not display the dreaded differential functioning and could be used in our assessments. Unfortunately, achievement is typically conceived in a much narrower sense, excluding much of the vocabulary and knowledge expected of well-educated people in the workplace and in life.
Test designers are also instructed to avoid material that might upset students. “Troublesome topics” include death and dying, religion, and violence, and in the case of Smarter Balanced extend even to medicine and dancing. The goal of excluding these topics is “to avoid material that may cause extreme negative emotions in test takers because such emotions have the potential to interfere with test performance.” The implication is that some students (presumably those who are less “sophisticated” or less able to “manage” a controversial topic) need to be protected. In such an empty landscape, one wonders what topics are left for students to explore and discuss – certainly, all too few that raise interesting and worthwhile questions about human existence. How many portions of the “foundational works of American literature” cited by the Common Core are automatically excluded by this criterion? (In the design of its own Core-aligned tests, New York State wisely pushes the envelope by allowing test designers to use excerpts from books that “include controversial ideas and language that some may find provocative” – but the actual passages used in the assessments cannot themselves exhibit those qualities.)
Almost a decade ago, in her book The Language Police, Diane Ravitch, who recently became a critic of the Common Core, censured textbook and standardized-test publishers for suppressing any reference to entire realms of content that could be construed as “unfair.” Unfortunately, it seems that little has changed since that time.
How Patronizing Fairness Contributes to Unequal Outcomes
The problem with patronizing fairness is not just the sheer absurdity of the self-censorship involved; rather, these broad restrictions underestimate students and, by stripping out content, serve them badly – especially the most underprivileged. How so?
We know that more privileged students are far more likely to have the opportunity to learn advanced vocabulary and a broad range of academic, historical, geographic, and other content from a variety of sources outside the classroom. Our least advantaged students, by contrast, are more dependent on public schools to impart much of this information. If they do not learn from their teachers what a foyer is – or, far less trivially, how to read and make reference to complex, even disturbing texts about fundamental issues – many of them will have no other chance to do so. And if teachers know that the exams that matter will scrupulously avoid covering, even indirectly, knotty issues that provoke strong opinions and advanced concepts that may prove novel for students, it makes perfect sense for them to avoid such content altogether. The absence of those materials on the test licenses this impoverishment in the classroom.
This is not merely a matter of specific vocabulary deficits or lack of attention to important issues. Rather, as E.D. Hirsch has noted, it is the contextual knowledge available to the middle-class student that gives her a sustained advantage throughout her education. Our insistence on tests that assess de-contextualized, carefully controlled, thoroughly “fair” dots of information forces test designers to create artificial assessments. The resulting tests cannot include many serious passages of literature that would be “discriminatory” by virtue of including instances of vocabulary, syntax, and background knowledge that would privilege the more affluent. The damaging truth is that in our drive to make our exams content-neutral, they may end up content-neutered, and the disadvantaged students will suffer the most.
Seen in this light, the patronizing fairness protocols are ultimately pernicious, despite their good intentions. They inspire tests that reinforce our fragmented curricula and provide no support for teachers and students in underprivileged communities as they strive to close the learning gap. Unless the new tests lead us forward by assessing more than we currently teach, they will perpetuate a downward spiral and entrench the economic and social status quo. The educational trap here is that in protecting disadvantaged groups against material that in any sense touches on their disadvantage, we cut them off from the very material they need the most to overcome that disadvantage.
Patronizing fairness not only adversely affects the academic and economic success of individual students, it also undermines the civic value of education. When test designers ensure that potentially emotive material cannot be included in their assessments – such as a discussion about gun control – they reduce the likelihood that teachers will help students develop the deliberative skills required for democratic participation. Substantive classroom discussion is demanding on students and teachers alike, to say the least, and yet research indicates that this practice is as important in citizenship formation as civics classes and, for some students, can compensate for a home life bereft of such debates.
Patronizing fairness therefore unintentionally reinforces the inequities our students have inherited and diminishes our civic culture. Our localized education system only reinforces these tendencies, leaving assessments bearing a disproportionate responsibility to effect long-term change. If we design tests that are blandly “fair” to all students taking them today, we surrender an opportunity to drive real fairness for all students tomorrow.
A Better Path: A Stronger Version of Fairness
If the new assessments are to fulfill the promise of the Common Core, test designers (and indeed all of us) will have to embrace a different, stronger version of fairness, one that requires us to tell the fuller truth about where our students stand, and test the rigorous content that will impart the knowledge and skills to succeed. This is by no means a simple task. It is likely that it would result, at least temporarily, in an even greater disparity in scores between our privileged and disadvantaged students. Thus, such a shift would need to be strongly signaled and carefully phased in over time. Nevertheless, the redesigned tests would ultimately reflect a more honest accounting of where we stand, allowing us to build a system of instruction and assessment that would far better serve generations of disadvantaged students.
Let’s be more specific.
First, in selecting passages and questions, test designers need to include rich textual excerpts that are not entirely anodyne. They should embrace serious topics and test for the understanding of vocabulary and ideas that we expect all educated individuals to know about and be ready to discuss thoughtfully. Rather than scrupulously avoiding the topic of death in Romeo and Juliet or God in the Mayflower Compact, our tests should include these the very passages – the ones that make these texts worth reading – so that educators are encouraged, not penalized, for teaching what is worth teaching. If we want to teach serious texts for serious reasons, we must test seriously, too.
Second, test designers should use the assessments to send even stronger signals about curriculum. Many countries write exams that specify multiple periods of history to be studied and then give students the choice to answer questions on those they have studied. For literature exams, they provide a rotating list of set texts that teachers and students can study in depth, knowing they will be asked questions on some of them. This model has the advantage of specifying at least a portion of the curriculum explicitly, ensuring that it meets standards of rigor, complexity, and richness. The model would also be fairer, since disadvantaged students who depend on their school to read these works would indeed have worked on them.
It will take sustained argument over the long term to persuade Americans that their children would be better off if they faced assessments that tested for rich domain knowledge. What may be achievable in the medium term is an exam that combines the current skills-based questions with others that draw on several specific domains announced well in advance, and then assesses knowledge in that domain. For example, policymakers might approve and test designers include seven literary domains for an 11th-grade English assessment (including, say, the Harlem Renaissance and Transcendentalism), of which students would select three to be tested on. Test items would then be drawn from the key texts in each domain, encouraging the thorough teaching of several domains at the discretion of the district, school, or teacher. These domains could change over time so long as teachers were given fair advanced notice of what those domains would be. As a first step, states could consider offering domain-specific assessments as an alternative to the current skills-dominant tests. There is certainly precedent (such as in New York State) for states allowing more than one approach to assessments.
Implementing a structure like this will not be easy. At a recent gathering of senior education policymakers in New York, Linda Bevilacqua, President of the Core Knowledge Foundation, asked policymakers to support the concept of articulating domains that would be tested. No one took her up on the suggestion. But she was right: transforming tests in this way would be the best way to help lift curricula to a consistently rigorous and effective level. In the decentralized approach to education that is so quintessentially American, the system, alas, does not provide many other alternatives.
Those working with test design must stop patronizing our students in the name of an erroneous conception of fairness and construct rigorous exams that will send clear signals to educators about what they should teach. This is the surest way to improve education outcomes for all students – especially the most disadvantaged. If we get them right, these new assessments can make a vital contribution to the promise of the Common Core; if not, that promise will be seriously jeopardized.
David Steiner is the Founding Director of the CUNY Institute for Education Policy and the Dean of the Hunter School of Education. This essay first appeared on Huffington Post.