BY MAI MIKSIC | One of the most important decisions parents can make is where to send their children to school. The increased availability of education tax credit scholarships and school vouchers have given parents other options beyond public schools, and many families opt for religious institutions. Parents choose Catholic schools for a number of personal reasons, but one of them is the belief that their children will receive a stronger academic education than in public schools. Catholic primary school children might score higher on tests, on average, than do their peers in the public sector, but the main question is whether such higher scores are a result of the so-called “school effect” (factors that are intrinsic to the schools themselves), or, rather, to other factors such as parent income, parent education, and the selection process itself. Disentangling the factors proves a difficult task.

#### Comparing Apples to Oranges: Catholic and Public School Students

The hard truth is that the majority of the research touted by Catholic organizations about their academic advantage is not based on methodologically rigorous research. When a new research study (Elder & Jepsen, 2014) reported results that showed no Catholic school advantage, the National Catholic Educational Association was quick to point to the average National Assessment of Educational Progress (NAEP) and SAT scores showing that Catholic schools had higher scores than public schools. But a mere comparison of average test scores is not sufficient to prove the Catholic school advantage.

When scholars attempt to isolate the “school effect” upon a given outcome (say, test scores or college attendance), they commit themselves to ruling out non-school factors that influence that outcome (such as parental education or income). In the case of parents’ selecting a Catholic school, one of those factors is the selection process itself: the family that selects a Catholic school may be *systematically* different from those that do not. This is known as selection bias, and ignoring it can result in inaccurate estimates of the effects of the Catholic schools. A methodologically rigorous research study does its best to minimize selection bias.

In order to understand the complexity of ruling out selection bias, we have to consider what an idealized experiment would look like. An analysis of Catholic versus public school academic achievement would proceed by randomly assigning a large number of nationally representative children to a Catholic or to a traditional public school and compare the ensuing test scores. Such random assignment would ensure that only the difference between school types, not family or individual characteristics, distinguished the children. In such a scenario, we would indeed be able to say that higher scores resulted directly from superior schooling.

In the real world though, we can’t randomly assign students to schools. We have to create reasonable substitutes for random assignment to reduce the selection bias associated with choosing a school.

#### Past Research Practices

Researchers try to minimize selection bias with a variety of strategies, the simplest of which is to artificially control the variables in what is called a regression model. Here, an independent variable of interest (Catholic schools) would be used to predict an outcome variable (high academic achievement). When researchers include control variables (such as income or region) in the model, they factor out the influence of those variables. James Coleman’s famous 1982 study comparing public and Catholic school test scores relied upon 17 control variables that represented family characteristics. His results showed that Catholic school students scored higher than public school students on standardized tests. Coleman’s paper was met with both acclaim, but he was also criticized for claiming too much based on regression analysis – which cannot completely eliminate selection bias. Indeed, other researchers (Noell, 1982) used different methods and found no statistically significant differences in test scores between Catholic school students and public school students.

Using control variables to deal with selection bias bears another difficulty: no two researchers will use the same set of control variables, and thus the findings may conflict with one another. This has certainly been true in studies of Catholic school outcomes. Researchers (Jeynes, 2013; Jeynes, 2007) have tried to reach a consensus on the Catholic school effect by doing meta-analyses (studies that calculate an overage effect size from a large number of research studies). However, those results are only as good as the individual studies themselves. If the meta-analysis uses mainly studies that have failed to properly deal with selection bias, then the resulting calculated effect sizes will also be biased.

Since Coleman et al.’s (1982) study, researchers have developed new ways to minimize selection bias. A new study (Elder & Jepsen, 2014) uses one of the most sophisticated analytic methods developed to investigate whether Catholic elementary schools inherently produce better academic results.

#### Propensity Score Matching

Elder and Jepsen (2014) use the Early Childhood Longitudinal Study Kindergarten Cohort (ECLS-K) to study whether the Catholic school effect is responsible for the higher test scores. This study included over 8,000 elementary school students (7,110 public school students and 1,150 Catholic school students) and followed them from kindergarten through eighth grade. The authors used reading and math test scores as the primary outcomes of interest, but they investigated several non-cognitive outcomes as well.

The methodology is the most interesting aspect of the Elder and Jensen study. They used propensity score matching, which minimizes selection bias by using matched pairs to control for background characteristics. Their specific goal was to match students who attend Catholic schools with demographically identical students at public schools. This propensity score matching creates a sort of “counterfactual” by comparing one student to another who is so similar that he or she could represent that student in a different school. The idea here is that this method replicates the random assignment process that occurs in an experimental design.

Matching techniques are not new; researchers have long tried to pair students who come from similar economic backgrounds, family formations, religious beliefs, or neighborhoods. However, manually matching each student to another student based on a number of personal characteristics can be cumbersome, if not impossible. Propensity score matching makes the process manageable by assigning each student a numeric score (a “propensity score”) based on selected sets of personal characteristics. The score represents the probability, or propensity, of the student to select a Catholic school. Researchers then match students based on their propensity scores.

Creating the optimal match is highly technical, and the authors relied upon three separate techniques that rendered their results as comprehensive as possible. They employed nearest-neighbor, kernel density, and caliper to match the students. Nearest-neighbor is the simplest matching method, which consists of matching a student with other students with the closest propensity score. For example, if a Catholic school student has a propensity score of .85 then she might be matched with a public school student who has a propensity score of .89, if that is the closest score. In nearest-neighbor matching, a student can be matched with more than one student in order to create a greater number of pairs. In this study, one Catholic school student was matched with four public school students. Kernel density and caliper matching are even more complicated, and therefore beyond the scope of this article, but it is sufficient to say that they increase the odds of optimal matches.

Once the matches have been made, the calculations of the effects are relatively straightforward. Similar to what would have been done had the study had been a randomized trial, the average test scores for Catholic and public elementary school students are directly compared and an average treatment effect is calculated.

What did Elder and Jepsen find?

#### The Results

There are three lenses through which Elder & Jepsen interpreted their results: raw scores, regression, and propensity score matching. Each has advantages and disadvantages and tells us something different about the system under analysis.

*Raw scores*tell us the average test scores and require no statistical manipulation. To determine the relationship between students who go to Catholic schools and those who go to public schools, we can simply subtract the average scores. In this case, they told us that Catholic schools produce higher test scores. However, the raw scores do not take into consideration the many variables that led to a child’s being in Catholic school in the first place.*Regressions*improve upon raw scores by allowing us to hold certain factors (variables) constant. We know that other individual and family factors affect student achievement; regressions allow us to answer the question, “All else being equal, what is the effect of Catholic schools on achievement?” They thus produce a more realistic assessment of the Catholic school effect than raw scores, because they control for these other variables. A weakness of regressions, however, can be that they assume a fixed relationship between the variables over time, instead of the real-life changes that occur in a system – such as changes in students’ test scores as described in the section below.*Propensity scoring*addresses this weakness. It eliminates the assumption (inherent in regressions) that the relationship between the two variables remains constant. This is what is known as the “linearity” assumption. In regressions, there is only one estimate that is produced to describe the relationship between Catholic schools and test scores. This estimate remains unchanging regardless of time or who is included in the study. A thorough analysis of the difference between regressions and propensity score matching is beyond the scope of this article, but suffice it to say propensity score matching allows researchers to more flexibly calculate the effects. Propensity scoring matching therefore paints a more realistic picture of what the relationship between Catholic elementary schools and achievement looks like by increasing the dimensionality of the relationship. It also comes as close as conceivably possible to replicating a randomized trial by creating two groups that are as identical to each other as possible, whereas regressions cannot do this.*Regressions and propensity scoring*can be, theoretically, of equivalent methodological rigor and even yield identical results, though this situation is highly unlikely. Because we are unable to ascertain the “perfect” combination of control variables to employ in a regression analysis, propensity scoring is generally more accurate than a mere regression.

So, what did the authors conclude when they examined the data through each method? First, the authors looked at the raw test scores over time. These were average percentile scores measured at the Fall of kindergarten, Spring of kindergarten, first grade, third grade, fifth grade, and eighth grade. To be clear, these are merely cross-sectional views of how the students performed on tests and not longitudinal analyses (which would have given us the rate of change for the scores as well). Results showed that Catholic school students had an initial sizable advantage in reading and math scores. Catholic school students started school in approximately the 61^{st} to 63^{rd }percentile for reading and math, respectively, while public school students started school in the 46^{th} percentile for both reading and math. This amounts to a 16 percentage point difference, favoring Catholic schools.

However this Catholic school advantage diminished over time for math scores, indicating that personal characteristics could possibly be driving the initial differences in scores or that Catholic schools do worse with older children in math. By the time Catholic school students got to 3^{rd} grade, their math scores had fallen by 8 percentage points, from the 63^{rd} percentile to the 55^{th} percentile. Meanwhile, math scores for public school students remained steady between the 46^{th} and 47^{th} percentile, which indicates that the achievement gap between Catholic schools and public schools decreased significantly in math by 3^{rd} grade. The gap remained unchanged between third and eighth grade.

Reading scores, on the other hand, decreased in first grade but increased afterward for Catholic school children; while the reading scores remained steady for public school students (around the 46^{th} percentile) from kindergarten through eighth grade. Reading scores for Catholic school students fell from the 61^{st} to the 56^{th} percentile by third grade. Reading scores then increased from third grade through eighth grade from the 56^{th} percentile to the 64^{th} percentile. This actually widened the gap between Catholic and public school students, suggesting that Catholic schools might do better over time on literacy. Raw scores thus indicate a strong Catholic elementary school advantage.

Second, the authors performed regressions, using Catholic school status to predict fifth and eighth grade test scores. Disappointingly, since the authors were not conducting longitudinal analyses, conclusions cannot be drawn about how test scores changed over time. However, it’s difficult to understand why the authors did not conduct longitudinal analyses since it would have been relatively simple to do. Also, the authors did not report the specific statistical significance levels (whether the scores were significant at p < .05 or greater). We assume that all results are statistically significant at least at p < .05, since they did report when scores were not significant.

Remember that, without any control variables, Catholic school students scored better than public school students on reading and math tests. When control variables, such as initial test scores from the beginning of kindergarten, race and ethnicity, family structure, parental marital status, parental education, income, and employment, were included in the regressions, the results differed substantially: they showed a negative effect for attending Catholic schools in math and almost no effect for reading. Catholic school students scored 7.53 percentile points lower in fifth grade math and 5.96 percentile points lower than public school students in eighth grade math. In fifth grade reading, Catholic school students scored 1.98 percentile points lower than public school students. Catholic school students showed a very small advantage over public school students (.93 percentile points) in eighth grade reading, but this was not statistically significant. Regression analysis illustrated, therefore, that the correlation between Catholic elementary schooling and academic achievement is tenuous, at best.

Finally, what did they find when they used propensity score matching? The authors only looked at cross-sectional results of the fifth and eighth grade reading and math scores. As a result, no conclusions can be drawn about the longitudinal changes in test scores over time. Regardless of the type of matching used, propensity score matching results showed that Catholic primary schooling is associated with lower math scores. Only the results for nearest-neighbor matching will be reported here for brevity’s sake. Catholic school students scored 6.79 and 9.77 percentile points lower in eighth grade and fifth grade math, respectively. Catholic primary schooling had no statistically significant effect on eighth and fifth grade reading scores.

This, then, is Elder and Jepsen’s grand conclusion: as far as elementary school achievement goes, the Catholic advantage, on average, is illusive at best. This differs greatly from the conclusions drawn by looking at merely the average scores for Catholic and public school students, and illustrates why it was not a good idea for the National Catholic Education Association to point to average test scores as an indicator of the Catholic school advantage.

Finally, the authors used regressions and propensity score matching to examine non-cognitive outcomes such as students’ “locus of control” (a person’s belief that they are in control of events), days absent or tardy from school, repeating a grade, or getting suspended. These non-cognitive outcomes were only available in fifth and eighth grade. Their results indicate that Catholic schools have no consistent effect on these non-cognitive outcomes with the exception of a sizable reduction in the likelihood of suspension in eighth grade.

#### Caveats

While the Elder & Jepsen results are indeed compelling, given the sophisticated methods used, they are not without issues. One of the main problems with propensity score matching is that it is very sensitive to the type and number of personal characteristics used to create the propensity score. Again, the propensity score predicts the probability of students to select into the treatment group; in other words, the propensity score represents the likelihood that students will go to Catholic schools, or more appropriately that the family will select to send their children to Catholic schools. Matches are made based on this likelihood. The authors used race, family structure, parental marital status, region in which the student lived (Midwest, South, etc.), type of region (suburban or rural), parental education, and family income to predict the likelihood of a child going to Catholic schools.

While these are standard and perfectly acceptable variables, key variables are missing that should have been used. The most obvious variable not used is religion. Obviously, a Catholic family is more likely to send their child to Catholic schools. The ECLS-K did not collect information about the family’s identified religion, but it did ask questions about the religiosity of the family. Specifically, it asked “How often does someone in your family talk to the child about your family’s religious beliefs or traditions?” It is completely reasonable to expect that a more religious family might be more likely to send their child to Catholic schools. Therefore this measure should have been included in the calculations of the propensity scores.

Even though the authors’ inclusion of a few “non-cognitive” outcomes is interesting, a more comprehensive set of non-cognitive measures would have been more compellling since some researchers have argued that Catholic schools offer much better character education than the public sector. It has been argued that Catholic schools do a better job than average public schools in installing a sense of purpose, self-discipline, and a robust response to adversity, failure, and frustration in students (Jeynes & Beuttler, 2012). While these variables were not available in the ECLS-K, other variables were available that would have been good proxies for them. Eighth graders were administered a set of questionnaires which asked about their perception of themselves, self-esteem, interest in schoolwork, and feelings of sadness, loneliness, or anxiety. The authors’ inclusion of such outcomes would have rendered their findings more comprehensive.

Finally, one of the most important things the authors omitted was a subgroup analysis, to determine how the results might have differed for racial/ethnic and income groups. This would have been helpful, because other research (Figlio & Stone, 1997) has shown that even if results indicate no Catholic school advantage for the *general* population, minority students, particularly from urban areas, still might benefit substantially. This, in fact, is one of the strongest reasons to promote access to Catholic schools: to narrow the racial and socioeconomic achievement gap. Propensity score matching could handily investigate whether Catholic schools really do benefit racial and ethnic minority students from urban areas.

#### Conclusion

Elder and Jepsen’s (2014) research study is not perfect – indeed, no study is. It did, however, provide an important look at outcomes and selection bias by employing a novel, and more rigorous, methodology than other studies have used in the past. Scholars could now apply this process in different settings, such as high schools, since it is possible that Catholic education might make a more significant difference during secondary school.

This study cannot, of course, answer the policy question of whether parents should have access to Catholic schools, or whether any given child might benefit from enrolling in one. The study cannot speak to a possible Catholic high school effect, either. The study does call for more qualifications and for more research about the effects of Catholic schooling upon academic outcomes.

One thing is clear, though: when the National Catholic Educational Association promotes its status as a producer of academically strong institutions, it should stop relying on and reporting average test scores and graduation rates. Instead, the NCEA and other Catholic organizations should emphasize more rigorous research studies and use them to improve Catholic schools. Research methodologies are constantly evolving, providing Catholic schools opportunities to use them to their advantage. There is no lack of rigorous research (using methods other than propensity score matching) indicating a possible Catholic school advantage (Altonji, Elder, & Taber, 2005; Chen & Pong, 2012; Evans & Schwab, 1995). The credibility of Catholic networks would improve if they consistently affirmed strong research and stopped demeaning careful researchers such as Elder and Jepsen.

#### References

Altonji, J., Elder, T., & Taber, C. (2005). Selection on observed and unobserved variables: assessing the effectiveness of Catholic schools. *Journal of Political Economy, 113*(1), 151–184.

Chen, V. W., & Pong, S. (2012). The effects of Catholic schools on mathematics achievement in twelfth grade: School district variations. National Center for the study of Privatization in Education, Teachers College, Columbia University. Retrieved from http://vvww.ncspe.org/publications_files/OP210.pdf

Coleman, J., Hoffer, S. N., & Kilgore, S. (1982). High school achievement: Public, Catholic and private schools compared. New York: Basic Books.

Elder, T., & Jepsen, C. (2014). Are Catholic primary schools more effective than public primary schools? *Journal of Urban Economics, 80*, 28-38.

Evans, W. N., & Schwab, R. M. (1995). Finishing high school and starting college: Do Catholic schools make a difference? *The Quarterly Journal of Economics, 110*(4), 941-974.

Figlio, D. N., & Stone, J. A. (2007). School choice and student performance: Are private schools really better? *Institute for Research on Poverty* Discussion Paper no. 1141-97. Retrieved from http://irp.wisc.edu/publications/dps/pdfs/dp114197.pdf

Jeynes, W. H. (2007). Religion, intact families, and the achievement gap. *Interdisciplinary Journal of Research on Religion, 3*(3), 1-22.

Jeynes, W. H., & Beuttler, F. (2012). What private and public schools can learn from each other. *Peabody Journal of Education, 87*, 285-304.

Jeynes, W. H. (2013). The effects of Catholic and protestant schools: A meta-analysis. *Catholic Education: A Journal of Inquiry and Practice, 12*(2), 254-275.

Noell, J. (1982). Public and Catholic schools: A reanalysis of ‘public and private schools.’ *Sociology of Education, 55*, 123-132.

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. *Biometrika, 70*(1), 41-55.