Success for All: A Quantitative Synthesis of U.S. Evaluations

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=uree20

Journal of Research on Educational Effectiveness

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/uree20

Success for All: A Quantitative Synthesis of U.S.

Evaluations

Alan C. K. Cheung, Chen Xie, Tengteng Zhuang, Amanda J. Neitzel & Robert E.

Slavin

To cite this article: Alan C. K. Cheung, Chen Xie, Tengteng Zhuang, Amanda J. Neitzel & Robert

E. Slavin (2021) Success for All: A Quantitative Synthesis of U.S. Evaluations, Journal of Research

on Educational Effectiveness, 14:1, 90-115, DOI: 10.1080/19345747.2020.1868031

To link to this article: https://doi.org/10.1080/19345747.2020.1868031

View supplementary material

Published online: 16 Apr 2021.

Submit your article to this journal

Article views: 22

View related articles

View Crossmark data

This article has been awarded the Centre

for Open Science 'Open Data' badge.

INTERVENTION, EVALUATION, AND POLICY STUDIES

Success for All: A Quantitative Synthesis of U.S. Evaluations

Alan C. K. Cheung

, Chen Xie

, Tengteng Zhuang



, Amanda J. Neitzel

and Robert E. Slavin

Department of Educational Administration and Policy, Faculty of Education, The Chinese University of

Hong Kong, Hong Kong, China;

Institute of International and Comparative Education, Faculty of

Education, East China Normal University, Shanghai, China;

Center for Research and Reform in

Education, Johns Hopkins University, Baltimore, Maryland, USA

ABSTRACT

Success for All (SFA) is a comprehensive whole-school approach

designed to help high-poverty elementary schools increase the read-

ing success of their students. It is designed to ensure success in

grades K-2 and then build on this success in later grades. SFA com-

bines instruction emphasizing phonics and cooperative learning,

one-to-small group tutoring for students who need it in the primary

grades, frequent assessment and regrouping, parent involvement,

distributed leadership, and extensive training and coaching. Over a

33-year period, SFA has been extensively evaluated, mostly by

researchers unconnected to the program. This quantitative synthesis

reviews the findings of these evaluations. Seventeen US studies

meeting rigorous inclusion standards had a mean effect size of

þ0.24 (p < .05) on independent measures. Effects were largest for

low achievers (ES¼þ0.54, p < .01). Although outcomes vary across

studies, mean impacts support the effectiveness of Success for All

for the reading success of disadvantaged students.

ARTICLE HISTORY

Received 22 January 2020

Revised 25 September 2020

Accepted 19 October 2020

KEYWORDS

Success for all; meta-

analysis; literacy instruction;

high-poverty schools;

struggling readers

The reading performance of students in the United States is a source of deep concern.

American students perform at levels below those of many peer nations on the Program

for International Student Assessment (PISA; OECD, 2019). Most importantly, there are

substantial gaps in reading skills between advantaged and disadvantaged students,

between different ethnic groups, and between proficient speakers of English and English

learners (National Center for Education Statistics [NCES], 2019). These gaps lead to ser-

ious inequalities in the American economy and society. America’s reading problem is

far from uniform. On PISA Reading Literacy (OECD, 2019), American 15-year-old stu-

dents in schools with fewer than 50% of students qualifying for free lunch scored higher

than those in any country. The problem in the United States is substantially advancing

the reading skills of students in high-poverty schools. The students in these schools are

CONTACT Chen Xie [email protected] Institute of International and Comparative Education, Faculty of

Education, East China Normal University, Shanghai, China.



Institute of Higher Education, Faculty of Education, Beijing Normal University, Beijing, China.

Supplemental material for this article can be accessed online at https://doi.org/10.1080/19345747.2020.1868031.

ß 2021 Taylor & Francis Group, LLC

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS

2021, VOL. 14, NO. 1, 90–115

https://doi.org/10.1080/19345747.2020.1868031

capable of learning at high levels, but they need greater opportunities and support to

fully realize their potential.

Research is clear that students who start off with poor reading skills are unlikely to recover

without significant assistance (Cunningham & Stanovich, 1997; National Reading Panel,

2000). A study by Lesnick et al. (2010) found that students reading below grade level in third

grade were four times more likely than other students to drop out before high

school graduation.

Evidence about the role of early reading failure in long-term school failure (e.g.

National Reading Panel, 2000) has led to a great deal of research and development

focused on ensuring that students succeed in reading in the elementary grades. Recent

reviews of programs for struggling readers by Neitzel, Lake, et al. (2020) and Wanzek

et al. (2016) have identified many effective approaches, especially tutoring and profes-

sional development strategies. However, in high-poverty schools in which there may

be many students at risk of reading failure, a collection of individual approaches may be

insufficient or inefficient. In such schools, whole school, coordinated approaches may be

needed to ensure that all students succeed in reading.

Success for All

Success for All (SFA) was designed and first implemented in 1987 in an attempt to serve

very disadvantaged schools, in which it is not practically possible to serve all struggling

readers one at a time. The program emerged from research at Johns Hopkins

University, and since 1996 has been developed and disseminated by a nonprofit organ-

ization, the Success for All Foundation (SFAF). SFA was designed from the outset to

provide research-proven instruction, curriculum, and school organization to schools

serving many disadvantaged students.

Theory of Action

Success for All was initially designed in a collaboration between researchers at Johns

Hopkins University (JHU) and leaders of the Baltimore City Public Schools (BCPS),

whose high-poverty schools had large numbers of students falling behind in reading in

the early elementary grades, losing motivation, and developing low expectations for

themselves. Ultimately, these students entered middle school lacking basic skills and, in

too many cases, no longer believing that success was possible. The JHU-BCPS team was

charged with developing a whole-school model capable of ensuring success from the

beginning of students’ time in school. The theory of action the team developed focused

first on ensuring that students were successful in reading in first grade, providing a cur-

riculum with a strong emphasis on phonemic awareness and phonics (National Reading

Panel, 2000; Shaywitz & Shaywitz, 2020; Snow et al., 1998), and using proven instruc-

tional methods such as cooperative learning (Slavin, 2017), and effective classroom man-

agement methods (e.g. Good & Brophy, 2018). Students in grades 1–5 are grouped by

reading level across grade lines, so that all reading teachers had one reading group. For

example, a reading group at the 3–1 level (third grade, first semester) might contain

some high-performing second graders, many third graders, and some low-performing

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 91

fourth graders, all reading at the 3–1 level. Students in the primary grades, but particu-

larly first graders, may receive daily, 30-min computer-assisted tutoring, usually in

groups of four, to enable most struggling readers to keep up (Neitzel, Lake, et al., 2020;

Wanzek et al., 2016).

The core focus of the SFA model is to make certain that every student succeeds in

basic reading. In addition to the reading instruction and tutoring elements, students

who need them can receive services to help them with attendance, social-emotional

development, parent involvement, and other needs. After students reach the 2–1 reading

level, they continue to receive all program services except tutoring. The upper-elemen-

tary program is an adaptation of Cooperative Integrated Reading and Composition

(CIRC; Stevens et al., 1987). The design of the SFA program in at reading levels 3–5is

focused on cooperative learning, comprehension, metacognitive skills, and writing.

The theory of action for SFA, therefore, assumes that students must start with suc-

cess, whatever this takes, in the expectation that early success builds a solid base for

later learning, positive expectations for future success, and motivation to achieve.

However, success in the early grades is seen as necessary, but not sufficient. Evidence

on the difficulties of ensuring long-term maintenance of reading gains from highly suc-

cessful first grade tutoring programs (e.g. Blachman et al., 2014) demonstrate that ensur-

ing early-grade success in reading cannot be assumed to ensure lifelong reading success.

The designers of SFA intended to build maintenance of first-grade effects by continuing

high-quality instruction and classroom organization after an intensive early primary

experience sets students up for success. Beyond reading and tutoring, the design seeks

to build on students’ strengths by involving their parents, teaching social-emotional

skills, and ensuring high attendance.

Figure 1 summarizes the SFA theory of action. At the center is success in reading in

grades K-2, and then 3–5. All other components of the model support these outcomes.

Only tutoring is limited to Grades 1 and 2. Other elements continue through the grades.

The logic of Success for All is much like that of response to intervention (Fuchs &

Fuchs, 2006), now often called Multi-Tier Systems of Support (MTSS). That is, teachers

receive extensive professional development and in-class coaching to help them use pro-

ven approaches to instruction and curriculum. Students who do not succeed despite

Figure 1. Theory of action for Success for All.

92 A. C. K. CHEUNG ET AL.

enhanced teaching may receive one-to-small group or, if necessary, one-to-one tutoring.

Ongoing assessment, recordkeeping, and flexible grouping are designed to ensure that

students receive instruction and supportive services at their current instructional level,

as they advance toward higher levels. Program components focus on parent involve-

ment, classroom management, attendance, and social-emotional learning, to solve prob-

lems that may interfere with students’ reading and broader school success. Each school

has a full-time facilitator to help manage professional development and other program

elements, some number of paraprofessional tutors, and coaches from the nonprofit

Success for All Foundation, who visit schools approximately once a month to review the

quality of implementation, review data, and introduce additional components.

Program Components

Success for All is a whole-school model that addresses instruction, particularly in reading, as

well as schoolwide issues related to leadership, attendance, school climate, behavior manage-

ment, parent involvement, and health (see Slavin et al., 2009,formoredetail).Theprogram

provides specific teacher and student materials and professional development to facilitate

use of proven practices in each program component.

Literacy Instruction

Learning to read and write effectively is essential for success in school. Success for All

provides in-depth support for reading acquisition. Instructional practices, teacher’s

guides, student materials, assessments, and job-embedded professional development are

combined to create a comprehensive reading program.

The Success for All reading program is based on research and effective practices in

beginning reading (e.g. National Reading Panel, 2000), and appropriate use of coopera-

tive learning to enhance motivation, engagement, and opportunities for cognitive

rehearsal (Slavin, 2017; Stevens et al., 1987).

Regrouping

As noted earlier, students in grades one and up are regrouped for reading. The students

are assigned to heterogeneous, age-grouped classes most of the day, but during a regular

90-minute reading period they are regrouped by reading performance levels into reading

classes of students all at the same level. For example, a reading class taught at the 2–1

level might contain first, second, and third grade students all reading at the same level.

The reading classes are smaller than homerooms because tutors and other certified staff

(such as librarians or art teachers) teach reading during this common reading period.

Regrouping allows teachers to teach the whole reading class without having to break

the class into reading groups. This greatly reduces the time spent on seatwork and

increases direct instruction time. The regrouping is a form of the Joplin Plan, which has

been found to increase reading achievement in the elementary grades (Slavin, 1987).

Preschool and Kindergarten

Most Success for All schools provide a half-day preschool and/or a full-day kindergarten

for eligible students. Research supports a balance between development of language,

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 93

school skills, and social skills (Chambers et al., 2016). The SFA preschool and kindergar-

ten programs provide students with specific materials and instruction to give them a

balanced and developmentally appropriate learning experience. The curriculum empha-

sizes the development and use of language. It provides a balance of academic readi-

ness and nonacademic music, art, and movement activities in a series of thematic

units. Readiness activities include use of language development activities and Story

Telling and Retelling (STaR), which focuses on the development of concepts about

print as well as vocabulary and background knowledge. Structured phonemic aware-

ness activities prepare students for success in early reading. Big books as well as oral

and written composing activities allow students to develop concepts of print story

structure. Specific oral language experiences are used to further develop receptive and

expressive language.

Curiosity Corner, Success for All’s pre-kindergarten program, offers theme-based

units designed to support a language-rich half-day program for 3- and 4-year olds that

supports the development of social emotional skills and early literacy.

KinderCorner offers a full-day theme-based kindergarten program designed to sup-

port the development of oral language and vocabulary, early literacy, and social and

emotional skills needed for long term success. KinderCorner provides students with

materials and instruction designed to get them talking using cooperative discussion with

an integrated set of activities. Opportunities for imaginative play increase both self-regu-

lation and language. Formal reading instruction is phased in during kindergarten.

Media-based phonemic awareness and early phonics ease students into reading, and

simple but engaging phonetically regular texts are used to provide successful application

of word synthesis skills in the context of connected text.

Beginning Reading

Reading Roots is a beginning reading program for grades K-1. It has a strong focus on

phonemic awareness, phonics, and comprehension (Shaywitz & Shaywitz, 2020; Snow

et al., 1998). It uses as its base a series of phonetically regular but interesting minibooks

and emphasizes repeated oral reading to partners as well as to the teacher. The mini-

books begin with a set of “shared stories,” in which part of a story is written in small

type (read by the teacher) and part is written in large type (read by the students). The

student portion uses a phonetically controlled vocabulary. Taken together, the teacher

and student portions create interesting, worthwhile stories. Over time, the teacher por-

tion diminishes and the student portion lengthens, until students are reading the entire

book. This scaffolding allows students to read interesting stories when they only know a

few letter sounds.

Letters and letter sounds are introduced in an active, engaging set of activities that

begins with oral language and moves into written symbols. Individual sounds are inte-

grated into a context of words, sentences, and stories. Instruction is provided in story

structure, specific comprehension skills, metacognitive strategies for self-assessment and

self-correction, and integration of reading and writing. Brief video segments use anima-

tions to reinforce letter sounds, puppet skits to model sound blending, and live action

skits to introduce key vocabulary.

94 A. C. K. CHEUNG ET AL.

Adaptations for Spanish Speakers

Spanish bilingual programs use an adaptation of Reading Roots called Lee Conmigo

(“Read With Me”). Lee Conmigo uses the same instructional strategies as Reading

Roots, but is built around shared stories written in Spanish. SFA also has a Spanish-lan-

guage kindergarten program, Descubre Conmigo (“Discover with Me”). Students who

receive Lee Conmigo typically transition to the English SFA program in Grade 2 or 3,

using special materials designed to facilitate transition. Schools teaching English learners

only in English are provided with professional development focused on supporting the

language and reading development of English learners.

Upper Elementary Reading

When students reach the second grade reading level, they use a program called Reading

Wings, an adaptation of Cooperative Integrated Reading and Composition (CIRC)

(Stevens et al., 1987). Reading Wings uses cooperative learning activities built around

story structure, prediction, summarization, vocabulary building, decoding practice, and

story-related writing. Students engage in partner reading and structured discussion of

stories or novels, and work toward mastery of the vocabulary and content of the story

in teams. Story-related writing is also shared within teams. Cooperative learning both

increases students’ motivation and engages students in cognitive activities known to

contribute to reading comprehension, such as elaboration, summarization, and rephras-

ing (see Slavin, 2017). Research on CIRC has found it to significantly increase students’

reading comprehension and language skills (Stevens et al., 1987).

Reading Tutors

A critical element of the Success for All model is the use of tutoring, the most effective

intervention known for struggling readers (Neitzel, Lake, et al., 2020; Wanzek et al.,

2016). In the current version of SFA, computer-assisted tutoring is provided by well-

qualified paraprofessionals to groups of four children with reading problems. However,

students with very serious problems may receive one to two or one to one tutoring. The

tutoring occurs in 30-minute sessions during times other than reading or math periods.

Leading for Success

Schools must have systems that enable them to assess needs, set goals for improvement,

make detailed plans to implement effective strategies, and monitor progress on a child

by child basis. In Success for All, the tool that guides this schoolwide collaboration is

called Leading for Success.

Leading for Success is built around a distributed leadership model, and engages all

school staff in a network of teams that address key areas targeted for continuous

improvement. The leadership team manages the Leading for Success process and con-

venes the staff at the beginning of the school year and at the end of each quarter to

assess progress and set goals and agendas for next steps. Staff members participate in

different teams to address areas of focus that involve schoolwide supports for students

and families as well as support for improving implementation of instructional strategies

to increase success.

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 95

Schoolwide Solutions Teams

A Parent and Family Involvement Team works toward good re lations with parents

and to increase involvement in the school. Team members organize “welcome” visits

for new familie s, opportunities for informal chats among parents and school staff

members, workshops for parents on su pporting achievement and general parenting

issues, and volunteer opportunities. Solutions teams also focus on improving attend-

ance and i ntervening with students having learning and behavioral problems.

Program Facilitator

A program facilitator works at each school to oversee (with the principal) the operation

of the Success for All model. The facilitator helps plan the Success for All program,

helps the principal with scheduling, and visits classes and tutoring sessions frequently to

help teachers and tutors with individual problems. He or she works directly with the

teachers on implementation of the curriculum, classroom management, and other issues,

helps teachers and tutors deal with any behavior problems or other special problems,

and coordinates the activities of the Family Support Team with those of the instruc-

tional staff.

Teachers and Teacher Training

Professional development in Success for All emphasizes on-site coaching after initial

training. Teachers and tutors receive detailed teacher’s manuals supplemented by three

days of in-service at the beginning of the school year, followed by classroom observa-

tions and coaching throughout the year. For classroom teachers of grades 1 and above

and for reading tutors, training sessions focus on implementation of the reading pro-

gram (either Reading Roots or Reading Wings), and their detailed teacher’s manuals

cover general teaching strategies as well as specific lessons. Preschool and kindergarten

teachers and aides are trained in strategies appropriate to their students’ preschool and

kindergarten models. Tutors later receive two additional days of training on tutoring

strategies and reading assessment.

Throughout the year, in-class coaching and in-service presentations focus on such

topics as classroom management, instructional pace, and cooperative learning. Online

coaching is also used after coaches and teachers have built good relationships.

Special Education

Every effort is made to deal with students’ learning problems within the context of the

regular classroom, as supplemented by tutors. Tutors evaluate students’ strengths and

weaknesses and develop strategies to teach in the most effective way. In some schools,

special education teachers work as tutors and reading teachers with students identified

as learning disabled, as well as other students experiencing learning problems who are

at risk for special education placement. One major goal of Success for All is to keep stu-

dents with learning problems out of special education if at all possible, and to serve any

students who do qualify for special education in a way that does not disrupt their regu-

lar classroom experience (see Borman & Hewes, 2002).

96 A. C. K. CHEUNG ET AL.

Consistency and Variation in Implementation

Success for All is designed to provide a consistent set of elements to each school that

selects it. On engaging with schools, school and district staff are asked to agree to

implement a set of program elements that the developers have found to be most import-

ant. These include the following:

 A full-time facilitator employed by the school. Typically, the facilitator is an

experienced teacher, Title I master teacher, or vice principal already on the

school staff, whose roles and responsibilities are revised to focus on within-school

management of the SFA process.

 At least one full-time tutor, usually a teaching assistant, to work primarily with

first graders who are struggling in reading.

 Implementation of the SFA KinderCorner (or Descubre Conmigo) program in

kindergarten, Reading Roots (or Lee Conmigo) in grades 1 and 2, and Reading

Wings in grades 2 and above (for students who have tested out of Reading

Roots). KinderCorner and Reading Roots are complete early reading approaches,

but Reading Wings is built around widely used traditional or digital texts and/or

trade books selected by schools.

 Professional development by SFA coaches, consisting of 2 days for all teachers,

plus monthly on-site visits by SFA coaches.

 Regrouping for reading. During a daily 90-minute reading period, students are

regrouped for reading starting in grade 1, as described above.

These elements are considered essential to SFA, and SFAF does not engage with

schools that decline to implement and maintain all of them. After program inception, it

of course occurs that schools cannot keep to their initial commitments, and some

accommodations have to be made. For example, a school under financial pressure may

have to use a half-time facilitator rather than full-time.

With respect to other elements of SFA, such as leadership, parent involvement, and

special education policies, SFAF negotiates variations to accommodate school character-

istics and district policies. As a result of its strong emphasis on consistency, the program

elements believed to be most essential to reading outcomes do not vary significantly

from school to school.

Evolution of Program Components over Time

The basic design and operation of Success for All has remained constant for its entire

33-year history, but there has been constant change in the specific components. These

are introduced because of learnings from experiences in schools, demand from schools

and districts, findings of research, external grants, and advances in technology (see

Peurach, 2011). The Reading Roots (K-1) reading program, for example, developed tech-

nology to help teachers present lessons and manage regular assessments. Reading Wings

(2–5) has increasingly focused on the teaching of reading comprehension using meta-

cognitive strategies. The tutoring program has evolved substantially. The main driver

has been a quest for cost-effectiveness, as tutoring is expensive. Initially, tutoring was

done by certified teachers one-to-one. However, this was not economically sustainable

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 97

for most schools, so in the mid-1990s, SFAF developed a new model appropriate for use

by teaching assistants. In the 2000s, SFAF began to introduce computer-assisted tutor-

ing, taking advantage of increasing availability of computers in schools. SFAF then

began to develop and evaluate small group tutoring. In 2016, SFAF developed a com-

puter-assisted small-group model that teaching assistants could use reliably with success

in groups of four. This model requires one-eighth the personnel costs per tutored stu-

dent of our original model, and gets equal outcomes, so it allows schools to serve many

more students for the same cost (Madden & Slavin, 2017).

Some whole programs have been added, to enable SFAF to serve additional popula-

tions. SFAF added a preschool program in the mid-1990s, and added Spanish bilingual

and sheltered English program around the same time. SFAF added the Leading for

Success component in the 2000s, to improve schools’ capacities to distribute leadership

among its staff.

Any program as comprehensive as Success for All has to evolve to keep up with the

times and to constantly improve its outcomes and reduce its cost and complexity.

Success for All has always learned from its partners and its own staff, and incorporates

these learnings continuously, in ways large and small.

Research on Success for All

Success for All has been in existence for 33 years, and currently (2020) provides services

to about 1,000 schools in the United States. About half of these use the full program,

and half use major components (most often, the K-2 reading program). The program

has placed a strong emphasis on research and evaluation, and has always carried out or

encouraged experimental or quasi-experimental evaluations to learn how the program is

working and what results it is achieving for which types of students and settings.

Studies of Success for All have usually been done by third party evaluators (i.e. research-

ers unrelated to the program developers). They have taken place in high-poverty schools

and districts throughout the United States.

The present synthesis of research on Success for All includes every study of reading

outcomes carried out in US schools that evaluated the program using methods that

meet a set of inclusion standards described below. The purpose of the synthesis is to

summarize the evidence and to identify moderators of program effects, and then to con-

sider the implications of the findings for theory, practice, and policy.

The Need for a Meta-Analysis on SFA

Over the past 33 years, SFA programs have been widely applied and evaluated through-

out the United States to help youngsters with their reading progress. However, these

reports only focus on single evaluations of the intervention rather than synthesizing

studies of all high-quality experiments over time. A meta-analysis of SFA studies was

reported as part of a meta-analysis of comprehensive school reforms by Borman et al.

(2003), and another meta-analysis was part of a synthesis of research on elementary

reading programs by Slavin et al. (2009). SFA outcomes for struggling readers were

included in a synthesis on that topic by Neitzel, Lake, et al. (2020). However, the

98 A. C. K. CHEUNG ET AL.

present synthesis is the first to focus in detail on Success for All alone, enabling much

more of a focus on its evidence base than was possible in reviews of many programs.

Also, the review uses up-to-date methods for quantitative synthesis (e.g. Borenstein

et al., 2009; Pigott & Polanin, 2020).

The main objective of the current meta-analysis is to investigate the average impact

of SFA on reading achievement. The three key main research questions are as follows:

1. What is the overall effect of SFA on student reading achievement?

2. Are there differential impacts of SFA on the reading achievement of different

subgroups of students?

3. What study features moderate the effects of SFA on reading achievement?

Methods

Data Sources

The document retrieval process consisted of several steps (see Figure 2). The research

team employed various strategies to identify all possible studies that have been done to

evaluate reading outcomes of SFA. First, the team carried out a broad literature search.

Electronic searches were made of educational databases (ERIC, Psych INFO,

Dissertation Abstracts) using different combinations of key words (for example,

“Success for All,”“SFA,”“reading,”“Comprehensive School Reform”) and the years

1989–2020. In addition, previous meta-analyses on reading interventions were searched

Figure 2. Flow chart of study selection.

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 99

and the reference lists of these meta-analyses were examined to identify any SFA studies.

The authors contacted the Success for All Foundation, the developer of the program, to

identify studies that might have been missed in the search, especially unpublished stud-

ies. Articles from any published or unpublished source that met the inclusion standards

were independently read and examined by at least two researchers. Any disagreements

in coding were resolved by discussion, and additional researchers read any articles on

which there remained disagreements.

Inclusion and Exclusion Criteria

Criteria for inclusion and exclusion of studies were similar to those of the What Works

Clearinghouse (WWC, 2020). They are as follows.

1. The studies evaluated SFA programs used in elementary schools. Studies had to

appear between 1989 and 2020.

2. Studies had to be of students who started SFA in grades pre-K, K, or 1, as most

tutoring (a key element of the theory of action) takes place in first grade.

3. The studies compared children taught in schools using SFA with those in control

schools using an alternative program or standard methods.

4. Random assignment or matching with appropriate adjustments for any pretest

differences (e.g. analyses of covariance) had to be used. In randomized experi-

ments, a number of schools volunteered to participate, and half were assigned at

random to use SFA, while the remaining schools continued using existing meth-

ods. In matched studies, schools assigned to use SFA were matched in advance

with control schools on factors such as pretests, poverty indicators, ethnicity, and

school size. Post-hoc studies in which matching was done after experimental and

control schools completed implementation were excluded. Studies without con-

trol groups, such as pre-post comparisons and comparisons to “expected” scores,

were also excluded.

5. Pretest data had to be provided. Studies with pretest differences of more than

25% of a standard deviation were excluded, as required by WWC

(2020) standards.

6. The dependent measures included quantitative measures of reading performance

not created by SFA developers or researchers.

7. A minimum study duration of one school year was required.

8. Studies had to have at least two schools in each treatment group. This criterion

avoided having treatment and school effects be completely confounded.

9. Study reported results at the end of the intervention period (for the main analy-

ses) or interim results (for exploratory analyses examining impacts over time).

Coding

Studies that met the inclusion criteria were coded by one of the study team members

and verified by another study team member. The fully coded data are available on the

Johns Hopkins University Data Archive (Neitzel, Cheung, et al., 2020). Data to be coded

100 A. C. K. CHEUNG ET AL.

beyond outcome measures, sample sizes, and effect sizes included substantive factors,

methodological factors, and extrinsic factors. These are described below.

Substantive Factors

Substantive factors describe the intervention, population, and context of the study.

These coded factors included duration of intervention, student grade level, and popula-

tion description (race, ethnicity, English learner status, and free/reduced price meals sta-

tus). Schools were categorized as being primarily African-American, Hispanic, or White

if more than half of students were of that race (or if there were subgroup analyses by

race). They were considered high-poverty if at least 66% of students qualified for

free lunch.

Methodological Factors

Methodological factors included the research design (randomized or quasi-experimental

design), and the type of outcome. Outcomes were categorized into three groups: general

reading/comprehension, fluency, or alphabetics (WWC, 2014). Alphabetics includes sub-

skills of reading such as letter identification and phonics outcomes, fluency includes

reading accuracy and reading with expression, and comprehension outcomes assess the

ability to understand connected text. General reading includes all types of reading out-

comes. Comprehension is weighted heavily in general reading measures, so we com-

bined general reading and comprehension scores into a single factor. The reading

posttest scores used as the main outcome measures were those reported from the final

year of implementation for a given cohort. For example, in a 3-year study with a K-2

and a 1-3 cohort, the third-year scores in grades 2 and 3 would be the main outcomes,

and these would be averaged to get a study mean.

Extrinsic Factors

Extrinsic factors coded included publication status, year of publication, and evaluator

independence. Studies were considered independent if the list of authors did not include

any of the original developers of SFA.

Statistical Analysis

The effect sizes of interest in this study are standardized mean differences. These are

effect sizes that quantify the difference between the treatment and control group on out-

come measures, adjusted for covariates, divided by standard deviations. This allows the

magnitude of impacts to be compared across interventions and outcome measures.

Effect sizes were calculated as the difference between adjusted posttest scores for treat-

ment and control students, divided by the unadjusted standard deviation of the control

group. Alternative procedures were used to estimate effect sizes when unadjusted postt-

ests or unadjusted standard deviations were not reported (Lipsey & Wilson, 2001).

Studies with cluster assignments that did not use HLM or other multi-level modeling

but used student-level analysis were re-analyzed to estimate significance account for

clustering (Hedges, 2007a).

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 101

In meta-analysis models, studies were weighted, to give more weight to studies with

the greatest precision (Hedges et al., 2010). In practice, this primarily involves weighting

for sample size. Weights for each study were calculated according to the following for-

mula:

ðv

þ s

where W

is the weight for study j, k

is the number of findings in study j, v

is the aver-

age finding-level variance for study j, and s

is the between-study variance in the study-

average effect sizes (Hedges et al., 2010; Tipton, 2015). Variance estimates were adjusted

for studies with cluster-level assignment, using the total variance for unequal cluster

sample sizes (Hedges, 2007b).

We used a multivariate meta-regression model with robust variance estimation (RVE)

to conduct the meta-analysis (Hedges et al., 2010). This approach has several advan-

tages. First, our data included multiple effect sizes per study, and robust variance esti-

mation accounts for this dependence without requiring knowledge of the covariance

structure (Hedges et al., 2010). Second, this approach allows for moderators to be added

to the meta-regression model and calculates the statistical significance of each moderator

in explaining variation in the effect sizes (Hedges et al., 2010). Tipton (2015) expanded

this approach by adding a small-sample correction that prevents inflated Type I errors

when the number of studies included in the meta-analysis is small or when the covari-

ates are imbalanced. We estimated three meta-regression models. First, we estimated a

null model to produce the average effect size without adjusting for any covariates.

Second, we estimated a meta-regression model with the identified moderators of interest

and covariates. Both the first and second models included only the outcomes at the end

of the intervention period. Third, we estimated an exploratory meta-regression model

including the same identified moderators of interest and covariates, but that added

results from interim reports, to better explore the change in impact over time. Both of

the meta-regression models took the general form:

¼ b

þ b

þ g

þ u

þ e

where T

is the effect size estimate i in study j, b

is the grand mean effect size for all

studies, b

is a vector of regression coefficients for the covariates at the effect size level,

is a vector of covariates at the effect size level, b

is a vector of regression coeffi-

cients at the study level, and X

is a vector of covariates at the study level, g

is the

study-specific random effect, and u

is the effect size specific random effect. The X

and X

included substantive, methodological, and extrinsic factors, as outlined above. All

moderators and covariates were grand-mean centered to facilitate interpretation of the

intercept. All reported mean effect sizes come from this meta-regression model, which

adjusts for potential moderators and covariates. The packages metafor (Viechtbauer,

2010) and clubSandwich (Pustejovsky, 2020) were used to estimate all random-effects

models with RVE in the R statistical software (R Core Team, 2020).

102 A. C. K. CHEUNG ET AL.

Results

Since first implemented in Baltimore in 1987, over 60 studies have been carried out to

examine the effectiveness of SFA. However, only 17 studies met the inclusion criteria

for this review. Common reasons for exclusion (see Online Supplementary Appendix 1)

included failure to have at least two schools in each treatment condition (k¼17), no

appropriate data, or nonequivalent or missing pretests (k ¼ 13), non-US locations

(k¼17), program started after first grade (k¼2), comparing to normed performance

(k¼2), or comparing two forms of SFA (k¼4).

Characteristics of Studies

The majority of the included studies were quasi-experiments (k ¼ 15), and only two

were randomized studies. Three of the included studies were published articles and 14

were unpublished documents such as dissertations, conference papers, and technical

reports. In terms of the relationship of the developer to the evaluator, most of the stud-

ies were determined to be independent (k¼13), while the remaining studies included at

least one of the developers in the author list of the study (k¼4). All but one of the stud-

ies took place in schools with very high levels of economic disadvantage, with at least

66% of students receiving free or reduced-price lunches (k¼16).

Across these 17 studies, a total of 221 separate effect sizes were coded, with an aver-

age of 13 effect sizes per study. Six studies reported final effect sizes after 1 year

(n ¼ 55), 3 studies reported effect sizes after 2 years (n ¼ 20), and 9 studies reported

effect sizes after 3 or more years (n ¼ 146). Six studies reported 85 outcomes for

African-American students, either by reporting on a predominantly African-American

student sample or by reporting on outcomes for African-American students separately,

within a heterogeneous sample. Outcomes for Hispanic students were reported in 3

studies (n¼34). One study reported outcomes for White students (n¼4). Four studies

reported outcomes separately for English Learners (ELs), while eight studies reported on

outcomes for low achievers separately. Outcomes were mainly of general reading or

comprehension measures (n¼90) and alphabetics (n¼97), with fewer findings reported

on fluency measures (n¼34).

Overall Effects

The results for the null model and full meta-regression model is shown in Table 1,

which lists the two randomized studies and then all quasi-experiments in order of

school sample size. The meta-regression model controlled for research design, independ-

ence of evaluator, duration of study, race/ethnicity of students, language status of stu-

dents, baseline achievement level, and outcome type. There was an overall positive

impact of SFA on reading achievement across all qualifying studies (ES¼þ0.24,

p < .05). However, these outcomes vary considerably, with a 95% prediction interval of

0.27 to þ0.75. The prediction interval provides a sense of the heterogeneity of the out-

comes, with 95% of the effect sizes in the population expected within this range. Study

characteristics and findings of the 17 included studies are summarized in Table 2, and

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 103

more detailed study-by-study information is shown in Appendix 2 in the

Online Appendix.

Only two of the studies of SFA were large-scale cluster randomized experiments.

Borman et al. (2007) carried out the first randomized, longitudinal study. Forty-one

schools (21T, 20C) throughout the United States were randomly assigned to either the

treatment or control condition. Children were pretested on the PPVT and then indi-

vidually tested on the Woodcock Reading Mastery Test each spring for 3 years, kinder-

garten to second grade. At the end of this 3-year study, 35 schools and over 2,000

students remained. Using pretests as covariates, the HLM results indicated that the

treatment schools significantly outperformed the controls on all three outcome meas-

ures, with an overall effect size of þ0.25 (p < 0.05). The effect sizes were þ0.22, þ0.33,

and þ0.21 for Word Identification, Word Attack, and Passage Comprehension,

respectively.

The second large-scale cluster randomized longitudinal study was carried out by

Quint et al. (2015). Similar to the Borman study, 37 low-SES schools from five school

districts in the United States were randomly assigned to treatment (N ¼ 19) or control

conditions (N ¼ 18). Students were followed from kindergarten to second grade. The

treatment schools scored significantly higher than the controls on phonics skills for

second-graders who had been in the treatment group for all three years. No statistically

significant differences were found on reading fluency and comprehension posttests.

However, among the lowest-performing students at pretest, those in the treatment group

scored significantly higher than their counterparts in the control group on phonics

skills, word recognition, and reading fluency.

All other US studies of SFA used quasi-experimental designs, in which schools were

matched at pretest based on pretests and demographics, and then students in both

groups were assessed each year, for from 1 to 5 years. Most of these quasi-experiments

involved small numbers of schools, and would not have had sufficient numbers of clus-

ters (schools) for adequate statistical power on their own. However, this meta-analysis

combines these with other studies, weighting for sample size and other covariates, to

obtain combined results that are adequately powered.

Table 1. Meta-regression results.

Reference Coefficient SE t df p

Null model

Intercept 0.10 0.06 1.59 9.42 0.146

Meta-regression

Success for all versus control (Intercept) 0.24 0.08 3.07 7.24 0.017

Randomized Studies Quasi-experiments 0.05 0.18 0.27 3.14 0.804

Independent Evaluations Not independent evaluations 0.07 0.13 0.48 4.39 0.653

1 year studies 3þ Year studies 0.06 0.14 0.44 7.92 0.670

2 year studies 0.27 0.11 2.43 1.91 0.141

Black students Mix of students 0.08 0.20 0.42 5.54 0.687

White students 0.41 0.23 1.74 3.79 0.162

Hispanic students 0.06 0.27 0.21 2.75 0.846

No EL students Mix of language status students 0.10 0.08 1.13 2.39 0.358

EL students 0.04 0.07 0.59 2.06 0.615

General Reading/Comprehension outcomes Fluency outcomes 0.05 0.06 0.83 5.05 0.443

Alphabetics outcomes 0.18 0.03 6.93 4.97 0.001

Low achievers Moderate/High Achievers 0.46 0.15 3.19 4.33 0.030

Mix of students 0.09 0.05 2.03 2.68 0.146

Note. SE: standard error; df: degrees of freedom.

104 A. C. K. CHEUNG ET AL.

Table 2. Features and summary of outcomes of included studies.

Study Design Evaluator Sample Sample description n Outcome Duration Grade Study ES Low achiever ES

Quint et al. (2015) CR Ind. 37 Schools,

1,635 students

Five school districts, mostly

in or on outskirts of

large or midsize cities in

the Northeast, South,

and West

12% W, 18%AA, 88%FRL,

24%ELL, 66%H

28 GR/C, Al 3 years K-2 þ0.08 þ0.18

Borman

et al. (2007)

CR 35 Schools,

2,108 students

Title I schools throughout

the U.S.

72%FRL, 56%AA,

30%W, 10%H

3 GR/C, Al 3 years K-2 þ0.25

Correnti (2009) CQE Ind. 115 Schools,

3,783 students

High-poverty schools in 17

states

69%FRL, 52%AA,

21%W, 18%H

1 GR/C 3 years K-2 þ0.43

Nunnery

et al. (1996)

CQE 67 Schools,

2,060 students

High-poverty schools in

Houston, TX

78%FRL, 54%H, 38%AA

3 GR/C 1 year 1st þ0.19

Ross, Smith,

et al. (1996)

CQE Ind. 12 Schools,

781 students

Memphis, TN 4 GR/C, Al, Fl 1 year 1st þ0.01

Slavin et al. (1993) CQE 10 Schools,

1,495 students

African-American students

in high-poverty schools

in Baltimore, MD

59 GR/C, Al, Fl 3 years preK-1 þ0.59 þ1.17

4 years preK-2 þ0.29 þ0.91

5 years preK-3 þ0.41 þ1.29

6 years preK-4 þ0.41 þ0.78

K-5 þ0.46 þ1.01

Chambers

et al. (2005)

CQE 8 Schools,

577 students

Mostly Hispanic

communities in the US

8 GR/C, Al 1 year K þ0.28

1st

þ0.32

Ross and

Casey (1998a)

CQE Ind. 8 Schools,

356 students

High-poverty schools in Ft.

Wayne, IN

75%FRL, 45%minority

8 GR/C, Al, Fl 2 years K-1 þ0.26 þ0.34

Datnow

et al. (2001)

CQE Ind. 6 Schools,

398 students

Diverse students in

Miami, FL

2 GR/C 4 years 1–4 þ0.11

Livingston and

Flaherty (1997)

CQE Ind. 6 Schools,

828 students

High-poverty multilingual

schools in Modesto and

Riverside, CA.

12 GR/C 2 years K-1 þ0.65

3 years K-2 þ0.40

4 years K-3 þ0.12

noz and

Dossett (2004)

CQE Ind. 6 Schools,

349 students

High-poverty schools in

Louisville, KY

1 GR/C 3 years 1–3 þ0.15

Ross, Nunnery,

et al. (1996)

CQE Ind. 5 Schools,

428 students

Tucson, Arizona 16 GR/C, Al, Fl 1 year 1st þ0.41 þ0.51

(continued)

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 105

Table 2. Continued.

Study Design Evaluator Sample Sample description n Outcome Duration Grade Study ES Low achiever ES

Ross and

Casey (1998b)

CQE Ind. 4 Schools,

581 students

Suburban schools in

Portland, OR.

15% minority

16 GR/C, Al, Fl 1 year K þ0.16 þ0.36

1st 0.02 0.18

Ross, Smith, Bond

et al. (1994)

CQE Ind. 4 Schools,

179 students

African-American students

in high-poverty schools

in Montgomery, AL

8 GR/C, Al, Fl 2 years 1–2 þ0.58 þ1.01

Ross et al. (1995) CQE Ind. 4 Schools,

257 students

Title I schools in Ft.

Wayne, IN

20 GR/C, Al, Fl 3 years K-2 þ0.10 þ0.56

4 years K-3 0.10

1–4 0.00 þ0.29

Ross et al. (1997) CQE Ind. 4 Schools,

291 students

A medium-size

midwestern city

24 GR/C, Al, Fl 2 years K-1 þ0.28 þ0.18

3 years K-2 þ0.16

1–3 þ0.02

Wang and

Ross (1999)

CQE Ind. 4 schools,

340 students

High poverty African-

American schools in

Little Rock, AK

8 GR/C, Al, Fl 1 year 1st þ0.24

2nd 0.05

Note. CR: cluster randomized; CQE: cluster quasi-experiment; Ind.: independent; W: White; AA: African American; FRL: free/reduced lunch; ELL: English Language Learner; H: Hispanic; GR/

C: general reading/comprehension; AL: alphabetics; FL: fluency; n: number of effect sizes; ES: effect size.

106 A. C. K. CHEUNG ET AL.

One of the QEDs was notable for its large size and longitudinal designs. Slavin et al.

(1993; also see Madden et al., 1993) evaluated the first five schools to use Success for

All. The schools, all high-poverty schools in Baltimore, were each matched with con-

trol schools with very similar pretests and demographics. All students were African

American and virtually all students qualified for free lunches. Within schools, indi-

vidual students were matched with control students. Students were followed from

first grade onward, in a total of five cohorts. The mean effect size across all five

cohorts after 3 years was þ0.59 (p ¼ .05) for all students and þ1.17 (p < .01) for low

achievers. The mean effect size for fifth graders who had been in treatment or con-

trol schools since first grade was þ0.46 (n.s.) overall and þ1.01 (p < .01) for low

achievers. A follow-up study of these schools was carried out by Borman and Hewes

(2002). It obtained data from three cohorts of students followed to the eighth grade,

so students would have been out of the K-5 SFA schools for at least three years.

Results indicated lasting positive effects on standardized reading achievement meas-

ures (ES ¼þ0.29, n.s.), and SFA students were significantly less likely to have been

retained in elementary school (ES ¼þ0.27, n.s.) or assigned to special education

(ES¼þ0.18, n.s.), in comparison to controls.

The second major, large-scale QED was a part of the University of Michigan’s

Study of Instructional Improvement (Rowan et al., 2009). This study compared more

than 100 schools throughout the United States that were implementing one of three

comprehensive school reform models: Success for All, America’s Choice, or

Accelerated Schools. There was also a control group. Students in the SFA portion of

the study were followed from kindergarten to second grade. The detailed findings

were reported by Correnti (2009), who found an overall effect size of

þ0.43 (p < .01).

Table 3. Substantive and methodological moderators.

Moderator Level kn ES SE t df p

Research design Randomized 2 31 þ0.20 0.14 1.44 1.51 0.322

Quasi-experiment 15 190 þ0.25 0.09 2.71 6.36 0.033

Duration 1 year 6 55 þ0.25 0.09 2.91 4.88 0.035

2 years 4 32 þ0.46 0.15 3.11 5.19 0.025

3þ years 9 134 þ0.19 0.10 1.77 5.12 0.135

Race Black 6 85 þ0.28 0.10 2.80 4.27 0.046

Hispanic 3 34 þ0.25 0.24 1.06 3.68 0.355

White 1 4 þ0.60 0.21 2.92 4.14 0.042

Mix 10 98 þ0.19 0.14 1.37 7.79 0.210

Language Learner status EL 4 18 þ0.27 0.10 2.80 4.55 0.042

Not EL 4 12 þ0.33 0.10 3.41 4.70 0.021

Mix 15 191 þ0.23 0.08 2.88 7.74 0.021

Outcome type General reading/comprehension 17 90 þ0.19 0.08 2.52 7.51 0.038

Alphabetics 12 97 þ0.32 0.09 3.50 7.44 0.009

Fluency 9 34 þ0.14 0.08 1.71 7.03 0.132

Achievement status Low achiever 8 60 þ0.54 0.15 3.69 6.16 0.010

Average/High achiever 8 60 þ0.07 0.07 1.05 5.36 0.338

Mix 14 101 þ0.16 0.08 2.00 8.23 0.080

Evaluator status Independent evaluator 13 148 þ0.22 0.10 2.18 9.50 0.056

Not independent evaluator 4 73 þ0.28 0.10 2.80 3.02 0.067

Note. k: number of studies; n: number of outcomes; ES: effect size; SE: standard error; df: degrees of freedom. Mean

effect sizes for each moderator category were calculated by estimated a model including the same covariates as those

shown in Table 1 without an intercept, with the moderator included as a categorical variable.

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 107

Substantive and Methodological Moderators

Several important demographic and methodological moderators of treatment impacts

were identified and explored statistically (see Table 3). Not all coded factors and poten-

tial moderators were able to be examined, because of very unequal distributions of stud-

ies within moderators, or substantial correlations between moderators and

study features.

Research Design

Differences in effect sizes between studies that used randomized designs (k ¼ 2,

ES¼þ0.23) and studies that used quasi-experimental designs incorporating matching

(k ¼ 15, ES¼þ0.24) were tested. This difference was not statistically significant.

Evaluator Status

We also compared differences in effect sizes for studies conducted independently from

the SFA developers and those conducted in collaboration with SFA. Effect sizes for stud-

ies from independent evaluations (ES¼þ0.21, p < .10) were similar to those from stud-

ies conducted with the program developers (ES¼þ0.30, p < .10). This difference was

not statistically significant.

Duration

Effect sizes were compared for studies at the end of 1, 2, and 3 or more years. Effect

sizes averaged þ0.25 after 1 year, þ0.46 after 2 years, and þ0.19 after 3 or more years.

Appendix 3 in the Online Appendix shows year-by-year outcome trends for longitudinal

studies, with mean outcomes by year similar to the duration findings.

Race & Ethnicity

Outcomes for samples of mostly African-American students averaged þ0.30 (p < .05;

k ¼ 6). In mostly Hispanic samples (k ¼ 3), effect sizes averaged þ0.24 (n.s.). One study

included mostly White students, with average effects of þ0.63 (p < .05). The remaining

10 studies included outcomes of a mix of race and ethnicities, with mean effect sizes of

þ0.23 (n.s.)

English Learner Status

Impacts were similar for English Learners (ES¼þ0.13, p < .05), non-English Learners

(ES¼þ0.36, p < .05) and mixed samples (ES¼þ0.23, p < .05). These differences were

not statistically significant.

Achievement Status

Outcomes including all students had a mean effect size of þ0.24 (k ¼ 17). Outcomes for

low achievers averaged þ0.54 (

p < .01), significantly higher than outcomes for average/

high achievers (ES¼þ0.07, n.s.), and those for mixed samples (ES¼þ0.16, n.s.).

108 A. C. K. CHEUNG ET AL.

Outcome Type

Differences in effect sizes across outcome types were also statistically examined. The

mean effect size across studies with general reading or comprehension outcomes was

þ0.20 (n ¼ 90). This contrasted with mean effect sizes across alphabetics outcomes

(ES¼þ0.32, n ¼ 97), and fluency outcomes (ES¼þ0.14, n ¼ 34). Alphabetics outcomes

were significantly higher than fluency outcomes (p < .01).

Discussion

Success for All is a very unusual educational reform program, unique in many ways. It

has operated for 33 years with the same basic philosophy and approach, although it has

constantly changed its specific components in response to its learnings (Peurach, 2011).

Its dissemination has waxed and waned with changing educational policies, SFA served

as many as 1,500 schools at one time, in 2000–2001. Currently, there are about 500

schools using the full program and another 500 schools using components. In contrast,

in two prominent charter networks, KIPP serves 242 schools, and New York’s Success

Academies serve 45. Also, the program is relatively long-lasting. Data reported by Slavin

et al. (2009), indicates that the median SFA school stays in the program for 11 years,

and there are several that have used it more than 20 years. At a cost of $117 per student

per year (as reported by Quint et al., 2015), SFA is relatively cost-effective (Borman &

Hewes, 2002).

In its long history, Success for All has frequently been evaluated, mostly by third parties.

There were 17 studies that met rigorous inclusion standards. In contrast, the great majority

of programs that met the inclusion standards of the What Works Clearinghouse or Evidence

for ESSA have been evaluated in just one qualifying study, and very few have been evaluated

more than twice.

Across the 17 qualifying US studies, the mean effect size was þ0.24 for students in

general, and among 8 studies that separately analyzed effects for the lowest achievers,

the mean was þ0.56. These are important outcomes. As a point of comparison, the

mean difference in National Assessment of Educational Progress (NAEP) reading

achievement between students who qualify for free lunch and those who do not is

approximately an effect size of 0.50 (National Center for Education Statistics (NCES),

2019). The mean outcomes of Success for All are almost half of this gap, and the out-

comes for lowest achievers equal the entire gap.

An important and interesting question for policy and practice is whether SFA works

particularly well with sub-populations. The only important factor with sufficient studies

to permit subsample analyses was lowest-achieving students (usually students in the low-

est 25% of their classes). As noted earlier, the mean effect size for low achievers

was þ0.54.

It is possible to speculate about what aspects of SFA made the program more effective

for lowest achieving students. Low achievers are most likely to receive one-to-one or

one-to-small group tutoring, known to have a substantial impact on reading achieve-

ment (Neitzel, Lake, et al., 2020; Wanzek et al., 2016). Also, there is evidence that

cooperative learning, used throughout SFA, is particularly beneficial for low achievers

(Slavin, 2017).

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 109

The findings of the subgroup analyses with low achievers may be especially important

for schools serving large numbers of students who are poor readers. Quint and her col-

leagues argued that the cost of SFA, which they estimated at $117 per pupil per year,

was relatively modest when compared to that of business-as-usual reading programs. In

other words, for schools with a high percentage of poor readers, SFA offers a pragmatic

alternative supported by evidence of effectiveness.

The effects of SFA are generally maintained as long as the program remains in oper-

ation. In the one study to assess lasting impacts (Borman & Hewes, 2002), outcomes

maintain in follow-up as well. This is an unusual finding, and contrasts with the declin-

ing impacts over time seen for intensive early tutoring (e.g. Blachman et al., 2014;

Pinnell et al., 1994). Beyond SFA itself, this set of findings suggests that a strategy of

intensive tutoring and other services followed up with continued interventions to

improve classroom instruction to maintain early gains may have more promise than

intensive early intervention alone.

The importance of tutoring for struggling readers in the early elementary grades is

suggested by the substantially greater short- and long-term impacts of SFA for the low-

est-achieving students, who are those most likely to receive tutoring, of course. Another

interesting point of comparison also speaks to the importance of tutoring as part of the

impact of SFA. Of the four largest evaluations of SFA, three found strong positive

impacts. In these, schools were able to provide adequate numbers of tutors to work with

most struggling readers in grades 1–3. However, the fourth study, by Quint et al.

(2015), took place at the height of the Great Recession (2011–2014). School budgets

were severely impacted, and during this study, most schools did not have tutors. This

study reported significant positive effects for low achievers, but all outcomes were much

smaller than those found in the other studies.

Many phonetic reading programs emphasizing early intervention show substantial

positive effects on measures of alphabetics, but not comprehension or general reading.

The outcomes of SFA are strongest on measures of alphabetics (ES¼þ0.32), but are

also positive on general reading/comprehension (ES¼þ0.19), indicating that the pro-

gram is more than just phonics.

A distressingly common finding in studies of educational programs is that studies

carried out by program developers produce much more positive outcomes than do

independent evaluations (Borman et al., 2003 ; Wolf et al., 2020). In the case of

Success for All, studies including SFA developers as co-investigators (k ¼ 4) do obtain

higher effect sizes than do independent studies (k ¼ 13) (ES¼þ0.30 versus þ0.21,

respectively), but this difference is not statistically significant. However, this analysis

was underpowered, with only 17 studies, so these results must be interpreted

with caution.

Policy Importance of Research on Success for All

Attempts to improve the outcomes of education for disadvantaged and at-risk students

fall into two types. One focuses on systemwide policies, such as targeted funding, gov-

ernance, assessment/accountability schemes, standards, and regulations. These types of

strategies are rarely found to be very effective, but they do operate on a very large scale.

110 A. C. K. CHEUNG ET AL.

In contrast, research and development often creates effective approaches, proven to

make a meaningful difference in student achievement. However, these proven

approaches rarely achieve substantial scale, and if they do, they often do not maintain

their effectiveness at scale (see Cohen & Moffitt, 2009, for a discussion of this dilemma).

Success for All is one of very few interventions capable of operating at a scale that is

meaningful for policy without losing its effectiveness. At its peak, Success for All oper-

ated nationally in more than 1,500 schools, and its growth was only curtailed by a shift

in federal policies in 2002. Its many evaluations, mostly done by third party evaluators,

have found positive outcomes across many locations and over extended periods of time.

In the current policy climate in the United States, in which evidence of effectiveness

is taking on an ever-greater role, Success for All offers one of very few approaches that

could, in principle, produce substantial positive outcomes at large scale, and this should

have meaning for national policies.

The importance of Success for All for policy and practice is best understood by plac-

ing the program in the context of other attempts to substantially improve student

achievement in elementary schools serving many disadvantaged students. A recent

review of research on programs for struggling readers in elementary schools by Neitzel,

Lake, et al. (2020) found that there were just three categories of approaches with sub-

stantial and robust evidence of positive outcomes with students scoring in the lowest

25% or 33% of their schools in reading. One was one-to-one or one-to-small group

tutoring, by teachers or teaching assistants, with a mean effect size of þ0.29. Another

was multi-tier whole school/whole school approaches, consisting of Success for All and

one other program. The third was whole class Tier 1 programs, mostly using coopera-

tive learning. What these findings imply is that in schools with relatively few students

struggling in reading, tutoring may be the best solution for the individuals who are

struggling. Even though tutoring is substantially more expensive per student than

Success for All, in a school with few struggling readers, it may not be sensible to inter-

vene with all students.

On the other hand, when most students need intervention in reading, it is not sens-

ible or cost-effective to solve the problem with tutoring alone. In the United States, the

average large urban school district has only 28% of fourth graders scoring “proficient”

or better on the National Assessment of Educational Progress (National Center for

Education Statistics [NCES], 2019), and in cities such as Dallas, Milwaukee, Baltimore,

Cleveland, and Detroit, fewer than 15% of students in the entire district score at or

above “proficient.” In such districts, and in individual low-performing schools even in

higher-performing districts, trying to reach high levels of proficiency through tutoring

alone would be prohibitively expensive.

The findings of the evaluations of Success for All have particular importance for spe-

cial education policies. The structure of SFA adheres closely to the concept of Response

to Intervention (RTI). SFA emphasizes professional development, coaching, and exten-

sive programming to improve outcomes of Tier 1 classroom instruction, which is then

followed up by closely coordinated Tier 2 (small-group tutoring) or Tier 3 (one-to-one

tutoring) for students who need it. Longitudinal research found substantial and lasting

impacts on the achievement of the lowest achievers, and on reductions in assignment to

special education as well as retentions in grade (Borman & Hewes, 2002).

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 111

Beyond the program itself, the research on Success for All, as applied to low-achiev-

ing students, illustrates that the educational problems of low-achieving students are fun-

damentally solvable. Perhaps someday there will be many approaches like Success for

All, each of which is capable of improving student achievement on a substantial scale.

Research on Success for All suggests that disadvantaged students and struggling readers

could be learning to read at significantly higher levels than they do today, and that sub-

stantial improvement can be brought about at scale. The knowledge that large-scale

improvement is possible should lead to policies that both disseminate existing proven

approaches, and invest in research and development to further increase the effectiveness

and replicability of programs that can reliably produce important improvements in read-

ing for disadvantaged and low-achieving readers.

Open Scholarship

This article has earned the Center for Open Science badge for Open Data. The data are openly

accessible at https://archive.data.jhu.edu/dataset.xhtml?persistentId=doi:10.7281/T1/VDZAZY.To

obtain the author ’s disclosure form, please contact the Editor.

Funding

This work was supported by the the National Social Sciences Funding 2019 (National Education

Sciences Planning, National Youth Project, Evaluation of Evidence-based Educational

Experiments, grant number CGA190249).

ORCID

Alan C. K. Cheung http://orcid.org/0000-0002-9013-1586

Chen Xie

http://orcid.org/0000-0001-6124-1420

Tengteng Zhuang

http://orcid.org/0000-0003-3940-4216

Amanda J. Neitzel

http://orcid.org/0000-0002-4676-9320

Robert E. Slavin

http://orcid.org/0000-0003-3117-7477

References



A double asterisk indicates studies included in the main meta-analysis (final reports).



A single asterisk indicates studies included in the exploratory meta-analysis (interim reports).

Blachman, B. A., Schatschneider, C., Fletcher, J. M., Murray, M. S., Munger, K. A., & Vaughn,

M. G. (2014). Intensive reading remediation in grade 2 or 3: Are there effects a decade later?

Journal of Educational Psychology, 106(1), 46–57. https://doi.org/10.1037/a0033663

Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2009). Introduction to meta-ana-

lysis. John Wiley & Sons, Ltd.

Borman, G. D., & Hewes, G. M. (2002). The long-term effects and cost-effectiveness of Success

for All. Educational Evaluation and Policy Analysis, 24(4), 243–266. https://doi.org/10.3102/

01623737024004243

112 A. C. K. CHEUNG ET AL.

Borman, G. D., Hewes, G. M., Overman, L. T., & Brown, S. (2003). Comprehensive school reform

and achievement: A meta-analysis. Review of Educational Research, 73(2), 125–230. https://doi.

org/10.3102/00346543073002125.



Borman, G. D., Slavin, R. E., Cheung, A. C. K., Chamberlain, A. M., Madden, N. A., &

Chambers, B. (2007). Final reading outcomes of the national randomized field trial of Success

for All. American Educational Research Journal, 44(3), 701–731. https://doi.org/10.3102/

0002831207306743.

Chambers, B., Cheung, A., & Slavin, R. (2016). Literacy and language outcomes of balanced and

developmental-constructivist approaches to early childhood education: A systematic review.

Educational Research Review, 18,88–111. https://doi.org/10.1016/j.edurev.2016.03.003



Chambers, B., Slavin, R. E., Madden, N. A., Cheung, A., & Gifford, R. (2005). Enhancing

Success for All for Hispanic students: Effects on beginning reading achievement. Success for All

Foundation. http://eric.ed.gov/?id=ED485350

Cohen, D. K., & Moffitt, S. L. (2009). The ordeal of equality: Did federal regulation fix the schools?

Harvard University Press.



Correnti, R. (2009). Examining CSR program effects on student achievement: Causal explanation

through examination of implementation rates and student mobility [Paper presentation]. Paper

presented at the 2nd Annual Conference of the Society for Research on Educational Effectiveness,

Washington, DC, March, 2009.

Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acquisition and its relation to read-

ing experience and ability 10 years later. Developmental Psychology, 33(6), 934–945. https://doi.

org/10.1037/0012-1649.33.6.934



Datnow, A., Stringfield, S., Borman, G., Rachuba, L., & Castellano, M. (2001). Comprehensive

school reform in culturally and linguistically diverse contexts: Implementation and outcomes

from a 4-year study. Center for Research on Education, Diversity, and Excellence.

Fuchs, D., & Fuchs, L. (2006). Introduction to response to intervention: What, why, and how

valid is it? Reading Research Quarterly, 41 (1), 93–128. https://doi.org/10.1598/RRQ.41.1.4.

Good, T., & Brophy, J. (2018). Looking in classrooms (10th ed.). Allyn & Bacon.

Hedges, L. V. (2007a). Correcting a significance test for clustering. Journal of Educational and

Behavioral Statistics, 32(2), 151–179. https://doi.org/10.3102/1076998606298040

Hedges, L. V. (2007b). Effect sizes in cluster-randomized designs. Journal of Educational and

Behavioral Statistics, 32(4), 341–370. https://doi.org/10.3102/1076998606298043.

Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regres-

sion with dependent effect size estimates.

Research Synthesis Methods, 1(1), 39–65. https://doi.

org/10.1002/jrsm.5.

Lesnick, J., Goerge, R., Smithgall, C., & Gwynne, J. (2010). Reading on grade level in third grade:

How is it related to high school performance and college enrollment? Chapin Hall at the

University of Chicago.

Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. SAGE.



Livingston, M., & Flaherty, J. (1997). Effects of Success for All on reading achievement in

California schools. WestEd.

Madden, N. A., & Slavin, R. E. (2017). Evaluations of technology-assisted small-group tutoring

for struggling readers. Reading & Writing Quarterly, 33(4), 327–334. https://doi.org/10.1080/

10573569.2016.1255577.



Madden, N. A., Slavin, R. E., Karweit, N. L., Dolan, L. J., & Wasik, B. A. (1993). Success for

All: Longitudinal effects of a restructuring program for inner-city elementary schools.

American Educational Research Journal, 30(1), 123–148. https://doi.org/10.3102/

00028312030001123.



noz, M. A., Dossett, D., & Judy-Gullans, K. (2004). Educating students placed at risk:

Evaluating the impct of Success for All in urban settings. Journal of Education for Students

Placed at Risk, 9(3), 261–277. https://doi.org/10.1207/s15327671espr0903_3.

National Center for Education Statistics (NCES). (2019). The condition of education 2019. https://

nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2019144.

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 113

National Reading Panel (NRP). (2000). Teaching children to read: An evidence-based assessment of

the scientific research literature on reading and its implications for reading instruction (NIH

Pub. No. 00-4754). http://www.nichd.nih.gov/publications/pubs/nrp/pages/report.aspx.

Neitzel, A., Cheung, A. C., Xie, C., Zhuang, T., & Slavin, R. (2020). Data associated with the pub-

lication: Success for All: A quantitative synthesis of U. S. evaluations (V1 ed.). Johns Hopkins

University Data Archive.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (2020). Effective programs for struggling readers:

A best-evidence synthesis. (Manuscript submitted for publication). www.bestevidence.org.



Nunnery, J. A., Slavin, R., Ross, S., Smith, L., Hunter, P., & Stubbs, J. (1996). An assessment of

Success for All program component configuration effects on the reading achievement of at-risk

first grade students [Paper presentation]. Annual Meeting of the American Educational

Research Association, New York.

OECD. (2019). PISA 2018 technical report. OECD Publishing.

Peurach, D. J. (2011). Seeing complexity in public education: Problems, possibilities, and Success for

All. Oxford University Press.

Pigott, T. D., & Polanin, J. R. (2020). Methodological guidance paper: High-quality meta-analysis

in a systematic review. Review of Educational Research, 90(1), 24–46. https://doi.org/10.3102/

0034654319877153

Pinnell, G. S., Lyons, C. A., DeFord, D. E., Bryk, A. S., & Seltzer, M. (1994). Comparing instruc-

tional models for the literacy education of high risk first graders. Reading Research Quarterly,

29(1), 8–38. https://doi.org/10.2307/747736

Pustejovsky, J. (2020). Clubsandwich: Cluster-Robust (Sandwich) Variance Estimators with Small-

Sample Corrections (Version R package version 0.4.1) [Computer software]. https://CRAN.R-

project.org/package=clubSandwich



Quint, J., Zhu, P., Balu, R., Rappaport, S., & DeLaurentis, M. (2015). Scaling up the Success for

All model of school reform: Final report from the Investing in Innovation (i3) evaluation.

MDRC.

R Core Team. (2020). R: a language and environment for statistical computing. R Foundation for

Statistical Computing. https://www.R-project.org/



Ross, S. M., & Casey, J. (1998a). Longitudinal study of student literacy achievement in different

Title I school-wide programs in Ft. Wayne community schools, year 2: First grade results.

Memphis, TN: University of Memphis, Center for Research on Educational Policy.



Ross, S. M., & Casey, J. (1998b). Success for all evaluation, 1997–98 tigard-tualatin school district.

Memphis: University of Memphis, Center for Research on Educational Policy.



Ross, S. M., Nunnery, J. A., & Smith, L. J. (1996). Evaluation of Title I reading programs:

Amphitheater public schools year 1: 1995–1996. University of Memphis, Center for Research in

Educational Policy.



Ross, S. M., Smith, L. J., & Bond, C. (1994). An evaluation of the Success for All program in

Montgomery, Alabama schools. University of Memphis, Center for Research on Educational

Policy.



Ross, S. M., Smith, L. J., & Casey, J. P. (1995). Final Report: 1994–1995 Success for All program

in Fort Wayne, Indiana. University of Memphis, Center for Research in Educational Policy.



Ross, S. M., Smith, L. J., & Casey, J. P. (1997). Preventing early school failure: Impacts of

Success for All on standardized tests outcomes, minority group performance, and school effect-

iveness. Journal of Education for Students Placed at Risk, 2(1), 29–53. https://doi.org/10.1207/

s15327671espr0201_4



Ross, S. M., Smith, L. J., Casey, J. P., Johnson, B., & Bond, C. (1994b). Using Success for All to

restructure elementary schools: A tale of four cities [Paper presentation]. At the annual meeting

of the American Educational Research Association, New Orleans, LA. (ERIC Document

Reproduction Service No. ED 373456)



Ross, S. M., Smith, L. J., Lewis, T., & Nunnery, J. (1996). 1995–96 evaluation of Roots & Wings

in Memphis City Schools. University of Memphis, Center for Research in Educational Policy.

114 A. C. K. CHEUNG ET AL.

Rowan, B., Correnti, R., Miller, R., & Camburn, E. (2009). School improvement by design: Lessons

from a study of comprehensive school reform programs. http://www.cpre.org/school-improve-

ment-design-lessons-study-comprehensive-school-reform-programs.

Shaywitz, S. E., & Shaywitz, J. (2020). Overcoming dyslexia (2nd ed.). Penguin Random House.

Slavin, R. E. (1987). Ability grouping and student achievement in elementary schools: A best-evi-

dence synthesis. Review of Educational Research, 57(3), 347–350. https://doi.org/10.3102/

00346543057003293.

Slavin, R. E. (2017). Instruction based on cooperative learning. In R. Mayer & P. Alexander

(Eds.), Handbook of research on learning and instruction. Routledge.

Slavin, R. E., Lake, C., Chambers, B., Cheung, A., & Davis, S. (2009). Effective reading programs

for the elementary grades: A best-evidence synthesis. Review of Educational Research, 79(4),

1391–1466. https://doi.org/10.3102/0034654309341374.

Slavin, R. E., Madden, N. A., Chambers, B., & Haxby, B. (2009). Two million children: Success for

All. Corwin.



Slavin, R. E., Madden, N. A., Dolan, L. J., & Wasik, B. A. (1993). Success for All in the

Baltimore City Public Schools: Year 6 report. Johns Hopkins University, Center for Research on

Effective Schooling for Disadvantaged Students.



Slavin, R. E., Madden, N. A., Dolan, L. J., Wasik, B. A., Ross, S. M., & Smith, L. J. (1994).

Success for All: Longitudinal effects of systemic school-by-school reform in seven districts [Paper

presentation]. Annual Conference of the American Educational Research Association, LA.

Snow, C. E., Burns, S. M., & Griffin, P. (Eds.). (1998). Preventing reading difficulties in young chil-

dren. National Academy Press.

Stevens, R. J., Madden, N. A., Slavin, R. E., & Farnish, A. M. (1987). Cooperative integrated read-

ing and composition: Two field experiments. Reading Research Quarterly, 22 (4), 433–454.

https://doi.org/10.2307/747701

Tipton, E. (2015). Small sample adjustments for robust variance estimation with meta-regression.

Psychological Methods, 20(3), 375–393. https://doi.org/10.1037/met0000011.

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of

Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03.



Wang, W., & Ross, S. M. (1999). Results for Success for All program. University of Memphis,

Center for Research on Educational Policy.

Wanzek, J., Vaughn, S., Scammacca, N., Gatlin, B., Walker, M. A., & Capin, P. (2016). Meta-anal-

yses of the effects of tier 2 type reading interventions in grades K-3. Educational Psychology

Review, 28(3), 551–576. https://doi.org/10.1007/s10648-015-9321-7.

Wolf, R., Morrison, J. M., Inns, A., Slavin, R. E., & Risman, K. (2020). Average effect sizes in

developer-commissioned and independent evaluations. Journal of Research on Educational

Effectiveness, 13(2), 428–447. https://doi.org/10.1080/19345747.2020.1726537

What Works Clearinghouse. (2014). Review protocol for beginning reading interventions version

3.0. Institute of Education Sciences, US Department of Education.

What Works Clearinghouse. (2020). Standards handbook (version 4.1). Institute of Education

Sciences, US Department of Education.

JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS 115