University of Mississippi University of Mississippi
eGrove eGrove
Honors Theses
Honors College (Sally McDonnell Barksdale
Honors College)
Spring 5-10-2023
An Application of the PageRank Algorithm to NCAA Football An Application of the PageRank Algorithm to NCAA Football
Team Rankings Team Rankings
Morgan Majors
University of Mississippi
Follow this and additional works at: https://egrove.olemiss.edu/hon_thesis
Part of the Algebra Commons, and the Applied Statistics Commons
Recommended Citation Recommended Citation
Majors, Morgan, "An Application of the PageRank Algorithm to NCAA Football Team Rankings" (2023).
Honors Theses
. 2999.
https://egrove.olemiss.edu/hon_thesis/2999
This Undergraduate Thesis is brought to you for free and open access by the Honors College (Sally McDonnell
Barksdale Honors College) at eGrove. It has been accepted for inclusion in Honors Theses by an authorized
administrator of eGrove. For more information, please contact [email protected].
An Application of the Pagerank Algorithm to NCAA
Football Team Rankings
Morgan Majors
Abstract
We investigate the use of Googles PageRank algorithm to rank sports teams. The
PageRank algorithm is used in web searches to return a list of the websites that are of most
interest to the user. The structure of the NCAA FBS football schedule is used to construct
anetworkwithasimilarstructuretotheworldwideweb. Parallelsaredrawnbetweenpages
that are linked in the world wide web with the results of a contest between two sports teams.
The teams under considerat ion here are the members of the 2021 Football Bowl Subdivision.
We achieve a total ordering of th e 2021 FBS teams by applying the PageRank algorithm
to the results of the regular and bowl seasons. A statistical method of correlation is used
to compare the final AP ranki n gs with PageRank models based on Margin of Victory and
Total Points Scored.
I
Dedication
I dedicate my thesis work to my parents, Penn and Ashley Majors. Their constant
support has propelled me to t h e finish line of thi s experience. They have stood by my side
since day one, and withou t them, I would not be the person I am t oday. The immense love
Ifeelfromthemdailyistrulyoverwhelming.
Aspecialfeelingofgratitudetomyfather,whoseloveformathematicsandsportsIinher-
ited. Some of my favorite memories have been alongside my father at a multitude of Ole Miss
sporting event s. The long nights spent at the dining roo m table while in elementary school
learning basic mathematics are still persistent m em ori e s. His encouragement throughout all
levels of my education has undoubtedly helped me to pursue my ambitions.
II
Acknowledgements
Iwishtothankmycommitteemembersfortheirgeneroustimeandexpertisespent
helping me make my thesis experience the best it could be. A special thanks to Ja m es
Reid who was always patient, knowledgeable, and encouragi n g when I needed assistance and
advice. This project could not have been completed without him. Thank you to Dr. Reid,
Dr. Sh ep p a r d so n , and Dr. Wilder for agreei n g to serve on my committee.
I would also like to acknowledge my family for the immense love they have always provided
me. My parents, brother, and grandparents have always shown sincere support toward my
academic goals and me. A special thanks to my brother, Penn Earl Majors, for always
pushing me to be a better version of myself. He has be en a constant pillar of support i n my
life. I also would like to thank my grandfather, Penn Majors, who inspired me to pursue
mathematics. As a former mathematics teacher, his wisdom always encouraged me to follow
in his footsteps. My grandmothers, Trudy Bomer and Mary Majors, have been my ultimate
encouragers throu gh out this journey.
I would like to thank everyone who has been i nvolved in this process. All of my former
teachers and professors have played a hand in where I am today. I am eternally grateful to
the Sally McDonnell Barksdale Honors College for the opportuniti es they have provided.
III
Contents
1 Introduction 2
1.1 College Football Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The PageRank Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 NCAA 2021 Football Season . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Mathematics of PageRank 14
2.1 A Graph Model of College Football . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Matrix Algebra and Ei g envectors . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Modifie d PageRank Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Kendall Rank Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Results 23
3.1 Margin of Victory Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Total Score Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 Appendix 34
1
Chapter 1
Introduction
College football is a profitable business with ni net een teams reportin g at least 100 million
dollars in Revenue in 2019 [8]. Hence i t is of intense interest to the Universities participatin g
in College Football t o achieve an accurate ranking of their perform an ce . This thesis considers
an approach to r a nki n g these teams that uses the Google PageRank algorithm for ranking
webpages. The ranking of nodes in networks and sports teams by an eigenv alue approach
is an active area of research (see, for example, [7, 9, 12, 18, 19, 20, 24, 27, 36, 38]). The
use of these ranking method s here is motivated by the 2008 doctoral dis ser t at i o n of Anjela
Yuryevna Govan [19].
We next discuss the organization of the pap er. In Chapter 1 we discuss the history of
College Football rank in g m et h ods. We then discuss the PageRank algorithm of Sergey Brin
and Larry Page that brought organiz at i o n to the world wide web. We next discuss the
widespread ap p l i cati on s of the al gori t hm to areas of academic and scientific i nterest. We
provide background on the NCAA 2021 Football Bowl Subdivision teams used in this study
to conclude Chapter 1.
In Chapter 2 we give the mathematical background of the results presented here. We
begin with the notion of the Wor l d Wide Web as a network or graph. We then discuss the
2
matrix theory from mathematics used to produce ranking eigenvectors by our algorithm.
Then the mathematical fo r mulation of the PageRank algori t h m is given. We discuss the
modification of the PageRank al g or i t h m which will be used to produce our results to co n cl u d e
Chapter 2.
In Chapter 3, we use the modified PageRank algorithm to rank the results of teams in the
2021 FBS College Football season using two methods. The first method takes into account
the Margin of Victory results of each team to compute its ranking. The second method is
based on the Total Score of each team in each of its games to produce its ranking. We then
compare the Margin of Victory and Total Score rankings with the final AP rankings for that
season. Finally, we make some conclusions and observatio n s about the eciency of these
ranking methods.
1.1 College Football Rankings
There are multiple ranking methods for College Football Teams that determine the order
of teams from b est to worst based on results. These ranking methods are important in
determining which teams will play in the most prestigious bowls. The most notable rank-
ing polls are the Associated Press, USA Today’s Coaches, Bowl Championship Series, and
College Football Playo Polls. The ranking of college football teams began in 1936 with the
release of the first Associated Press (AP) poll [29]. The fir st AP Poll included only the top
twenty teams, with Minnesota ranking at number one [11]. College football gained national
attention at the time of t h i s first poll. Foo t b al l was created in the northeast United States
and was initia l l y dominated by Ivy L eag u e universities. These schools not on l y dominated
the world of football physically, but the rules committee was also made up of Ivy League
alumni, coaches, and players. These schools were considered to have played a superior brand
of football. However, the first AP Poll illustrated how teams from other areas of the county
3
were now superior to the Ivy League teams. Minnesota, a Midwestern team, finished in the
one spot, with schools from the Deep South, West, and Great Plains finishing in the top five
places in the first poll [11].
The USA Today Coaches Poll that d eb u t ed in the 1950 season ranked t h e top 20 NCAA
football and b asketball teams. In 1990 the Coaches Poll conformed to the method of the
other p ol l s by ranking the top 2 5 teams. Until the 1973 season, the poll was released only
in December, after the conclusion of the regular season but before the bowl games began.
AscandalregardingcontroversialvotingpracticeswithintheCoachesPollarosein2005,
resulting in ESPN d ro p p i n g its sponsorship of this poll [1]. Th e Coaches Poll became the
USA Today Coaches poll in 2005 [1].
The Bowl Championship Series (BCS) Poll debuted in 1998 [ 29] . Th e BCS Poll was a
revolutionary idea that propelled the popula r i ty of college football. Prior to the BCS Poll,
the idea of a true championship game between the top two teams was uncommon [32]. This
was t h e reason for the creation of the BCS Poll. After naming two teams as Co-Nati on a l
Champions in back-to-back years in the 1990 and 1991 seasons, college football fans wanted
clarity on a true champion. Some eorts were made through organizations such as the
Coalition, the Alliance, and the Super Alliance to construct a number 1 versus number 2
National Championship matchup, but none of these organi za t i on s could get every conference
on board. The year of the first BCS did exactly what it was created to do, create the matchup
between the two best teams in the country. However, issues still arose such as when the BCS
poll claimed LSU as the 2003 Na t i on a l Champions, but the AP claimed USC as the best
team of the season. These issues led to the birth of the College Football Playo. The BCS
Poll bridged the gap between the poll era and the College F ootball Playo Poll [32].
All of the polls d i er in their ranking strategies, but th e AP, Coaches, and BCS Polls
have one thing in common: the method they use to rank the teams. All of these polls
use a committee, ranging in size, to construct a ranking. This is done by each committee
4
member casting votes for what team should be placed at 1, and so forth. The team that the
committee member thinks should rank at 1 recei ves 25 points, the 2 spot receives 24 points,
and so on until the team at 25 receives 1 point. Each poll releases a new ranking of teams
1-25 weekly. Today, the AP Poll committee i s made of 63 members, the Coaches Poll is 62,
BCS is 62, and CFP is 13. The CFP Poll diers from the rest because it does not construct
a ranking until well into the season [31].
The fi r st Co l l eg e Football Playo (CFP) Poll was released in 2014 [29]. While other
polls simp l y rank team s for the audience’s en g ag em ent, the CFP Poll has a great deal of
meaning. This poll sin g l e-h a n d ed l y chooses the top four teams that will play for the National
Championship. Another re sponsibility of the CFP Poll is to place teams that did not receive
an invitation to the playos into the New Year’s Bowls. Six games occur on New Year’s Eve
and New Year’s Day. These games are th e Cotton Bowl, Fiesta Bowl, Orange Bowl, Peach
Bowl, Rose Bowl Game, and Sugar Bowl. The two Playo Semifinal games rotate annually
through two of these games [2].
The ranking of college football teams is importa nt for the advancement of the sport and
its nationwide recognition. The top 4 teams in the College Football Playo Poll receive the
chance to compete for a National Champion s h i p . The team ranked at 1 plays against the 4-
ranked team, and the team ranked at 2 plays against the 3-ranked team. The winners of the
two semifinal games face o against one another i n the final Championship game. The 2022
National Championship was played on Janu ar y 10, 2022, between the Georgia Bulldogs and
the Alabama Crimson Tide. The fir st National Title by the Georgia Bulldogs since 1980 was
watched by 24.5 million viewers. [10]. This was the most-watched non-NFL sp or t i n g event
in two years, with a 19 percent increase in viewers since the 2021 National Ch a m p i on s h i p
[10]. Along with the national attention that a team receives from being selected into the
College Football Playos, teams receive large payou ts for their selection. The four teams
selected into playos receive a 6 million dol l a r payout from the CFB R evenue Pool[4]. No
5
additional payout occurs for being one of the last two remaining teams, but all expenses are
covered [4]. Simi l a r l y, popular bowls have a l a r ge payout [4]. With the financial implications
of rankings to college football teams, interest in their methodology continues to grow.
1.2 The Pa ge Ra nk Al go ri thm
The origins of the met h ods considered here can be found in the desire to find order in the
World Wide Web. The need for such an order arose in the late 20th century. There was
an explosion in the number of webpages that occurred during this time. For example, in
the years 1995, 1996, and 1997 the number of webpages increased from 23, 500 to 250, 601
to 1, 117, 255 (see https://www.internetlivestats.com/total-number-of-websites for
alivecountofthenumberofpagesontheweb,currentlycloseto2billion). SergeyBrin
and Larry Page approached the problem of organizing the web by developing a web search
engine in 1995. They call their web search engine Google. With this Google web search
engine, they sought to place link s to webpages of the greatest interest at the top of a search
query. Brin and Page were computer science doctoral students at Stanford University at
the time (see [22, Chapter 3]). They used their dorm rooms as oces for their business. A
tech report from 1998 [9] announced the development of the search engine. Google’s parent
company Al p h abet is currently valued at over $1.4 trillion [13].
The basis for the Google web search engine is the PageRank algori t h m . The World Wide
Web is organized by hyperlinks. Selecting a hyperlink to webpage Y that appears on webpage
X will send the reader to webpage Y. Each webpage typically contains links to several other
webpages. A webpage will be considered important if a link to it appears on other important
webpages. This reasoning appears to be circular because two webpages might be considered
to be important if each says the other is import ant. However, we show in Chapter 2 that
applying mathematical tools to this reasoning leads to a well-defined ranking of webpages
6
and therefore of sports teams.
We may think of the web as a network of webpages with links between some of the pages.
Auserofthewebcanbethoughtofas“surng”betweenthesepages.Thenetworkevolves
as the user at a particular page has certain probabilities of traveling to other linked pages.
Markov’s Theory is useful in describing the evolution of such a network. This theory can
be traced back to 19 06 with the work of the Russian mathemat i ci a n A.A. Markov [17]. A
Markov process describes a sequence of possibl e events where the probability of e ach event
only d epends on the state of the pr ev i ous. In order to assign ranks t o webpages with the
ranks being real numbers versus complex numbers, Brin and Page needed a result from
Linear Algebra of Perron [ 3 0] and Perron and Frobenius (see [22, p. 172]). It is important
to assign a real number rank to a web page as the set of real numb ers is ordered, e.g. the
number 0.3 is larger than the number 0.2. The set of complex numbers is not ordered. For
example, there is no way to know whether a complex number such as 1+3i is lar ger than
another complex number, so it is important to obtain ranking numbers that are positive real
numbers.
The Power Method [26], developed by von Mises and Pollaczek-Geiringer, was a key piece
of the puzzl e necessary to make the large matrix co m p u t at i o n s needed to rank webpages using
PageRank. Brin and Page needed to compute eigenvectors of large matrices (see Section 2.2)
to provide their rankings and adaptations of this method of updating ranks iteration by
iteration stabilized on well-defined ranking eigenvecto r s in a computable number o f steps
[22, Chapter 9]. All of these too l s were synthesized by Brin and Page in 1998 to create the
revolutionary PageRank algorithm. At the heart of this algorithm is the beautiful equation
given in 1.2.1.
T
=
T
(S +(1 )E)(1.2.1)
7
Some consid er this equation to be as important as equations fr o m the theory of relativi ty
such as E = mc
2
of Einstein and the Energy Wave Equation E = hf suggested by Planck.
Graham Farmelo [16] edited a book of beauti ful equations entitled ”It Must Be Beautiful:
Great Equations of Modern Science” t h at contains Einstein’s and Planck’s equations as well
as Equation 1.2.1 of Brin and Pa ge. We explain the derivation of this equation that un d er l i es
the organization of the world wide web in Section 2.3.
An important alternative algorithm to the PageRank algorithm is the Hypertext Induced
Topic Search algorithm, abbreviated HITS. The HITS algorithm was foun d ed in 1998 by Jon
Kleinberg. [22]. Thus this algori t h m is a contemporary of the PageRank algor i th m . The
HITS algorithm is a sy st em for ranking webpages while considering a webpage’s p o p u l ar i ty.
It is very simi lar to the PageRank algorithm except it relies on the qu er y to rank webpages
and produces two popularity scores for each web page whereas PageRank is not dependent on
the query and only produces one po p u l a ri ty score [28] . Also, HITS operates by associating
webpages with authorities and hubs. An authority is a webpage with many inlinks, and a hub
is a webpage with many outlinks [22]. Authorities and hubs have a d ependent relationship,
as good hubs result in good authorities and good authorities result in good hubs [28].
Methods of ranking webpages remain of consi d er ab l e interest. After t h e founding of
PageRank and HITS, less popular ranking methods were developed. A few years after the
success of PageRank and HITS, in 200 0, a Stochastic Approach to Link Structu r e Analysis
was developed [22]. This stochastic algorith m later became known as SALSA (Stochastic
Approach for Link-Structure Analysis) [15]. This approach has similari t i es to both PageRank
and HITS. It creates both hub and authority scores while being derived from Markov chains.
While it combines som e of the best features from both major algorithms, SALSA has one
major setback [15]. The major dependence on queries fro m SALSA is what keeps this
algorithm from being well-known. This is because the query-dependence results in a ranking
are not unique [22].
8
Anewwaveofrankingmethodswasintroducedaround2005throughmeta-searchengines
[22]. TracRank attempted to rank webpages by merging the results from several dierent
ranking algorithms [12]. It hoped to provide the most accurate and precise rankings through
the use of a multi-link algorithm. TracRank encouraged the mindset that the World
Wide Web does not consist of webpages connected by a single road, inste ad , it is billi on s of
webpages connected by billions of connections [22].
Although the PageRank algorithm is primarily used in web-search engines, it is used in
many dierent ways throughout society. The PageRank algorithm has been manipulated
and modified to fit a number of applications. A history of the development of the PageRank
algorithm and its uses are given in Table 1.1 (see [17]). We will discuss the applications of
this algorithm in the next section of the thesis.
Table 1.1: PageRank History
1906 Markov Markov th eor y
1907 Perron Perron theorem
1912 Frobenius Perron-Frobenius theorem
1929 von Mises and Pollaczek-Geiringer Power method
1941 Leontief Econometric model
1949 Seeley Sociometric model
1952 Wei Sport ranking model
1953 Katz Sociometric model
1965 Hubbell Sociometric model
1976 Pinksi and Narin Bibliometric model
1998 Kleinberg HITS
1998 Brin and Page PageRank
1.3 Applications
The PageRank algorithm is primar i l y associated with the Worl d Wide Web and the way
web pages are linked. However, the PageRank algorithm can be applied to many dierent
9
subjects. For example, the PageRank algorithm can be applied within t he fields of chemistry,
biology, literature, and social networks to organize the complex relationships between data.
In chemistry, PageRank is used to study molecules and their valences throu g h t he stu d y of
change within water molecules linked by hydrogen bonds. Ultimately, it is u se d to determi n e
if the molecules have a pot ential hydrogen bond to a solute molecule [18]. Teleportation
occurs to determine structural dierences. The development of PageRank within chemistry
has evolved into the abili ty to provide analytical derivatives and illustrati on s [39]. Through
dierent experiments, it can be determined which solvent rearrangement has the most impact
on the reaction pathway [39].
Common u ses of the PageRank algorit hm are in biology such as GeneRank [27], Protein-
Rank [36], and IsoRank[24]. All of these app l i c at i o n s study the dynamics of network data in
bioinformatics. They result in the sharing of locali ze d information about the graph produced
by the PageRank algorithm. The reactions between the genes, proteins, and isotopes are
transformed into interactions between nodes in the network, therefore allowing the PageR-
ank algorithm to be applied to rank objects in the network [18]. In a recent experiment, the
measurement of the graph centrality of the network associated with the PageRank algorithm
was used in the identification of cancer genes [14]. Experi m ental data is used to determine
the accuracy of the centrality within biology [14].
A fasci n a ti n g aspect of the PageRank alg o ri t h m is that it is used in the world of literature
through BookRank [37]. Elliot Yates and Louise Dixon in [37] state that the BookRank
algorithm can sh ed light on ...the central questions of liter at u r e: What are the most
important books? Which story is most li kely to occur within a novel? And what should I read
next?” BookRank is used to catalog books on the web. This is useful for the organization
of the multitude of written works. The idea of the BookRank algorithm is similar to the use
of PageRank in Google, b u t book titles are exchanged for webpages in thi s algorithm [18].
PageRank is used in social networking [5] through BuddyRank and TwitterRank. The
10
people interacting on social media platforms act as the nodes, while the relationships formed
on social networking sites act as the edges or links between nodes. It can make predictions
about who an individual might connect with next and evaluate th e centrality of estimating
someone’s social status. Also, it can estimate the potential influence of a person’s opinion
of the social network. Both of these ranking programs typically use a standard alpha of .85
to perform a reverse application of PageRank [18].
PageRank allows endless possibiliti e s for its uses. Whether they are used in everyday life
or as crucial research opp ortunities, it is undeniable that PageRank has millions of possible
applications. Biology, chemistry, literature, and social networks are just the surface of what
PageRank can do. Although it is typically used in web search engines and mathematics,
PageRank can be used in any subject whose interactions can be model ed by a network.
1.4 NCAA 2021 Football Season
The 2021 NCAA Football S ea so n began with a game between Illinois and Nebraska pl ayed
on August 28, 2021, at 1:20 pm Eastern time an d concluded with a game between Georgia
and Alabama at 8:00 pm Eastern time on January 10, 2022 [3]. NCAA Divisi o n I Footba l l
is categor i zed into two divisions, the Football Bowl Subdivisio n (FBS) and the Football
Championship Subdivision (FCS)[6]. Our research is centered around the 130 teams of the
2021 NCAA FBS se aso n. There are an addit i o n al 128 NCAA FCS teams that participated
in the 2021 season. Game s are pri m a ri l y pl ayed between a pair of FBS teams and a pair of
FCS teams although a limited number of games are played between teams of dierent levels.
The 2021 FBS teams are organized into 11 conferences and 7 independent schools as
follows. The Atlantic Coast Con fer ence includes the Atlantic division of Wake Forrest,
Clemson, North Carolina State, Louisville, Flordia State, Boston College, Syr a cu se, and
the Coa st al division of Pitt, Miami (FL), Virgini a, Virginia Tech, North Carolina, Georgia
11
Tech, and Duke. The American Athletic Conference includes Cincinnati, H ou st on , UCF,
East Carolina, Tulsa, SMU, Memphis, Navy, Temple, South Florida, and Tulane. The
Big 1 2 Conference includes Baylor, Oklahoma State, Oklahoma, Iowa State, Kansas Sta t e,
West Virgini a, Texas Tech, Tex as , Texas Christian, and Kansas. The Big Ten Conference
includes the Ea st division of Michigan, Ohio State, Michigan State, Penn State, Maryland,
Rutgers, Indiana, and the West divi si o n of Iowa, Minnesota, Purdue, Wi sco n si n , Illinois,
Nebraska, and Northwestern. The Conference USA includes the East division of Western
Kentucky, Marshall, O l d Dominion, Middle Tennessee State, Charlotte, Florida Atlant i c,
Florida International, and the West d i v i si on o f UTSA, UA B, Nor th Texas, UTEP, Rice,
Louisiana Tech, an d Southern Mississippi. The Mid-American Conference includes the East
division of Kent Sta te , Miami (OH), Ohio, Bowling Green, Bualo, Akron, and the West
division of Northern Illinois, Central Michigan, Toldeo, Western Michigan, Eastern Michigan,
and Ball State. The Mountain West Conference includes the Mountain division of Utah
State, Air Force, Boise State, Wyoming, Colorado State, and New Mexico, and the West
division of San Di ego State, Fresno State, Nevada, Hawaii, San Jose S t at e, and Nevada-
Las Vegas. The Pac-12 Conference includes the North division of Oregon, Washington
State, Oregon State, California, Washington, and St an fo r d , and the South division of Utah,
UCLA, Arizona State, Colorado, USC, and Arizona. The Southeastern Conference incl u d e s
the East division o f Georgia, Kentucky, Tennessee, South Carolina, Missouri, Florida, and
Vanderbilt, and the West division of Alabama, Ole Miss , Arkansa s, Texas A&M, Mis si ssi p p i
State, Auburn, and LSU. The S u n Belt Conference includes the East division of Appalachian
State, Coastal Carolina, Georgia State, Troy, and Georgia Southern, and the West division
of L o u i si a n a, Texas St at e, South Alaba m a, Louisian a -M on r oe, and Ar kansas State. The
Independent schools that are not in confer en ces are Notre Dame, New Mexico State, Liberty,
Massachusetts, Connecticut, Army, and BYU. [33]. Each team in a conference typically pl ays
most of its games against other teams in the same conference. Each team in the FBS typically
12
has one or two opponents in the FCS with the remaining games against other FBS team s.
Teams that are successful during the season with a winning record typically play in a Bowl
Game after the conclusion of the regular season. Since there are 130 FBS team s and most
teams play between 11 and 13 games, most pairs of teams do not play each other. This
observation is the reason that College Football Ranking systems are useful to compare pairs
of teams that do not play each other and the relative rankings of paired teams can be in
dispute.
13
Chapter 2
Mathematics of PageRank
We discuss the mathematical concepts used in this research in this chapter. The PageRank
algorithm uses an interesting mix of tools from a variety of mathematical disciplines. The
research uses models and c on ce p ts from Graph Theory, Matrix Algebra, a n d Statistics. An
understanding of the basic principles of all of these fields is necessary to develop the simple
premise of the PageRank Algorithm:
Axiom 2.1. The ranking of a team is proporti on al to th e sum of the r an ki n gs of the teams
that it defeats.
In order to examine the implications of Ax i o m 2.1 to team ranking we will begin with a
discussion of Graph Theory.
2.1 A Graph Model of College Football
Graph Theory is an area of mathemat i cal research that is concerned with mod el in g the
interactions between objects of interest in a subject using a graph. The Graph Theory
terminology used here can be found in (see [35]. A graph is a pair G =(V,E), where V is a
nonempty set called the vertex set of G and E is a set called the edge set. The elements of
14
Alabama
Mississippi
Auburn
TexasAM
Figure 2.1: A four team exa m p l e
E consist of pairs of vertices (the plural of vertex) of V .Agraphiscalleddirectedwhenthe
two vertices of each edge are given an order or equivalently a direction. Consider the di r ect ed
graph G =(V,E) whose diagram is given in Figure 2.1. The vertex set of thi s graph consists
of four elements labeled “Alabama”, “Auburn”, “ Mi ss i ssi p p i ” , and “TexasAM”. There are
edges between each distinct pair of vertices in G such as between the vertices labeled by
Alabama and Aubu r n . Further, the edges have directions indicated by arrows such as the
arrow that points from Auburn to Ala b am a . The graph G represents the results of all
the games played between the Uni versity of Alabama, Auburn University, th e University of
Mississippi, and Texas A&M University in the NCAA 2021 football season. T h e directed
edge from Auburn to Alabama in the gra p h G represents that Alabama defeated Auburn
during this season . Thus this small graph represents that Alabama defeat ed Auburn and
Mississippi, Auburn defeated Mississippi, Mississippi defeated Texas A&M, and Texas A&M
defeated both Auburn and Alabama. One may think of this as vertex Alabama voting for
vertex TexasAM, in the same way, a web page link “votes for” or “suggests” that the web
user visits the linked page.
Assign a vertex to each of the 130 NCAA FBS football teams from the 2021 season, and
15
Cincinnati
MiamiOH
MurrayState
Indiana
NotreDame
Temple
UCF
Navy
Tulane
Tulsa
SouthFlorida
SMU
EastCarolina
Houston
Marshall
CharlestonSouthern
Memphis
Rice
Grambling
Connecticut
Auburn
NichollsState
ArkansasState
MississippiState
Army
AbileneChristian
NorthTexas
LouisianaTech
TCU
FloridaAM
Akron
Wagner
MorganState
OldDominion
BoiseState
BethuneCookman
Florida
BostonCollege
Colgate
Massachusetts
Missouri
VirginiaTech
GeorgiaTech
Clemson
SouthCarolinaState
Syracuse
FloridaState
Louisville
WakeForest
SouthCarolina
IowaState
Duke
NorthCarolinaAT
Northwestern
Kansas
NorthCarolina
Miami
KennesawState
EasternKentucky
AppalachianState
CentralConnecticutState
NorthCarolinaState
Pittsburgh
GeorgiaState
Virginia
Wofford
Furman
Tennessee
NewHampshire
Ohio
Albany
Liberty
WilliamMary
Illinois
MiddleTennesseeState
Richmond
NorfolkState
Rutgers
Baylor
TexasState
TexasSouthern
WestVirginia
BYU
Texas
Oklahoma
KansasState
TexasTech
OklahomaState
Mississippi
NorthernIowa
UNLV
SouthDakota
WesternCarolina
Nebraska
Oregon
MissouriState
Duquesne
California
Louisiana
StephenFAustin
FIU
LIU
Charlotte
PennState
Minnesota
Idaho
WesternKentucky
Iowa
KentState
ColoradoState
Maryland
Howard
Michigan
WesternMichigan
Washington
NorthernIllinois
Wisconsin
OhioState
MichiganState
YoungstownState
Colorado
Purdue
Fordham
Buffalo
IndianaState
Utah
BallState
Villanova
OregonState
Delaware
EasternMichigan
ArizonaState
GardnerWebb
FloridaAtlantic
GeorgiaSouthern
UTEP
SoutheasternLouisiana
NorthCarolinaCentral
Monmouth
SouthernMississippi
Toledo
NorthwesternState
UTSA
Hampton
UAB
JacksonvilleState
NewMexicoState
NewMexico
Lamar
TennesseeMartin
Miami OH
AirForce
Bucknell
Arizona
UtahState
WashingtonState
IdahoState
USC
Yale
Campbell
Troy
Stanford
Bryant
BowlingGreen
WesternIllinois
CentralMichigan
RobertMorris
StFrancisPA
VirginiaMilitaryInstitute
Central Michigan
Maine
IllinoisState
SanJoseState
Nevada
Lafayette
Wyoming
FresnoState
CalPoly
UCLA
SanDiegoState
Hawaii
PortlandState
HoustonBaptist
Towson
SouthernUtah
NorthDakota
MontanaState
SacramentoState
NorthernColorado
StonyBrook
Vanderbilt
LSU
WeberState
Montana
Alabama
Mercer
Arkansas
Georgia
TexasAM
ArkansasPineBluff
AlabamaState
Samford
Kentucky
LouisianaMonroe
TennesseeChattanooga
McNeeseState
AustinPeay
TennesseeState
SoutheastMissouriState
EasternIllinois
TennesseeTech
SouthAlabama
PrarieViewAm
Elon
CoastalCarolina
CentralArkansas
Citadel
JacksonState
AlcornState
Southern
Figure 2.2: The 2021 NCAA Football Season as a Graph
to each of the FCS teams that played at least one FBS tea m dur i n g the season to form the
vertex set of a graph such as in Figure 2.2. We draw an edge between two vertices when the
corresponding teams played during eith er the regular or bowl season. This is the web-type
network for which we seek an ordering of the vertex set. This graph was produced using the
computer program Mathematica. One can note how th e FCS teams appear in the periphery
of the graph while teams in the interior of the graph are clustered mostly by conference and
geographic location.
16
2.2 Matrix Algebra and Eigenvectors
We develop the notion of a ranking vector in this section of the thesis. The Linear Algebra
notation used here follows the textbook Leon [23]. The equations given in the table of
Figure 2.2 . 1 arise naturally from Axi o m 2.1 and the graph given in Figur e 2.1. Here the
word proportional is modeled by the multiplication of the sums by the real number
1
.
For ex am p l e, in the first row of the table in Equation 2.2.1 the rank of Alabama contains
the proportionality constant
1
times the sum of the r an k s of Auburn and Mississippi, the
two teams in this list of four that Alabama defeated. This reasoning for computing rankings
appears to be circular so we will examine the implications of such a computation from the
perspective o f Linear Algebra. Equation 2.2.1 can be written using matrix multiplication as
in shown in Fi g u r e 2.4.
r(Al a bama )=
1
(0 · r(Alabama)+1· r(Auburn)+1· r(Mississippi)+0· r(TexasA&M ))
r(Auburn)=
1
(0 · r(Alabama)+0· r(Auburn)+1· r(Mississippi)+0· r(TexasA&M ))
r(Mississippi)=
1
(0 · r(Alabama)+0· r(Auburn)+0· r(Mississippi)+1· r(TexasA&M ))
r(TexasA&M)=
1
(1 · r(Alabama)+1· r(Auburn)+0· r(Mississippi)+0· r(TexasA&M ))
(2.2.1)
We let R be the ranking vector and A b e the matrix given in Figure 2 . 3. Then the matrix
equation R =
1
AR is obtained. So R represents a vector with four rows and one column.
The entries obtained in R will be the ranking score of Alabama, Auburn, Mississippi, and
Texas A&M, respectively, rea d from top to bottom. Th e vector R is called an eigenvector
of the matrix A while th e number is called an eigenvalue. There is a problem with this
method in finding the ranking vector R in that the entries may be complex numbers and not
real numbers. The field of complex numbers is not ordered as is the field of real numbers.
17
This potential problem is addressed in the next section of th e thesis. For example, the four
eigenvalues obt ai n ed as solutions in to the equation R = AR are 1.3953, .4604+1.1393i,
.4604 1.1393i,and.4746. There is a unique positive real eigenvalue, 1.3953. The
eigenvector corresponding to the eigenvalue 1.3953 is h.5516,.3213,.4484,.6256i
T
.Thiswould
correspond to Alabama having the second highest ranking of . 5 5 16 , Auburn having the l owest
ranking of .3213, Mississippi having the second lowest ranking of .4484, and Texas A&M
having the highest ranking of .6256 . It is natural that Alabama and Texas A&M are ranked
higher in this little four-team league as both o f tho se tea ms had two wins. Texas A&M is
ranked higher than Alabama as it defeated Alabama. Both Auburn and Mississippi had one
win and Auburn defeated Mississipp i , but Mississippi is ranked higher than Auburn. The
explanation for this eigenvalue ranking is that Mississippi d efeated the highest-ranked team
Texas A&M. This small example ranking will be generalized to rank large complex networks
such a s that given in Figure 2.2. The eigenvalue ranking presented here is simpler than the
PageRank algorithm presented in the next section of the thesis.
2.3 Modified Pa g eRa nk A lg or it hm
We next develop the Goog le PageRank algorithm for ranking the relative import an ce of
webpages. Consider the directed graph G given in Figure 2.5. Suppose this gra p h models
the web links of six websites labeled by 1 through 6 (see [22, Chapter 4]). So WebPage 1
has links to We b Pages 2 and 3, for examp l e. Following Axiom 2.1, Brin and Page let r(P
i
)
denote the rank of page P
i
.LetB
i
be the set of pages that point to page P
i
for each i.So
R =
2
6
6
4
r(Alabama)
r(Auburn)
r(Mississippi)
r(T exas A&M )
3
7
7
5
and A =
2
6
6
4
0110
0010
0001
1100
3
7
7
5
Figure 2.3: The ranking vector
18
B
2
= {P
1
,P
3
} and B
5
= {P
3
,P
4
} for example. So t h ey obtained the equation gi ven in 2.3.1
where |P
j
| denotes the total number of outlinks from page P
j
.
r(P
i
)=
X
P
j
2B
i
r
k
(P
j
)
|P
j
|
(2.3.1)
So consider the ranking of the page P
2
by the method of Equation 2.3.1. We obtain r(P
2
)=
r(P
1
)
|P
1
|
+
r(P
3
)
|P
3
|
=
P
1
2
+
P
3
3
.InordertostartthisprocessweneedranksforP
1
and P
3
.Sowelet
each of the six vertices of G have rank
1
6
.Thisassignmenthastheadvantagethateachrow
of H will sum to 1. So the rank of P
2
becomes
1
6
(
1
2
+
1
3
). In terms of vectors we can write
arowvector
T
= h
1
6
,
1
6
,
1
6
,
1
6
,
1
6
,
1
6
i where the rank i ng of vertex i is found in the ith place of
the vector.
Let
T
=(0)
T
and
(1)T
=
(0)T
H.Then,weobtainanewrankingvector
(1)
=h1/18, 5/36, 1/12, 1/4, 5/36, 1/6i.RepeatingthisprocessforK =1, 2,...,weobtain
(K+1)T
=
(K)T
H. Now, the su r p r i si n g thing noticed by Brin and Page is that the ran k i n g
vectors converge to a unique vector of positive, real entries. The reason for this convergence
is the Perron-Frobenius Theorem from 1908 [22, p. 172]. The description here is the Power
Method for computing the ranking Eigenvector.
Equation 2.3.1 can caus e a division by 0 if a page has no inl i n k s. This problem can be
solved by adding non-zero entries to H as follows. Let e be the 61vectorofonesandS be the
matrix Z = S +(1 )
1
6
ee
T
.ThenZ is called the Google Matrix.Thenumber is chosen
between 0 and 1, typically =0.85 is chosen. This parameter is called the teleportation
index.Soif =0.85, and we surf the web of the graph G with 85% probability, we follow the
2
6
6
4
r(Alabama)
r(Auburn)
r(Mississippi)
r(T exas A&M )
3
7
7
5
=
1
2
6
6
4
0110
0010
0001
1100
3
7
7
5
2
6
6
4
r(Alabama)
r(Auburn)
r(Mississippi)
r(T exas A&M )
3
7
7
5
Figure 2.4: Matrix form of Axiom 2.1
19
Figure 2.5: PageRank Example Graph G
hyperlink structure of H,andwith15%probability,werandomlyteleporttoanewvertex
with p r ob a b il i ty
1
6
.SowemultiplytherankingvectortransposeontheleftofG iteratively
until the ranking vector for the pages is obtained. In general, we use n instead of 6 for a
graph of n vertices. Th i s ranking can b e considered P
1
“voting” for P
2
with hal f-credi t as P
1
has two out nodes and P
3
“voting” for P
3
with one-third credit as P
3
has three out nodes.
Computing rankings for each vertex of the graph G of Figure 2.5 by Equation 2.3.1 we obtain
amatrixH in Figure 2.6 where the rank of each vertex P
i
is the sum of the entries in column
i of the matrix. The matrix H is call ed the Hyper l i nk matrix of the Graph G.Wecontinueto
iterate the process of applying Equation 2.3.1 where the new rank of each P
i
is computed by
multiplying the transpose of the current ranking vector by H.Theserankingvectorforthis
example converges to = h0.33550.02500.35020.10940.09300.0870i.IfR(i) is the ranking of
node i,thenR(1) = 0.3355, R(2) = 0.0250, R(3) = 0.3502, R(4) = 0.1094, R(5) = 0.0930,
and R(6) = 0.0870 . In this framework, smaller ranked nodes represent websites that are
more likely to be visited. So, the most important nodes in order would be 2, 6, 5, 4 , 1, and
3. One can imagine that Page 2 is the most likely visited page as one surfs this six-page web
as it has inlinks but not outlinks. On average, the nodes on the right side of the graph are
higher ranked than the nodes on the left side of the graph due to the link from Page 3 to
Page 5.
20
d 123456
0
B
B
B
B
B
@
1
C
C
C
C
C
A
101/21/20 0 0
20 0 0 0 0 0
31/31/30 01/30
40 0 0 01/21/2
50 0 01/201/2
60 0 0 1 0 0
Figure 2.6: Hyperlink mat ri x H
0
B
B
B
B
B
@
1
C
C
C
C
C
A
01/21/20 0 0
1/61/61/61/61/61/6
1/31/30 01/30
00001/21/2
0001/201/2
000100
Figure 2.7: Stochastic Ma tr i x S,Rowentriessumto1
0
B
B
B
B
B
@
1
C
C
C
C
C
A
1/61/61/61/61/61/6
1/61/61/61/61/61/6
1/61/61/61/61/61/6
1/61/61/61/61/61/6
1/61/61/61/61/61/6
1/61/61/61/61/61/6
Figure 2.8: The matrix 1/n ee
T
21
2.4 Kendall Rank Correlation
We discuss the Kendall Rank Correlation of two data sets in this section of the thesis.
Kendall Rank Correlation is a non-parametric measure of relationships between columns of
ranked data [25]. We use the Kendall Rank Correlation to test the similarities in the ordering
of our data [25].
Let x
1
, x
2
, ···, x
n
and y
1
, y
2
, ···, y
n
be two ranked lists of observations. Then the Kenall
Rank Correlation of t h e variables is given in Equation 2.4.1.
=
number of concordant pairs number of discordant pairs
number of pairs
(2.4.1)
Here a pair of observations ( x
i
,y
i
)and(x
j
,y
j
), i<j,isconcordantifthesortorderofx
i
and x
j
and the sort order of y
i
and y
j
agree, otherwise the pair is said to be discordant.
The number of pairs i s
n
2
. Note th a t if the rank i n gs are the sam e, then =1whileifthe
rankings are reversed, then = 1. The closer the rankings are to each other, the closer
will be to one. A small example of computing this parameter for the rankings 1, 2, 3, 4and
1, 4, 2, 3 is given next. Here
n
2
=
4
2
=6. Theconcordantpairsare12,13,14,and23while
the discordant pairs are 24 and 34. Thus =
42
6
=
1
3
.That is a positive measure that
the lists agree in order more than they disagree.
We can see the results of the Kendall Ran k in 3.2 and 3.3. The linear reg ress io n line
can be given in the equation Y = a + bX, where X is the explanatory variable and Y is the
dependent variable [34]. Plots above the linear regression li n e are more highly Y-ranked and
plots below th e linear regression line are more highly X-ranked. A graph with mo r e plots
clustered close to the l i n e of regression gives a more consistent evaluation. The plots in 3.2
show a closer relationship to the final AP ran k i n gs than 3.3 because of the lesser distance
between the linear regression line and the majority of plots.
22
Chapter 3
Results
We give a total ordering of all teams i n the FBS based on t h e results of the 2021 seaso n by
two dierent methods in this chapter. The methods used are adapted fro m th o se used i n
the 2008 Ph.D. thesis of Anjela Govan at North Carolina State University [19]. Following
Govan’s work and a subsequent paper of Laurie Zack, Ron Lamb, and Sarah Ball [38] that
appeared in the journal Involve we investigate the rankings produced using b oth a Margin
of Victory approach and a Total Points scored in a contest app r o ach. Our rankings are for
the 130 teams that were in the FBS of College football in 2021. The ranking of Gova n and
Zack et. al. was for the 32 teams in the Nation a l Football League. It was a challenge to
compile the data from the large pool of teams in the FBS.
3.1 Margin of Victory Rankings
We present a total ordering of the FBS football teams by using the computer program
Matlab with code as given in Figure 3.1. Note that the l i n e headings “Line 1” etc. were
not part of the code used, they are added for reference’s sake. The matrix R is at the heart
of th e ranking. This matrix had both rows an d columns indexed by the 130 FBS teams in
23
alphabetic order. For the matrix entry in the column labeled “Ole Miss” and the row lab e l ed
“LSU”, we accounted for the score of the game between LSU and Ole Miss as follows. Ole
Miss won the game 31 to 17 so a 14 was entered i n this entry as that was the Margin of
Victory for Ole Miss over LSU. So each team such as Ole Miss received a positive entry in
its column at each corresponding row where the team won a game against that opponent. If
ateamsuchasOleMisseitherlostagameordidnotplayagameagainsttheteamlabeling
the corresponding row of its column, then a “0” was entered. If two teams played twice,
then the scores were added for both games and a Mar g i n o f Vi ct or y positive number was
entered for the team with the most total points in i ts column. On occasion, some teams
played anoth er team more than once. Remat ches of this type occurred four times during the
season. The most notable example of a rematch was Alabama versus Georgia in the S EC
Championship and in the National Championship. In the SEC Championship, Al a b am a won
by a score of 41-24 with a Margin of Victory of 17. In the National Championship, Georgia
won by a score of 33-18 with a Margin of Victory of 15. Both Margins of Victory scores
were recorded. So returning to Line 1 of Figure 3.1 the matrix R features prominently. This
matrix is 130 rows by 130 columns. The matrix C is merely a square mat r i x of the same
dimension as R with the number 0.1ineachentry. WeaddC to R so that we do not divide
by 0 in the computations that follow as the matrix R has a 0 in the majority of its entries.
Line 2 of the code is necessary to make the matrix called Hv row stochastic where the entries
in each row sum to 1. L i n e 3 sets th e t el eportation index equ al to 0.85. This is the number
between 0 and 1 commonly used in applica t i on s of the PageRank algorithm. We found that
this was a good choice based on experiments with choices ranging from 0.5to0.99. We
note that in the matrix R we ignored the result s of matches between FBS and FCS teams as
these did not appreci ab l y aect rankings according to our experiments. In Line 6 we create
the Google matrix Ga.IfweimagineourselvesrandomlytravelingaroundtheMarginof
Victory network, then with prob a b i l i ty we follow nodes suggested by the numbers in Hv
24
Line 1: S=sum(R+C,1);
Line 2: Hv=(R+C)./S;
Line 3: alpha = 0.85;
Line 4: e = ones(130,1);
Line 5: v = e/130;
Line 6: Ga = alpha*Hv+(1-alpha)*v*e’;
Line 7: q = rand(130,1); % random initial vector
Line 8: q = q/sum(q); % normalise p to have unit sum
Line 9: for k=1:100
Line 10: q = Ga*q;
Line 11: end
Figure 3.1: Matlab Code for Finding a Pagerank Vector
where teams lose to other teams vote for th e m based on the Margin of Victory, while with
probability 1 we randomly jump around the network based on the numbers in the square
matrix v e
0
with each entry as 1/130. The initial vector q produced in Lines 7 and 8 is a
random vector with entries between 0 and 1 that is normalized so that i t s entries summed to
1 which is our initial ra n k i n g. An amazing aspect of this algorithm is that it doesn’t matter
what is our initial ranking. In Line 10 of the algorithm, we iterate q by left multiplying it by
the matrix Ga 100 times. Eventually, the numbers in q stabilize to a fi n al ranking vector.
In Tab l e 3.1 we list the top 25 teams obtained by the algori th m of Figure 3.1. The q value
obtained for each team is listed there. The complete rankings for all 130 teams are given
in the Appendix. In this table, a smaller q value produces a better ranking. Note that the
algorithm could be adapted to a transposed matrix so that the larger q value would produce
a bett er ranking. We also give the Final AP Top 25 in th e last column for comparison’s
sake.
The top 25 in Table3.1 pro d u ced some expected results and some surprising results.
Georgia and Alabama are high l y ranked in both pol l s as expected. Teams such as Notre
Dame an d Coastal Carolina seem to be more h i g h l y ranked than would be expected. Note
that the 2021 coach of Notre Dame, Brian Kelly, received a large increase in salary to join LSU
25
Team q-Value AP Top 25
1 Notre Dame 0.00194679056358042 Georgia
2 Georgia 0.00196000393177937 Alabama
3 Alabama 0.00196797504119794 Michigan
4 Coastal Carolina 0.00196988141542532 Ohio State
5 Cincinnati 0.00198379033469467 Baylor
6 Oklahoma State 0.00200420842507026 Oklahoma State
7 Michigan 0.00202229517821375 Cincinn at i
8 Oklahoma 0.00206023741216896 Notre Dame
9 Clemson 0.00207774962269084 Oklahoma
10 Baylor 0.0021016193527964 Michigan State
11 Ohio State 0.00220550673438769 Ole Miss
12 Air Force 0.00229666661999109 Arkansas
13 BYU 0.00229756915826271 Utah
14 Pi t t sbu r g 0.00230684479094631 Pitt
15 Michigan State 0.0023120444199844 NC State
16 Penn State 0.00241098386284439 Clemson
17 Ole Miss 0.00246464202733623 Minnesota
18 Arkansas 0.00248728348591432 Wisconsin
19 East Carolina 0.00250162674755332 Purdue
20 NC State 0.00250988269888323 Tennessee
21 Tex as A&M 0.00252239514451533 Kentucky
22 Boise State 0.00253635971904086 Iowa
23 Wake Forest 0.00255176433875926 Wake Forrest
24 Kentucky 0.00255453226853235 Utah State
25 SMU 0.00260051664523215 San Diego State
Table 3.1: Margin of Victory
26
after the 2021 season. Similarly, the coach of Coastal Carol i n a, Jamey Chad well, received a
large increase in salary to join Lib erty after the 2022 season. The quality of these two teams’
seasons was noticed by the coll e ge footba l l community. Now we discuss a statist i ca l method
for comparing the PageRank list with the AP List. Note that we compared all 130 teams in
both lists.
3.2 Total Score Ranking
We constructed the Total Score Rank i n g very similarly to Margin of Victory. This method
diers from Margin of Victory because it uses two weighted arrows to represent each team’s
score rather than one for th e Margin of Victory. [38]. Using the same example in Chapter
3.1, Ole Miss defeated LSU 31 to 17. For the Total Score Ranking, a 31 would be placed in
the matrix entry in the column labeled ”O l e Miss” and the row labeled ”LSU”. Similarly, a
17 would be placed in the matrix entry in the column labeled ”LSU” and the row labeled
”Ole Miss”. This way Ole Miss was positively recorded for scoring 31 points and defeating
LSU and LSU was recorded for scoring 17 points. If a team such as Ole Miss did not play a
game against the team labeling the corresponding row of its column, then a ”0” was entered.
If two teams played twice, then the total number of points scored by t hat team was added
together and placed in the corresponding matrix entry of the row and column. The same
code used in the Margin of Victory computations was u sed for Tot a l Score Rankings, se e
below. The only change that was made was the matrix R.Weimporteda130rowby130
column matrix with the Total Score entries. Our code for the Matlab computation of the
Margin of Victory Ran k i n g used q.Weusedp to note the dierence in the matrix impo r te d .
Eventually, the numbers in p stabilize to a final ranking vector.
In Tabl e 3.2 we li st the top 2 5 teams obtained by the algorithm of Figur e 3.4. The p value
obtained for each team is listed there. The complete rankings for all 130 teams are given
27
Figure 3.2: Margin of Victory Kendall’s Tau
28
in the appendix. In this table, a smaller p value produces a better ranking. Note that the
algorithm could be applied to a transposed matrix so that the larger p value would produce
a bett er ranking. We also give the Final AP Top 25 in th e last column for comparison’s
sake.
The top 25 in Table 3.2 produced some ex pected resul t s and some sur p r i si n g results.
Georgia and Alabama are hig h l y ranked in both polls as expected. Teams such as Liberty
and Texas A& M seem to be m or e highly ranked than would be ex pected. Note t h a t the
head coach of the Liberty Fla m s, Hugh Freeze, received a large increase in salary to join
the Auburn Tigers after the 2022 season. Ji mbo Fisher, the head co ach of the Tex as A&M
Aggies, received a four-year contract extension following th e 2020 season that raised his
annual salary by 1.5 milli o n dollars [21]. The conclus ion s for Total Score dier from the
results of Margin of Victory. This is because not all of the teams shown in 3.2 represent
teams with a heavy win season. While many of the teams listed in 3.2 did win the majority
of their games, having a winning schedule was not an exclusive factor, as it was in Margin
of Victory. The teams listed in 3.2 scored many points, whether they were victorious in the
game or not. Teams in the Margin of Victory table outscored their opponents the most,
while teams in the Total Score table scored many points, regardless of a win or loss.
The top 25 in Table3.2 pro d u ced some expected results and some surprising results.
Georgia and Alabama are hig h l y ranked in both polls as expected. Teams such as Liberty
were n o t as expe ct ed to be ranked highly. Note that the head coach of the Liberty Flames,
Hugh Freeze, received a large salary i n cr ea se by joining the Auburn Tigers afte r the 2022
season. Jimbo Fisher, the head coach of the Texas A&M Aggies, received a four-year contract
extension following the 2 0 20 season that raise d his annual salary by 1.5 million dollars [21].
The conclusions for Tota l Score dier from the results of Margin of Victory. This is because
not all of the teams shown in 3.2 represent team s with a heavy win season. While m any o f
the teams listed in 3.2 did win the majority of their games, having a winning schedule was
29
Figure 3.3: Total Score Kendall’s Tau
30
Line 1: S=sum(H+C,1);
Line 2: Hv=(H+C)./S;
Line 3: alpha = 0.85;
Line 4: e = ones(130,1);
Line 5: v = e/130;
Line 6: Ga = alpha*Hv+(1-alpha)*v*e’;
Line 7: p = rand(130,1); % random initial vector
Line 8: p = p/sum(p); % normalise p to have unit sum
Line 9: for k=1:100
Line 10: p = Ga*p;
Line 11: end
Figure 3.4: Matlab Code for Finding a Pagerank Vector
Team p-Value AP Top 25
1 Liberty 0.00304746969912415 Georgia
2 Georgia 0.00381204199025146 Alabama
3 Wisconsin 0.00407542269923828 Michigan
4 Penn State 0.00417366519034118 Ohio State
5 Clemson 0.00419166990380779 Baylor
6 TexasA& M 0.0042417589958973 Oklahoma State
7 Nebraska 0.00473118334230752 Cincinnati
8 Oklahoma State 0.00486888996498965 Notre Dame
9 Michigan 0.00488610520420468 Oklahoma
10 NC State 0.00497641969495136 Michigan State
11 Alabama 0.00521606093022588 Ole Miss
12 Purdue 0.00530534261905736 Arkansas
13 Iowa 0.00533300262674794 Utah
14 Minnesota 0.00534577785603451 Pitt
15 Kansas State 0.00536983025416077 NC State
16 Iowa State 0.0054653187130986 Clemson
17 Aubur n 0.00563143171890239 Minnesota
18 Syracuse 0.00566076793675136 Wisconsin
19 Illinois 0.00574882147723986 Purdue
20 Arkansas 0.00575824184598954 Tennessee
21 Florida Stat e 0.00576672770807803 Kentucky
22 Ole Miss 0.00592753339050079 Iowa
23 Michigan State 0.0059897626199456 Wake Forrest
24 Californi a 0.00601905226057784 Utah State
25 Florida 0.0061003127503729 San Diego State
Table 3.2: Total Score
31
not an exclusive factor, as it was in Margin of Victory. The teams listed in 3.2 scored many
points, whether they were victorious in the game or not. Teams in the Marg i n of Victory
table outscored their opponents the most, while teams in the Total Score table scored many
points, regardless of a win or loss.
3.3 Conclusions
The Google PageRank alg or i t h m proved to be an accurate and eect i ve tool for providing
rankings that corre l at ed well with th e Final AP ranking s. The Margin of Victory method
provided a better correlation wit h the AP ranki n g s than did the Total Score rankings. Ma t h -
ematica was an eective tool for illu st r ati n g t h e network invol ved in the algorithm. The
combination of Matlab and the power method is a better tool for computing matrix eigen-
values of large matrices such as the 130 by 130 NCAA FBS season results. The FBS versus
FCS matchups did not appreciably aect the rankings of the teams and were ignored. This
observation was verified computationally and supported by previous research [20]. Future
directions for research could be the eect of first downs, turnovers, yards gained, passing
yards, running yards, and many other game parameters to see which factors contribute the
most to team success.
The metho d of applying the PageRank al gor i t hm to rank sports teams can be directed
to many future research opportunities. This method can be used to produce and compute
rankings for any sport. In a similar approach, this ideology could be applied to any factor
regarding sports. For example, time of possession, turnovers, total yardage, yards gained
by pass and run, points gained by eld goals, time of possession, and many other factors
could be applied to the PageRank met h od. Another way to further research the ranking of
sports teams would be the changing of th e value of . Through o u t our resea rch, we used a
standard of 0.85 to evaluate the teleportation index. The value of could range from 0
32
to 1. Changing the value of allows for a dierent ranking vector to be obtained.
33
Chapter 4
Appendi x
A complete list of the Margin of Victory PageRank rankings of the 202 1 FBS footb al l season
is given next.
Air Force Falcons 0.002297 Akron Zips 0.061325 Alabama Crimson Tide 0.001968 Ap-
palachian State Mountaineers 0.002951 Arizona Wildcat s 0. 01 574 0 Ari zo n a St a te Su n D ev -
ils 0.004497 Arkansas Razorbacks 0.002487 Arkansas State Red Wolves 0.015871 Army
Black Knights 0.003912 Au b u r n Tigers 0.003000 Ball State Cardinals 0.008654 Baylor Bear s
0.002102 Boise State Broncos 0.002536 Boston College Eagles 0.004283 Bowling Green
Falcons 0.057965 Bualo Bulls 0.028531 BYU Cougars 0.002298 California Golden Bears
0.014806 Central Michigan Chippewas 0.004611 Charlotte 49ers 0.013100 Cincinnati Bearcats
0.001984 Clemson Tigers 0.002078 Coastal Carolina Chanticleers 0.001970 Colorado Buf-
faloes 0.012612 Colorad o State Rams 0.009721 Duke Blue Devils 0.012657 East Carolin a
Pirates 0.002502 Eastern Michigan Eagles 0.012743 FIU Panthers 0.030598 Florida G at o r s
0.003968 Flo ri d a Atlantic Ow l s 0.006102 Florid a State Seminol es 0.002742 Fresno State Bull-
dogs 0.002873 Georgia Bulldogs 0.001960 Georgia Southern Eagles 0.009894 Georgia State
Panthers 0.005741 Georgia Tech Yellow Jackets 0.004574 Hawaii Rainbow Warriors 0.009230
Houston Cougars 0.002779 Illinois Fighting Illini 0.004247 Indiana Hoosiers 0.007545 Iowa
34
Hawkeyes 0. 0 02 6 54 Iowa State Cyclones 0.005514 Kansas Jayhawks 0.014448 Kansas State
Wildcats 0.002845 Kent State Golden Flashes 0.008974 Kentucky Wi l dc at s 0.002555 Liberty
Flames 0.004807 Louisiana Ragin’ Cajuns 0.00310 3 Louisiana-Monroe Warhawks 0.018261
Louisiana Tech Bulldogs 0.013464 Louisville Cardinals 0.002998 LSU Tigers 0.003185 Mar-
shall Thundering Herd 0.00359 6 Maryland Terrapins 0.004653 Memphis Tigers 0 . 00 4 63 7
Miami (FL) Hurricanes 0.002730 Miami (OH) RedHawks 0.004 82 5 Michigan Wolverines
0.002022 Michigan State Spartans 0.002312 Middle Tennessee Blue Raiders 0.006750 Min-
nesota Golden Gophers 0.009449 Mississippi State Bulldogs 0.004004 Missouri Tigers 0.003702
Navy Midshipmen 0.005613 Nebraska Cornhuskers 0.003237 Nevada Wolf Pack 0.003568
New Mexico Lobos 0.012356 New Mexico State Aggies 0.015929 North Carolina Tar Heels
0.006411 North Carolina State Wolfpack 0.002510 North Texas Mean Green 0.010420 North-
ern Illinois Huskies 0.00 4 43 0 Northwestern Wildcats 0.011050 Notre Dam e Fighting Irish
0.001947 Ohio Bobcats 0.031091 Ohio State Buckeyes 0.002206 Oklahoma Sooners 0.002060
Oklahoma State Cowboys 0.002004 Old Dominion Monarchs 0.004932 Ole Miss Rebels
0.002465 Oregon Ducks 0. 00 44 5 4 Oregon State Beavers 0.0058 5 8 Penn State Nittany Lion s
0.002411 Pittsburgh Panthers 0.002307 Purdue Boilerm a kers 0.002760 Rice Owls 0.0116 68
Rutgers Scarlet Knights 0.007779 San Diego State Aztecs 0. 00 30 1 9 San Jose S t at e Spar-
tans 0.009861 SMU Mustangs 0.0026 01 South Alabama Jaguars 0.01218 4 South Carolina
Gamecocks 0.003553 South Florida Bul l s 0.006962 Southern Miss Golden Eagles 0.018827
Stanford Car d i n al 0.010281 Syracuse Orange 0.004115 TCU Horned Frogs 0.005330 Tem-
ple Owls 0.018647 Tennessee Volunteers 0.003038 Texas Lon g h or n s 0. 00 9 09 6 Texas A&M
Aggies 0.002522 Texas State Bobcats 0.0 1077 0 Texas Tech Red Raiders 0.006355 Toledo
Rockets 0.004997 Troy Trojans 0.01 160 9 Tulane Green Wave 0.005181 Tulsa Gol d en Hu r ri -
cane 0.002877 UAB Blazers 0.006338 UCF Knights 0.003262 UCLA Bruins 0.002863 UConn
Huskies 0.026607 UMass Minutemen 0.022084 UNLV Rebels 0.008076 USC Trojans 0.008906
UTEP Miners 0.003932 UTSA Road r u n n er s 0.004133 Utah Utes 0.002646 Utah State Aggies
35
0.005562 Vanderbilt Commodores 0.010507 Vir g i n i a Cavaliers 0.004298 Virginia Tech Hokies
0.005800 Wake Forest Demon Deacons 0.002552 Washington Huski es 0.004219 Washington
State Cougars 0 . 00 2 64 7 West Virginia Mountaineers 0.003823 West er n Kentucky Hilltop-
pers 0.006111 Western Michigan Broncos 0.005934 Wisconsin Badgers 0.002887 Wyoming
Cowboys 0.012582
36
A complete list of th e Points Scored PageRank rankings is given below.
Air Force Falcons 0.006603 Akron Zips 0.013724 Alabama Crimson Tide 0.005241 Ap-
palachian State Mountaineers 0.007499 Arizona Wildcat s 0. 00 789 8 Ari zo n a St a te Su n D ev -
ils 0.006329 Arkansas Razorbacks 0.005779 Arkansas State Red Wolves 0.012095 Army
Black Knights 0.007541 Au b u r n Tigers 0.005650 Ball State Cardinals 0.009814 Baylor Bear s
0.006459 Boise State Broncos 0.006511 Boston College Eagles 0.006776 Bowling Green
Falcons 0.010981 Bualo Bulls 0.010814 BYU Cougars 0.007602 California Golden Bears
0.006032 Central Michigan Chippewas 0.012192 Charlotte 49ers 0.010518 Cincinnati Bearcats
0.006180 Clemson Tigers 0.004215 Coastal Carolina Chanticleers 0.009016 Colorado Buf-
faloes 0.006502 Colorad o State Rams 0.008345 Duke Blue Devils 0.009839 East Carolin a
Pirates 0.006889 Eastern Michigan Eagles 0.009999 FIU Panthers 0.007864 Florida G at o r s
0.006119 Flo ri d a Atlantic Ow l s 0.006928 Florid a State Seminol es 0.005791 Fresno State Bull-
dogs 0.007081 Georgia Bulldogs 0.003839 Georgia Southern Eagles 0.009574 Georgia State
Panthers 0.009059 Georgia Tech Yellow Jackets 0.007608 Hawaii Rainbow Warriors 0.010284
Houston Cougars 0.008187 Illinois Fighting Illini 0.005771 Indiana Hoosiers 0.006566 Iowa
Hawkeyes 0. 0 05 3 57 Iowa State Cyclones 0.005490 Kansas Jayhawks 0.009831 Kansas State
Wildcats 0.005392 Kent State Golden Flashes 0.013347 Kentucky Wi l dc at s 0.006186 Liberty
Flames 0.003070 Louisiana Ragin’ Cajuns 0.00713 5 Louisiana-Monroe Warhawks 0.009357
Louisiana Tech Bulldogs 0.009366 Louisville Cardinals 0.006815 LSU Tigers 0.006599 Mar-
shall Thundering Herd 0.00685 5 Maryland Terrapins 0.007592 Memphis Tigers 0 . 00 9 48 2
Miami (FL) Hurricanes 0.006310 Miami (OH) RedHawks 0.008 86 5 Michigan Wolverines
0.004910 Michigan State Spartans 0.006014 Middle Tennessee Blue Raiders 0.007790 Min-
nesota Golden Gophers 0.005361 Mississippi State Bulldogs 0.006659 Missouri Tigers 0.008643
Navy Midshipmen 0.007442 Nebraska Cornhuskers 0.004760 Nevada Wolf Pack 0.008679
New Mexico Lobos 0.008127 New Mexico State Aggies 0.010140 North Carolina Tar Heels
0.007635 North Carolina State Wolfpack 0.005000 North Texas Mean Green 0.007913 North-
37
ern Illinois Huskies 0.01 1 97 2 Northwestern Wildcats 0.006766 Notre Dam e Fighting Irish
0.006568 Ohio Bobcats 0.010453 Ohio State Buckeyes 0.006285 Oklahoma Sooners 0.007265
Oklahoma State Cowboys 0.004892 Old Dominion Monarchs 0.008413 Ole Miss Rebels
0.005949 Oregon Ducks 0. 00 80 7 0 Oregon State Beavers 0.0079 4 3 Penn State Nittany Lion s
0.004200 Pittsburgh Panthers 0.007238 Purdue Boilerm a kers 0.005332 Rice Owls 0.0090 68
Rutgers Scarlet Knights 0.006489 San Diego State Aztecs 0. 00 70 0 2 San Jose S t at e Spar-
tans 0.007742 SMU Mustangs 0.0081 27 South Alabama Jaguars 0.00922 6 South Carolina
Gamecocks 0.006317 South Florida Bul l s 0.008463 Southern Miss Golden Eagles 0.007797
Stanford Car d i n al 0.008242 Syracuse Orange 0.005685 TCU Horned Frogs 0.007999 Tem-
ple Owls 0.011338 Tennessee Volunteers 0.007035 Texas Lon g h or n s 0. 00 8 13 5 Texas A&M
Aggies 0.004266 Texas State Bobcats 0.0 1062 9 Texas Tech Red Raiders 0.007797 Toledo
Rockets 0.008604 Troy Trojans 0.00 835 0 Tulane Green Wave 0.007789 Tulsa Gol d en Hu r ri -
cane 0.008222 UAB Blazers 0.007027 UCF Knights 0.007679 UCLA Bruins 0.007636 UConn
Huskies 0.009882 UMass Minutemen 0.010104 UNLV Rebels 0.008571 USC Trojans 0.008731
UTEP Miners 0.007895 UTSA Road r u n n er s 0.008369 Utah Utes 0.006886 Utah State Aggies
0.008797 Vanderbilt Commodores 0.009217 Vir g i n i a Cavaliers 0.007191 Virginia Tech Hokies
0.007171 Wake Forest Demon Deacons 0.007518 Washington Huski es 0.006219 Washington
State Cougars 0 . 00 7 74 7 West Virginia Mountaineers 0.006717 West er n Kentucky Hilltop-
pers 0.009320 Western Michigan Broncos 0.009984 Wisconsin Badgers 0.004102 Wyoming
Cowboys 0.009729
38
Bibliography
[1] Coaches’ poll, https://americanfootballdatabase.fandom.com/wiki/Coaches%27_
Poll.
[2] College football playo history, https://collegefootballplayoff.com/sports/
2016/9/30/overview,journal=CollegeFootballPlayo.
[3] 2021 year summary: College football at sports, https://www.sports-reference.com/
cfb/years/2021.html,journal=Reference.com,2021.
[4] College football playo payouts 2022-2023, https://businessofcollegesports.com/
college-football-playoff-payouts/, January 2022.
[5] Bahman Bah m a n i , Abdur Chowdhury, and Ashish Goel, Fast incremental and person-
alized pagerank,ProceedingsoftheVLDBEndowment4 (2010), no. 3, 173–184.
[6] Stan Becton, Everything you need to know about the 2021 fcs championship semifi-
nals, https://www.ncaa.com/news/football/article/2021- 12 -1 4 /2 02 1 -fcs -championship-
everything-you-need-know-abou t - 20 21 -f cs-championship-semifinals, December 2021.
[7] Joost Berkhout and Bernd F. Heidergott, Ranking nodes in general networks: a Markov
multi-chain approach,DiscreteEventDyn.Syst.28 (2018), no. 1, 3–33. MR 3772939
[8] Ryan Brewer, College football rankings, https://graphics.wsj.com/table/NCAA_
2019.
39
[9] S. Brin , L. Page, R. Motwam i , and T. Winograd, The pagerank citation ranking: Bring-
ing order to the web, Tech. report, S t a n for d University, 1998.
[10] Amanda Brooks, 2022 college football playo national champi-
onship nets 22.6 million viewers, cable’s top telecast in two
years, https://espnpressroom.com/us/press-releases/2022/01/
2022-college-football-playoff-national-championship-nets-22-6-million-viewers-cables-top-telecast-in-two-years/,
January 20 22 .
[11] Matt Brown, What we can learn from college football’s very first ap poll, where
minnesota was no. 1, http://www.sbnation.com/college-football/2018/10/19/
18001148/ap-top-25-poll-history,October2018.
[12] Stephen K. Callaway, Internet banking and performance, American Journal of Business
26 (2011), no. 1, 12–25.
[13] Raju Choudhary, Google net worth 2022: Top 5 google (alphabet) shareholderrs, https:
//caknowledge.com/google-net-worth/,September2022.
[14] Christop her Engstr¨om, P agerank in evolving networks and applications of graphs in
natural language processing and biology, Ph.D. thesis, alardalen University, 2016,
p. 1–256.
[15] Ayman Farahat, Thomas LoFaro, Joel C. Miller, Gregory Rae, and Lesley A. Ward,
Authority rankings from hits, pagerank, and salsa: Existence, uniqueness, and eect of
initialization, SIAM Journal on S ci entific Computing 27 (2006), no. 4, 1181–1201.
[16] Graham Farmelo (ed.), It must be beautiful,GrantaPublications,London,2003,Great
equations of modern science. MR 1990427
[17] Massimo Franceschet, Pagerank,CommunicationsoftheACM54 (2011), no. 6, 92–101.
40
[18] David F. Gleich, PageRank beyond the web, SIAM Rev. 57 (2015), no. 3, 321–363. MR
3376760
[19] Anjela Yuryevna Govan, Ranking theory with application to popular sports,ProQuest
LLC, Ann Arb or , MI, 2008, Thesis (Ph.D.)–North Carolina State University. MR
2712816
[20] Ilse C. F. Ipsen and Teresa M. Selee, PageRank computation, with special attention to
dangling nodes, SIAM J. Matrix An al . Appl. 29 (2007), no. 4, 1281–1296. MR 2369296
[21] Sam Kahn Jr, https://theathletic.com/4201182/2021/09/01/
texas-am-jimbo-fisher-agree-to-4-year-extension-through-2031-and-raise/,
August 2021.
[22] Amy N. Lan g vi l l e and Carl D. Meyer, Google’s PageRank and beyond: the science of
search engine rankings, Pri n cet o n University Press, Princeton, NJ, 2006. MR 2262054
[23] Steven J. Leon, Linear algebra with applications, Macmillan, Inc., New York; Collier
Macmillan Publ i sher s, London, 1980. MR 566769
[24] C.-S. Liao, K. Lu, M. Baym, R. Singh, and B. Berger, Isorankn: Spectral methods
for global alignment of multiple protein networks,Bioinformatics25 (2009), no. 12,
i253–i258.
[25] Joseph Magiya, Kendall rank correlation explained., https://towardsdatascience.
com/kendall-rank-correlation-explained-dee01d99c535, Ju n e 2019.
[26] Richard von Mises and Hilda Pollaczek-Geiringer, Praktische verfahren der gle-
ichungsau߬osung, VDI-Verlag, 1929.
41
[27] Julie L Morrison, Rainer Breitl i n g , Desmond J Higham, and David R Gilbert, Gener-
ank: Using search engine technology for the analysis of microarray experiments,BMC
Bioinformatics 6 (2005), no. 1.
[28] Saeko Nomura, Satoshi Oyama, Tetsuo Hayamizu, and Toru Ishida, Analysis and im-
provement of hits algorithm for detecting web communities, Systems and Computers in
Japan 35 (2004), no. 13, 32–42.
[29] Spencer Parlier, College football history: Notable firsts and mile-
stones, https://www.ncaa.com/news/ncaa/article/2020-01-31/
college-football-history-notable-firsts-and-milestones, June 2022.
[30] Oskar Perron, Zur Theorie der Matrices, Math. Ann. 64 (1907), no. 2, 248–263. MR
1511438
[31] Wayne Staats, College football rankings: Every poll explained and how
they work, https://www.ncaa.com/news/football/article/2019-07-08/
college-football-rankings-every-poll-explained-how-they-work, August
2022.
[32] Andy Staples, The chaos and consequences of the bcs, 20 years af-
ter its inaugural season, https://www.si.com/college/2018/07/09/
bcs-history-20th-anniversary-controversy-tennessee-florida-state, July
2018.
[33] Richard Tovar, Ncaa college football 2021: Lists of the fbs
conferences for this season, https://bolavip.com/en/sports/
ncaa-college-football-2021-lists-of-the-FBS-conferences-for-this-season-20210824-0024.
html,2021.
42
[34] Yale University, Linear regression, http://www.stat.yale.edu/Courses/1997-98/
101/linreg.htm.
[35] Dougl a s B. West, Introduction to graph theory, Prentice Hall, Inc., Upper Saddle River,
NJ, 1996. MR 1367739
[36] Gang Wu, Ying Zhang, and Yimin Wei, Accelerating the Arnoldi-type algorithm for
the PageRank problem and the ProteinRank problem, J. Sci. Comput. 57 (2013), no. 1,
74–104. MR 3095291
[37] Elliot J. Yates and Louise C. Dixon, P agerank as a method to rank biomedical literature
by importance,SourceCodeforBiologyandMedicine10 (2015), no. 1.
[38] Laurie Zack, Ron Lamb, and Sarah Ball, An application of Google’s PageRank to NFL
rankings,Involve5 (2012), no. 4, 463–471. MR 3069048
[39] Tiecheng Zhou, Ernesto Martinez-Baez, Gregory Schenter, and Auro r a E. Clark, Pager-
ank as a collective variable to study complex chemical transformations and their energy
landscapes, The Journal of Chemical Physics 150 (2019), no. 13, 134102.
43