SQL Practice: Analyzing fantasy football data from the 2020 NFL season using SQL

Learn beginner SQL commands using fantasy football data.

Titans RB Derrick Henry (Source USA Today)

The goal of this case study is practice SQL commands — not to find out the actual answers to the questions. When we ask “who led the NFL in touchdowns?”, we know we could easily Google the answer. The goal is to find answers using SQL.

Open the database

Get started by opening the fantasy football database here. The database is hosted in DB Fiddle, a DBMS simulator, that replicates what an actual database would look like. If you have not taken one of our programs, DB Fiddle may look foreign to you but it is pretty intuitive.

The main thing to know is everything is contained on this one page. The schema is how the data is structured in the database. The Query SQL section is where we write out SQL section, and the results section is where our results from the query are displayed.

Before we analyze any data, we want to have a good understanding of what the data looks like. Here is a screenshot of the data table we are working with:

Think of this data table like an Excel spreadsheet — a collection of data formatted in rows and columns. The full table contains fantasy scoring data for 627 players from the 2020 NFL season. The table contains 20 columns of info like the players name, team, position, stats, and fantasy points scored for each player.

Know that we understand the data we are analyzing, let’s answer some questions using SQL queries.

Question 1: Who were the top three fantasy wide receivers by total fantasy points in 2020?

-
To answer this question using SQL, we need to follow the following steps:
(1) Pull all players from the players data table (2) Filter to only display only wide receivers and (3) Sort the data by total fantasy points.

The first step in SQL is to pull data from the data table. To do this, use the SELECT command which is one of the most used commands in SQL. We want to pull all columns from the players data table so we use an asterisk instead of listing out every column in the table.

SELECT * FROM Players

The second step is to filter for players who play wide receiver. We can do this using the WHERE command on the Pos column. ‘Pos’ is an abbreviation for ‘position’. We set this column equal to ‘WR’ which is the abbreviation for wide receiver.

SELECT * FROM Players
WHERE Pos = 'WR'

Lastly, we sort the data using the ORDER BY command. This command will sort the data from highest to lowest or lowest to highest. In order to sort the data in descending order (from highest to lowest), we add the word DESC at the end of the query as seen below.

SELECT * FROM Players
WHERE Pos = 'WR'
ORDER BY FantasyPoints DESC

This is the full query needed to answer the question. Run this query in DB Fiddle using the ‘Run’ button at the top of the screen. When the query is run we get a list of all wide receivers sorted by how many fantasy points they scored in 2020.

We can easily identify the top three wide receivers by looking at the top three rows of the data as seen in the chart below. The top three wide receivers were: Davante Adams, Tyreek Hill, and Stefon Diggs.

Question 2: Which running back led the NFL in receiving yards in 2020?

-
This query will be very similar to the question above. Instead of filtering the database for wide receivers, we need to filter for running backs using the WHERE command and the position code ‘RB’.

Next, we need to filter for receiving yards from highest to lowest using the ORDER BY command and DESC which sorts numerical data from highest to lowest. Here is the full SQL query for this question and the result of the query:

SELECT * FROM Players
WHERE Pos = 'RB'
ORDER BY ReceivingYds DESC

The answer? Alvin Kamara — who caught 83 passes for 756 yards and 5 touchdowns in addition to his 932 yards on the ground.

Question 3: Of all quarterbacks who didn’t play all 16 games, which player scored the most fantasy points?

-
This question is a tricky one, how do we filter out quarterbacks who didn’t play all 16 games? The key is to use the AND command, which allows us to filter multiple conditions. Change the position to quarterbacks (abbreviated ‘QB’) and then add on the AND command allowing us to add a second condition.

The second condition should be filtering for quarterbacks who did not play 16 games. To do this, write G < 16, meaning the number of games played is less than 16. Lastly, order the results from most to least fantasy points as seen in the code below.

SELECT * FROM Players
WHERE Pos = 'QB' AND G < 16
ORDER BY FantasyPoints DESC

The answer? Patrick Mahomes — who threw for 4,740 yards despite sitting out Week 17 (meaning he only played 15 of the teams 16 games).

Question 4: Write a SQL query that displays the top fantasy scorers on the Minnesota Vikings. Display only the player names, team name, and fantasy point columns.

-
This question is also a multi-part question. The first thing you should notice is our SELECT command will be different. Instead of displaying all columns in the players data table, we just want to view the three columns listed above: the player’s name, the team name, and the number of fantasy points they scored. To do this, we remove the asterisk and write out the exact name of the column header. In this case: Player, Tm, and FantasyPoints.

Next, we need to filter out players who only played for the Minnesota Vikings. Use the WHERE command to filter the Tm column for Vikings players, which can be found using the abbreviation ‘MIN’.

Lastly, we need to display the results in order of fantasy points scored from highest to lowest. We have used this exact statement before using the ORDER BY command on the FantasyPoints column and sorting the columns from highest to lowest using DESC. This leaves us the following three lines of code:

SELECT Player, Tm, FantasyPoints FROM Players
WHERE Tm = 'MIN'
ORDER BY FantasyPoints DESC

Running this query gives us our answer — Kirk Cousins — who led the team in fantasy points. Bonus question for those who don’t follow football: what position does Kirk Cousins play? How could you adjust our SQL to add in the player’s position to our data table?

Question 5: How many players scored more than 10 receiving touchdowns in 2020? Display only the number of players.

-
This query involves a new command we haven’t seen yet, the COUNT command. This command allows us to count how many rows are in a SQL query. This is useful when you filter data and want to know how many rows are in the data instead of what rows are in the data.

To run the query, let’s start with writing a SELECT command that pulls only the player name column from the players data table: SELECT * FROM Players. Next, write the WHERE command to filter the data to only display platers who scored more than 10 receiving touchdowns: WHERE ReceivingTD > 10.

Lastly, let’s edit the first line of the code to include the COUNT command. After the SELECT statement, write COUNT() and then move the column name ‘Player’ inside of the parentheses as seen below. This code will then count the number of players that fit the query guidelines instead of displaying the player names.

SELECT COUNT(Player) FROM Players
WHERE ReceivingTD > 10

When you run the query, you will see only the number 7 appear in the results section. This means that only seven players caught more than 10 touchdowns during the 2020 season!

Recap

The best way to learn SQL is through real world practice. We recommend that you make up your own fantasy football questions and try to write a SQL query to answer your question.

To learn SQL, check out our Computer Science for Business Program — an online, career-focused program that teaches SQL, Python, and VBA for business students and professionals.

--

--