File size: 3,888 Bytes
4d6b465
 
 
 
 
 
 
 
5f26252
 
7a4fc48
 
 
5f26252
 
 
7a4fc48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f26252
7a4fc48
5f26252
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21e77ce
5f26252
 
 
21e77ce
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: other
language:
- en
tags:
- nba
- basketball
---
# NBA Predictions

This repo contains AI model code and weights which predicts the outcome of NBA games. Its output represents the chance that given point spreads will occur.

## Intro

The model requires 8 players on the home and away teams, plus their ages, as input. It will then output probabilities for each point spread between -20 and +20 points, from the home team's point of view.

For example, the following text and chart represents the model's opinion on the Boston Celtics vs the Denver Nuggets. A matchup I am personally terrified of as a Celtics fan.

Let's start with both teams at pretty much full strength, with the Celtics at home. In this example, the model predicts the celtics to win around 3 in every 4 games, with a 14% chance of the Celtics winning by 20 or more.

![Full strength Celtics vs full strength Denver. Celtics at home.](celtics-at-home.png)

Let's flip the location and see what the model thinks would happen if the Celtics had to travel to Denver. Interestingly, the Model now favors Denver to win with 55% confidence.

![Full strength Celtics vs full strength Denver. Denver at home.](denver-at-home.png)

Now here's the really fun part - mixing and matching players. Most people would say Jokic is the best player in the league at the time of writing, and Tatum is a notch below him. A lot of people would also say that the Celtics are an incredibly deep team, as far as their starters are concerned, while the Nuggets are a bit more reliant on their top stars.

All of this is to say that taking Jokic off the Nuggets should have more of an effect than taking Tatum off the Celtics. The chart below shows Denver at home, without Jokic in the lineup. He has been replaced by Peyton Watson. As you can see, Denver's win percentage dropped by 13%.

![Celtics ful strength vs Denver without Jokic. Denver at home.](denver-at-home-no-tatum.png)

Let's keep the game in Denver, put the Nuggets back at full strength, and replace Tatum with Pritchard. As you can see, the Nuggets are now projected to win 66% of the time. That sounds about right to me!

![Celtics without Tatum vs full strength Denver. Celtics at home.](denver-at-home-no-tatum.png)

## Installation

I recommend installing Python 3.11.8, as that is what the repo was written / tested in. The code will likely work with most recent versions of Python, though.

Once you have Python installed, run `pip install -r requirements.txt`. It will take a while to install dependencies if you don't already have PyTorch cached.

## Usage

The `example.ipynb` notebook shows how to use the model to predict the final game of the 2023-24 NBA season - a game between the Dallas Mavericks and Boston Celtics. It will output the chart above.

To change the players and their ages, you must reference the `player_tokens.csv` and `age_tokens.csv` files.

For example, if you wanted to subtract Kristaps Porzingis from Boston's team and swap who was home / away, you would take the token representing Porzingis `4416` out of the `home_team_tokens` list, and replace him with, say, Payton Pritchard `4999`. You would then have to look up Pritchard's age (26), find the corresponding age token in `age_tokens.csv`, which is `11`, and replace Porzingis' age token (which is the second to last token).

To swap home and away, you could replace the variables containing all of the player and age tokens, or just set the `swap_home_away` variable to `True`.

## Training Process

I downloaded data from stats.nba.com using the [https://github.com/swar/nba_api](swar/nba_api) package to get information on minutes played, game outcomes, and a few other dimensional elements to make everything fit together. Then, I ran a custom PyTorch training loop to train the model(s) on their chosen loss objective (spread, money line, or spread probability).