ANNA'S ANALYTICS

Using R and machine learning to make playoff predictions

May 09, 2025 | 4 Minute Read


NBA Playoff Predictions

The NBA playoffs can either be the highlight of a fan’s year...or their worst nightmare. After the regular season, the top teams face off to see who’s best. But instead of a single elimination game, each series requires a team to win four games to advance, which leads to wild upsets and comebacks. Even though the playoffs started a few weeks ago, I thought it would be fun to see if I could predict how the final 8 teams perform using machine learning.

Set-up and Data Collection

To approach this question, I figured machine learning would be the easiest (and most fun) way to go. I compiled regular season stats from the past 10 years for the final 8 teams in each year’s playoffs. Based on how far each team went, I gave them a score from 1 to 8, with 1 meaning they won it all. For the 2025 season, I collected the same stats for the final 8 teams but left the score blank, making this my target variable.

I used the tidymodels package in R to create a recipe for cleaning the data. Pretty quickly, I realized that any model I used would likely assign the same score to multiple teams rather than ranking them uniquely. To get around this, I treated the score as a continuous variable and used regression models. That way, I could generate a score for each team and rank them from highest to lowest. This also made it possible to simulate bracket matchups using those rankings.

Note: This was just a fun side project, so I didn’t go too deep with feature engineering or model tuning. But if you’re into that, doing more there would definitely improve the results.

Comparing models

I trained four different regression models, then averaged their predictions to get a final ranking. I used tidymodels, which makes everything super straightforward.

my code

Boosted Forest

Figure

Here are the results for the boosted forest model (with 1 = most likely to win). According to ChatGPT, the Thunder are the current favorites so it’s interesting that this model had them doing the worst.

Linear Regression

Figure

You can’t do a regression problem without trying a simple lm model, right? These results also have the Thunder not doing as well, but I could get behind the Nuggets winning. Since I’m a Jazz fan, I don’t really care who wins, but I just feel in my soul that the Nuggets could make it happen.

SVM

Figure

I’m not usually an SVM fan, but I figured why not? Surprisingly, it gave the exact same order as the linear regression model. So either I did something very right… or very wrong.

Neural Network

Figure

I know “neural network” sounds intense, but with bag_mlp() in tidymodels, I threw together a super simple single-layer model. For more complex problems, I’d use something deeper, but for this it was a fun experiment. These results varied a bit, but overall the same teams floated to the top across all models.

Final Predictions

So I know I could’ve stacked these models or tuned them more to get a super powerful final prediction, but I wanted the results fast before any more games were played so I just used dpylr to combine the predictions into one table, then get the average prediction.

Here are the final results:

Figure

Once I had these rankings I could use them to complete my bracket to look like this:

Figure

I’ll be honest, I’m not sure how much faith I have in these results but it’s always fun to mess around with data. It will be fun to see if I get any of my predictions right from this simple machine learning project. I’ve never considered myself a Cavaliers fan, but I have to cheer for them now! Overall if I did this again I would collect more data and do more feature engineering, and tune my models. I think there is definetly potential with this set up and the models I selected if I went deeper into this project.

As always thanks for reading :) I’ll post the results once the playoffs are over!