« June 2009 | Main

September 05, 2009

Correlation Between Prior and Current Year Performance

First off, quick blog note: I'm going to move all of the technical statistical stuff after the "jump". Before the jump, I will try to summarize the statistical finding in layman's terms. After the jump, I'll throw all of the stats work. Onward.

Today's Question: What is the correlation, if any, between a current year's performance and a prior year's performance. How good of a predictor is the prior year? If it was 100%, then the prior year's performance would exactly predict the current year. If it was 0%, then there would be absolutely no correlation (and you should ignore all prior stats when drafting)

The Answer: Roughly between 50-65%. So, when drafting a player, only 50 to 65% of his performance this year can be predicted by previous year's stats. This seems pretty low. There are a few relevant factors in the modern NFL that contribute to this:


What does this mean for your draft strategy?

To get these results, I compiled a list of the top 300 players in 2008 and examined the correlation between their 08 and 07 performance, 07 and 06, and so on. Obviously, since it was the top 300 players in 2008, as we get further away the results will be less accurate. The results of the correlation test are (as represented by the 95% confidence intervals and all are significant to at least the .001 p-value)


2007->2008 = 0.5029116, 0.6530814 [p-value < 2.2e-16]
2006->2007 = 0.5120332, 0.6600707 [p-value < 2.2e-16]
2005->2006 = 0.5514062, 0.6899482 [p-value < 2.2e-16]

Beyond 2005, the data won't be very accurate.

Posted by haydenth at 09:41 AM | Comments (1)

September 01, 2009

Taking Requests

I'm about to post the 2001-2008 data for download on here (I think NU gives me file space to share files). Is there a preferred format that you guys (i.e. stats junkies) prefer to have the files in? Currently, mine are in giant .CSV files or the MySQL dump files. I can post both or something new, if you request it.

The data set contains all player/game/team data for 2001 to 2009 and includes all pre-season, regular season, and post-season games.

Posted by haydenth at 02:18 PM | Comments (2)

Scoring Rules, Comments, and Maybe a Book?

I've had a couple of questions asking me how I calculate the fantasy points. Here is the scale that I've been using (I hope this is basically the defacto standard for scoring):

Do other rulesets vary from this widely? If so, please feed my comments.

Onto other things: I've re-configured the blog to allow comments to be posted without my approval. Approval was required by default. I can't enable anonymous comments, though. It must be a university wide blog restriction, I'm not sure.

Finally, I've got so many thoughts in my heads about ways we can crunch these numbers. I've been throwing the idea around of putting a book together. Something boring with a title like, "The Analytics of Fantasy Football". Maybe in the range of 200-300 pages. Just a thought right now, it might be nice to try to put something together over the course of the season, though.

Posted by haydenth at 01:59 PM | Comments (0)

Home Field Advantage

Here's a fun one. What if we looked at all the scores of NFL games since 2001 and compared the score of the Home Team versus the Away team. Does being the home team give you an advantage? Short answer: Yes! On average the home team scores about 2.5 points more than the away team.

In terms of stats, we can use a t.test to compare the home scores versus the away scores (normally, we could use a z-test but I'm guessing the scores aren't normally distributed).

A basic analysis using R (just google the letter R) reveals:

	Welch Two Sample t-test
t = 9.2216, df = 5357.6, p-value < 2.2e-16
95 percent confidence interval:
 1.978520 3.046853 
sample estimates:
mean of x mean of y 
 21.81978  19.30709 

Our 95% confidence interval is between 1.97 and 3.04 points per game. Over the course of a whole season, I can see having more active players on home games can lend a small advantage.

This leads to a different question. When drafting your team, it seems like it would be advantageous to obtain players with alternating home games. For example, if you had two quarterbacks that played home games on alternating weeks. More on this coming shortly.

Update: Thinking quickly about it, I realized that Super Bowl games don't have terribly much of a home field advantage, given that they are in a neutral third party city usually. With those games removed, our numbers are adjusted slighly but trivially: {1.98, 3.06}

Posted by haydenth at 12:02 PM | Comments (0)

Can't hold me back!

Apparently, the University doesn't turn off your mBlog permissions after you have graduated. This is excellent news (until someone reads this blog and shuts me off). I really don't want to have to move my layout and everything to a new site. Therefore, I'm going to continue to blog from here. If I disappear, though, you can always find me on the Fantasy Football Mailing List. Also, I'm a doctoral student at Northwestern University now, so if you're ever in Chicago - shoot me an email and we can hang out.

Anyway, here's a fun bit of fantasy data. The top 10 single-game performances for any player since 2001:


+-----------+-----------+------+------------+--------+
| firstname | lastname | tid | gid | points |
+-----------+-----------+------+------------+--------+
| Clinton | Portis | DEN | 2003120711 | 51.80 |
| Tom | Brady | NE | 2007102104 | 47.80 |
| Adrian | Peterson | MIN | 2007110404 | 47.60 |
| Peyton | Manning | IND | 2003092806 | 46.47 |
| Donovan | McNabb | PHI | 2004120512 | 45.47 |
| Carson | Palmer | CIN | 2007091601 | 45.37 |
| Shaun | Alexander | SEA | 2001111113 | 44.60 |
| Peyton | Manning | IND | 2004112500 | 43.87 |
| LaDainian | Tomlinson | SD | 2007101410 | 43.80 |
| Peyton | Manning | IND | 2004103103 | 43.73 |
+-----------+-----------+------+------------+--------+

If you need the date, it's contained in the game ID, which is YYYYMMDD. Kudos to Clinton Portis for holding the record. Scores may vary between your own scoring system.

Posted by haydenth at 11:27 AM | Comments (2)