« July 2008 | Main | September 2008 »
August 31, 2008
Update (Finally!) - Preseason Results Posted
First, I'd like to apologize for the lack of updates lately. I've been moving to my new place. I'm all moved in now with functioning Internet, so we're back in business. I'm calculating the preseason data right now, so that I can export it to you. While it's doing that. Let's look that the top performers (at QB, RB and WR) for the preseason! First, Quarterbacks:
+---------------+------+------+----------+------+ | name | pos | pts | zscore | team | +---------------+------+------+----------+------+ | M. Schaub | qb | 30 | 2.150000 | HOU | | D. Brees | qb | 30 | 2.145000 | NO | | J. Cutler | qb | 28 | 1.945000 | DEN | | T. Bouman | qb | 20 | 1.140000 | JAC | | B. Basanez | qb | 10 | 1.140000 | CAR | | C. Frye | qb | 20 | 1.135000 | SEA | | J. O'Sullivan | qb | 19 | 1.035000 | SF | | M. Gutierrez | qb | 19 | 1.035000 | NE | | E. Manning | qb | 18 | 0.935000 | NYG | | T. Jackson | qb | 9 | 0.930000 | MIN | | D. Orlovsky | qb | 25 | 0.800000 | DET | | M. Bulger | qb | 16 | 0.735000 | STL | | J. Delhomme | qb | 15 | 0.630000 | CAR | | T. Edwards | qb | 15 | 0.630000 | BUF | | D. McNabb | qb | 15 | 0.630000 | PHI | | D. Garrard | qb | 22 | 0.600000 | JAC | | A. Rodgers | qb | 22 | 0.596667 | GB | | D. Carr | qb | 21 | 0.530000 | NYG | | A. Feeley | qb | 7 | 0.530000 | PHI | | B. Johnson | qb | 14 | 0.530000 | DAL | +---------------+------+------+----------+------+
Now, Let's look at wide receivers:
+---------------+------+------+----------+------+ | name | pos | pts | zscore | team | +---------------+------+------+----------+------+ | D. Mason | wr | 9 | 3.150000 | BAL | | V. Jackson | wr | 8 | 2.680000 | SD | | N. Burleson | wr | 7 | 2.200000 | SEA | | B. Marshall | wr | 13 | 1.965000 | DEN | | T. Wilson | wr | 6 | 1.730000 | CLE | | R. Davis | wr | 6 | 1.730000 | CHI | | S. Smith | wr | 6 | 1.730000 | CAR | | J. Walker | wr | 6 | 1.730000 | OAK | | D. Hixon | wr | 11 | 1.490000 | NYG | | G. Jennings | wr | 11 | 1.490000 | GB | | L. Evans | wr | 10 | 1.255000 | BUF | | D. Jarrett | wr | 10 | 1.255000 | CAR | | J. McCareins | wr | 10 | 1.255000 | TEN | | M. Bradley | wr | 10 | 1.255000 | CHI | | C. Johnson | wr | 10 | 1.255000 | DET | | L. Moore | wr | 15 | 1.253333 | NO | | B. McMullen | wr | 14 | 1.096667 | WAS | | K. Walter | wr | 9 | 1.020000 | HOU | | T. Williamson | wr | 9 | 1.020000 | JAC | | T. Ginn | wr | 9 | 1.020000 | MIA | +---------------+------+------+----------+------+
Finally, running backs:
+---------------+------+------+----------+------+ | name | pos | pts | zscore | team | +---------------+------+------+----------+------+ | T. Duckett | rb | 33 | 2.323333 | SEA | | D. Williams | rb | 19 | 1.890000 | CAR | | M. Barber | rb | 16 | 1.455000 | DAL | | D. Sproles | rb | 16 | 1.455000 | SD | | J. Chatman | rb | 23 | 1.356667 | NYJ | | J. Arrington | rb | 23 | 1.356667 | ARI | | K. Smith | rb | 21 | 1.163333 | KC | | A. Bradshaw | rb | 21 | 1.160000 | NYG | | C. Taylor | rb | 14 | 1.160000 | HOU | | A. Hall | rb | 13 | 1.015000 | DEN | | R. Brown | rb | 11 | 0.725000 | MIA | | A. Peterson | rb | 11 | 0.725000 | MIN | | T. Hunt | rb | 16 | 0.680000 | PHI | | L. Johnson | rb | 10 | 0.580000 | KC | | J. Harrison | rb | 15 | 0.580000 | CLE | | R. Cartwright | rb | 10 | 0.580000 | WAS | | R. Williams | rb | 15 | 0.580000 | MIA | | T. Minor | rb | 15 | 0.580000 | STL | | D. Foster | rb | 10 | 0.580000 | SF | | M. Pittman | rb | 15 | 0.580000 | DEN | +---------------+------+------+----------+------+
I'm unsure of how much to use this data. At first, I'm tempted to almost use this data exclusively - it's more recent but pretty unreliable. First, it's no indicator of team success or of the team game - hell, the Lions went 4-0 in the pre-season but only Calvin Johnson appears on the list above. Second, some players don't get much playing time, particularly great players that want to avoid injury. I suspect this is why you don't see Manning or Brady on the above list.
Despite the lack of consistency, we can some insight from the above figures:
- Upcoming Stars - Players that perform exceptionally well in the preseason but don't have much of a history may be great draft picks or free-agents in your league. They're probably undervalued in the default draft rankings.
- Strategic Signaling - Teams that spend a lot of pre-season time working on the passing game may be preparing for a big season of passing intensive offense (teams like NO, HOU and NYJ)
Posted by haydenth at 10:46 AM | Comments (0)
August 17, 2008
Books I Recommend and More on Z-Scores
First off, I'd like to point out a book that I carry around in my laptop case. I've found it tremendously helpful at working with statistics and refreshing some of my knowledge. It's an older book and I doubt it's even available on amazon anymore. You may have to Library.
Aigner, Dennis J. 1968. Principles of statistical decision making. Macmillan decision series. New York: Macmillan.
If you want to find it at the library, here's the Worldcat link. Also, as of today, there are four used copies on Amazon.com, for $2. Really, this book is a gem - it contains a concise summary of the foundation of Information economics and statistical decision theory. There's a bit of calculus and linear algebra but if you can get through this book, you'll be a pro.
Now, let's talk about z-scores. I've received a few emails asking questions about how it works. Here's the z-score equation (stolen from Wikipedia):
The Z value is X (the total points for a given player in a given game) minus the mean (all players at the same position against the same team) divided by the standard deviation (all players at the same position against the same team).
If your data is normally distributed, then all of your z-scores will fall in the range of -1 to +1, with -1 being below average and 1 being above average. The fantasy football data set, unfortunately, isn't normally distributed. However, it's pretty close and the vast majority of players fall in the -1 to +1 range. For example, here's the grid of Drew Brees' performance z-scores:
+----------+--------+------------+----------+--------+-------+ | name | id | pid | opponent | score | games | +----------+--------+------------+----------+--------+-------+ | D. Brees | 189811 | 00-0020531 | ARI | 0.202 | 2 | | D. Brees | 189812 | 00-0020531 | ATL | 0.631 | 5 | | D. Brees | 189813 | 00-0020531 | BAL | 0.494 | 2 | | D. Brees | 189814 | 00-0020531 | BUF | 1.099 | 2 | | D. Brees | 189815 | 00-0020531 | CAR | 0.134 | 5 | | D. Brees | 189816 | 00-0020531 | CHI | 0.338 | 2 | | D. Brees | 189817 | 00-0020531 | CIN | 0.823 | 2 | | D. Brees | 189818 | 00-0020531 | CLE | -0.338 | 3 | | D. Brees | 189819 | 00-0020531 | DAL | 1.768 | 2 | | D. Brees | 189820 | 00-0020531 | DEN | -0.600 | 7 | | D. Brees | 189822 | 00-0020531 | GB | 1.103 | 2 | | D. Brees | 189823 | 00-0020531 | HOU | 0.033 | 3 | | D. Brees | 189824 | 00-0020531 | IND | 0.277 | 3 | | D. Brees | 189825 | 00-0020531 | JAC | 2.003 | 3 | | D. Brees | 189826 | 00-0020531 | KC | 0.687 | 7 | | D. Brees | 189827 | 00-0020531 | MIA | -0.359 | 3 | | D. Brees | 189829 | 00-0020531 | NE | 0.553 | 2 | | D. Brees | 189830 | 00-0020531 | NO | 2.042 | 1 | | D. Brees | 189831 | 00-0020531 | NYG | 0.171 | 2 | | D. Brees | 189832 | 00-0020531 | NYJ | 0.010 | 3 | | D. Brees | 189833 | 00-0020531 | OAK | 0.435 | 8 | | D. Brees | 189834 | 00-0020531 | PHI | 0.731 | 3 | | D. Brees | 189835 | 00-0020531 | PIT | 0.323 | 3 | | D. Brees | 189837 | 00-0020531 | SEA | 1.533 | 2 | | D. Brees | 189838 | 00-0020531 | SF | 1.197 | 3 | | D. Brees | 189839 | 00-0020531 | STL | -0.210 | 2 | | D. Brees | 189840 | 00-0020531 | TB | 0.992 | 5 | | D. Brees | 189841 | 00-0020531 | TEN | -0.026 | 2 | | D. Brees | 189842 | 00-0020531 | WAS | -0.892 | 2 | +----------+--------+------------+----------+--------+-------+
You can see that our data isn't normally distributed (there are scores greater than 1) but most of the scores still fall in the -1,+1 range. What do these scores tell us? Historical performance indicates that he does better than other quarterbacks against quite a few teams (especially Jacksonville). I probably wouldn't start him in a game against Denver, though, his performance is less than the average (-.60) and he's played them 7 times.
So, what can we do with these scores? How can we make this useful? What I did is load the 2008 schedule into the database and build a grid of every game that every player is playing. From that, we can denormalize (go back to the X value). Here's what Drew Brees' predictions look like:
+----------+------+------------+-------+--------+------+ | name | id | pid | gid | points | team | +----------+------+------------+-------+--------+------+ | D. Brees | 5409 | 00-0020531 | 29534 | 8.83 | TB | | D. Brees | 5614 | 00-0020531 | 29551 | 10.85 | WAS | | D. Brees | 5782 | 00-0020531 | 29568 | 10.42 | DEN | | D. Brees | 5913 | 00-0020531 | 29580 | 11.72 | SF | | D. Brees | 6126 | 00-0020531 | 29602 | 11.74 | MIN | | D. Brees | 6182 | 00-0020531 | 29607 | 11.58 | OAK | | D. Brees | 6292 | 00-0020531 | 29618 | 10.14 | CAR | | D. Brees | 6482 | 00-0020531 | 29637 | 12.21 | SD | | D. Brees | 6707 | 00-0020531 | 29659 | 11.55 | ATL | | D. Brees | 6873 | 00-0020531 | 29679 | 12.01 | KC | | D. Brees | 7140 | 00-0020531 | 29703 | 11.46 | GB | | D. Brees | 7208 | 00-0020531 | 29713 | 8.83 | TB | | D. Brees | 7382 | 00-0020531 | 29726 | 11.55 | ATL | | D. Brees | 7465 | 00-0020531 | 29736 | 9.68 | CHI | | D. Brees | 7663 | 00-0020531 | 29755 | 12.75 | DET | | D. Brees | 7882 | 00-0020531 | 29776 | 10.14 | CAR | +----------+------+------------+-------+--------+------+
Couple interesting observations by looking at the predictions: first, even though Brees has no history against the Lions, he's still predicted to score at least 12.75 points because the average number of points against the Lions is higher than other teams. On the other hand, Brees performs well against Tampa Bay comparatively but is only predicted to score 8.83 points. However, a little bit of human intuition is needed in some of these situations - Tampa Bay used to have a great defense. Their defense this year isn't as great as in years past. This kind of interpretation is what makes Fantasy Football fun.
What's the end game here? Build the ultimate roster based on all players and their upcoming schedules. This is coming soon (before my draft day, August 31st but probably next week!)
Posted by haydenth at 04:03 PM | Comments (0)
August 16, 2008
Bayes' Theorem Braindump
Disclosure: this post is a braindump. Read at your own discretion!
Blog reader Kevin asked me a question about whether I've used the Bayes' Theorem in any of my calculations. I did not expect this question (and I should have paid more attention to Bayes' Rule in SI680). So - I went back and reviewed my knowledge. If you're not familiar with Bayes', I'd spend about 45 minutes and work through this intuitive explanation of Bayes. This question got the cogs in my brain turning. I just spent about an hour walking around Ann Arbor, trying to figure out a way to apply Bayes' to my dataset. I remain stumped. Here's my line of thinking though (not sure whether it's right or not):
1. For Bayes' to work, we need prior probabilities. 2. Then we need some conditional probabilities. 3. We can then generate posterior probabilities - revised priors.
Super. Unfortunately, none of my data is currently normalized as probabilities. Rather, my game-by-game data has been normalized as z-scores, which aren't really probabilities. We do have the predicted player-v-team score matrix - i.e. how John Kitna's past performance compares to other quarterbacks and how re-normalized to generate a prediction. Here's John Kitna's prediction entry:
+----------+------+------------+-------+--------+------+ | name | id | pid | gid | points | team | +----------+------+------------+-------+--------+------+ | J. Kitna | 5383 | 00-0009311 | 29529 | 11.45 | ATL | | J. Kitna | 5576 | 00-0009311 | 29546 | 11.34 | GB | | J. Kitna | 5799 | 00-0009311 | 29569 | 11.62 | SF | | J. Kitna | 6023 | 00-0009311 | 29591 | 9.70 | CHI | | J. Kitna | 6162 | 00-0009311 | 29606 | 11.86 | MIN | | J. Kitna | 6354 | 00-0009311 | 29625 | 13.10 | HOU | | J. Kitna | 6497 | 00-0009311 | 29634 | 10.81 | WAS | | J. Kitna | 6559 | 00-0009311 | 29645 | 9.70 | CHI | | J. Kitna | 6762 | 00-0009311 | 29661 | 10.26 | JAC | | J. Kitna | 6837 | 00-0009311 | 29674 | 10.30 | CAR | | J. Kitna | 7027 | 00-0009311 | 29693 | 8.69 | TB | | J. Kitna | 7146 | 00-0009311 | 29704 | 12.25 | TEN | | J. Kitna | 7342 | 00-0009311 | 29723 | 11.86 | MIN | | J. Kitna | 7545 | 00-0009311 | 29742 | 10.58 | IND | | J. Kitna | 7664 | 00-0009311 | 29755 | 13.16 | NO | | J. Kitna | 7829 | 00-0009311 | 29772 | 11.34 | GB | +----------+------+------------+-------+--------+------+
The predictions don't seem to be quite low and don't have enough variance. I'm going to have to look into that. Back to Bayes' - each of these predictions has a probability associated with it (or at least a range within 95% confidence intervals.) Can I then use Bayes' to generate a more accurate prediction on how Kitna does against Green Bay or New Orleans? I don't think so, well, I'm not sure. More research needed.
If you know how we can apply Bayesian rationality to these statistics, please reply and let me know (or at least point me in the right direction)
Posted by haydenth at 10:15 PM | Comments (0)
August 15, 2008
Preseason Week 1 Stats Available and Other File-Related Errata
It's a good thing we have the preseason because I needed some time to test my export routines for 2008. I think I finally got it hashed out. NFL.com didn't change anything but my import routines weren't set up to handle preseason data very well. You can download the new file (in .tar.gz) format below. I am going to post a file after every week this season and it will usually be on Tuesday morning.
http://www-personal.umich.edu/~haydenth/2008_PRE1.txt.gz
A couple notes you stat fans should be aware of. First, I modified the file format a bit (the original historical files were broken out between games, players and stats). This is just one file and the player names are included as a row on the table. If I'm missing anything from the file, please let me know and I can include it. Hopefully, the file should be self-explanatory. If you use the R statistical program, you can use the following command to load the weekly data sets. If you've never used R for stats, I highly recommend it.
nfl <- read.csv(file="2008_PRE1.txt",head=TRUE,sep=",")
For those of you who asked me for the mysqldump file, I'm still working on it. I had originally mixed in my database with other files so I've been in the process of breaking out my NFL stats into it's own database. As soon as I get that done, you can bet that I'll upload a file.
Posted by haydenth at 08:52 PM | Comments (0)
August 08, 2008
2008 Preseason Week 1
I'd like to factor this year's preseason information (which, I'll make available on August 12th or 13th) into my equations but I'll first need to figure out a way to weight it accordingly. There were a couple interesting results from last night's games. Here's the NYJ passing game:
Passing CP/AT YDS TD INT B. Ratliff 14/20 252 2 0 K. Clemens 4/6 31 0 0
Receiving REC YDS TD LG D. Clowney 4 163 2 71 C. Stuckey 3 44 0 22 W. Wright 2 20 0 14 J. Cotchery 1 16 0 16 M. Smith 2 12 0 6 J. Caulcrick 2 8 0 5
Ratliff was 14/20 with 252 yards and 2 touchdowns for the Jets. Also, the Jets just picked up Brett Favre from the Packers (+1 win for the Lions, maybe?) It should be interesting to see how this works out, for now I am going to avoid both Ratliff and Favre in my draft picks. I am liking wide receivers from the Jets, it looks like their offense is really tooling up generate a lot of receiving yards and touchdowns. David Clowney (NYJ WR) had a pretty big receiving game - I think I'm going to keep an eye on him.
Posted by haydenth at 08:34 AM | Comments (0)
August 06, 2008
Stats Available
It appears that most of the visitors to this blog are looking for fantasy football statistics (from previous years). Well, if that's what you're looking for, then you should be able to download them from my university web hosting account:
http://www-personal.umich.edu/~haydenth/nfl_2001_2007_aggregate_stats.tar.gz
It's all of the player, game, and individual game result data from 2001-07. It's straight from NFL.com, so take it for what it's worth. It's not nearly as complete as some of the serious data sets out there (but I'm giving it away for free and they're not).
I am going to put up a weekly file, as well for the 2008 season. It will probably be available on Tuesday mornings, so take note. If there's enough demand, I'll make an rss feed for just the files.
It's a .tar.gz file and you can open it with almost any compression program, these days.
Posted by haydenth at 06:44 PM | Comments (4)