July 30, 2008
Wide Receivers and Runnings Backs
Running my prediction algorithm for wide receivers and running backs. Keep in mind that the points are generated based on the 2007 scores using the Yahoo! fantasy football points system. Here's the wide receivers:
+---------------+----------+--------------+-------+---------+ | name | position | current_team | pts | games07 | +---------------+----------+--------------+-------+---------+ | R. Wayne | wr | IND | 62.26 | 16 | | T. Holt | wr | STL | 61.53 | 16 | | T. Owens | wr | DAL | 61.43 | 15 | | D. Mason | wr | BAL | 61.20 | 16 | | P. Burress | wr | NYG | 60.63 | 16 | | R. Moss | wr | NE | 60.17 | 16 | | L. Fitzgerald | wr | ARI | 60.05 | 15 | | C. Johnson | wr | CIN | 59.71 | 16 | | D. Driver | wr | GB | 59.50 | 15 | | D. Jackson | wr | DEN | 59.33 | 15 | +---------------+----------+--------------+-------+---------+
And here's the running backs:
+---------------+----------+--------------+-------+---------+ | name | position | current_team | pts | games07 | +---------------+----------+--------------+-------+---------+ | L. Tomlinson | rb | SD | 92.16 | 16 | | T. Jones | rb | NYJ | 88.99 | 16 | | K. Faulk | rb | NE | 88.95 | 16 | | F. Taylor | rb | JAC | 88.71 | 15 | | S. Young | rb | DEN | 87.64 | 15 | | L. Washington | rb | NYJ | 87.13 | 16 | | D. Sproles | rb | SD | 87.11 | 15 | | M. Smith | rb | NYJ | 86.79 | 16 | | M. Jones-Drew | rb | JAC | 86.42 | 15 | | E. James | rb | ARI | 86.34 | 16 | +---------------+----------+--------------+-------+---------+
Something I'm noticing in these results; there are players who were on the roster for every team but didn't necessarily play or get any points. For example, D. Sproles played 4 or 5 games for SD and had great games (one against Detroit, actually). Those games caused his overall score to be inflated a bit. I think I'm going to have to make some adjustments for this effect. I want players who not only dress every game but also play.
Posted by haydenth at 09:03 PM | Comments (2)
Making Statistics Available - Copyright Issues?
I'm considering making the statistics that I scraped from the NFL's website available in .csv files on this blog. This way, anyone who had the problem that I had can download 2001-07 game data. However, I'm concerned that the NFL or Stats.com might send their attorneys after me. Does anyone know about the copyright issues, as they relate to sports statistics?
Posted by haydenth at 09:11 AM | Comments (1)
July 28, 2008
John Kitna - a Respectable Fantasy Football Pick?
I uploaded the 2008 schedule into my database and denormalized all the player-vs-team data. When I denormalize the data, it takes as an input the player's normalized value (usually between -1 and 1) and the averages for all other players. For quarterbacks, here's the results of my early testing:
+---------------+-------------+-----------+--------------+---------+ | name | sum(points) | last_seen | current_team | games07 | +---------------+-------------+-----------+--------------+---------+ | P. Manning | 183.30 | 2007 | IND | 16 | | B. Favre | 181.93 | 2007 | GB | 16 | | J. Kitna | 178.02 | 2007 | DET | 16 | | D. Anderson | 175.68 | 2007 | CLE | 16 | | D. Brees | 175.46 | 2007 | NO | 16 | | T. Romo | 174.66 | 2007 | DAL | 16 | | J. Cutler | 173.53 | 2007 | DEN | 16 | | E. Manning | 173.43 | 2007 | NYG | 16 | | T. Brady | 173.22 | 2007 | NE | 16 | | M. Hasselbeck | 172.78 | 2007 | SEA | 16 | +---------------+-------------+-----------+--------------+---------+
Yes folks, you're reading that right. John Kitna is estimated to be third in total season points. At first, this sounds crazy - the Lions are terrible. Intuitively, though, it makes some sense. The Lions have an easy schedule, especially compared to the Patriots or the Giants. While Kitna doesn't have a great record against two NFC North teams {CHI,MIN}, he actually has a respectable record against other teams. Plus, the Lions don't have much else in the Quarterback position (Drew Stanton, seriously?) so he sees a lot of playtime.
With that said, statistically speaking, Peyton Manning should absolutely be your #1 pick in the draft. The Colts have a moderately hard schedule and he's still #1 in the rankings by a considerable amount.
Posted by haydenth at 11:06 PM | Comments (0)
Z-Scores out of Whack
Upon sifting through my data this afternoon (and thinking about it yesterday), I noticed that the mean values of some of my z-score tables was not 0. I went through some of my calculation scripts and noticed that my primary calculation was not picking up the sample mean correctly. I had to adjust & recalculate and now the population mean jives correctly. I'm going to go through and adjust my past few blog entries to reflect this.
A couple important things to note when calculating the z-score yourself:
- The z-score is meant to normalize data (i.e. distribute the data normally, that is, within the standard normal distribution)
- The mean of all your z-scores should be zero
- The standard deviation of all your z-scores should be 1
Why do I think the z-score is particularly valuable in this situation? We can normalize the game-by-game performance of players. Furthermore, we can normalize it to a single player-against-team metric (see my recent post about Kitna). From this, we can load in the 2008 schedules, de-normalize and see who might have a good season.
Posted by haydenth at 07:45 PM | Comments (0)
July 26, 2008
John Kitna versus the World
In a previous post, I wrote about the statistical z-score that we can generate on a game-by-game basis for each player. This score compares them to a population, which includes all other players at the same position in all other games (from 01-07). With this score, we can take it a step further and compare their average z-score for each team they play against. For example, here is the result of the equation for the Lions' QB John Kitna
+----------+----------+--------+-------+ | name | opponent | score | games | +----------+----------+--------+-------+ | J. Kitna | CAR | 1.470 | 1 | | J. Kitna | PHI | 0.978 | 2 | | J. Kitna | DAL | 0.891 | 2 | | J. Kitna | HOU | 0.872 | 2 | | J. Kitna | OAK | 0.811 | 2 | | J. Kitna | SD | 0.619 | 3 | | J. Kitna | DEN | 0.610 | 2 | | J. Kitna | CHI | 0.516 | 5 | | J. Kitna | PIT | 0.482 | 6 | | J. Kitna | SEA | 0.340 | 2 | | J. Kitna | SF | 0.315 | 2 | | J. Kitna | BAL | 0.298 | 6 | | J. Kitna | NYG | 0.227 | 2 | | J. Kitna | CLE | 0.213 | 5 | | J. Kitna | MIA | 0.175 | 1 | | J. Kitna | STL | 0.152 | 2 | | J. Kitna | NYJ | 0.105 | 2 | | J. Kitna | NE | 0.043 | 3 | | J. Kitna | TEN | 0.009 | 3 | | J. Kitna | GB | -0.062 | 4 | | J. Kitna | MIN | -0.064 | 4 | | J. Kitna | ARI | -0.068 | 3 | | J. Kitna | TB | -0.200 | 2 | | J. Kitna | ATL | -0.256 | 2 | | J. Kitna | IND | -0.302 | 1 | | J. Kitna | BUF | -0.444 | 5 | | J. Kitna | JAC | -0.531 | 3 | | J. Kitna | KC | -0.534 | 3 | | J. Kitna | NO | -0.539 | 1 | | J. Kitna | DET | -0.549 | 2 | | J. Kitna | WAS | -1.229 | 1 | | J. Kitna | CIN | -1.401 | 0 | +----------+----------+--------+-------+
Statistically, it looks like Kitna would be a good quarterback to have if you were an AFC team (the Lions are an NFC team). Even better, Kitna does pretty poorly against two other teams in the NFC North { GB,MIN } and does better than other quarterbacks against one { CHI }.
What about everyone else? If we restrict our search to players who have played at least 2 games against an opponent, we get:
+--------------+----------+-------+-------+ | name | opponent | score | games | +--------------+----------+-------+-------+ | D. Brees | JAC | 2.003 | 3 | | P. Manning | ATL | 1.978 | 3 | | T. Green | CLE | 1.776 | 3 | | K. Collins | TEN | 1.724 | 3 | | P. Manning | NO | 1.629 | 3 | | T. Brady | PIT | 1.515 | 4 | | D. Culpepper | NO | 1.500 | 4 | | P. Manning | CIN | 1.490 | 3 | +--------------+----------+-------+-------+
Wow, Drew Brees seems to really provide a whooping to Jacksonville. Also, Peyton Manning is represented three times in the top 8. What if we restrict this further to 5+ games?
+------------+----------+-------+-------+ | name | opponent | score | games | +------------+----------+-------+-------+ | B. Favre | CAR | 1.297 | 5 | | C. Palmer | CLE | 1.167 | 8 | | P. Manning | BAL | 1.152 | 5 | | P. Manning | NE | 1.121 | 7 | | P. Manning | HOU | 1.018 | 12 | | T. Brady | BUF | 1.003 | 14 | | R. Gannon | SD | 0.998 | 5 | | D. Brees | TB | 0.992 | 5 | +------------+----------+-------+-------+
There's Peyton Manning on the list three times again. It doesn't hurt that he's going to play all three teams {BAL, NE, HOU} this year. IN my next post, I'm going to load the 2008 schedule into the system and see which quarterbacks have the most favorable schedule.
Posted by haydenth at 12:14 AM | Comments (0)
July 25, 2008
The Week Score (by position)
Using my game-by-game data (all games since 2001), the first statistic I set out to determine was a week-by-week comparison of how every player compared with every other player. Using total fantasy football points (using the Yahoo! rules) as an index, I was able generate a z-index (standard score). Mathematically, the stat works like this 
For every player in every game, we can loop through and generate the z-value. Our population sample used is every other game played by other players at the same position.
For example, here's the week 5 of 2007 top 5 players:
+-------------+----------+---------------+------+--------+--------+ | name | position | pop_score_pos | week | season | points | +-------------+----------+---------------+------+--------+--------+ | T. Brady | qb | 1.78 | 5 | 2007 | 27 | | P. Rivers | qb | 1.11 | 5 | 2007 | 21 | | G. Frerotte | qb | 1.11 | 5 | 2007 | 21 | | J. Campbell | qb | 1.00 | 5 | 2007 | 20 | | P. Manning | qb | 0.78 | 5 | 2007 | 18 | +-------------+----------+---------------+------+--------+--------+ 5 rows in set (0.05 sec)
And here's the bottom five:
+---------------+----------+---------------+------+--------+--------+ | name | position | pop_score_pos | week | season | points | +---------------+----------+---------------+------+--------+--------+ | V. Young | qb | -1.34 | 5 | 2007 | -1 | | B. Gradkowski | qb | -1.34 | 5 | 2007 | -1 | | B. Leftwich | qb | -1.34 | 5 | 2007 | -1 | | B. Volek | qb | -1.23 | 5 | 2007 | 0 | | J. Kitna | qb | -1.23 | 5 | 2007 | 0 | +---------------+----------+---------------+------+--------+--------+
This stat (which I refer to as pop_score_pos) tells me on a week-by-week basis how a player performed compared to a large population of all past games. I think this might be the most valuable when looking at how a player performs against a given team (which I'll cover in a future post)
Posted by haydenth at 11:08 PM | Comments (0)
Scraping from NFL.com
It appears that obtaining game-by-game data from the NFL is harder than originally thought. First, there seems to be some kind of monopoly with stats.com - so the NFL doesn't provide any kind of open format for stats. Second, there are very few third party providers of NFL stats (unlike MLB stats). Third, the providers that do exist all want money or only provide season-by-season aggregate data. I want game-by-game summaries.
Therefore, we have to "screen scrape" the NFL.com box scores. To do this, we first need to get the unique game_id for all games, which they provide on their "gamecenter" page. Below is the script I used to scrape up all the game_ids. Basically, it's the PHP curl module + some regular expression to pull out the game_id, which we use to create "game" class and entry in our MySQL database.
// Loads the season file to pull the game ids // author: tom haydeninclude_once "config.lib.php"; $seasons = array("2001","2002","2003","2004","2005","2006","2007"); $games = array(); foreach($seasons AS $s) { // loop through each week // assume 17 weeks for( $i=1; $i<=17; $i++ ) { echo "loading season: $s \t week: $i \n"; // generate the url $url = "http://www.nfl.com/scores?season=$s&week=Week+$i"; // use curl to pull down the page $slurp = new slurp( $url ); // extract game ids preg_match_all("/.*boxscore\?game_id=(.*)\&disp./",$slurp ->output,$matches); // feed game id matches into an array foreach($matches[1] AS $m) { // load into mysql $game = new game(); $game->create($m,$i,$s); } print_r($games); } }
So, we basically end up with a table that looks like the below sample. In total, I ended up with 1,785 rows for all regular season games from 2001-2007.
mysql> select * from game order by rand() desc limit 10; +-------+------+--------+ | id | week | season | +-------+------+--------+ | 17694 | 17 | 2001 | | 27150 | 17 | 2004 | | 29343 | 11 | 2007 | | 26935 | 2 | 2004 | | 29315 | 9 | 2007 | | 26500 | 1 | 2003 | | 27052 | 10 | 2004 | | 26464 | 13 | 2003 | | 29325 | 10 | 2007 | | 18343 | 17 | 2002 | +-------+------+--------+ 10 rows in set (0.07 sec)
Next, we need to loop through all of the individual game box scores and scrape out all of the individual player game stats (touchdowns, yards, receptions, field goals, etc). This is much more complicated and the code is way to long to post on a blog. In a nutshell, I used a script similar to that above but had a ton of regular expressions and some help from Troy Wolf's class_http table-to-array script to build a massive stats table (see sample of a random John Kitna score below).
mysql> select * from stats where pid='00-0009311' order by rand() limit 1; +--------+-------+------+------------+-----------+-----------+----------+---------+----------+----------+----------+---------+---------+---------+---------+--------+--------+---------+----------+---------+---------+---------+---------+---------+----------+---------+----------+----------+---------+-------+--------+-------+-------+-------+--------+-------+-------+--------+---------+---------+--------+--------+ | id | gid | tid | pid | stat_type | pass_cpat | pass_yds | pass_td | pass_int | rush_att | rush_yds | rush_td | rush_lg | rec_rec | rec_yds | rec_td | rec_lg | fum_fum | fum_lost | fum_rec | fum_yds | kick_fg | kick_lg | kick_xp | kick_pts | punt_no | punt_avg | punt_i20 | punt_lg | kr_no | kr_avg | kr_td | kr_lg | pr_no | pr_avg | pr_td | pr_lg | def_ta | def_sck | def_int | def_ff | points | +--------+-------+------+------------+-----------+-----------+----------+---------+----------+----------+----------+---------+---------+---------+---------+--------+--------+---------+----------+---------+---------+---------+---------+---------+----------+---------+----------+----------+---------+-------+--------+-------+-------+-------+--------+-------+-------+--------+---------+---------+--------+--------+ | 215469 | 28895 | DET | 00-0009311 | Passing | 25/40 | 342 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0 | 21.40 | +--------+-------+------+------------+-----------+-----------+----------+---------+----------+----------+----------+---------+---------+---------+---------+--------+--------+---------+----------+---------+---------+---------+---------+---------+----------+---------+----------+----------+---------+-------+--------+-------+-------+-------+--------+-------+-------+--------+---------+---------+--------+--------+
Posted by haydenth at 09:59 PM | Comments (0)
Introduction
Hello. This blog is an experiment. Here's the (brief) back story. I'm a graduate student in the School of Information @ Michigan (hence the MBlog). I've been playing fantasy football every year since I was in high school (back in 2001). These days, I play against the undergrads in my old fraternity back at Michigan State University. Every year, before the season, I try to run some statistical analysis on NFL data and hope the computer will assist me in making better picks. Last year, I was in last place.
However, this year since I've started at Grad School, I've built up a wider repertoire of statistical analysis tools: grad-level stats, the R software package, courses in recommender and reputation systems. I should be able to generate better figures (especially comparative figures) than I have been able to in the past, when I had to rely on basic algebra and weightings.
Here's the plan for this year:
1. Scrape all the game-by-game boxscores for 2001-07 from nfl.com
2. Use the box scores to generate an estimated points-per-game for each player
3. Build simple z-scores (I'll explain this more later) for each player in each game.
4. Run comparative player analysis' to determine which players are good against which teams.
5. Use this information to figure out who I should draft and play in every week.
6. Scrape the league information to determine the status of other player's teams.
Also, I will plan to slap all of my findings online in a simple website.
Posted by haydenth at 09:31 PM | Comments (0)