I just discovered possible filtering errors. When using the DictVectorizer to convert categorical features (e.g. opponent=DEN) to numbers (using one-hot encoding this could result in 32 binary features indicating whether the opponent is a certain team or not), the following were shown as values of certain features: (I'll mark the correct ones)
35: u'passlen=deep', <--- correct
36: u'passlen=intended',
37: u'passlen=left',
38: u'passlen=right',
39: u'passlen=short', <--- correct
...
43: u'side=41',
44: u'side=48',
45: u'side=EJ',
46: u'side=QB',
47: u'side=for',
48: u'side=kicks',
49: u'side=kneels,',
50: u'side=left', <--- correct
51: 'side=middle', <--- correct, but somehow not u'side?
52: u'side=right' <--- correct
53: u'side=sacked',
54: u'side=snap',
55: u'side=the',
56: u'side=to'
There's no immediate hurry to fix this, but please take a look at it whenever possible. I'll look into it as well.