Tuesday, March 13

Sport Nerd Challenge

The time has come for NCAA basketball tournament brackets.  People across the country are devoting untold hours researching and discussing basketball this week.  Hundreds of millions of dollars will exchange hands.  But I have a nerdier question that simply who will win basketball games.  I want to know what the best way is to determine who wins the brackets.
 
Now, "best" is a subjective term here.  So my idea here is that a non-zero number of you will submit a scoring system for dealing with bracket picks, along with an argument as to why that scoring system for brackets is superior to all others.  Some definition of what your overall purpose is may be necessary in your justification.  I present below a few common scoring systems, along with a brief discussion of some pros and cons.
 
Note: For all discussion here, the "first four" games are ignored and the 32 Thursday/Friday games are considered Round 1.  (This is not a requirement for your comments, but simply what I'll be doing.)
 
System #1: 2^(Rd#-1)
This is the most common scoring system.  Each game within a round is worth the same number of points, and each round is worth the same total number of points (32).  This system is simple, easy to implement, and widely accepted.  But if the best argument for something is simply that doing it differently is hard, that's not very convincing.  In general, games in the later rounds do need to be worth more than games in the earlier rounds; at the time of picking, we don't even know who will be in the championship game, let alone who will win, so picking the winner is fairly difficult.  However, this system awards the same number of points for getting the winner correct as for getting all 32 games correct in the first round.  Which is harder?  I suspect that any given year a non-trivial percentage of brackets get the winner right, perhaps varying from 5-20%.  However, how many get the entire first round correct?  Out of the millions of brackets entered on ESPN.com, only a very small handful.  So, what is more deserving of 32 points last year, picking UConn to win it all, or picking Butler, VCU, Gonzaga, Richmond, Florida State, Morehead St and Marquette to win, while at the same time not picking Princeton, Michigan St, Memphis or Missouri to win?  The tougher task should be rewarded accordingly.
 
System #2: Rd#
The next most common and simple system is increase the value of each game by one point in each round.  Rather than increase geometrically (1, 2, 4, 8, 16, 32), they increase arithmetically (1, 2, 3, 4, 5, 6) which serves to over-emphasize the opening rounds.  Again, I think the important question is to ask which is harder: guessing the champion, or getting an additional 6 games right in the opening round?  A further downside to this scoring technique is that is allows some people to build large advantages early on, which become almost insurmountable later; the contest isn't too exciting if the championship game isn't enough to bring you back significantly.  System #1 has the opposite problem, of course, where there is almost no lead that is safe: the final 2 victories of by the champion are worth 25% of the total points available, which generally corresponds to well over 1/3 of any individual score.  It doesn't seem right that such a huge percentage of your points hinge on the final 2 games in a 63 game tournament.
 
System #3a: Seeds
System #3b: Difference in Seeds
The idea behind these systems are to award risks by offering bounties for picking upsets.  A point value of a victory by a given team is equal to that team's seed, i.e. a 12/5 upset is worth 12 points.  (Or, in system 3b, it would be worth 7 points.)  Victories by favored (or evenly seeded) teams are worth a single point.  These systems risk over-valuing upsets to the point that bracket pickers are encouraged to simply pick every upset.  Statistically, a few will hit, and if the bonus is big enough, it doesn't matter that you missed on the majority of them.  A scoring system shouldn't favor mindless picking of lower seeded teams any more than it should favor mindless pick of higher seeded teams.
 
So where does that leave us?  Its time to start combining the best aspects of the various systems.  Pay attention here, because the math gets slightly trickier.
 
System #4: 2^(Rd#-1)+(Seeding Difference)*Rd#
This is just system #1 with an added bonus for getting upsets, which scales through the rounds.  Picking 11-seeded VCU to win in the first round last year would have required guts (and luck) so we want to reward that with more than a single wimpy point.  So you get 1 point, plus the difference between their seed (11) and their opponents seed (6), for a total of 6 points.  But to pick them to win a second game?  Even less likely.  So, their victory over the 3 seed was worth 2^(2-1)+(11-3)*2 = 2+16 = 18 points.  But here is where the quirk of this method kicks in.  In VCUs next game, they played 10th-seeded Florida State (having just knocked off the 2-seed (ND)).  VCU was (by seed) basically a coin flip to win the game, so that victory was worth only 2^(3-1)+(11-10)*3 = 4 + 3 = 7 points.  Perhaps the craziest thing about a system like this is that going into the tournament, it is impossible to know how many total points there are going to be, and that victories in different rounds can be worth very different amounts.  The most valuable win last year would have been VCU's next game where they beat 1-seed Kansas for a 48 point victory.  (That's 2^(4-1)+(11-1)*4 = 8 + 40 = 48 points, if you didn't want to work that out in your head.)  The final game, as it wasn't an upset, was worth 32 points, which tied with Butler's upset of Florida.  The next most valuable games were 3 2nd round upsets at 18 each.  This system certainly gets hard to track in your head, because so many different things can happen, and opponents affect point values.  A weakness is that the 8/9 games become fairly mindless.  9-seeds actually have a slight advantage historically, and a victory over a 9 seed is worth 2 points, whereas the 8 seed winning is only worth 1.  You really should just pick all the 9s.  However, this is only going to net a few points, which probably won't be too consequential.  (A perfect bracket in last year's tournament would have been worth 410 points, though, if Butler had won the final, it would have been worth 30 more points than the UConn win.)
 
So, what have I missed?  What crazy idea do you like?  (And yes, I know the tournament starts in earnest in about 36 hours.)

5 comments:

tysqui said...

I really like your idea for the scoring system. Now the question for me is how hard would that scoring system be to program into Excel...

For the last 20 years or so, our family brackets have been scored 1 point for a win, 2 points for an upset win, consistent through all rounds. This system still tends to benefit those who pick the first and second round best. It's hard to come from behind.

Ben said...

So I find picking the winner to be less interesting than figuring out how the game will go.
For example, if you and I both pick different 12's to beat 5's and mine loses by 35 points and yours loses in quadruple overtime after the winning basket is mistakenly waived off, then we haven't made equivalent errors.
Secondarily, if I anticipate that with their spotty play Gonzaga will win their first round came then flame out in their second game, then I've picked something up even if the wrong team wins.
So, here's what I'd get people to do it my way:
In every game your picking both the winner and the loser and whatever bonus is available for that game, you get half for each part you pick right.
I would use some kind of weighting toward rounds though honestly, I'm not sure what the correct values are. I think here it depends on what your goal is. Here's one option, every round (not game in a round but round) has x times the points of the previous round. Perhaps for an x about .7.
Now, the points are distributed based on the spread of the game and the spread in the seeds. The exact smoothing formula is again a little bit of a parameter but imagine something like this:
If the game is won by the higher seed, then the game has a point value equal to the spread times the difference between the seeds[*]. If the game is won by the lower seed then the game has a point value equal to the spread (perhaps scaled by some factor which decreases based on the upset type). For example, suppose 1 beats 16 by 10 points. This is worth .675. If 16 beats 1 by 1 point this is worth 1. Though both of those numbers might need to be scaled. At the end of the round we can figure out how to divide points. If this was the entire first round then the round score 1.675 points. We need to normalize that to the available number of points which is at this point just an arbitrary number. Thus 5/26 of the points go for picking the 1 to win and the 16 to lose and 5/13 for picking the 16 to win and the one to lose.

[*] This probably needs to be flattened. Perhaps the logarithm of the seed?

Sabrina said...

I have no opinion. I am nerdy, but not in this sense so I don't even want to think too hard about it. What I find entertaining about this post is that is soooo you and that Tyler was the first to respond is sooo him. I love it!

Sabrina said...

Haha, and as I post this, Ben has posted second with a lengthy explanation which is soooo him :)

Clark said...

I note that Ben and I both got 4 Os in "soooo you", whereas Tyler only got 3. (Sooo you.) I'm sure no offense was intended.

As I've analyzed the scoring system I put forth, I'm a bit worried that it over-values upsets. I've managed to convince one person who is running a bracket to adopt the scoring system, so we'll see how that turns out, and whether anyone calls for my head. I analysis has been somewhat influenced by last years tournament, which was not typical with an 11 seed and an 8 seed meeting in the final 4. Better research through the last 27 years of the tournament could produce win/loss rates of each matchup which could then be used to weight the upsets "correctly".

To answer Tyler's question: absolutely, this can be put into Excel, and I plan on doing that, perhaps tonight. (Though I'll probably do it in Google docs.) The question is whether I want to put data in for the last 27 tournaments to look over time.

And finally, someone asked me a question that's sure to mess up my day: what's the highest possible score with my proposed scoring system? Does someone want to volunteer to work that out so I don't have to?