Introducing Game Score WAR

I’m really excited to share a major statistical breakthrough with all of you. It’s a lot to cover, so I’m going to apologize in advance if something doesn’t add up. This is something I plan on referencing often in the future (via this blog, my Twitter account, etc), so I need to make sure I provide the background information here first.

Long story short, I found a way to convert pitcher Game Score into WAR totals… and I didn’t really even mean to!

Now, websites run by people who are a whole lot smarter than I am already do a fantastic job assigning wins to players with their own complex WAR formulas. I’m not going to pretend for one second that this “Game Score WAR” even approaches the sophistication of those models… but what I am going to tell you – and the reason why I’m so excited to share this – is that these Game Score WARs actually line up really well with both Fangraphs & Baseball-Reference. Like, it’s kind of blown my mind.

Some Background

I’ve been tracking Game Score (which is only assigned for starting pitchers) for a few years now. At first, I was using the original Bill James formula as a way to compare pitchers throughout baseball history. With Play Index, it was easy to pull lists of GmSc averages for individual seasons. Then I stumbled upon an article that suggested using scores to calculate a “true” win-loss record for today’s starting pitchers. Since Game Scores for individual starts were readily available in Baseball-Ref’s game logs, this was something convenient that I kind of jumped into head-first.

I’ve since changed over from the James version of Game Score to Tom Tango’s updated method (Version 2.0), which can be found in pitcher game logs on Fangraphs and in MLB.com’s box scores. The changes are explained well here, but what led me to make the switch was Tango’s version was designed with replacement-level in mind, which I’ll expand more on later. Also, version 2.0 adjusts in-season so the average Game Score is always 50. Doesn’t matter the year, the league, or the opponent. 50 is always average. That’s a huge change, because it allows for much easier comparisons between seasons.

I’ve tried to fiddle with the numbers and create some of my own goofy Game Score metrics during the past few years, but mostly I’ve just been using game logs to keep my own pitcher records. It’s tedious, but I love it, and it’s pretty cool in an era where pitcher wins & losses are pretty meaningless, to say Jacob deGrom’s 28 “Game Score Wins” narrowly edged out Max Scherzer’s 27 last season. What better way to demonstrate a pitcher’s dominance than to say he posted a 28-1-3 (the 3 stands for no-decisions, which are roughly average starts) record? In a time when 20-win seasons are rare, it’s fun to point to Game Score records and show the 11 pitchers who topped the “20-win” mark in 2018.

It’s not anything ground-breaking, but that’s where my interest (obsession?) in Game Score had taken me… until last week.

The Breakthrough

One of those goofy metrics I mentioned above was something called “Advantage” that I made after reading a Game Score-related post on Bill James’ own blog. The idea was to find a way to show how much better one starter was than his counterpart during the current season. It was pretty simple, you just took the pitcher’s total Game Score accumulated and subtracted out 50 (the league average) for each start. So a league average pitcher would have a 0 advantage, right?

Apparently I wasn’t ready to unlock the full potential of this idea, and advantage totals (abbreviated ADV in my spreadsheets) eventually became just another set of numbers I stopped paying much attention to.

Then, on Christmas night, and for some reason which I really can’t explain, it hit me. Change the constant in Advantage from 50 to 40, just as Tom Tango adjusted the starting mark from 50 in the original Game Score to 40 in his updated version, to better reflect replacement-level performances. Now, instead of measuring how much better (or worse) than AVERAGE a pitcher is, advantage measures how much better the pitcher is than REPLACEMENT.

And that little change, my friends, would unlock a world of possibilities. No big deal…

It’s probably time in the story for a real-life example. After my “a-ha” moment, I started calculating 2018 advantages for the big dog starting pitchers. deGrom-896; Verlander-850; Scherzer-825… Then the idea, give them a win for every 100 points. deGrom 8.96; Verlander 8.5; Scherzer 8.25… Considering both deGrom and Scherzer were worth roughly 9 rWAR (Baseball-Reference) each, I felt like I might have really been onto something.

Well, it turns out you can’t just divide every starter’s advantage by 100, unless you want the most inflated, unrealistic WAR scores league-wide (Jhoulys Chacin, 4.5?). And being the reasonable fan that you are, you’d have seen right through me and moved onto the next bogus idea.

So I began adjusting, recalculating, comparing. And slowly but surely (and at the expense of the upper echelon pitchers like deGrom, Scherzer, and a few others unfortunately), my league totals started to creep closer and closer to Fangraphs & rWAR. You can imagine my giddiness as this was happening.

The result, I decided, would fittingly be called “Game Score WAR.”

Game Score WAR

To calculate Game Score War, which I’ll refer to from now on as gsWAR, you’ll need a pitcher’s games started, Game Score average, and his advantage (ADV). Here’s an example.

Dereck Rodriguez | GS = 19 | Avg GmSc = 55 | Total Game Score = 1,045 | ADV (Total GmSc minus Replacement) = 285

Once the advantage is calculated, you divide by the constant, which I have set at 145. During the entire process, I tried to keep in mind the idea of league average. What exactly is league average? Well, a pitcher who makes 30 starts with an average GmSc of 50 (the MLB average mark), would end up an advantage of 300. If the constant is 100, as I first tried with deGrom & others, that pitcher with 30 “average” starts is a 3-win player. That’s too high. Make the constant 150, and you’ve got a 2-win player, which is just about where we want him to end up. I settled at 145 ADV/win, because it was just enough of a difference to push pitchers like Corey Kluber, Chris Sale, and Gerrit Cole into the 5 WAR mark. If I couldn’t find a way to make the elite arms 5-win players, I probably couldn’t justify any of this.

Ok, back to the example using D-Rod. We take the advantage, 285, and divide by the constant, 145. What you get is 1.965, rounded to 2.0, for a league average, 2-win player. fWAR (which uses FIP instead of ERA) gives him a 1.5, rWAR says he’s 2.4. That means the good old gsWAR is right in the middle, which a perfectly fine place to be.

So it works for one pitcher for my favorite team… but how well does it hold up league wide? I think the answer may surprise you.

One important factor to keep in mind here is there are pretty significant differences between fWAR and rWAR, especially when you compare individual players and even certain teams. The Padres rotation, for example, earned 3.9 combined WAR this year according to Fangraphs, while Baseball-Ref assigned them NEGATIVE 3.4 WAR. That’s a 7-win difference, for one team. So you can understand why after a while I wasn’t concerned about my numbers being a bit different, whether on the high or low end.

There were 343 different starting pitchers this season (including openers). rWAR valued them at 314.8 combined WAR. That’s 0.92 per pitcher. According to fWAR, they were worth 329.7 combined WAR, 0.96 per pitcher. My final calculations put the league value at 334.4 gsWAR, or 0.97 per SP. Yes, a simple formula using IP, H, R, HR, BB, and K gets you within one-tenth of a win per pitcher of Fangraphs WAR calculations. I know, it shocked me too.

Here’s the percent breakdown of WAR for all of MLB, with a few notable players from each tier. Note that a “2-win” player includes all scores between 1.6-2.5 for this exercise, and so on.

Screen Shot 2018-12-30 at 8.04.34 AM

It was eye-opening to see the scores broken down like this. When 3/4 of ALL starting pitchers gave their team below average WAR, you can understand why teams are valuing bullpens more than ever, and exploring different methods of pitcher usage. It all goes hand-in-hand, from where I’m standing.

Here’s how gsWAR stacks up against the big dogs. You can see the Game Score model definitely tends to pull more players toward the middle, or average tier. I don’t love that aspect of it, but that’s how it had to be in order to keep league totals in the ballpark… Pitchers are allotted 430 total WAR per season (570 WAR goes to hitters), with a chunk of that number going to relievers. Maybe you knew that, but before doing all of this, I didn’t.

Screen Shot 2018-12-30 at 8.31.16 AM

So I think you can see this model holds up well enough for me to go forward with it. To me, the possibilities are endless, from historic comparisons, finding overlooked value, player profiles, etc. But one thing that really excites me is the fact that I can now calculate a pitcher’s WAR in-season while I track his Game Score record. For someone who’s wanted to make Game Score more meaningful as an evaluation tool, this is some really exciting stuff.

Here’s Dereck Rodriguz’s 2018, by start. The Advantage and WAR look a little different than the numbers I used earlier, because those were done using his Fangraphs Game Score, rounded to 55. Calculating the average by individual starts is obviously much more precise, but is also time-consuming, as it must be entered by hand.

Screen Shot 2018-12-30 at 8.50.13 AM

If you project D-Rod’s numbers over 30 starts, he earns 3.2 gsWAR… identical numbers earned by both Clayton Kershaw and Kyle Hendricks this year. Remember, fWAR doesn’t like Dereck as much because of his elevated FIP. Whether he can continue his statistical production next year or will regress is probably one of the biggest issues concerning the 2019 Giants pitching staff right now. But breaking his numbers down per start really does show how good he was last summer.

Limitations

As I said earlier, I’m not going to pretend that Game Score WAR is anything to be held in the high regard that the Fangraphs and Baseball-Reference models are. It’s a simple formula that happens to line up pretty well, but there are certainly some limitations.

One major drawback is that Game Score isn’t park-adjusted. rWAR gives the Rockies rotation 9 more wins than gsWAR does. Kyle Freeland & German Marquez are two of the 6 most punished pitchers by this system (Freeland gets a 3.6 gsWAR and an 8.4 rWAR). There are other teams that don’t match up well at all, likely because of their home environment. This is a significant issue, and one I have tried to address. It’s not an easy fix, and I don’t know that I’m intelligent enough to make the necessary corrections. I might need to reach out to Mr. Tango for some help!

Another limitation of Game Score is it really depresses the scores for the best pitchers. When you subtract fWAR and rWAR totals from gsWAR, and average the difference, there are exactly 10 guys who score less than -1.0, meaning they are the most undervalued by Game Score.

Here are those 10 players…

Screen Shot 2018-12-30 at 9.23.49 AM

So the most punished starters include nine studs and Seth Lugo, who the Astros are calling the Mets about… Obviously he must be undervalued. Nice job Game Score! Anyway, deGrom earns the highest gsWAR in all the land, but still loses an average of 3 wins compared to the other models. That’s frustrating, but it does make some sense considering the adjustments I had to make.

At 145 ADV/WAR, a single start of 100 Game Score earns a pitcher just over 0.4 gsWAR. That’s no-hitter status, and when multiply it by 32 starts, you get about 13 wins. Now there’s no way a pitcher will ever throw 32 no-hitters (or similar production) in a single season, but if he did, you’d like to think he’d be worth more than 13 wins… Especially considering Aaron Nola’s 2018 was worth 10 wins, according to Baseball-Reference. Is there a way to fix this? The only idea I’ve come up with is to create a special exception that adjusts the Advantage constant for pitchers who reach a certain threshold. I’m not sure about the fairness of something like that though.

Final Thoughts

Ok, I know this is a lot to take in. As mentioned earlier, I hope I was able to communicate this process clearly. Yes, I do have plenty more to share, like WAR scores for the Giants over the past few years, career scores for Bumgarner, Cain, Lincecum, and others. But we’ll save those for another post.

I’ve also found a way to use these calculations for projections like Steamer. So, yes, I can now pretty confidently project a pitcher’s WAR… which is so, so exciting to me. I hope you’ll join me on this journey, ask questions, and offer suggestions to help me improve this metric.

Oh, and I’m trying to keep up with the changing times by finding a way to calculate WAR for relievers using this method. So far, the results aren’t great. More work is definitely needed.

My goal is and has been to make Game Score relevant, and I think we just took a major step forward on that path. Thanks for reading!

Advertisements

2 thoughts on “Introducing Game Score WAR”

  1. This is pretty interesting. As you said, there is a lot to take in, so honestly I’m going to have to reread this a few times! 🙂

I'd love your feedback!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s