Updating Game Score Metrics

A few short months ago, on Christmas night no less, I had a revelation that I might be able to use Game Score to create a WAR total. Here’s the post I wrote then, explaining my methodology and giving player/team/league examples.

As excited as I was to share my shiny, new Game Score WAR – “gsWAR” – metric, I was also pretty clear about my feelings on its limitations. Here’s what I wrote then…

As I said earlier, I’m not going to pretend that Game Score WAR is anything to be held in the high regard that the Fangraphs and Baseball-Reference models are. It’s a simple formula that happens to line up pretty well, but there are certainly some limitations.

One major drawback is that Game Score isn’t park-adjusted…This is a significant issue, and one I have tried to address. It’s not an easy fix, and I don’t know that I’m intelligent enough to make the necessary corrections. I might need to reach out to Mr. Tango for some help!

Another limitation of Game Score is it really depresses the scores for the best pitchers… deGrom earns the highest gsWAR in all the land, but still loses an average of 3 wins compared to the other models. That’s frustrating, but it does make some sense considering the adjustments I had to make.

Wow. Harsh!

Well, I’m learning that like player development, my brain & its thoughts don’t always connect in linear fashion. Rarely do they, actually. But I’ve been hard at work, and I’ve got a few major updates to share with you.

Becoming Analytical

I never liked Keith Law, but on the recommendation of my buddy Steve, I downloaded his book “Smart Baseball,” to my Kindle this offseason. I really enjoyed it, and it gave me new perspectives on both the author and the topics of his writing. In fact, it was Law’s book that inspired me to do a little more inspecting under-the-hood of baseball’s advanced metrics… I wanted to know more, and his explanation of how WAR is created, both for pitchers and position players, really gave me an understanding I never had before.

Reading “Smart Baseball” is what got me into doing more of my own calculations. I created WAR calculators for position players, starters, and relief pitchers, and really dove into how Fangraphs & Baseball-Reference estimate this things. Some of this I shared via my Twitter account.

About a month ago, I decided it was time for another book, and my new obsession with the analytical side of baseball led me to a choice I NEVER, EVER, EVER would have made until recently. For a long time, I despised “Moneyball” and all the hoopla surrounding the Billy Beane way. Yet there I was, downloading the novel and gaining an even deeper understanding.

Admittedly I’m not yet halfway through Moneyball, but there was a part early in the book where Michael Lewis talked about Bill James, and how in his early days he literally just took the limited box score stats available to him (hits, runs, walks, etc.) and kept adding, multiplying, dividing, crunching, and adjusting until his totals resembled team run totals. If the numbers didn’t match, he changed the calculations.

Yet again, another example of my brain working in different ways. If James, the cult hero of baseball’s stat revolution, could essentially “fiddle around” with basic stats to create sophisticated models, why would I settle for a Game Score WAR model that I didn’t even find reliable?

So, three months later, I opened the files and started adjusting…

Replacement Level

My first major revelation was to adjust what I considered a “replacement level” game score. Originally, when I created the WAR formula, it was based on a change to a metric called Advantage (ADV), which was meant to show how many points a starter was better or worse than average. In game score terms, average is 50, so lowering the constant from 50 to 40 (as Tom Tango did to change Bill James’ original game score formula) would essentially make ADV a way to measure a player’s total game score above replacement, rather than above average.

One of the major holdups, it turned out, was that my replacement-level score was too low. I’ll give you a couple examples that are close to home. Last season, Drew Pomeranz and Jeff Samardzija logged identical average game scores of 40 for the season. Pomeranz did it over 11 starts, Samardzija over 10. According to Game Score WAR, those two had a season value of exactly 0. But according to every other WAR metric out there, and for anyone who watched either veteran pitch last season, both were clearly a negative value to their team. (Pomeranz’ average WAR between Fangraphs & Baseball-Ref was -0.4; Shark’s was -0.45).

So I raised the replacement level threshold a bit, and all of a sudden Pomeranz has an ADV of -37, while Shark’s new total is -50. Remember, both of those totals were 0 under my original calculation. These weren’t the only examples. Take Pittsburgh’s Nick Kingham, who made 15 starts at a 41 game score average last year. That’s half a season of very poor production (among the very worst of any starter who made that many starts), but still a positive ADV if the replacement level is set to 40. Under the new replacement level, Kingham gets a -36 ADV.

Raising the replacement level not only helped to level out the pitchers who were overvalued, but it also allowed me to make another adjustment that would have huge implications on my WAR totals… it allowed the elite pitchers to have elite WAR totals.

I’ll explain. Under the original formula, pitchers like Jacob deGrom and Max Scherzer were being hugely undervalued in order to keep the averages in line with MLB WAR totals. For deGrom specifically, a near-900 ADV was essentially diced into tiny pieces, as 1 win required nearly 150 ADV points. So the best of the best, the guys posting real-life 6, 7, 8 WAR seasons, could barely sniff 6 gsWAR. Well, you raise the replacement level threshold, and these guys are still posting very good ADV scores despite the league correction. Now, I was able to lower the points per WAR number from 150, closer to 100, and a top tier group that was clawing its way to 5 and (rarely) 6 win totals morphs into this…

Screen Shot 2019-04-14 at 9.29.24 AM.png

Two simple tweaks. Two changes that I just couldn’t see back in December, and that I may never have contemplated had I not downloaded Moneyball, the book I vowed many years ago never to read. Now I’ve got a metric that lands right between the two big dogs (Fangraphs & BB-Ref), and has a higher correlation rate to each than they do to each other.

As Harry so pointedly tells his old pal Marv in the original Home Alone, that’s it. That’s the silver tuna.

Now that I had an updated gsWAR calculation I was proud of, I was ready to conquer even greater challenges.

Park Effects

I want to recognize something before I go any farther. Tom Tango (which can’t be his real name), the man who now works for Statcast and made alterations to Bill James’ Game Score formula, really is a genius in my eyes. I read a lot of his work, much of which is over my head, but I owe so much of my findings this year to his work. While it’s great that I found a way to take a player’s Game Score and spit out a scary-accurate WAR total, it’s no accident. It’s really a credit to Tango’s formula, which weighs each common stat (outs, hits, runs, walks, and strikeouts) just enough to make all of this work out. That’s some excellent work.

The beauty of Tango’s Game Score (which can only be found on Fangraphs individual game logs pages) is that it self adjusts during the season, so that the average for both leagues is always 50. There is one thing, however, that it does not do. It does not adjust for park environment. That’s a mystery I had wanted to solve since I first started logging Game Scores 4 years ago, but never knew where to start. Heck, look what I wrote back in January. I didn’t even think I was smart enough to tackle such a challenge.

Well, I take it back. Turns out I was smart enough after all.

A couple years back, I came up with a metric I called “Game Score Plus” (GmSc+). Plus was a really fashionable term in baseball at the time. OPS+, ERA+, wRC+… So I thought I was pretty cool, making my own plus metric. I will say, the idea was a good one. Instead of telling the reader Dereck Rodriguez had a 55 Game Score average, which may mean very little depending on your understanding of Game Score, I could say his Game Score Plus was 110, or 10% better than league average. People understand terms like that, as we generally don’t have to know all the inner-workings of the raw data to make sense  of it. It’s quick and simple.

While the idea was good, the follow through was anything but. At some point, I realized that “Plus,” the hot, new term, was supposed to indicate that the numbers were adjusted not only for the season, but for the player’s park environment as well. Game Score Plus didn’t have any kind of park adjustment… it was just a cool formula I figured out how to calculate in a spreadsheet.

I kept using it anyway, knowing that, while on paper D-Rod’s 110 GmSc+ was better than Colorado’s German Marquez (54 average, 108 GmSc+), in reality, the two numbers could not be compared that way. A 54 game score (6 IP, 6 H, 3 R, 3 BB, 5 K as an example) in Denver just isn’t the same as a 55 score (same box line, but add a strikeout) in San Francisco. Any baseball fan can tell you that, but I just couldn’t find a way to make Game Scores represent it.

Enter Fangraphs, which is really an incredible resource. Seriously, what can’t you find on that site? Well, I had gotten my gsWAR totals so close to the big dogs, I figured I could probably get them even closer if I used park factors. So off I went to the “GUTS!” pages of Fangraphs, where they house the constants & park factors for each season.

There are so many numbers on these pages, it’s hard to know what is appropriate for your assignment. I first tried to apply the park factors to my WAR totals my multiplying them together: 2.5 WAR * 0.95 (for a park factor of 95) = 2.4.  I liked this, and did it for the entire league. But I soon discovered that it was having the opposite effect for players with negative WAR totals. The ones who’s scores should have been getting better were getting worse, and vice versa. Back to the drawing board.

Another major revelation. I realized if I wanted this to work, I would have to apply the park factors to the actual game scores themselves… i.e., I would need to calculate an adjusted “total Game Score” for each player, then run that score through my WAR formulas. This was a major breakthrough, but again, it wasn’t working like I wanted it to. First I tried the 5-year park factor for each team, then I tried the 1-year park factor. My WAR scores were sure changing, but not for the better in some cases.

The Mets, for example, had a park factor of 87 for last season. Adjusting to that number, Jacob deGrom’s historic season was essentially gone (down from a 7.6 to a 6.6). Not only that, but his adjusted WAR placed him below players like Kyle Freeland, whose WAR had sky-rocketed thanks to a 112 park factor. Like I said, it just wasn’t working. And for a time, I decided to scrap the idea.

A few days later, I tried again. But this time, I noticed something I hadn’t before. Clear over to the right hand side of the Park Factor chart were a set of numbers under a column titled “FIP,” for Fielding Independent Pitching. I knew they had been there all along, but I never looked close enough to see how well they matched (or didn’t) the 5-year and 1-year factors I’d been working with. I just assumed they were all roughly the same. That, however, was not the case.

So I sorted the page according to the FIP column. The Rockies, as you’d imagine, had the highest score, a 107. The Giants, as you also might imagine, were the lowest at 95. Oddly enough, the Giants do not have the lowest 1-year and 5-year factors. In fact, their 2018 Park Factor was a 101, or above average for offense. As I looked over the FIP factors one-by-one, they seemed to be much more appropriate for what I knew about pitching environments. And more importantly, they were organized much closer together (no 30% swings from 1st to last like the other factors).

I plugged the FIP factors into my spreadsheet, and watched in amazement as the correlation rates grew even stronger. I’m no expert in correlations, but I do know that 1.0 is the very best you can get, and anything over 0.75 is considered a very strong relationship between number sets. Fangraphs WAR and Baseball-Reference WAR have a 0.877 correlation to each other. Now check out the correlation between gsWR and the big dogs.

Screen Shot 2019-04-14 at 10.33.17 AM

Screen Shot 2019-04-14 at 10.33.26 AM

Yes, my little, old, rinky-dink metric has a stronger relationship to each type of WAR than they do to each other. This is a big deal my friends.

Game Score Plus

Something else happened when I uncovered that FIP Park Factor column. As I looked over the list, examining the scores for each team… Rockies 107; Yankees 104; Brewers 102; Cubs 100; Cardinals 99; A’s 98; Rays 97; Giants 95, etc…. That’s when it hit me. “Those are the God damn Game Score Plus averages!”

I don’t know if I actually said it out loud, but I sure thought it to myself. And it sure changed everything I thought I knew about Bill James’ self-proclaimed garbage stat, Game Score. This was it. This was the answer I’d been looking for. Again, right in front of my eyes, there this whole time… development isn’t linear. But what I now knew was if a pitcher logged a 50 Game Score, I had the tools and the know-how to adjust it for any park in the big leagues.

Here comes a bit more math. The way park factors work, is the site lists them as a number that’s been cut in half, or the average of two different scores, for each team in the league. This is done because a team only plays half of its games at home. Sites like Fangraphs assume (not always 100% accurately, but probably much better than we can by ourselves) that a team’s 81 road games will work themselves out to a neutral park factor, listed as 100. So, to find the home park factor for a team, you take the listed number, multiply by 2, and subtract the road factor of 100. Here’s an example, with the Giants in mind.

Park Factor = 95 | Road Factor = 100 | Home Factor = (95 * 2) – 100 | 190 – 100 = 90

If you do this for all 30 teams, you get the independent park factor for each home stadium. Now, the Rockies have the high mark of 114, which is 24% higher than the Giants mark of 90. This makes sense with what we know. But how does it work for Game Score?

Well, assume a pitcher logs an MLB-average score of 50. Well, in about half of the league’s stadiums, that 50 really is a legitimate score. For a 100 park factor, there is no change. For a 98 park factor, you subtract 2% (which is really just 1 game score point). So, in Atlanta, for example a raw score of 50 becomes a 49. In Detroit (102 park factor), it’s now a 51. As you can see, there isn’t a whole lot of variance in many cases.

But it’s on the extreme edges where these adjustments really start to show up. At Coors, the 50 Game Score becomes a 57, or 14% higher than average. In San Francisco, 50 gets a 10% hit, which is really 45. Thinking of this another way, the AVERAGE score in Colorado is a 43 (14% below MLB average), while it’s a 55 in San Francisco (10% higher than average). That’s the biggest gap between stadiums in baseball, right there.

If Dereck Rodriguez posts a 50 in Colorado, it’s well above average (114 GmSc+). If he does it in San Francisco, it’s below average (45 GmSc+). If he does it in a neutral park, it’s just average (100 GmSc+). Hopefully this starting to make sense.

What I’ve done for the current season is create a calculator that not only gives you a player’s raw Game Score for an individual start, but it tells you what the adjusted score is for every home park, as well as the decision (W-L-ND) earned for that performance. It looks like this, and I’ll be using it every night. Wins are green, losses are red, and no decisions are yellow.

Screen Shot 2019-04-14 at 11.03.06 AM

The final adjustment in all of this is to calculate an adjusted Game Score Plus for each player and team. I will be leaving the raw averages alone, so you can see that Rodriguez’ 55 average in 2018 was only 5% better than league average (105 GmSc+), while Marquez’s 54 average was 16% better than the league (116 GmSc+). Yes, you can finally include GmSc+ in with the park-adjusted, cool kids club. Better late than never!

Going Forward

As I sit here, trying to explain my thinking throughout the hours and hours of time I’ve dedicated to logging, understanding, sharing, explaining, and now updating Game Score for pitchers, I’m reminded of something that happened recently. It was Opening Day, just a couple weeks ago, and I went over to MLB.com to pull up the day’s box scores, where I would find and log the Game Scores, as I’ve done daily for the past four seasons. Only this time, I couldn’t find them. I figured it was because the games were still ongoing, so I checked back when the score was final. Nada. They weren’t there.

It’s kind of comical, really. While I put countless amounts of time into one statistic, the baseball world removes it from its shelves and locks it up in the closet. Game Score, in MLB’s eyes, isn’t even worthy of a mention in the footnotes of the boxes anymore!

There’s a part of me that believes I might literally be the only dope on this planet who tracks and manipulates these scores, which can literally only be found in one place, on one website, and cannot be sorted for anything more than a single pitcher at one time. And you know what? It actually kind of makes me proud that I do this. It’s unique.

But the more I do it, the more I believe it’s also valuable. Take WAR for example. You aren’t a baseball fan anymore if you haven’t at least heard the term. Even if you aren’t an expert, you still have a basic idea of what it is, and that when analysts talk about it, they’re referencing how valuable a player is said to be. WAR is everywhere in this game, from talk shows, articles, projection magazines, arbitration hearings, free agent negotiations, and most certainly inside front offices. And yet, the two major sites who calculate it and list it for our consumption are measuring COMPLETELY DIFFERENT things.

For pitchers, Baseball Reference calculates WAR based on runs allowed (or runs prevented, as another way to think about it). That’s really it. Yes, they adjust for parks & defenses, but all you really need to calculate a pitcher’s rWAR is his Innings and the number of runs he gave up. Not strikeouts, not walks, not home rungs… just runs.

Fangraphs, on the other hand, barely uses runs at all in their WAR calculations. If you want to know an fWAR, you need a pitcher’s IP, HR, K, & BB. Then you can estimate how well the pitcher pitched, leaving out batted ball luck, defensive differences, and other factors that play out during the course of a season.

Fangraphs & Baseball-Reference do an amazing job, and I’m not knocking them at all. But I am saying that Game Score combines what they do, and also does a pretty damn good job of it. The way we analyze baseball has changed dramatically, but a pitcher’s line score is still very valuable, and still has many practical uses in the game. The fact we can take ALL of the components of that line score & use them to create a value total (WAR) is pretty important from my vantage point. Runs are important. Runs make the game go round. It’s good to know how many runs a pitcher is allowing. I guarantee you no matter how much the game changes, the best pitchers will always allow far fewer runs than the worst ones.

But we also know that walks, homers, and strikeouts have a huge impact on pitcher performance… and on future predictability. It’s why FIP, and other defense-independent metrics were created in the first place. It’s good to know how often a pitcher strikes out a hitter, and how well he limits walks. Game Score utilizes all of this. A garbage stat!

I don’t know where the future lies, or if there even is a future for these metrics. I don’t know if people will ever really care enough. I do know that I will continue to track pitching through my own lens. One that is now adjusted for parks, and ready to keep up in the dog-eat-dog 21st century (wait, the Rays used another opener today, didn’t they?! Where are you going?).

I’ll say this. If my work here can convince Fangraphs to put Game Score averages on their player profile home pages someday, I’ll consider it a major achievement. If not, it probably won’t change the work I’m doing. But I do hope that I convinced some of you to start following a little more closely.

And if you made it to the end, thank you so much for reading. It means more than you know. Time to go calculate today’s Game Scores!

~ Kyle


I'd love your feedback!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s