A Problem with WAR = Defensive Value
February 17, 2019 by Michael Hoban · 5 Comments
My primary research interest has always been determining which players had the best seasons—and the best careers. That is why (since the publication of WIN SHARES in 2002) I have used win shares as the basis for my system of evaluating a player’s career (CAWS Career Gauge).
As a mathematician, it is difficult for me to understand how any metric could do a better job than win shares in evaluating a player’s season AFTER the season is over.
But, as a researcher, I try to keep an open mind. So, although I have not studied WAR in any real depth, I decided to see how the numbers for the 2018 season look between Win Shares and WAR. (I do have the impression that some people think that a player’s WAR indicates how good a season he had).
I did not get very far in the comparison before I ran into a real “problem.” And I am hopeful that someone who understands WAR reasonably well will be able to help me understand how this result is possible— that is, if there is anyone who understands WAR well.
In 2018, both WS and WAR have Mookie Betts and Mike Trout ranked #1 & 2 among position players: Betts = 38.8 WS and 10.9 WAR while Trout = 38.2 WS and 10.2 WAR. OK so far. It is in the #3 spot that the question arises. (Numbers from the BASEBALL GAUGE at seamheads.com and/or baseball-reference.com ).
WS has Matt Chapman ranked at #18 with 25.6 win shares (a very good season). But WAR has him ranked at #3 with 8.2 WAR. That is quite a difference. Is WAR really suggesting that Chapman had the third best season by a position player in 2018? It would appear so – even though this result is clearly not credible.
WS has Alex Bregman as the #3 player at 35.5 win shares (an MVP type season) while WAR has him at #8 at 6.9 WAR. Why the big difference in ranking? It would appear that inappropriate credit for fielding is the key factor.
Is it possible that Chapman’s “fielding value” (at third base) can be that important in terms of his contribution to his team? That is really the essential question. How much “value” is it possible for a third baseman’s glove to add to a team’s success ?
So, who had the “better season”—Bregman or Chapman?
Here are some “old fashioned offensive numbers.”
Bregman = .286 BA, 170 hits, 31 hr, 103 rbi, 105 runs, .926 OPS, 156 OPS+
Chapman = .278 BA, 152 hits, 24 hr, 68 rbi, 100 runs, .864 OPS, 136 OPS+
Actually, WS and WAR do agree that Bregman’s offense was superior. WAR had Bregman’s at 7.5 and Chapman’s at 5.0, while WS had 29.1 and 20.4, respectively.
OK, so what is the problem? WAR’s love of Chapman’s fielding appears to be “way off base.” This appears to be at the crux of the “problem.” WS awarded Bregman 6.4 and Chapman 5.3 for fielding. WAR awarded Bregman -0.6 and Chapman 3.5. WHAT?
It appears that WS suggests that Bregman’s value was 82% offense and 18 % defense, while Chapman’s value was 80% offense and 20% defense. WAR seems to give considerably more value to Chapman’s fielding (almost double).
The WAR numbers are not quite as clear (as usual) but WAR apparently suggests that all of Bregman’s value was in offense and his fielding was terrible (how else to interpret a negative fielding WAR?) while something like 38% of Chapman’s value was in fielding. I took a look at Bregman’s actual fielding numbers and cannot understand how his fielding can ACTUALLY SUBTRACT from his WAR—his 7.5 offensive WAR becomes 6.9 for the season. Forget “replacement” value – this conclusion defies common sense.
I think my biggest problem is with the credibility of that 3.5 WAR for Chapman’s fielding. How realistic is that? Brooks Robinson is generally considered to be perhaps the best defensive third baseman of all time (16 gold gloves). In 23 seasons, Brooks had a fielding WAR greater than 3.0 just twice. Willie Mays won 12 gold gloves in center field and he never had a defensive WAR greater than 3.0. But Chapman had a 3.5 (at third base) in 2018? REALLY?
So, you can see my problem. Is Matt Chapman really that good a fielder? And even if he is, can fielding (at third base) count so much that we are to believe that he was the #3 best position player of 2018 with an offensive WAR of 5.0 (compared to Bregman’s 7.5)? That makes no sense.
This result certainly seems to raise the question as to whether WAR is “overvaluing” fielding (especially for a third baseman). And, therefore, how valuable can it be for comparing the seasonal value for different players?
I find these to be intriguing questions. I have studied win shares and consider it to be very effective in evaluating a player’s season. But I have not studied WAR in any depth and I have to admit that a result like this raises many questions in my mind about its ability to compare players’ seasons.
Even if I were to believe that Matt Chapman had the greatest fielding season at third base ever, I still would have difficulty seeing his 2018 season as the 3 rd best in baseball. His #18 ranking with win shares seems more realistic.
Thank you for your time.
Mike Hoban, Ph.D.
Professor Emeritus (mathematics) – City U of NY
Author of DEFINING GREATNESS: A Hall of Fame Handbook
Thanks for these thoughts, Dr. Hoban. A puzzlement that I share with you. Eager to hear a response from a WAR advocate.
Somebody shared your article in a comment at High heat Stats and I wrote a response to a followup there, that I’ll share some bits here. I don’t know the details of the WS formula for fielding, so I’ll stick to explaining WAR and why there might be this big a difference in fielding performance.
Bregman played some games at SS while Chapman did not, so I’m going to try to compare just their 3B performance. Bregman played 1173.2 innings at 3B, while Chapman played 1273.2. In order to make a better comparison, I’m going to normalize their numbers to chapman’s innings played by extrapolating Bregman’s performance.
Raw:
Chapman 484 chances, 20 Errors, 464 PO or Ast
Bregman 342 chances, 13 Errors, 329 PO or Ast.
Bregman normalized: 371 chances, 14 errors 357 PO or Ast.
So, it looks like Chapman fielded a LOT more balls at 3B per inning than Bregman did. One key question is why? Did more balls get hit right to Chapman, or did he *get to* more balls? It depends on how the scorers are judging things. One things we know from experience is that scorers tend to treat balls not reached as non-errors, even if an average fielder for that position should have gotten them. They are (relatively) good at deciding whether the play should have been made, once the fielder is there, but they are generally terrible at deciding whether the fielder should have been there to attempt the play in the first place.
So one way of looking at these stats is to say that every single extra chance is a hit turned into an out! If this assumption were accurate, the difference would be a LOT more than 3.5 WAR!
What would you say about two batters, who had exactly the same number of walks, strikeouts, HR and Balls in play, but one got 207 hits and the other got 100? Would you be surprised if there were a 3.5WAR difference? I would! I would expect the difference to be far greater! The linear value of a hit v. an out is ~.8 runs and the run context of the 2018 AL is 9.12 R/G. So that would suggest a difference of closer to *9* wins from Chapman’s performance in the field.
Obviously the real trust is somewhere in between. What B-R does to get to it’s rField number is use BiS data on balls in play with experts reviewing each play and making a determination of what an average fielder would/should have done. It is that data which determined that Chapman’s extra chances were not worth 85 runs, but only around 29 vs. average, and ~35 vs. extrapolated Bergman. This means the BIS data suggests that most of Chapman’s extra chances were due to more balls hit his way, rather than due to his better range.
Note also that dWAR includes a positional adjustment based on the average difficult of the position played, and how many innings are played at those positions. Bergman gets an extra run in positional adjustment for playing 217 Innings at SS (a harder position than 3B).
In any case, I’m not going to say that WAR fielding is perfect, but it’s not crazy, and it’s based on some relatively sound principles.
It’s also important to understand that fielding numbers in B-R WAR from before fairly recently are based on Total Zone data which is much less precise, because we simply don’t have the data and video available for games back then, only whatever limited notes were made about the zone balls fell in. So Brooks Robinson’s numbers are much more subject to big errors than Chapman’s (IMO and I think most sabermetricians who study fielding metrics would agree). That said, Brooks did have 2 years where his rField and dWAR were higher than Chapman’s this year, in 1967 and 1968 with rField of 33 and 32 runs respectively and dWAR of 4.2 and 4.5. The higher dWAR in 1968 is due to the historically low run/game context, so one less run saved, translated to more wins.
it seems fairly likely to me, that if every Brooks Robinson game had been sent to video and examined by BIS experts, he might have had a few more seasons judged as good or better than Chapman’s, but we’ll never know.
One thing that has become very clear when people have studied fielding in more depth is that some of the best fielders make it look so easy that it’s not obvious they are as good as (or even better than) flashier players.
BTW, I heartily recommend spending some time digging into the public information on B-R about exactly how WAR works and is calculated. They break out batting, baserunning, fielding etc. on their site, and there are articles discussing how the numbers are arrived at.
It looks like I’d have to do a lot more digging to be sure I understand what’s going on in the WS fielding formula (honestly WS makes the WAR formulas seems fairly simple), but at first glance it appears to treat traditional defensive metrics fairly straight rather than making an attempt to determine better than traditional scorekeepers what should have been a made play vs. not. I’m not surprised that this results in little difference between fielders with comparable error rates, but unless I’m missing something it’s not going to capture range differences at all, which is what the various different fielding WAR component formulas are designed to do.
Anyway, I hope I’ve given you an idea of why I think it’s 100% reasonable that a very good fielder could be worth 3-4 wins more than an average fielder at the same position.
I’m still not 100% sold on WAR fielding numbers either, and I don’t think anybody really is, but I do think it’s *much* better than relying solely on traditional fielding numbers. As much better as using linear weights vs. batting average and RBIs for offense.
I don’t think it defies common sense that Bergman’s fielding could subtract from his WAR. Why do you think that? If his fielding value is below average, that would make perfect sense. Remember that WAR is defining a replacement player as somebody on a AAA+ team that might get ~40 wins over the course of a season, and there is a separate component (RRep) based on PAs/Innings. Every other component, players are compared to league average. If you beat league average, you get + numbers, if you are worse, you get -numbers. WS is going from zero, so there are no negatives.
I also don’t think it defies common sense that Chapman’s fielding could be worth 29 runs above average. Yes, it suggests that he was roughly as valuable on the defensive side as Brooks Robinson in his prime. Is it really reasonable to dismiss that possibility out of hand, when he has gotten so many more fielding chances than typical 3Bs per inning? I don’t think it is. On HHS, someone compared him to Kyle Seager, there’s the difference in # chances is smaller, but still *plenty* enough to be worth as much as 6 WAR if all extra chances were hits turned into outs.
Turning 30-40 hits into outs is HUGE extra value, and yes, that could easily make up the difference between a 156 and 135 OPS+ and make Chapman the third most valuable player in the league.
Can we 100% trust these fielding metrics? No, but I’ve got no particular reason to believe they are *over*estimating value, rather than *under*estimating it.
Professor Hoban,
I used to hang out in these parts, but haven’t done so as much recently. Anyway, I was, like Michael Sullivan, turned on to this particular post by a comment at High Heat Stats.
I want to discuss something I don’t think I’ve ever mentioned on your posts before, but that I think is fundamental and necessary to understand, and it’s the major flaw in Win Shares’ defensive evaluation: the lack of negative numbers.
When Win Shares divides defensive credit, it divides a whole team’s defensive performance and distributes to each player. Most of the time – maybe 75% of the time – this works fine. But in a non-insignificant percentage of cases, the problem is that great defensive players are harmed and poor defensive players are rewarded; everyone is pushed toward the middle.
As an example of what the equivalent would be like, think about Barry Bonds. In 2002, he played as good of an offensive season as anyone in history. What if, instead of evaluating Bonds as an individual, we looked at San Francisco’s overall offense (third in the league with 787 runs) and distributed credit for those. OK, no biggie, right? Bonds was the best player, and he’ll get the most credit. That’s true. But Tsuyoshi Shinjo would, for example, also be credited with positive impact, this in spite of his well-below-average OPS of .664; Shinjo was NOT a positive contributor, he was a negative one. So, in some sense, by dividing offense at the team level, Tsuyoshi Shinjo would be receiving positive credit for the things that Barry Bonds did!
That would be a silly way to measure offense, and since we have better tools, that’s not what Win Shares does on offense… but it IS what Win Shares does on defense!
The other way to think about this would be to do a different thought experiment. What if you imagine a team (Team A) with the best infield defense possible – every ground ball turned into an out. Their outfield, however, is the WORST you could possibly imagine – every fly ball is turned into a hit. Now, let’s say that, due to the performance of their pitchers and the particular ground-ball/fly-ball tendencies of the staff, they allow league average defensive runs (around 650, let’s say). How do you distribute those defensive Win Shares =, which will be based on marginal runs – about 325 runs in this case)? The obvious answer is that you should distribute all defensive Win Shares to the infield… okay. No imagine a second team, Team B. They have a similar infield; all ground balls are turned into outs. Yet, unlike Team A, all balls in the OUTFIELD are turned into outs, as well. They would still allow SOME runs (HR, walks, HBP) – let’s call it 150 runs. So how do you evaluate THAT defense? You’d have 825 runs to distribute… in other words, 500 more than Team A. Now, do you give all 500 of those to the outfield? Probably not, because they weren’t necessarily responsible for ALL of those runs. So, in other words, you’re left with two identical infield defenses, but you’re forced into a situation in which these two defenses receive different amounts of credit for the same performance.
If, you believe, however, as WAR does, that you can give NEGATIVE numbers to players for their performance, you CAN end up grading these two defensive infields identically.
Theoretically, by the way, this would be solved if James ever published Loss Shares and the calculation thereof. However, that remains proprietary, so we’re stuck with what we’ve got; and what we’ve got makes WAR the better system, because it rewards the best players and harms the worst players in a way commensurate with their actual skill.
Your “problem with WAR” is not about how WAR “values” defense. WAR values runs and therefore treats defensive runs and batting runs and base running runs and pitching runs all the same. Runs are the currency. You object to the defensive metrics that go into WAR calculations. That is a fair point and one worthy of argument. From reading your posts, it appears that you disagree with the fundamental value propositions of WAR vs Win Shares, which, again is perfectly reasonable. However, you are confusing your dislike of the currency (runs vs wins) with the system, itself.
You don’t believe in 3.5 wins as a fielder. Maybe that number is accurate or maybe it isn’t. Maybe it overestimates fielding value or maybe it underestimates fielding value. But you very clearly believe in the value of offense. So it is up to you to explain how there must be a limit to defensive runs, despite there being no limit to offensive runs.
THis quote from Michael SUllivan:
“The linear value of a hit v. an out is ~.8 runs”
How do we get to there? I am operating under the assumption a single is worth 0.45 runs and an out -.23. Together that’s about 0.7 difference. HOw does he get 0.8? thanks