
|
It's been about 14 years since my system for calculating pitcher wins above replacement was first put together. The last season I've updated on this site was 2009, and since then Baseball-Reference has used the system for their pitcher WAR. A quick summary of the system:
For most pitchers, the system seems to work well, or at least reasonably. There are some cases where it breaks down, and in nearly every case the problem stems from the estimation of #3, defensive support. The system takes a simple approach. Start with the defensive runs rating for the team, and prorate to the pitchers on that team based on the percentage of balls in play against that pitcher. It didn't have to be this way. When I started this, I used Total Zone runs as the defensive component. It would not have been too hard to calculate Total Zone separately for each pitcher, and then we could have wildly different defensive support numbers for different pitchers on the same team. I chose not to do that because I didn't think TZ was accurate enough to say that one pitcher received 5 runs of defensive support, but the fielders were 10 runs worse than average for his teammate. The way pitcher WAR is set up is an answer to this question: If Aaron Nola had been hurt for 2018, and replacement level pitchers had replaced his 212 innings, how many runs could we expect them to allow? In this situation, we should expect they would receive typical defensive support given the team's defense. That's not really the question people are thinking of when they look at pitcher WAR, however. They want it to say what actual defensive support Nola received from his teammates. In 2018, Aaron Nola was one of the top 3 pitchers in the National League, along with Max Scherzer and Jake deGrom. The 3 all pitched in the same division, with similar workloads, between 212 and 220 innings. With 48 runs allowed, at first glance deGrom looks the best, compared to 57 for Nola and 66 for Scherzer. In WAR, however, Nola pulls ahead with 10.2. deGrom has 9.9 and Scherzer 8.4. Scherzer's Nationals rate as an average defensive club, with the Mets (-31) and Phillies (-82) rating below average. Since they received poor defensive support, they get a credit for pitching under bad circumstances in WAR. In Nola's case, we're adding 11 runs, or about an extra win, to his credit beyond the actual runs allowed data, because we think his defensive support was bad. But was it really? Nola allowed a .254 batting average on balls in play, well below the league average of .295. Seven batters reached on error when he was pitching, which was close to a league average number as well. Based on these numbers one might argue that Nola received about 18 runs worth of good defensive support, and this should be subtracted from his WAR. What I should have done is to take a combination of two estimates. The first estimate is the overall defensive runs for the team, the second is the results of balls in play when that pitcher was on the mound. Some of the tools to figure out how best to do this however were not available at the time, and the credit for this idea belongs to Phil Birnbaum. Birnbaum posted a series looking at the Nola situation on his blog a little over a year ago. Here are the links: Phil uses the Statcast Outs Above Average in his work. This is the best publicly available defensive system. We don't have to guess where a fielder should have been playing, or need large sample sizes to even out the defensive opportunites. OAA knows exactly where a fielder was standing, and how far away from him the ball was hit. In addition, it can be looked at for each individual pitcher. To estimate a pitcher's true defensive support, we have a reliable target number to compare. Phil found that by using the team OAA rating, and the pitcher's record on balls in play, he can estimate the true defensive support for the pitcher with this equation:
I looked at pitchers with 150 or more innings from 2015 to 2021, and was able to confirm those results (the .64 r above is from my sample). However, if you are using Statcast, you don't need a regression model to estimate pitcher defensive support, you already have it. How about for years before 2015? I used the same sample and tried both DRS and TZ, combined with the pitcher's record on balls in play, to find out which combination best predicts Statcast OAA for the individual pitcher. I get these equations:
TZ .60 * pitcher + .40 * team (r = .50) That doesn't look good for DRS. The correlation is not that different, but the coefficient for DRS is so small that it's not adding much useful information here. TotalZone is more useful in the equation, and gives a slightly better correlation, so it looks like a better option for years 2003 to 2014 (before we can use Statcast). For years before 2003, we don't have DRS anyway, so TotalZone will have to be used. Now let's look at a pitcher whose case is the mirror image of Nola's, Jim Kaat in 1965. Looking at his traditional pitching stats that year, Kaat looks very good. He won 18, lost 11, and had a 2.83 ERA (126 ERA+) in 264 innings. That's what, a 5 WAR season? The Twins thought he was pretty good, enough to give him the ball for 3 starts in the 1965 world series. The matchup wasn't easy, he was going against Koufax. Jim won the first matchup, but lost the next 2 (including game 7) as Koufax allowed only 2 runs in 24 innings. I looked for pitchers with an ERA+ of 126 and between 250 and 280 innings, there have been 14 such seasons besides Kaat, with WAR between 3.9 and 6.5. Jim Kaat's season however checks in at only 0.4 WAR. There are two reasons for this. One is Kaat gave up a ton of unearned runs, and WAR does not make a distinction between unearned and earned runs. This is by design. Kaat's ERA was 26% better than the league, but he allowed 4.12 runs in all, which was a bit worse than the league average of 3.94 (an RA+ of 96). The other reason is that Kaat is penalized by playing for a good defensive team. The 1965 Twins were 70 TZ runs above average. They turned balls in play into outs at a .724 rate, above the league average of .715. I use all runs, not earned runs, as the basis of WAR. One reason is that this avoids the subjectivity of the error. Runner on third, 2 out, grounder to short. Shortstop gets a glove on it but can't field it cleanly, and the runner comes in to score. Let's say the exact same play happens in two cities. In one, the play is scored as an error on the shortstop. This run is unearned. In another city, the scorer calls it a hit. Here, it's an earned run. Even with that subjectivity, I could consider the case for leaving unearned runs out of WAR. However, there's another twist. Nobody on, 2 out, grounder to shortstop is booted for an error. The pitcher then walks the bases loaded and gives up a grand slam. He then walks them loaded again, and gives up another slam before he is finally pulled. That's 8 runs in all. How many are earned? None of them. Baseball has taken the position that the error should have ended the inning, so anything that happens after that point, even if it has nothing at all to do with fielders like walks and homers, would not have happened if the play had been made. But if the play had been made, wouldn't 7 of the runs have scored next inning? The first batter who walked would have led off the next inning. Baseball tells us that we can't assume the same events would have happened if the inning had ended differently. Sure, I sort of see the point, but they are making a far bigger assumption in wiping those batter events completely off the pitcher's record. If the rules were set up so that in this situation, the guy who reaches base on an error is an unearned run but all the following events were earned, then I might come around to reconsidering how to calculate pitcher WAR. Kaat in the 1965 season is shown as benefiting from 14 defensive runs saved. The team was +70, and he was on the mound for about 1/5 of their balls in play. Did he really benefit though? His BABIP was .281, compared to a league average of .269. In addition, 28 batters reached base on errors, while an average number for his time on the mound would have been 16. Let's work through a new method here:
Doing this for his whole career, his defensive support changes from +49 to -48. He was at 45.2 pitching WAR, which would improve to 54.9. Add in another 5.6 for being a better than average hitting pitcher, and he's up to 60.5. I'm glad he finally got in the hall while he's still around to enjoy it. It's been 39 years since he threw his last pitch, that's even longer than his career lasted.
This page was last modified 04/02/2022 |