# Predicting pitchers’ walks using xBB%

The other day, I discussed predicting pitchers’ strikeout rates using xK%. I will conduct the same exercise today in regard to predicting walks. Using my best intuition, I want to see how well a pitcher’s walk rate (BB%) actually correlates with what his walk rate should be (expected BB%, henceforth “xBB%”). Similarly to xK%, I used my intuition to best identify reliable indicators of a pitcher’s true walk rate using readily available data.

An xBB% metric, like xK%, would not only if a pitcher perennially over-performs (or under-performs) his walk rate but also if he happened to do so on a given year. This article will conclude by looking at how the difference in actual and expected walk rates (BB – xBB%) varied between 2014 and career numbers, lending some insight into the (un)luckiness of each pitcher.

Courtesy of FanGraphs, I constructed another set of pitching data spanning 2010 through 2014. This time, I focused primarily on what I thought would correlate with walk rate: inability to pitch in the zone and inability to incur swings on pitches out of the zone. I also throw in first-pitch strike rate: I predict that counts that start with a ball are more likely to end in a walk than those that start with a strike. Because FanGraphs’ data measures ability rather than inability — “Zone%” measures how often a pitcher hits the zone; “O-Swing%” measures how often batters swing at pitches out of the zone; “F-Strike%” measures the rate of first-pitch strikes — each variable should have a negative coefficient attached to it.

I specify a handful of variations before deciding on a final version. Instead of using split-season data (that is, each pitcher’s individual seasons from 2010 to 2014) for qualified pitchers, I use aggregated statistics because the results better fit the data by a sizable margin. This surprised me because there were about half as many observations, but it’s also not surprising because each observation is, itself, a larger sample size than before.

At one point, I tried creating my own variable: looks (non-swings) at pitches out of the zone. I created a variable by finding the percentage of pitches out of the zone (1 – Zone%) and multiplied it by how often a batter refused to swing at them (1 – O-Swing%). This version of the model predicted a nice fit, but it was slightly worse than leaving the variables separated. Also, I ran separate-but-equal regressions for PITCHf/x data and FanGraphs’ own data. The PITCHf/x data appeared to be slightly more accurate, so I proceeded using them.

The graph plots actual walk rates versus expected walk rates. The regression yielded the following equation:

xBB% = .3766176 – .2103522*O-Swing%(pfx) – .1105723*Zone%(pfx) – .3062822*F-Strike%
R-squared = .6433

Again, R-squared indicates how well the model fits the data. An R-squared of .64 is not as exciting as the R-squared I got for xK%; it means the model predicts about 64 percent of the fit, and 36 percent is explained by things I haven’t included in the model. Certainly, more variables could help explain xBB%. I am already considering combining FanGraphs’ PITCHf/x data with some of Baseball Reference‘s data, which does a great job of keeping track of the number of 3-0 counts, four-pitch walks and so on.

And again, for the reader to use the equation above to his or her benefit, one would plug in the appropriate values for a player in a given season or time frame and determine his xBB%. Then one could compare the xBB% to single-season or career BB% to derive some kind of meaningful results. And (one more) again, I have already taken the liberty of doing this for you.

Instead of including every pitcher from the sample, I narrowed it down to only pitchers with at least three years’ worth of data in order to yield some kind of statistically significant results. (Note: a three-year sample is a small sample, but three individual samples of 160+ innings is large enough to produce some arguably robust results.) “Avg BB% – xBB%” (or “diff%”) takes the average of a pitcher’s difference between actual and expected walk rates from 2010 to 2014. It indicates how well (or poorly) he performs compared to his xBB%: the lower a number, the better. This time, I included “t-score”, which measures how reliable diff% is. The key value here is 1.96; anything greater than that means his diff% is reliable. (1.00 to 1.96 is somewhat reliable; anything less than 1.00 is very unreliable.) Again, this is slightly problematic because there are five observations (years) at most, but it’s the best and simplest usable indicator of simplicity.

Thus, Mark Buehrle, Mike Leake, Hiroki Kuroda, Doug Fister, Tim Hudson, Zack Greinke, Dan Haren and Bartolo Colon can all reasonably be expected to consistently out-perform their xBB% in any given year. Likewise, Aaron Harang, Colby Lewis, Ervin Santana and Mat Latos can all reasonably be expected to under-perform their xBB%. For everyone else, their diff% values don’t mean a whole lot. For example, R.A. Dickey‘s diff% of +0.03% doesn’t mean he’s more likely than someone else to pitch exactly as good as his xBB% predicts him to; in fact, his standard deviation (StdDev) of 0.93% indicates he’s less likely than just about anyone to do so. (What it really means is there is only a two-thirds chance his diff% will be between -0.90% and +0.96%.)

As with xK%, I compiled a list of fantasy-relevant starters with only two years’ worth of data that see sizable fluctuations between 2013 and 2014. Their data, at this point, is impossible (nay, ill-advised) to interpret now, but it is worth monitoring.

Name: [2013 diff%, 2014 diff%]

Miller is an interesting case: he was atrociously bad about gifting free passes in 2014, but his diff% was only marginally worse than it was in 2013. It’s possible that he was a smart buy-low for the braves — but it’s also possible that Miller not only perennially under-performs his xBB% but is also trending in the wrong direction.

Here are fantasy-relevant players with a) only 2014 data, and b) outlier diff% values:

I’m not gonna lie, I have no idea why Cobb, Corey Kluber and others show up as only having one year of data when they have two in the xK% dataset. This is something I noticed now. Their exclusion doesn’t fundamentally change the model’s fit whatsoever because it did not rely on split-season data; I’m just curious why it didn’t show up in FanGraphs’ leaderboards. Oh well.

Implications: Richards and Roark perhaps over-performed. Meanwhile, it’s possible that Odorizzi, Ross  and Ventura will improve (or regress) compared to last year. I’m excited about all of that. Richards will probably be pretty over-valued on draft day.

# Need some streamers? T. Ross, Hutchison, Colon, Wood

I’ve been slacking on my streamer picks, so let’s cut straight to the chase.

Today, 5/15:
Tyson Ross, SD @ CIN
Mr. Ross is the real deal, my friends. He’s 10th of all pitchers in batters’ contact on pitches in the zone, sandwiched between the unfamiliar names of Jose Fernandez and Zack Greinke (and the players who precede him include Michael Wacha, Yordano Ventura, Julio Teheran and Max Scherzer). He doesn’t make batters chase pitches at an overwhelming rate, but they make contact on such pitches only half the time, which ranks Ross fourth only to Ervin Santana, Garrett Richards and Masahiro Tanaka. At 7.99 K/9, his K-rate should actually improve. You can really only bash him for his walk rate, but it’s no worse than Gio Gonzalez or Justin Verlander. I don’t care if it’s a road game; Ross should be owned in all leagues at this point.

Friday, 5/16:
Drew Hutchison, TOR @ TEX
I’ll be honest with you: I’m not totally sold on this matchup. Hutchison hasn’t been very impressive, but there are simply not many matchups worth exploiting on Friday. I like Hutchison for his strikeouts, and before his last start (during which he walked four), he had only walked five guys across 32-1/3 innings. His control escaped him, but if it comes back, he should be able to control a miserable Texas offense that ranks 26th of 30 teams in extra-base hits.

Saturday, 5/17:
Bartolo Colon, NYM @ WAS
Again, not crazy about this one, either. But Colon has been incredibly unlucky. The dude is walking fewer than a batter per nine innings (0.9 BB/9), so all the baserunners (and, consequently, earned runs) he has allowed are a largely a function of an elevated batting average on balls in play (BABIP). It’s hard to trust a guy who’s mired in a slump, but the luck should eventually turn in his favor. Who’s to say it won’t be this weekend? I’d take a chance. The Nationals don’t score a ton of runs, either. It’s not the best play, but it’s safer than most.

Sunday, 5/18:
Travis Wood, CHC vs. MIL
After a hot start, albeit a brief one, Wood has since collapsed in spectacular fashion, sporting a 4.91 ERA and 1.43 WHIP. So why would I ever vouch for this guy? Check out his home-road splits:

Home 2 1 .667 2.39 4 26.1 22 7 2 4 32 0.987 10.9 8.00
Away 1 3 .250 8.02 4 21.1 31 19 2 11 12 1.969 5.1 1.09
Provided by Baseball-Reference.com: View Original Table
Generated 5/15/2014.

The splits are ridiculous. They speak for themselves, although I’ll highlight the ones that are most impressive. With that said, he’s starting at home. Enough said.

Good luck and happy streaming!

# Tigers and Pirates continue to puzzle; Mets gearing up

Nothing looked unusual when the Detroit Tigers traded first baseman Prince Fielder to the Boston Red Sox for second baseman Dustin Pedroia, despite the trade being very high-profile. It appeared as if the Tigers were clearing up salary space to sign starting pitcher and 2013 Cy Young winner Max Scherzer to a long-term deal. Instead, they dealt pitcher Doug Fister and signed outfielder Rajai Davis and former Yankee reliever Joba Chamberlain (great last name, by the way) for depth. So… now what? The salary they freed up has been spent, and all the moves made have been lackluster. And, in a latest turn of events, Scherzer is on the market. What the heck is going on?

(Although, honestly, I think Scherzer’s value peaked in 2013. Dude had control issues his whole career until the 2012 All-Star Break, and he’s about to enter the latter half of his career. 0

The Bucs have been worse. The Tigers’ moves have been sensible; the Pirates moves have been indefensible. Charlie Morton for three years? Edinson Volquez for one year? These guys are rotation fillers who expect to not contend. These are not the moves a contending team makes. Unfortunately, it appears they’re sold on Morton’s illusory 2013, and unless Volquez is merely for depth (beyond a No. 5 starter), this is money wasted.

Meanwhile, the New York Mets may fancy themselves contenders.

NYM sign OF Curtis Granderson
I didn’t realize the Grandy Man was so divisive. I guess Yankees fans are bitter or something. Maybe I’m overexposed to a microcosm of the Yankee-Red Sox rivalry. Regardless, four years, \$50 million for a proven power hitter and decent defensive outfielder ain’t bad. I like it a whole lot more than the Jacoby Ellsbury signing, based mostly on the length. The Mets think they’ll contend, and while I think they won’t realistically do it until 2015 or later, they plan to make a 2013-Kansas-City-Royals-type of splash next season. Either that, or it’ll be a Blue Jays-caliber flop, but without the hype, so it won’t be as bad.

Winner: Mets
Preseason rank: Top-50 OF

NYM sign SP Big Fat Bartolo Colon
BFB revived his career, got caught with steroids, then continued to impress afterward. I have no idea how he does it, because metrics all point to some sort of regression, but his excellent command of his fastball must keep him afloat. (Other than, well, all his fat. OK, that was mean. Sorry!) Two years isn’t bad, especially if the Mets think they’ll contend this year… But I really don’t. But 2015? Maybe. World Series team? Probably not. So I don’t know. And, again, I can’t imagine Colon will repeat his 2012 and 2013. But who knows? He could be even better. Baseball is a funny sport. As far as fantasy baseball implications go, he’s going to arguably a worse team, and his strikeout rate is, well, pretty miserable. He’s a three-category contributor at best, but if he regresses, it could be more like zero categories.

Winner: Colon
Preseason rank: 69th

# Early SP rankings for 2014

I wouldn’t say pitching is deep, but I’m surprised by the pitchers who didn’t make my top 60.

Note: I have deemed players highlighted in pink undervalued and worthy of re-rank. Do not be alarmed just yet by what you may perceive to be a low ranking.

# A look at how run support affects a pitcher’s value

Some pitchers get better run support than others. It separates the fantasy studs from the fantasy duds, turns nobodies into somebodies and sometimes silences ace pitchers. Remember Cliff Lee‘s dismal 6-9 record last year despite his 3.05 ERA?

I won’t call them luckiest, for all these pitchers are plenty talented. So let’s say… run supportiest. Take a look at the run supportiest pitchers this year, followed by their average run support per game:

1. Max Scherzer, 7.64
2. Jeremy Hellickson, 6.70
3. Justin Verlander, 6.64
4. Anibal Sanchez, 6.57
5. Ryan Dempster, 6.38
6. Bartolo Colon, 6.22
7. Chris Tillman, 6.18
8. Matt Moore, 6.16
9. Lance Lynn, 6.00
10. Mike Minor, 6.00

Well, look at that. Mr. 15-game winner Max Scherzer is at the top of the list, and by no small margin. Without digging further, it’s important to make some distinctions. The average team scores approximately 4.20 runs per game, but no team is the average team. Although the Boston Red Sox lead the majors in scoring, it’s Scherzer’s own Detroit Tigers who lead in runs scored per game at 5.18 runs. It probably comes as no surprise that the Miami Marlins are last in runs scored at 3.19 per game, almost a full two runs fewer than the Tigers.

Part of the strategy in fantasy baseball is finding not necessarily the best pitchers but the above-average pitchers on good teams who will naturally get a lot of run support. Ryan Dempster isn’t having a great season by measure of his 4.54 ERA, but playing for the Red Sox certain bolsters his chances of collecting wins without having lights-out stuff. (Unfortunately, it hasn’t worked out that way for Dempster, notching only six wins.)

Instead of looking at the top 10 run supportiest pitchers in nominal terms, we ought to normalize the list by taking the difference between the pitchers’ run support and the average runs scored by their teams. The new list looks like this:

1. Max Scherzer, 2.46
2. Jeremy Hellickson, 2.09
3. Bartolo Colon, 1.77
4. Yovani Gallardo, 1.61
5. Matt Moore, 1.51
6. Hyun-Jin Ryu, 1.54
7. Chris Tillman, 1.50
8. Mike Minor, 1.47
9. Yu Darvish, 1.46
10. Justin Verlander, 1.46

The number following each name is the difference between the pitcher’s run support and his team’s average runs scored per game. Scherzer and Tampa Bay Rays pitcher Jeremy Hellickson lead the list again, but some new names popped up: Yovani GallardoHyun-Jin Ryu and Yu Darvish. The 10 pitchers above have combined for 115 wins, or 11.5 wins on average. Even Gallardo has eight wins despite having the eighth worst ERA of all qualified starters.

This list serves two purposes, although both aren’t immediately valuable: 1) although most of these pitchers are pitching well, don’t be surprised if they win less often as their run support regresses toward the mean; 2) if you’re in a dynasty league. don’t bank on a potential 20-game winner to do it again next year, especially if he’s the beneficiary of randomly elevated run support.

In contrast, here are the 10 least run-supportiest pitchers (relative to average team run support like the previous list):

1. Chris Sale, -1.22
2. Homer Bailey, -1.19
3. Kris Medlen, -1.00
4. Eric Stults, -0.99
5. A.J. Burnett, -0.88
6. Joe Blanton, -0.82
7. Roberto Hernandez, -0.78
8. Julio Teheran, -0.75
9. John Lackey, -0.75
10. Travis Wood, -0.74

The above pitchers have combined for only 61 wins, or 6.1 wins on average, a far cry from 115 wins (11.5 average) posted by the top 10 run supportiest pitchers. These pitchers don’t throw for terrible teams, either — six of them play for contenders, or call it seven if you’re a hopeless Angels fan.

(Interjecting some notes: Red Sox starter John Lackey is having a renaissance season, and it looks like he has nobody but himself to thank for his seven wins; Chicago Cubs starter Travis Wood is having a breakout year despite a lack of run support; I just want a reason to say “the artist formerly known as Fausto Carmona”; if I’m in a dynasty league, I’m gunning for Cincinnati Reds starter Homer Bailey, who would be having a breakout season ,piggybacking on his very solid second half of 2012, if it were not for his miserable run support… he ought to have better stats to go with his 1.14 WHIP.)

My takeaway from all of this, again, is as much predictive as it is descriptive. If I had to offer bits of advice based on what I’ve presented some of it would be the following:

• Buy low on A.J. Burnett, who is 4-7 with a sub-3.00 ERA playing for the NL Central-leading Pittsburgh Pirates…
• Do the same for Lackey, who shows no signs of slowing down…
• Sell high on Tampa Bay Rays pitcher Jeremy Hellickson, who is sporting a career-worst ERA and is being buoyed by his win total…
• I’d even venture to say sell high on Los Angeles Dodgers pitcher Hyun-Jin Ryu and Baltimore Orioles pitcher Chris Tillman, who are both benefiting from high strand rates even amid seasons I would classify as underwhelming…
• And I’d even sell high on Big Fat Bartolo Colon, who simply won’t keep winning every game and has a lackluster strikeout rate…
• Remember these names during your draft next year! Run support can fluctuate randomly and wildly year to year. Just ask Cliff Lee.

# Slider fuels Justin Masterson’s success

I recently wondered aloud, in one of my league’s chat rooms, why Cleveland Indians pitcher Justin Masterson is having a renaissance season, noting his wildly improved strikeout rate. Another owner (who is conveniently a Red Sox fan and therefore semi-invested in knowing how Masterson does for the rest of his career) told me as long as Masterson is still a sinker-ball pitcher, he will revert back to his old ways. I assumed he would be right. The strikeouts really came out of nowhere. And I mean, shoot, I didn’t even know Masterson was a sinker ball pitcher before my friend mentioned it.

Still, I decided to look into the matter to see what’s really going on. The following, thanks to FanGraphs’ compilation of PITCHf/x data, revealed the following: Masterson has saved the most runs of any pitcher this year using his slider. (His slider has saved the third-most runs proportionate to his total pitches thrown, behind the very legit Jose Fernandez and very fat Bartolo Colon.)

Which made me wonder further: Has he always been this good with his slider? Has he always thrown it this often? Besides, the PITCHf/x technology is relatively new (installed in 2006), and FanGraphs admits it may occasionally mistake pitches thrown by a pitcher. For example, Masterson’s sinker could perhaps be mistaken for a slider.

The answer to the questions I posed, however, is a resounding no. He has thrown his slider this year 8.4 percent more often than last year at an average of 8.5 mph slower than his sinker ball, so there should be no mistakes made there. On top of this, he is recording career bests in not only swinging strikes but also swings and misses outside the strike zone.

Ipso facto, his slider has been filthy this year. And the swinging strikes and out-of-the-zone whiffs attest to his 9.18 K/9, good for 14th of all qualified starters.

FanGraphs specifically warns of the descriptive, and not predictive, nature of PITCHf/x data. Deviations from a certain benchmark for PITCHf/x data do not necessarily indicate bad luck, good luck, pending regression or anything like that. However, the descriptive nature of the metrics on Masterson’s slider indicate he can (probably) attribute his bounce-back year to his slider. And if you’re looking for something predictive, I predict if Masterson continues to use his slider and hone his sinker, he will continue to be effective.