# Predicting pitchers’ walks using xBB%

The other day, I discussed predicting pitchers’ strikeout rates using xK%. I will conduct the same exercise today in regard to predicting walks. Using my best intuition, I want to see how well a pitcher’s walk rate (BB%) actually correlates with what his walk rate should be (expected BB%, henceforth “xBB%”). Similarly to xK%, I used my intuition to best identify reliable indicators of a pitcher’s true walk rate using readily available data.

An xBB% metric, like xK%, would not only if a pitcher perennially over-performs (or under-performs) his walk rate but also if he happened to do so on a given year. This article will conclude by looking at how the difference in actual and expected walk rates (BB – xBB%) varied between 2014 and career numbers, lending some insight into the (un)luckiness of each pitcher.

Courtesy of FanGraphs, I constructed another set of pitching data spanning 2010 through 2014. This time, I focused primarily on what I thought would correlate with walk rate: inability to pitch in the zone and inability to incur swings on pitches out of the zone. I also throw in first-pitch strike rate: I predict that counts that start with a ball are more likely to end in a walk than those that start with a strike. Because FanGraphs’ data measures ability rather than inability — “Zone%” measures how often a pitcher hits the zone; “O-Swing%” measures how often batters swing at pitches out of the zone; “F-Strike%” measures the rate of first-pitch strikes — each variable should have a negative coefficient attached to it.

I specify a handful of variations before deciding on a final version. Instead of using split-season data (that is, each pitcher’s individual seasons from 2010 to 2014) for qualified pitchers, I use aggregated statistics because the results better fit the data by a sizable margin. This surprised me because there were about half as many observations, but it’s also not surprising because each observation is, itself, a larger sample size than before.

At one point, I tried creating my own variable: looks (non-swings) at pitches out of the zone. I created a variable by finding the percentage of pitches out of the zone (1 – Zone%) and multiplied it by how often a batter refused to swing at them (1 – O-Swing%). This version of the model predicted a nice fit, but it was slightly worse than leaving the variables separated. Also, I ran separate-but-equal regressions for PITCHf/x data and FanGraphs’ own data. The PITCHf/x data appeared to be slightly more accurate, so I proceeded using them.

The graph plots actual walk rates versus expected walk rates. The regression yielded the following equation:

xBB% = .3766176 – .2103522*O-Swing%(pfx) – .1105723*Zone%(pfx) – .3062822*F-Strike%
R-squared = .6433

Again, R-squared indicates how well the model fits the data. An R-squared of .64 is not as exciting as the R-squared I got for xK%; it means the model predicts about 64 percent of the fit, and 36 percent is explained by things I haven’t included in the model. Certainly, more variables could help explain xBB%. I am already considering combining FanGraphs’ PITCHf/x data with some of Baseball Reference‘s data, which does a great job of keeping track of the number of 3-0 counts, four-pitch walks and so on.

And again, for the reader to use the equation above to his or her benefit, one would plug in the appropriate values for a player in a given season or time frame and determine his xBB%. Then one could compare the xBB% to single-season or career BB% to derive some kind of meaningful results. And (one more) again, I have already taken the liberty of doing this for you.

Instead of including every pitcher from the sample, I narrowed it down to only pitchers with at least three years’ worth of data in order to yield some kind of statistically significant results. (Note: a three-year sample is a small sample, but three individual samples of 160+ innings is large enough to produce some arguably robust results.) “Avg BB% – xBB%” (or “diff%”) takes the average of a pitcher’s difference between actual and expected walk rates from 2010 to 2014. It indicates how well (or poorly) he performs compared to his xBB%: the lower a number, the better. This time, I included “t-score”, which measures how reliable diff% is. The key value here is 1.96; anything greater than that means his diff% is reliable. (1.00 to 1.96 is somewhat reliable; anything less than 1.00 is very unreliable.) Again, this is slightly problematic because there are five observations (years) at most, but it’s the best and simplest usable indicator of simplicity.

Thus, Mark Buehrle, Mike Leake, Hiroki Kuroda, Doug Fister, Tim Hudson, Zack Greinke, Dan Haren and Bartolo Colon can all reasonably be expected to consistently out-perform their xBB% in any given year. Likewise, Aaron Harang, Colby Lewis, Ervin Santana and Mat Latos can all reasonably be expected to under-perform their xBB%. For everyone else, their diff% values don’t mean a whole lot. For example, R.A. Dickey‘s diff% of +0.03% doesn’t mean he’s more likely than someone else to pitch exactly as good as his xBB% predicts him to; in fact, his standard deviation (StdDev) of 0.93% indicates he’s less likely than just about anyone to do so. (What it really means is there is only a two-thirds chance his diff% will be between -0.90% and +0.96%.)

As with xK%, I compiled a list of fantasy-relevant starters with only two years’ worth of data that see sizable fluctuations between 2013 and 2014. Their data, at this point, is impossible (nay, ill-advised) to interpret now, but it is worth monitoring.

Name: [2013 diff%, 2014 diff%]

Miller is an interesting case: he was atrociously bad about gifting free passes in 2014, but his diff% was only marginally worse than it was in 2013. It’s possible that he was a smart buy-low for the braves — but it’s also possible that Miller not only perennially under-performs his xBB% but is also trending in the wrong direction.

Here are fantasy-relevant players with a) only 2014 data, and b) outlier diff% values:

I’m not gonna lie, I have no idea why Cobb, Corey Kluber and others show up as only having one year of data when they have two in the xK% dataset. This is something I noticed now. Their exclusion doesn’t fundamentally change the model’s fit whatsoever because it did not rely on split-season data; I’m just curious why it didn’t show up in FanGraphs’ leaderboards. Oh well.

Implications: Richards and Roark perhaps over-performed. Meanwhile, it’s possible that Odorizzi, Ross  and Ventura will improve (or regress) compared to last year. I’m excited about all of that. Richards will probably be pretty over-valued on draft day.

# Pitchers to sell high, buy low or cut bait

All right. It’s April. It’s horrifying, unless you’re doing well, and then it’s not. But, full disclosure, I’m not. Chicago White Sox staff ace Chris Sale just hit the 15-day disabled list yesterday, joining the Philadelphia Phillies’ Cole Hamels, Seattle Mariners’ James Paxton, Tampa Bay Rays’ Alex Cobb, Cincinnati Reds’ Mat Latos, New York Yankees’ David Robertson and the Detroit Tigers’ Doug Fister on my teams’ DLs. It’s killing me, really. It’s incredibly painful.

What I’m saying is I’ve spent more time than I’d care to admit frolicking in free agency, trying to figure out which early-season studs are legit or not. I’ve been pondering various buy-low situations as well. So I jumped into a pool of peripherals and PITCHf/x data to look for answers.

The list below is not remotely exhaustive. It’s mostly players I am watching or already using as replacements for my teams. Here they are, in no particular order.

Jake Peavy, BOS | 0-0, 3.33 ERA, 1.48 WHIP, 9.25 K/9
Peavy’s prime came and went about five years ago, so, full disclosure, I don’t know as much about him off the top of my head as I should. But I do know one thing: he doesn’t strike out a batter per inning anymore. In his defense, batters’ contact rate against him is the best it has been since 2009, his last truly good year. So maybe he will strike out a few more batters than last year, but I think it’ll be closer to 2012’s 7.97 K/9, not 2009’s 9.74 K/9. The WHIP is atrocious;  the walk rate is through the roof. If there’s a guy in your league who will pay for what will end up being the illusion of ERA and strikeouts, by all means, trade him. He’s owned in 100 percent of leagues but doesn’t deserve to be.
Verdict: Sell high

John Lackey, BOS | 2-2, 5.25 ERA, 1.46 WHIP, 8.63 K/9
Another Boston pitcher, another bad start to the season. I like Lackey a lot more, though, for a variety of reasons. One, last year’s renaissance was legitimate. Two, he’s not walking many batters right now, so his unspectacular ratios are more a result of an unlucky batting average on balls in play (.333 BABIP) than incompetence. Three, his swinging strike and contact rates are currently career bests. Again, we’re working with small sample sizes here, and this could easily regress. But considering his velocity is also at a career high, I don’t find it improbable that Lackey actually does better than he did last season. If an owner in your league has already dropped him, put in your waiver claim now.

Jesse Chavez, OAK | 1-0, 1.38 ERA, 0.92 WHIP, 9.69 K/9
Talk about unexpected. Chavez, who has been relevant about zero times, is making for an intriguing play in all leagues. It’s a given he will regress, especially considering the .242 BABIP, but his improved walk rate could be here to stay, as he is pounding the zone more than he ever has in his career. The strikeouts are somewhat of a mirage, but it looks like he can be a low-WHIP, moderate-strikeout guy, and that’s still valuable.
Verdict: Sell really high, or just ride the hot hand

Nathan Eovaldi, MIA | 1-1, 3.55 ERA, 1.14 WHIP, 8.17 K/9
I wouldn’t call Eovaldi a trendy sleeper, but he certainly was a sleeper coming into 2014. It was all about whether he could command his pitchers better — and, like magic, it appears he has, walking only 1.07 batters per nine innings as opposed to 3.39-per-nine last year. The swinging strike and contact rates are concerning, as they are the lowest of his career, so it’s hard to see his strikeout rate going anywhere but down. However, he’s throwing 65 percent of his pitches in the strike zone, highest of all qualified pitchers. So there are two ways to look at this. His control has probably legitimate improved. Unfortunately, even the masterful Cliff Lee only threw 53.3 percent of pitches in the zone last year, and I am hesitant to claim Eovaldi has better control than Lee. This could be a “breakout” year of sorts for Eovaldi, but I’m using that term liberally here. He’s only owned in 20.5 percent of leagues, so this makes him more of a ride-the-hot-hand type, like Mr. Chavez above.
Verdict: Eventually drop, ideally before he does damage to your team

Mark Buehrle, TOR | 4-0, 0.64 ERA, 0.93 WHIP, 6.11 K/9
Look, I have had a long-standing man crush on Buehrle, but this is ridiculous. You know better than I that these happy dreams will soon become nightmares, not because Buehrle is awful or anything, but because regression rears its head in occasionally very brutal ways.
Verdict: Sell high

Alfredo Simon, CIN | 0.86 ERA, 0.81 WHIP, 5.57 K/9
Something isn’t right here. A 0.81 WHIP and… fewer than six strikeouts per nine innings? As you become more familiar with sabermetrics, you quickly realize certain things don’t mesh. A low WHIP combined with the low strikeout rate is one of those things. I can tell you without looking that his BABIP is impossibly low — and, now looking, I see I’m right: it’s .197. Tristan H. Cockcroft of ESPN is all about Simon, and in his defense, Simon’s PITCHf/x data foreshadows some positive regression coming his way in the strikeout department. But it can only get worse from here for Simon. However, I think he has a bit of a Dan Straily look to him, and that’s certainly serviceable.
Verdict: Sell high, or just ride the hot hand

Yovani Gallardo, MIL | 1.46 ERA, 1.09 WHIP, 6.93 K/9
This is a disaster waiting to happen. Like Simon, his strikeout rate is low, but for Gallardo, it is deservedly so: his swinging strike and contact rates are, by far, career worsts. Meanwhile, his ratios are buoyed by a .264 BABIP and 89.8% LOB% (left-on-base percentage), despite his 74.7% career LOB%. The Brewers will fall with him. Sell high, and sell fast.
Verdict: Sell high

Shelby Miller, STL | 3.57 ERA, 1.50 WHIP, 8.34 K/9
Miller is the first pitcher on this list in whom owners actually invested a lot. Be patient. The 98.3-percent of owners who didn’t cut bait before his last start were surely rewarded. I imagine he’s leaving his pitches up in the zone, given his increased percentage of pitches thrown in the zone coupled with his home run rate. Speaking of which, he shouldn’t be walking five batters per nine innings when he’s throwing more than 50 percent of his pitches in the zone. He’ll be fine.