# Predicting pitchers’ walks using xBB%

The other day, I discussed predicting pitchers’ strikeout rates using xK%. I will conduct the same exercise today in regard to predicting walks. Using my best intuition, I want to see how well a pitcher’s walk rate (BB%) actually correlates with what his walk rate should be (expected BB%, henceforth “xBB%”). Similarly to xK%, I used my intuition to best identify reliable indicators of a pitcher’s true walk rate using readily available data.

An xBB% metric, like xK%, would not only if a pitcher perennially over-performs (or under-performs) his walk rate but also if he happened to do so on a given year. This article will conclude by looking at how the difference in actual and expected walk rates (BB – xBB%) varied between 2014 and career numbers, lending some insight into the (un)luckiness of each pitcher.

Courtesy of FanGraphs, I constructed another set of pitching data spanning 2010 through 2014. This time, I focused primarily on what I thought would correlate with walk rate: inability to pitch in the zone and inability to incur swings on pitches out of the zone. I also throw in first-pitch strike rate: I predict that counts that start with a ball are more likely to end in a walk than those that start with a strike. Because FanGraphs’ data measures ability rather than inability — “Zone%” measures how often a pitcher hits the zone; “O-Swing%” measures how often batters swing at pitches out of the zone; “F-Strike%” measures the rate of first-pitch strikes — each variable should have a negative coefficient attached to it.

I specify a handful of variations before deciding on a final version. Instead of using split-season data (that is, each pitcher’s individual seasons from 2010 to 2014) for qualified pitchers, I use aggregated statistics because the results better fit the data by a sizable margin. This surprised me because there were about half as many observations, but it’s also not surprising because each observation is, itself, a larger sample size than before.

At one point, I tried creating my own variable: looks (non-swings) at pitches out of the zone. I created a variable by finding the percentage of pitches out of the zone (1 – Zone%) and multiplied it by how often a batter refused to swing at them (1 – O-Swing%). This version of the model predicted a nice fit, but it was slightly worse than leaving the variables separated. Also, I ran separate-but-equal regressions for PITCHf/x data and FanGraphs’ own data. The PITCHf/x data appeared to be slightly more accurate, so I proceeded using them.

The graph plots actual walk rates versus expected walk rates. The regression yielded the following equation:

xBB% = .3766176 – .2103522*O-Swing%(pfx) – .1105723*Zone%(pfx) – .3062822*F-Strike%
R-squared = .6433

Again, R-squared indicates how well the model fits the data. An R-squared of .64 is not as exciting as the R-squared I got for xK%; it means the model predicts about 64 percent of the fit, and 36 percent is explained by things I haven’t included in the model. Certainly, more variables could help explain xBB%. I am already considering combining FanGraphs’ PITCHf/x data with some of Baseball Reference‘s data, which does a great job of keeping track of the number of 3-0 counts, four-pitch walks and so on.

And again, for the reader to use the equation above to his or her benefit, one would plug in the appropriate values for a player in a given season or time frame and determine his xBB%. Then one could compare the xBB% to single-season or career BB% to derive some kind of meaningful results. And (one more) again, I have already taken the liberty of doing this for you.

Instead of including every pitcher from the sample, I narrowed it down to only pitchers with at least three years’ worth of data in order to yield some kind of statistically significant results. (Note: a three-year sample is a small sample, but three individual samples of 160+ innings is large enough to produce some arguably robust results.) “Avg BB% – xBB%” (or “diff%”) takes the average of a pitcher’s difference between actual and expected walk rates from 2010 to 2014. It indicates how well (or poorly) he performs compared to his xBB%: the lower a number, the better. This time, I included “t-score”, which measures how reliable diff% is. The key value here is 1.96; anything greater than that means his diff% is reliable. (1.00 to 1.96 is somewhat reliable; anything less than 1.00 is very unreliable.) Again, this is slightly problematic because there are five observations (years) at most, but it’s the best and simplest usable indicator of simplicity.

Thus, Mark Buehrle, Mike Leake, Hiroki Kuroda, Doug Fister, Tim Hudson, Zack Greinke, Dan Haren and Bartolo Colon can all reasonably be expected to consistently out-perform their xBB% in any given year. Likewise, Aaron Harang, Colby Lewis, Ervin Santana and Mat Latos can all reasonably be expected to under-perform their xBB%. For everyone else, their diff% values don’t mean a whole lot. For example, R.A. Dickey‘s diff% of +0.03% doesn’t mean he’s more likely than someone else to pitch exactly as good as his xBB% predicts him to; in fact, his standard deviation (StdDev) of 0.93% indicates he’s less likely than just about anyone to do so. (What it really means is there is only a two-thirds chance his diff% will be between -0.90% and +0.96%.)

As with xK%, I compiled a list of fantasy-relevant starters with only two years’ worth of data that see sizable fluctuations between 2013 and 2014. Their data, at this point, is impossible (nay, ill-advised) to interpret now, but it is worth monitoring.

Name: [2013 diff%, 2014 diff%]

Miller is an interesting case: he was atrociously bad about gifting free passes in 2014, but his diff% was only marginally worse than it was in 2013. It’s possible that he was a smart buy-low for the braves — but it’s also possible that Miller not only perennially under-performs his xBB% but is also trending in the wrong direction.

Here are fantasy-relevant players with a) only 2014 data, and b) outlier diff% values:

I’m not gonna lie, I have no idea why Cobb, Corey Kluber and others show up as only having one year of data when they have two in the xK% dataset. This is something I noticed now. Their exclusion doesn’t fundamentally change the model’s fit whatsoever because it did not rely on split-season data; I’m just curious why it didn’t show up in FanGraphs’ leaderboards. Oh well.

Implications: Richards and Roark perhaps over-performed. Meanwhile, it’s possible that Odorizzi, Ross  and Ventura will improve (or regress) compared to last year. I’m excited about all of that. Richards will probably be pretty over-valued on draft day.

# Need some streamers? T. Ross, Hutchison, Colon, Wood

I’ve been slacking on my streamer picks, so let’s cut straight to the chase.

Today, 5/15:
Tyson Ross, SD @ CIN
Mr. Ross is the real deal, my friends. He’s 10th of all pitchers in batters’ contact on pitches in the zone, sandwiched between the unfamiliar names of Jose Fernandez and Zack Greinke (and the players who precede him include Michael Wacha, Yordano Ventura, Julio Teheran and Max Scherzer). He doesn’t make batters chase pitches at an overwhelming rate, but they make contact on such pitches only half the time, which ranks Ross fourth only to Ervin Santana, Garrett Richards and Masahiro Tanaka. At 7.99 K/9, his K-rate should actually improve. You can really only bash him for his walk rate, but it’s no worse than Gio Gonzalez or Justin Verlander. I don’t care if it’s a road game; Ross should be owned in all leagues at this point.

Friday, 5/16:
Drew Hutchison, TOR @ TEX
I’ll be honest with you: I’m not totally sold on this matchup. Hutchison hasn’t been very impressive, but there are simply not many matchups worth exploiting on Friday. I like Hutchison for his strikeouts, and before his last start (during which he walked four), he had only walked five guys across 32-1/3 innings. His control escaped him, but if it comes back, he should be able to control a miserable Texas offense that ranks 26th of 30 teams in extra-base hits.

Saturday, 5/17:
Bartolo Colon, NYM @ WAS
Again, not crazy about this one, either. But Colon has been incredibly unlucky. The dude is walking fewer than a batter per nine innings (0.9 BB/9), so all the baserunners (and, consequently, earned runs) he has allowed are a largely a function of an elevated batting average on balls in play (BABIP). It’s hard to trust a guy who’s mired in a slump, but the luck should eventually turn in his favor. Who’s to say it won’t be this weekend? I’d take a chance. The Nationals don’t score a ton of runs, either. It’s not the best play, but it’s safer than most.

Sunday, 5/18:
Travis Wood, CHC vs. MIL
After a hot start, albeit a brief one, Wood has since collapsed in spectacular fashion, sporting a 4.91 ERA and 1.43 WHIP. So why would I ever vouch for this guy? Check out his home-road splits:

Home 2 1 .667 2.39 4 26.1 22 7 2 4 32 0.987 10.9 8.00
Away 1 3 .250 8.02 4 21.1 31 19 2 11 12 1.969 5.1 1.09
Provided by Baseball-Reference.com: View Original Table
Generated 5/15/2014.

The splits are ridiculous. They speak for themselves, although I’ll highlight the ones that are most impressive. With that said, he’s starting at home. Enough said.

Good luck and happy streaming!

# Ten bargain starters outside my top 60

The idea is simple: In a standard 10-team mixed league, an owner is allotted six spots to fill with starting pitchers. That relegates everyone else drafted No. 61 and higher to fantasy benches or free agency.

That doesn’t mean pitchers drafted outside the top 60 are worse than pitchers in the top 60. You can find good pitchers up until the 60th pick — heck, it’s the Brewers’ Marco Estrada, who has excellent control and solid strikeout numbers — but as many as a third of those 60 are risky are overvalued. Value bleeds into the late rounds  and it’s worth figuring out who’s worth reaching for, despite pitchers with better ADPs (average draft positions) still on the board, and who’s worth waiting for.

I’ll discuss a handful of pitchers I like outside my top 60, in order of ESPN ADP.

Lackey had a renaissance 2013, coming back from a lost 2012 and miserable 2011. The strikeout and walk rates were second-best and best of his career, respectively, and there’s little reason to think he’ll crumble overnight. He’s less risky than Dan Haren (about whom I’ve been vocal about my distrust), who is being drafted 49th of starting pitchers, or Dan Straily, going 56th, who is honestly mediocre. He’s enough to fill the back of your rotation, let alone a bench spot.

Wood is a control artist, and the Braves simply know how to develop pitchers. Scouts and experts are excited about him; I don’t know why he’s not getting more draft love. He’s guaranteed a rotation spot, due to the rash of injuries to Atlanta starters, and should be more than serviceable.

I love Kluber.

Sources say he’s recovering well from his surgery. If he makes the Dodgers’ rotation and remotely resembles the Beckett of old, he’s  a value.

He absolutely dealt for the Padres last year. A reader mentioned he could be on an innings limit, but I would still ride him until he’s shuffled out of the rotation, and then simply find a replacement for him.

If the Royals’ Yordano Ventura is going 62nd on average, there’s no reason Paxton should be going outside the top 100 pitchers. Paxton doesn’t gas a 10o-mph heater like Ventura does but his strikeout and walk rates are very similar to Ventura’s.

Skaggs was a three-time top-100 prospect for Baseball America, peaking at No. 12 in 2013 (and No. 17 for Baseball Prospectus). It would be a mistake to write him off so soon after one bad season, especially with minor-league numbers better than those of Ventura or Paxton. His 2013 and current spring training numbers are an eyesore, though, so the repulsion is understandable. But, as I always say, he’s a name worth remembering.

Other notables: Drew Hutchison (114th), Erik Johnson (133rd), Jake Odorizzi (151st)

# Panning for gold using spring stats, pitcher edition

Here’s the second installment of my breakdown of spring training stats. You can view the first one by scrolling down like four inches to the previous post. Here is a look at a variety of pitchers in no particular order.

James Shields, KC
Important stats: 14.2 IP, 18 K, 0 BB (0.61 ERA, 0.48 WHIP)
Why they’re important: Shields is firmly entrenched as a solid No. 2 fantasy starter, but he is off to as a hot a start as anyone right now, striking out 11.05 batters per nine innings and walking nobody. Not saying he’s worth bumping up in your rankings, but perhaps he’ll give you a little more than what you expected this year.

Max Scherzer, DET
Important stats: 14.1 IP, 16 K, 2 BB
Why they’re important: It would be unjust to exclude him. He’s having an excellent start, but he’s an excellent pitcher, so this is nothing extraordinary at this point.

Chris Tillman, BAL
Important stats: 12.2 IP, 14 K, 2 BB
Justin Masterson, CLE
Important stats: 13.0 IP, 14 K, 2 BB
Why they’re important: What’s the difference between them? Tillman has a 4.97 ERA and 1.26 WHIP while Masterson is sporting a 0.00 ERA and 0.62 WHIP. Meanwhile, their underlying stats are almost identical. This is where small sample sizes can really warp perspectives. Each guy is the victim and beneficiary of batting average on balls in play (BAbip), respectively. Only difference is Masterson is giving up fewer fly balls, making him less prone to home runs and hits.

Corey Kluber, CLE
Important stats: 14.1 IP, 15 K, 2 BB
Why they’re important: Maybe you’ve caught on to the trend again: I’m focusing on guys with excellent strikeout rates as well as strikeout-to-walk ratios (K/BB). Ignore the 5.02 ERA and 1.33 WHIP; Kluber’s BAbip is a sky-high .395 over this small sample size. He’s steal dealing. Also, he has the fifth-best ground ball rate of qualified spring training pitchers. I’ve read concerns about his home runs allowed last year. Can’t hit a home run on the ground, son. (Well, technically you can, but… shhhhhhh.)

Josh Johnson, SD
Important stats: 13.1 IP, 1.05 WHIP, 13 K, 4 BB
Why they’re important: For people hoping for a comeback, these ratios (8.78 K/9, 2.70 BB/9) are the makings of a solid starter. He’s not on my radar, but I acknowledge reasons why he could be on it (aside from the fact that he used to be one of the most dominant pitchers in all of baseball).

Alex Wood, ATL
Important stats: 14 IP, 0.00 ERA, 0.93 WHIP, 12 K, 2 BB
Why they’re important: He had a 1.73 ERA and 0.99 WHIP in the minors with a 3.78 K/BB. He followed it up with an 8.9 K/9 in the majors, nearly identical to his minor-league rate. The Braves develop great pitchers (and they know when to deal them… looking at you, Tommy Hanson). Wood is the next in line.

There are pitchers having bad springs, too. Guess which statistic I’m primarily using to evaluate them?

Tony Cingrani, CIN
Important stats: 12.2 IP, 6.39 ERA, 1.42 WHIP, 13 K, 6 BB
Why they’re important: I’m not as concerned with the ratios as I am the walks, which he’s handing out at a 5.68 walks-per-nine-innings (BB/9) clip. Strikeouts are still there, which is good, and, of course, it’s worth acknowledging the small sample size. Maybe he’s working off the offseason slumber. But I’m keeping my eye on his control.

Tim Hudson, SF
Important stats:
13.1 IP, 1.58 WHIP, 9 BB
Why they’re important: Nothing matters here except for the lack of control. Cingrani’s walks are a bit disconcerting; Hudson’s walks (6.08 BB/9) is really worrisome, especially for an older pitcher coming back from a gruesome foot/ankle/leg injury. Perhaps it’s a bit early to predict the beginning of the end, but I’ll say it anyway: this could be the beginning of the end of Tim Hudson. It’s a shame, but it ultimately happens to everyone.

Matt Moore, TB
Important stats: 10.1 IP, 2.32 WHIP, 10 K, 11 BB
Why they’re important: He’ll always be loved for his strikeout propensity but his walk rate (9.58 BB/9) is most horrifying of all. I understand if you like him, but I will never draft him because of how he damages my WHIP — and a player with bad command is one bad-luck-BAbip away from having an absolutely miserable year.

Jose Quintana, CHW
Important stats: 6 IP, 30.00 ERA, 4.00 WHIP
Why they’re important: And the Worst/Most Humiliating Spring Training award goes to… Jose Quintana! Just look at it. It’s almost impossible how bad he’s been. But, in his defense, there’s a .586 BAbip at work here. And that, my friends, is why sample sizes this small should not be trusted. Some statistical anomalies are worth noting, but this one is simply outrageous. I am not changing my ranking of him based on this.

Other notable pitchers having bad springs, in terms of control: Zach McAllister, Dan Straily

Rookies/prospects having good springs: Yordano Ventura, KC (1.76 ERA, 0.72 WHIP, 15-to-1 K/BB ratio in 15.1 IP)… Drew Hutchison, TOR (2.79 ERA, 0.83 WHIP, 16-to-1 K/BB ratio in 9.2 IP)…

Rookies/prospects having bad springs: Allen Webster, BOS, continually plagued by command issues (5.25 ERA, 1.42 WHIP, 5.25 BB/9)… Archie Bradley, ARI, baseball’s No. 1 pitching prospect, also plagued by command issues, a problem he has had his entire professional career (4.32 ERA, 1.68 WHIP, 6.48 BB/9)… Trevor Bauer, CLE, allegedly on the comeback trail, but starting to doubt it (10.29 ERA, 2.43 WHIP, 6.43 BB/9)…

I said this verbatim in my last post: “Do your own research, form your own opinions.” It’s important to remember that these are incredibly small smaple sizes, meaning there’s a lot of volatility involved here. Still, some metrics can be very telling, and strikeout and walk rates can be much more indicative of future performance than ERA (or even WHIP, which can be jerked around by fluctuations in BAbip). Again, don’t put your eggs into one basket (where spring training stats is the basket in this analogy), but it’s worth remembering a name or two.

# Early SP rankings for 2014

I wouldn’t say pitching is deep, but I’m surprised by the pitchers who didn’t make my top 60.

Note: I have deemed players highlighted in pink undervalued and worthy of re-rank. Do not be alarmed just yet by what you may perceive to be a low ranking.

# Need a Streamer? Young guns Ventura (tonight), Johnson (next week), others

• MARQUEE STREAM for tonight: The Kansas City Royals have called up starting pitcher Yordano Ventura, who racked up a 3.14 ERA and 1.28 WHIP over 134-2/3 innings in Triple-A. He struck out well more than a batter per inning but also walked too many guys. Control may always be an issue, considering Ventura’s fastball frequently exceeds 100 mph, but the strikeout potential is there as well as a possible win if he’s not held to a strict pitch count.
• For Monday, Sept. 23: Chicago White Sox rookie pitcher Erik Johnson finally had the kind of outing I’ve been waiting for: 6 IP, 0 ER, 4 H, 2 BB, 8 K. Johnson doesn’t have the same strikeout potential as Ventura, but he has been a remarkably better pitcher, notching a 1.57 ERA, 1.08 WHIP and 8.9 K/9 in 57-1/3 innings in Triple-A — in line with his career minor-league line of 2.24 ERA, 1.08 WHIP and 8.4 K/9. I may consider slotting him next Monday against the Blue Jays; if I don’t, I’m certainly watching his start to see how he fares as I consider dynasty options.
• For Saturday, Sept. 21: San Diego Padres pitcher Burch Smith hurled a gem in Atlanta on Sunday, striking out 10 across seven innings and allowing only three hits. He has 17 strikeouts through two starts (12 IP) with a 1.08 WHIP since his recall (although he has allowed six walks). He has a ghastly 6.57 ERA and 1.67 WHIP because of a horrendous string of outings when he debuted, but if you look past it, you have a rookie pitcher with strikeout potential and good control who appears to have turned a corner.
• For Thursday, Sept. 19: Don’t quit on Los Angeles Dodgers pitcher Ricky Nolasco — and if you’re in a league where an owner has abandoned ship like I am, throw down a waiver claim. I acknowledged Nolasco’s quietly-good season on one of my league’s message boards and how a move to L.A. could greatly impact his value (I wish I had this blog at the time to back it up). He plays Arizona and San Francisco his next two starts, although I understand if you’re hesitant to start him against the Giants, who caused owners to jump ship in the first place.