# Predicting pitchers’ walks using xBB%

The other day, I discussed predicting pitchers’ strikeout rates using xK%. I will conduct the same exercise today in regard to predicting walks. Using my best intuition, I want to see how well a pitcher’s walk rate (BB%) actually correlates with what his walk rate should be (expected BB%, henceforth “xBB%”). Similarly to xK%, I used my intuition to best identify reliable indicators of a pitcher’s true walk rate using readily available data.

An xBB% metric, like xK%, would not only if a pitcher perennially over-performs (or under-performs) his walk rate but also if he happened to do so on a given year. This article will conclude by looking at how the difference in actual and expected walk rates (BB – xBB%) varied between 2014 and career numbers, lending some insight into the (un)luckiness of each pitcher.

Courtesy of FanGraphs, I constructed another set of pitching data spanning 2010 through 2014. This time, I focused primarily on what I thought would correlate with walk rate: inability to pitch in the zone and inability to incur swings on pitches out of the zone. I also throw in first-pitch strike rate: I predict that counts that start with a ball are more likely to end in a walk than those that start with a strike. Because FanGraphs’ data measures ability rather than inability — “Zone%” measures how often a pitcher hits the zone; “O-Swing%” measures how often batters swing at pitches out of the zone; “F-Strike%” measures the rate of first-pitch strikes — each variable should have a negative coefficient attached to it.

I specify a handful of variations before deciding on a final version. Instead of using split-season data (that is, each pitcher’s individual seasons from 2010 to 2014) for qualified pitchers, I use aggregated statistics because the results better fit the data by a sizable margin. This surprised me because there were about half as many observations, but it’s also not surprising because each observation is, itself, a larger sample size than before.

At one point, I tried creating my own variable: looks (non-swings) at pitches out of the zone. I created a variable by finding the percentage of pitches out of the zone (1 – Zone%) and multiplied it by how often a batter refused to swing at them (1 – O-Swing%). This version of the model predicted a nice fit, but it was slightly worse than leaving the variables separated. Also, I ran separate-but-equal regressions for PITCHf/x data and FanGraphs’ own data. The PITCHf/x data appeared to be slightly more accurate, so I proceeded using them.

The graph plots actual walk rates versus expected walk rates. The regression yielded the following equation:

xBB% = .3766176 – .2103522*O-Swing%(pfx) – .1105723*Zone%(pfx) – .3062822*F-Strike%
R-squared = .6433

Again, R-squared indicates how well the model fits the data. An R-squared of .64 is not as exciting as the R-squared I got for xK%; it means the model predicts about 64 percent of the fit, and 36 percent is explained by things I haven’t included in the model. Certainly, more variables could help explain xBB%. I am already considering combining FanGraphs’ PITCHf/x data with some of Baseball Reference‘s data, which does a great job of keeping track of the number of 3-0 counts, four-pitch walks and so on.

And again, for the reader to use the equation above to his or her benefit, one would plug in the appropriate values for a player in a given season or time frame and determine his xBB%. Then one could compare the xBB% to single-season or career BB% to derive some kind of meaningful results. And (one more) again, I have already taken the liberty of doing this for you.

Instead of including every pitcher from the sample, I narrowed it down to only pitchers with at least three years’ worth of data in order to yield some kind of statistically significant results. (Note: a three-year sample is a small sample, but three individual samples of 160+ innings is large enough to produce some arguably robust results.) “Avg BB% – xBB%” (or “diff%”) takes the average of a pitcher’s difference between actual and expected walk rates from 2010 to 2014. It indicates how well (or poorly) he performs compared to his xBB%: the lower a number, the better. This time, I included “t-score”, which measures how reliable diff% is. The key value here is 1.96; anything greater than that means his diff% is reliable. (1.00 to 1.96 is somewhat reliable; anything less than 1.00 is very unreliable.) Again, this is slightly problematic because there are five observations (years) at most, but it’s the best and simplest usable indicator of simplicity.

Thus, Mark Buehrle, Mike Leake, Hiroki Kuroda, Doug Fister, Tim Hudson, Zack Greinke, Dan Haren and Bartolo Colon can all reasonably be expected to consistently out-perform their xBB% in any given year. Likewise, Aaron Harang, Colby Lewis, Ervin Santana and Mat Latos can all reasonably be expected to under-perform their xBB%. For everyone else, their diff% values don’t mean a whole lot. For example, R.A. Dickey‘s diff% of +0.03% doesn’t mean he’s more likely than someone else to pitch exactly as good as his xBB% predicts him to; in fact, his standard deviation (StdDev) of 0.93% indicates he’s less likely than just about anyone to do so. (What it really means is there is only a two-thirds chance his diff% will be between -0.90% and +0.96%.)

As with xK%, I compiled a list of fantasy-relevant starters with only two years’ worth of data that see sizable fluctuations between 2013 and 2014. Their data, at this point, is impossible (nay, ill-advised) to interpret now, but it is worth monitoring.

Name: [2013 diff%, 2014 diff%]

Miller is an interesting case: he was atrociously bad about gifting free passes in 2014, but his diff% was only marginally worse than it was in 2013. It’s possible that he was a smart buy-low for the braves — but it’s also possible that Miller not only perennially under-performs his xBB% but is also trending in the wrong direction.

Here are fantasy-relevant players with a) only 2014 data, and b) outlier diff% values:

I’m not gonna lie, I have no idea why Cobb, Corey Kluber and others show up as only having one year of data when they have two in the xK% dataset. This is something I noticed now. Their exclusion doesn’t fundamentally change the model’s fit whatsoever because it did not rely on split-season data; I’m just curious why it didn’t show up in FanGraphs’ leaderboards. Oh well.

Implications: Richards and Roark perhaps over-performed. Meanwhile, it’s possible that Odorizzi, Ross  and Ventura will improve (or regress) compared to last year. I’m excited about all of that. Richards will probably be pretty over-valued on draft day.

# Matt Shoemaker, pending fantasy star

How does one apologize for not writing for more than a month and a half? It’s hard, man. Maybe one does not apologize to one’s readers. Maybe one’s readers accepts that it is what it is.

You know what else is what it simply is? Matthew David Shoemaker, the Los Angeles Angels of Anaheim’s right-handed reliever-turned-starter.

Every baseball season often produces more questions than answers. Namely: Who is Matt Shoemaker? Where did he come from? Why am I writing about him if he’s not that good?

Let us rewind to 2012. A mysterious figure emerged from the mist of the Seattle Mariners’ bullpen to dazzle us– or maybe just me, given he never really received the recognition he deserved. Maybe he had a right to be ignored: he posted a 4.75 ERA and 1.42 WHIP in 30-1/3 relief innings. If you don’t know how this fairy tale ends, it goes something like: goes largely unnoticed in 2012, is drafted outside the top 75 pitchers on average in ESPN drafts in 2013, and eventually emerges as a borderline fantasy ace by the name of Hisashi Iwakuma.

There’s a lesson to be learned. Iwakuma’s horrid statistics as a reliever muddied his season numbers. In hindsight, a 3.15 ERA for the year is solid, but a 2.65 ERA is better, and that’s what Iwakuma posted strictly as a starter. Yet fantasy owners who opted only to scratch the surface saw mostly unsightly ratios.

The same fairy tale manifested itself in a different form in 2013 that would make the Brothers Grimm proud. The Cleveland Indians’ Corey Kluber emerged from the bullpen in May, albeit after only half a dozen innings, many more than that in 2012. Kluber’s season, however, began with aplomb — and by aplomb, I mean “a handful of horrible starts.” Starts horrible enough to sully his numbers for the year (3.95 ERA). But the peripherals were there at season’s end: 8.28 strikeouts per nine innings, 2.09 walks per nine, 3.12 xFIP. In case you haven’t kept track, Kluber has more or less assumed the role as Cleveland’s staff ace this year, posting a 2.95 ERA with more strikeouts than innings.

I will now shortsightedly assume, without any kind of research, that this kind of thing happens every year. Every year, there’s at least one player who emerges from the bullpen and becomes an ace. Sure, you have the Chris Sales and Adam Wainwrights of the baseball world, who make a gigantic, whale-sized spash, but you also have the Iwakumas and Klubers, who basically don’t make a splash at all and probably sit on the side of the pool with their feet dangling in and shirts still on.

So I’m calling it: Mr. Shoemaker will be 2015’s reincarnation of this fairy tale.

In keeping the trend alive, a look at Shoemaker’s stats tell you… well, in the way of anything positive, not much. He has somehow notched seven wins despite a 4.54 ERA and 1.30 WHIP, so that rules. Worse, his WHIP was, like, 1.42 before his most recent start. So, bad season stat line? Check.

Meanwhile, he has struck out 9.68, and walked only 1.87, hitters per nine innings. It would behoove me to point out that these numbers dwarf those posted by Kluber in 2013, during which Kluber existed primarily in a gelatinous state of Emerging Star. It would also behoove me to point out that a reader with a discerning eye would notice that Shoemaker has a still-lackluster 4.37 ERA and 1.28 WHIP as a starter, fitting the mold of “maybe his season numbers are ruined.” It would further behoove me to point out that he is suffering the misfortune of a .350 batting average on balls in play (BABIP), which, if normalized to a more reasonable .320, would produce a 1.20 WHIP. A league-average .300 BABIP? A 1.14 WHIP. So, distorted stats as a starter? Check.

Perhaps the most important, and valid, question at this point is whether or not Shoemaker can sustain what he’s doing. Small sample size caveats abound here, but I think the results are still substantial, if not due for regression. For all pitchers who have thrown at least 60 innings, Shoemaker ranks 11th in swinging strike percentage (11.9), one spot behind Stephen Strasburg, the MLB strikeout leader, and three spots ahead of his teammate Garrett Richards, who has done all kinds of breaking out this year. Shoemaker also ranks 9th in hitter contact allowed (73.5 percent), sandwiched between Gio Gonzalez and, yes, Richards. Thus, even given small sample size caveats, Shoemaker is among excellent company. The walk rate may suffer; it’s hard to say, and even harder still given that I’m on an airplane over central California with no internet. But, given the browser tabs I still have open, I can tell you that Shoemaker’s percentage of pitches thrown in the strike zone, according to FanGraphs’ data, trails only Clayton Kershaw and Mets reliever Carlos Torres among the 10 names ahead of his on the swinging strike percentage list. That bodes well for projecting his control going forward. (PITCHf/x, however, portends another story, as his zone percentage trails six of the eight names ahead of his. But when the names you trail are Felix Hernandez, Masahiro Tanaka and Kershaw, I’d say you’re not doing so bad for yourself.) So, solid peripherals? Check.

It’s a makeshift and largely personal checklist, but so far, Shoemaker meets all my criteria for the gelantinous Emerging Star. Who knows how Shoemaker will fare during the season’s last two months, but I think he’s worth owning now despite his current stat line. As for 2015 and beyond, I like him — for now. I wouldn’t bother keeping him, as I think his value will be depressed heading into next year’s draft, so you can easily wait around for him in the late rounds, if not add him as free agency in the first couple of weeks of the season, just as many owners did with Iwakuma and Kluber the past two years.

I hesitate to say Shoemaker is a lock for success. If anything, this post is less about finding The Next Big Thing as it is finding a pitcher whose performance betrays his value. There are the Sonny Grays and Michael Wachas of the world, whose status as top prospects make them costly prospective adds. Then there are the Matt Shoemakers, whose obscurity and relative misfortune keep him out of the fantasy limelight — and, one would hope, on the clearance shelf, from which you can swipe him on the cheap.

# Don’t shy from injured pitchers

Drafting injured players can be tricky. The success of the strategy is largely dependent on your league’s rules. In a single-year format, where all players are thrown back into the pool for next season’s draft, the room for error is much narrower. In a dynasty format, however, where players are kept for X number of years or at an additional premium to the player’s salary of Y dollars, it can be used much more effectively because the chances for success are spread distributed temporally.

For example: An owner in my primary 10-team standard rotisserie league with an auction draft purchased an injured Hanley Ramirez last year for \$6. Had he been healthy, he probably would have gone for \$25, but his estimated time of arrival in 2013 was uncertain; he actually played his first game April 1, 2013, but appeared in only three more games between then and June 4. This uncertainty greatly reduced his value.

I should re-phrase: the uncertainty greatly reduced his 2013 value. With four days until draft day, I’m realizing now that Ramirez’s value at \$6, even in 2013, was immense for the format of our league, because now he will be owned for a measly \$9 — all because the owner was willing to plug a hole with a replacement-level shortstop for two months. Now his team is poised to dominate this year with cheap retention prices for Chris Davis and Paul Goldschmidt to boot.

Breaking down the strategy, it makes a lot of sense. Stream someone like Stephen Drew, ESPN’s 18th best shortstop of 2013, for two months while Ramirez heals. Their patchwork stat line would have looked like this:

.302 BA, 80 R, 24 HR, 78 RBI, 12 SB

That is a solid line for a shortstop, regardless of whose name — or names — show up in the box score.

If you fancy yourself a bargain hunter or someone who can spot the late-round sleepers, this strategy makes even more sense: Draft a superstar for less than face value, stash him on the DL and fill the opening with whomever this year’s Jean Segura may be. Even if you can’t find this year’s breakout star, the replacement-level strategy still has the opportunity to be effective.

Upon further reflection, I may take a chance on players such as Cole Hamels and Hisashi Iwakuma whose draft stocks may take a hit. There’s enough pitching depth for me to make their absences painless, and I have a chance to retain them next year at a discount (relative to their expected salaries).

It’s important, though, that the player has already established a high benchmark for himself. In this case, Jurickson Profar wouldn’t be as smart a play here; he wasn’t going for a lot of money (or too quickly off draft boards) in the first place.

The best opportunities, therefore, are found in the best players who are out for two or three months. It’s important to wring out as much 2015 value as possible, but you don’t want to clog your DL all year and hamper your 2014 value too much, or it defeats the purpose. Clearly, one must strike a fine balance.

But, basically, if you see an injured player heavily discounted on draft day, and you’re  in a league that rewards bargain hunting, take a stab at him.

Here are some so-called “eligible” players for this injured-player strategy and what I predict their discounts might be:

Hamels, SP, expected to miss a month | \$10, three to four rounds
Iwakuma, SP, expected to miss a month | \$11, seven to eight rounds
Mike Minor, SP, expected to miss a month | \$8, six to seven rounds
Aroldis Chapman, RP, expected to miss 6 to 8 weeks | \$9, four to five rounds
Manny Machado, 3B, expected to miss a week, but could miss a month | \$3, three rounds
Michael Bourn, OF, expected to miss a couple of weeks, but could be longer | \$4, four rounds
Matt Harvey, SP, expected to miss entire year | \$18, 12 to 15 rounds
Kris Medlen, PS, expected to miss entire year | \$15, 10 to 12 rounds ***DISCLAIMER: may not return to form after second Tommy John surgery

Players for whom the strategy may not work so well:

Mat Latos, SP, will only miss a couple of starts
Homer Bailey, SP, will only miss a couple of starts
Profar, 2B, will miss 10 to 12 weeks but isn’t valuable enough
Jeremy Hellickson, SP, will miss two months but isn’t valuable enough
A.J. Griffin, SP, will miss entire year but isn’t valuable enough
Jarrod Parker, SP, will miss entire year but isn’t valuable enough
Brandon Beachy, SP, will miss entire year but isn’t valuable enough

Players who are wild cards:

Matt Kemp, OF, depends on if you think he’ll return to form

# Updated SP rankings

I have updated the starting pitcher rankings to reflect offseason signings, rotation battles and spring training injuries — and holy cow, have there been a lot of spring training injuries.

I also truncated the list to the top 90 pitchers. I will write about my favorite pitchers outside the top 90 because a lot of them are really good; they simply won’t get enough get enough starts or pitch enough innings for them to crack the top 90 in value. In terms of stuff, though, there are plenty of diamonds to find in the rough.

Stock up: James Paxton, Justin Verlander, Ervin Santana

Stock down: Cole Hamels (shoulder), Hisashi Iwakuma (finger), Kris Medlen (elbow), Patrick Corbin (elbow), Jarrod Parker (elbow), A.J. Griffin (elbow), Brandon Beachy (elbow), Miguel Alfredo Gonzalez

# Bold prediction #3: Corey Kluber is this year’s Hisashi Iwakuma

Bold Prediction #2: Brad Miller will be a top-5 shortstop
Bold Prediction #1: Tyson Ross will be a top-45 starter (until he reaches his innings cap)

The Corey Kluber Society, fronted by Carson Cistulli of FanGraphs, is, frankly, hilarious. The format of the post is great, and if you haven’t read it before, you should here.

But there’s a more important reason to read about (and “join”) the Society. Kluber is not only a legitimate fantasy starting pitcher but also a very good one. His breakout last year was muted by a couple of bad starts, but he is a perfect comp to a 2012 Hisashi Iwakuma on the verge.

I will list a variety of statistics in which Kluber excelled. Then I will let you know whom he outperformed in each category for all pitchers with at least 140 innings pitched (107 total).

K/9: 8.31 (26th overall)
Better than: Cole Hamels, Julio Teheran, Adam Wainwright, Mat Latos, Mike Minor

K/BB: 4.12 (11th overall)
Better than: Hamels, Jordan Zimmermann, Teheran, Anibal Sanchez, Homer Bailey

BAbip: .329 (6th worst)

Swinging strike rate: 10.4% (22nd overall)
Better than: Zack Greinke, Latos, Iwakuma, Scott Kazmir, Jose Fernandez

Contact rate: 76.8% (16th overall)
Better than: Kris Medlen, Jeff Samardzija, Bailey, Greinke, Fernandez

xFIP-: 78 (11th overall)
Better than: Max Scherzer, Fernandez, David Price, Iwakuma, Stephen Strasburg

Yowza. Those are some seriously stellar numbers. What’s the deal? Unfortunately for Kluber, he suffered a brutal outing or two, causing his WHIP and ERA to be inflated for most of the year and allowing him to fly under the radar. Chalk it up to bad luck, considering Kluber’s 6th-worst BAbip, better than only Joe Saunders, Dallas Keuchel and other names one wishes not to be associated with.

This sounds vaguely familiar. A high-control guy with a solid strikeout rate out of the bullpen? Does the name Hisashi Iwakuma ring a bell? It should, because he has already been mentioned several times in the last 300 words. Anyway, I rode the Iwakuma (and Bailey) wave through the end of 2012. Instead of going with my gut and drafting Iwakuma in the last round of my shallow draft in 2013, I opted for Marco Estrada — not a terrible pick, but clearly not the right gamble to take. It’s actually the moment upon which I reflected and realized that I should really just take my own advice. Because given Dan Haren‘s peripherals, why would anyone have trusted him over Bailey last year? Ridiculous. (FYI, I will rip on Haren in a forthcoming bold prediction, just to be clear that I’m not ripping on him because he gave up a million home runs last year.)

But I digress. Iwakuma was good in 2012, but his 7.25 K/9, 2.35 K/BB and 1.28 WHIP were all rather pedestrian. But sometimes you need to rely on your eyes more than the numbers, and anyone who watched Iwakuma saw flashes of brilliance. 2013 may have been more than we anticipated, which brings me to my point:

Kluber already has the makings of a great pitcher, and his peripherals indicate that none of it was a fluke. My official bold prediction: Corey Kluber will be a top-20 starting pitcher.

# Implications of spring training injuries and news

We’re only a couple of days in and teams are already crying “man down!”

The Seattle Mariners’ No. 2 starter Hisashi Iwakuma strained his finger on the first day of spring training, sidelining him for four to six weeks. Considering spring training is a time for conditioning and preparing for the season (duh), Iwakuma fans and owners can only hope he will be participating in workouts with the exception of throwing to keep pace. Still, that could be legitimate four to six weeks of the season we could miss of him, even if he is healthy by Opening Day, which he is expected to be.

The New York Yankees’ ace CC Sabathia has allegedly lost more weight but, instead of simply slimming down, has added more muscle. He allegedly felt weak last year after committing to a lifestyle change that saw him lose 35-or-so pounds. It’s an interesting situation; I keep Sabathia ranked around 40th of starting pitchers, but I’ll be tracking his velocity through the spring, if possible. If he’s got some of it back, it could boost his stock. It was only two years ago that Tim Lincecum halted his routine of fast food binges and started eating healthily — right before he had the worst season of his career, and lost mph and life on his fastball. Coincidence?

Also, the Cincinnati Reds’ ace Mat Latos had surgery on his knee to clean up some stuff going on down there. (Pretty scientific, right?) It’s supposed to be minor — he’ll be up and running in 10 days — but there’s always a chance of complications, even if the probability is slim. And, like Iwakuma, one has to hope the lost time doesn’t affect his Opening Day start.

Cy Young winner Justin Verlander says he feels fine and will be ready for Opening Day. It remains to be seen how it will affect his throwing, though. One more bad year and things could start getting ugly.

And American League MVP Miguel Cabrera says he feels stronger after core surgery this offseason. Is that even possible? It could be hot air, but it’s just scary thinking about what kind of season he could have feeling perfectly healthy — and wondering how much his woes last year plagued him before they became very obvious in the latter third of the season.

Bonus coverage: Is it easy to dismiss Trevor Bauer after his rough-and-tumble stay in the bigs so far? Yes. And is it easy to dismiss the PR machine churning in Cleveland trying to make him look like he’s not a lost cause? Yes. But! Let us consider one thing: Mickey Callaway did something magical last year when he revived the careers of both Ubaldo Jimenez and Scott Kazmir. Is it out of the question that he can do the same for the third overall pick of the 2011 draft?

As I always say, “keep your eye on so-and-so during spring training”… But seriously, if Bauer is walking fewer than 4 BB/9 in his first few starts of the season, he will have my attention.

# A look at international players’ value, or “Might as well give Tanaka his Yankee jersey now” (Updated Jan. 14)

Let’s avoid all talk about who’s right or wrong in the Alex Rodriguez debacle, spectacle, three-ring circus, what-have-you. I liked the White Sox as sleepers to win Japanese phenom Masahiro Tanaka‘s services this winter. Now that A-Rod is suspended for 162 games, though, the New York Yankees will have something like \$24 million in payroll freed up for 2014.

Although the Yankees were allegedly among two or three frontrunners in the bidding war for Tanaka, it appeared to me their payroll would pose a huge obstacle if they truly wanted to obey the luxury tax threshold. But Rodriguez’s suspension blows everything wide open, upgrading the Bronx Bombers’ status from Possible to Probable.

Updated Jan. 14, 2014: The Angels are a distant third to the Yankees and Dodgers, and with Los Angeles looking to extend pitcher Clayton Kershaw… well, the deal is as good as done. Although, in defense of the L.A. teams, Tanaka has mentioned he wants to play on the west coast.

As for the White Sox… get ’em next time, boys. Keep looking for those good deals. I tell you what, every high-profile international signing in the past three years has been a winner.

It is commonly accepted that each win a player provides in value (a “win above replacement,” for those just piecing two and two together) has a market value of about \$5 million, although Lewie Pollis at SB Nation argues it is closer to \$7 million. Even using the quick-and-easy (and lower) \$5 million as a benchmark, the value (by means of WAR) of the 2013 performance of every notable international player in MLB exceeded the average annual value (AAV) of his contract:

Yu Darvish: 5.0 WAR ~ \$25 million (AAV: \$18.62 million)
Hisashi Iwakuma: 4.2 WAR ~ \$21 million (AAV: \$7 million)
Yasiel Puig: 4.0 WAR ~ \$20 million (AAV: \$6 million)
Hyun-jin Ryu: 3.1 WAR ~ \$15.5 million (AAV: \$6 million)
Leonys Martin: 2.7 WAR ~ \$13.5 million (AAV: \$4.1 million)
Yoenis Cespedes: 2.3 WAR ~ \$11.5 million (AAV: \$9 million)
Norichika Aoki: 1.7 WAR ~ \$8.5 million (AAV: \$1.65 million)

Let’s note here that the AAV for all the players listed above exceeded their actual 2013 salaries. For example, Martin made \$3.25 million last year, and Ryu made \$3.33 million. Thus, even Cespedes, with his disappointing production compared to 2012, still managed to be a boon for his team, and he should only improve from last year.

It’s a small sample size, but hey, the results seem pretty substantial so far in the post-Dice-K era. Don’t be surprised when my fantasy team has Jose Abreu, Alexander Guerrero and Miguel Alfredo Gonzalez on it.