# Predicting pitchers’ strikeouts using xK%

Expected strikeout rate, or what I will henceforth refer to as “xK%,” is exactly what it sounds like. I want to see if a pitcher’s strikeout rate actually reflects how he has pitched in terms of how often he’s in the zone, how often he causes batters to swing and miss, and so on. Ideally, it will help explain random fluctuations in a pitcher’s strikeout rate, because even strikeouts have some luck built into them, too.

An xK% metric is not a revolutionary idea. Mike Podhorzer over at FanGraphs created one last year, but he catered it to hitters. Still, it’s nothing too wild and crazy like WAR or SIERA or any other wacky acronym. (A wackronym, if you will.)

Courtesy of Baseball Reference, I constructed a set of pitching data spanning 2010 through 2014. I focused primarily on what I thought would correlate highly with strikeout rates: looking strikes, swinging strikes and foul-ball strikes, all as a percentage of total strikes thrown. I didn’t want the model specification to be too close to a definition, so it’s beneficial that these rates are on a per-strike, rather than per-pitch, basis.

The graph plots actual strikeout rates versus expected strikeout rates with the line of best fit running through it. I ran my regression using the specification above and produced the following equation:

xK% = -.6284293 + 1.195018*lookstr + 1.517088*swingstr + .9505775*foulstr
R-squared = .9026

The R-squared term can, for easy of understanding, be interpreted as how well the model fits the data, from 0 to 1. An R-squared, then, of .9026 represents approximately a 90-percent fit. In other words, these three variables are able to explain 90 percent of a strikeout rate. (The remaining 10 percent is, for now, a mystery!)

In order for the reader to use this equation to his or her own benefit, one would insert a pitcher’s looking strike, swinging strike and foul-ball strike percentages into the appropriate variables. Fortunately, I already took the initiative. I applied the results to the same data I used: all individual qualified seasons by starting pitchers from 2010 through 2014.

The results have interesting implications. Firstly, one can see how lucky or unlucky a pitcher was in a particular season. Secondly, and perhaps most importantly, one can easily identify which pitchers habitually over- and under-perform relative to their xK%. Lastly, you can see how each pitcher is trending over time. Every pitcher is different; although the formula will fit most ordinary pitchers, it goes without saying that the aces of your fantasy squad are far from ordinary, and they should be treated on an individual basis.

(Keep in mind that a lot of these players only have one or two years’ worth of data (as indicated by “# Years”), so the average difference between their xK% and K% as a representation of a pitcher’s true skill will be largely unreliable.)

It is immediately evident: the game’s best pitchers outperform their xK% by the largest margins. Cliff Lee, Stephen Strasburg, Clayton Kershaw, Felix Hernandez and Adam Wainwright are all top-10 (or at least top-15) fantasy starters. But let’s look at their numbers over the years, along with a few others at the top of the list.

Kershaw and King Felix have not only been consistent but also look like like they’re getting better with age. Wainwright’s difference between 2013 and 2014 is a bit of a concern; he’s getting older, and this could be a concrete indicator that perhaps the decline has officially begun. Darvish’s line is interesting, too: you may or may not remember that he had a massive spike in strikeouts in 2013 compared to his already-elite strikeout rate the prior year. As you can see, it was totally legit, at least according to xK%. But for some reason, even xK% can fluctuate wildly from year to year. I see it in the data, anecdotally: Anibal Sanchez‘s huge 6.7-percent spike in xK% from 2012 to 2013 was followed by a 5.5-percent drop from 2013 to 2014. Conversely, David Price‘s 5-percent decrease in xK% from 2012 to 2013 was followed by an almost perfectly-equal 5-percent increase from 2013 to 2014. So the phenomenon seems to work both ways. Thus, perhaps it shouldn’t have come as a surprise when Darvish couldn’t repeat his 2013 success. To the baseball world’s collective dismay, we simply didn’t have enough data yet to determine which Yu was the true Yu. I plan to do some research to see how often these severe spikes in xK% are mere aberrations versus how often they are sustained over time, indicating a legitimate skills improvement.

I have also done my best to compile a list of players with only one or two years’ worth of data who saw sizable spikes and drops in their K% minus xK% (“diff%”). The idea is to find players for whom we can’t really tell how much better (or worse) their actual K% is compared to their xK% because of conflicting data points. For example, will Corey Kluber be a guy who massively outperforms his xK% as he did in 2014, or does he only slightly outperform as he did in 2013? I present the list not to provide an answer but to posit: Which version of each of these players is more truthful? I guess we will know sometime in October.

Name: [2013 diff%, 2014 diff%]

And here some fantasy-relevant guys with only data from 2014:

# Matt Shoemaker, pending fantasy star

How does one apologize for not writing for more than a month and a half? It’s hard, man. Maybe one does not apologize to one’s readers. Maybe one’s readers accepts that it is what it is.

You know what else is what it simply is? Matthew David Shoemaker, the Los Angeles Angels of Anaheim’s right-handed reliever-turned-starter.

Every baseball season often produces more questions than answers. Namely: Who is Matt Shoemaker? Where did he come from? Why am I writing about him if he’s not that good?

Let us rewind to 2012. A mysterious figure emerged from the mist of the Seattle Mariners’ bullpen to dazzle us– or maybe just me, given he never really received the recognition he deserved. Maybe he had a right to be ignored: he posted a 4.75 ERA and 1.42 WHIP in 30-1/3 relief innings. If you don’t know how this fairy tale ends, it goes something like: goes largely unnoticed in 2012, is drafted outside the top 75 pitchers on average in ESPN drafts in 2013, and eventually emerges as a borderline fantasy ace by the name of Hisashi Iwakuma.

There’s a lesson to be learned. Iwakuma’s horrid statistics as a reliever muddied his season numbers. In hindsight, a 3.15 ERA for the year is solid, but a 2.65 ERA is better, and that’s what Iwakuma posted strictly as a starter. Yet fantasy owners who opted only to scratch the surface saw mostly unsightly ratios.

The same fairy tale manifested itself in a different form in 2013 that would make the Brothers Grimm proud. The Cleveland Indians’ Corey Kluber emerged from the bullpen in May, albeit after only half a dozen innings, many more than that in 2012. Kluber’s season, however, began with aplomb — and by aplomb, I mean “a handful of horrible starts.” Starts horrible enough to sully his numbers for the year (3.95 ERA). But the peripherals were there at season’s end: 8.28 strikeouts per nine innings, 2.09 walks per nine, 3.12 xFIP. In case you haven’t kept track, Kluber has more or less assumed the role as Cleveland’s staff ace this year, posting a 2.95 ERA with more strikeouts than innings.

I will now shortsightedly assume, without any kind of research, that this kind of thing happens every year. Every year, there’s at least one player who emerges from the bullpen and becomes an ace. Sure, you have the Chris Sales and Adam Wainwrights of the baseball world, who make a gigantic, whale-sized spash, but you also have the Iwakumas and Klubers, who basically don’t make a splash at all and probably sit on the side of the pool with their feet dangling in and shirts still on.

So I’m calling it: Mr. Shoemaker will be 2015’s reincarnation of this fairy tale.

In keeping the trend alive, a look at Shoemaker’s stats tell you… well, in the way of anything positive, not much. He has somehow notched seven wins despite a 4.54 ERA and 1.30 WHIP, so that rules. Worse, his WHIP was, like, 1.42 before his most recent start. So, bad season stat line? Check.

Meanwhile, he has struck out 9.68, and walked only 1.87, hitters per nine innings. It would behoove me to point out that these numbers dwarf those posted by Kluber in 2013, during which Kluber existed primarily in a gelatinous state of Emerging Star. It would also behoove me to point out that a reader with a discerning eye would notice that Shoemaker has a still-lackluster 4.37 ERA and 1.28 WHIP as a starter, fitting the mold of “maybe his season numbers are ruined.” It would further behoove me to point out that he is suffering the misfortune of a .350 batting average on balls in play (BABIP), which, if normalized to a more reasonable .320, would produce a 1.20 WHIP. A league-average .300 BABIP? A 1.14 WHIP. So, distorted stats as a starter? Check.

Perhaps the most important, and valid, question at this point is whether or not Shoemaker can sustain what he’s doing. Small sample size caveats abound here, but I think the results are still substantial, if not due for regression. For all pitchers who have thrown at least 60 innings, Shoemaker ranks 11th in swinging strike percentage (11.9), one spot behind Stephen Strasburg, the MLB strikeout leader, and three spots ahead of his teammate Garrett Richards, who has done all kinds of breaking out this year. Shoemaker also ranks 9th in hitter contact allowed (73.5 percent), sandwiched between Gio Gonzalez and, yes, Richards. Thus, even given small sample size caveats, Shoemaker is among excellent company. The walk rate may suffer; it’s hard to say, and even harder still given that I’m on an airplane over central California with no internet. But, given the browser tabs I still have open, I can tell you that Shoemaker’s percentage of pitches thrown in the strike zone, according to FanGraphs’ data, trails only Clayton Kershaw and Mets reliever Carlos Torres among the 10 names ahead of his on the swinging strike percentage list. That bodes well for projecting his control going forward. (PITCHf/x, however, portends another story, as his zone percentage trails six of the eight names ahead of his. But when the names you trail are Felix Hernandez, Masahiro Tanaka and Kershaw, I’d say you’re not doing so bad for yourself.) So, solid peripherals? Check.

It’s a makeshift and largely personal checklist, but so far, Shoemaker meets all my criteria for the gelantinous Emerging Star. Who knows how Shoemaker will fare during the season’s last two months, but I think he’s worth owning now despite his current stat line. As for 2015 and beyond, I like him — for now. I wouldn’t bother keeping him, as I think his value will be depressed heading into next year’s draft, so you can easily wait around for him in the late rounds, if not add him as free agency in the first couple of weeks of the season, just as many owners did with Iwakuma and Kluber the past two years.

I hesitate to say Shoemaker is a lock for success. If anything, this post is less about finding The Next Big Thing as it is finding a pitcher whose performance betrays his value. There are the Sonny Grays and Michael Wachas of the world, whose status as top prospects make them costly prospective adds. Then there are the Matt Shoemakers, whose obscurity and relative misfortune keep him out of the fantasy limelight — and, one would hope, on the clearance shelf, from which you can swipe him on the cheap.

# Bold prediction #3: Corey Kluber is this year’s Hisashi Iwakuma

Bold Prediction #2: Brad Miller will be a top-5 shortstop
Bold Prediction #1: Tyson Ross will be a top-45 starter (until he reaches his innings cap)

The Corey Kluber Society, fronted by Carson Cistulli of FanGraphs, is, frankly, hilarious. The format of the post is great, and if you haven’t read it before, you should here.

But there’s a more important reason to read about (and “join”) the Society. Kluber is not only a legitimate fantasy starting pitcher but also a very good one. His breakout last year was muted by a couple of bad starts, but he is a perfect comp to a 2012 Hisashi Iwakuma on the verge.

I will list a variety of statistics in which Kluber excelled. Then I will let you know whom he outperformed in each category for all pitchers with at least 140 innings pitched (107 total).

K/9: 8.31 (26th overall)
Better than: Cole Hamels, Julio Teheran, Adam Wainwright, Mat Latos, Mike Minor

K/BB: 4.12 (11th overall)
Better than: Hamels, Jordan Zimmermann, Teheran, Anibal Sanchez, Homer Bailey

BAbip: .329 (6th worst)

Swinging strike rate: 10.4% (22nd overall)
Better than: Zack Greinke, Latos, Iwakuma, Scott Kazmir, Jose Fernandez

Contact rate: 76.8% (16th overall)
Better than: Kris Medlen, Jeff Samardzija, Bailey, Greinke, Fernandez

xFIP-: 78 (11th overall)
Better than: Max Scherzer, Fernandez, David Price, Iwakuma, Stephen Strasburg

Yowza. Those are some seriously stellar numbers. What’s the deal? Unfortunately for Kluber, he suffered a brutal outing or two, causing his WHIP and ERA to be inflated for most of the year and allowing him to fly under the radar. Chalk it up to bad luck, considering Kluber’s 6th-worst BAbip, better than only Joe Saunders, Dallas Keuchel and other names one wishes not to be associated with.

This sounds vaguely familiar. A high-control guy with a solid strikeout rate out of the bullpen? Does the name Hisashi Iwakuma ring a bell? It should, because he has already been mentioned several times in the last 300 words. Anyway, I rode the Iwakuma (and Bailey) wave through the end of 2012. Instead of going with my gut and drafting Iwakuma in the last round of my shallow draft in 2013, I opted for Marco Estrada — not a terrible pick, but clearly not the right gamble to take. It’s actually the moment upon which I reflected and realized that I should really just take my own advice. Because given Dan Haren‘s peripherals, why would anyone have trusted him over Bailey last year? Ridiculous. (FYI, I will rip on Haren in a forthcoming bold prediction, just to be clear that I’m not ripping on him because he gave up a million home runs last year.)

But I digress. Iwakuma was good in 2012, but his 7.25 K/9, 2.35 K/BB and 1.28 WHIP were all rather pedestrian. But sometimes you need to rely on your eyes more than the numbers, and anyone who watched Iwakuma saw flashes of brilliance. 2013 may have been more than we anticipated, which brings me to my point:

Kluber already has the makings of a great pitcher, and his peripherals indicate that none of it was a fluke. My official bold prediction: Corey Kluber will be a top-20 starting pitcher.

# The role of luck in fantasy baseball

I apologize for being that guy that ruins that ooey gooey feeling you get when think about the fantasy league you won last year. As much as you want to think you are a fantasy master — perhaps even a fantasy god — you should acknowledge that you probably benefited from a good deal of luck. Sure, for your sake, I will admit you made a great pick with Max Scherzer in the fifth round. But did you, in all your mastery, predict he would win 21 games?

Don’t say yes. You didn’t. And frankly, you would be crazy to say he’ll do it again.

I focus primarily on pitching in this blog, and let it be known that pitchers are not exempt from luck in the realm of fantasy baseball. If you’re playing in a standard rotisserie league, you probably have a wins category. In a points league, you likely award points for wins.

Wins. Arguably the most arbitrary statistic in baseball. Let’s not have that discussion, though, and instead simply accept the win as it is. The win has the most drastic uncontrollable effect on a fantasy pitcher’s value. (ERA and WHIP experiences similar statistical fluctuations, but at least they aren’t arbitrary.)

I had an idea, but before I proceed, let me interject: if you’re drafting for wins, you’re doing it wrong. But, as I said, you can’t ignore wins.

But let’s say you did, and drafted strictly on talent, or “stuff” (which, here, factors in a pitcher’s durability). How would the top 30 pitchers change? Here’s my “stuff” list, which you can compare with the base projections:

Here are the five players with the biggest positive change and a breakdown of each:

1. Brandon Beachy, up 23 spots
His injury history has weakened his wins column projection. Consequently, the number of innings Beachy is expected to throw is significantly less than a full season. But if he managed to stay healthy for the full year (say, 200 innings)? He’s a top-1o pick based on pure stuff. If you draft with the philosophy that you can always find a viable replacement on waivers, Beachy could be your big sleeper.
2. Marco Estrada, up 22 spots
Estrada’s diminished expected wins is more a function of his terrible team than ability. Estrada has underperformed the past two years, Ricky Nolasco style, but if he can pull it together, he’s a top-30 pitcher based on “stuff.” And hey, maybe he can luck into some extra wins. However, if he can’t pull it together — Ricky Nolasco style — he’ll be relegated to fringe starter.
3. Danny Salazar, up 9 spots
Salazar has immense potential. His injury history led the Indians to cap his per-game pitch count last year, and that has been factored into his projection. But if he’s a full-time, 200-inning starter? He’s a top-25 starter with top-15 upside. Again, this is in terms of “stuff”. But is Ivan Nova better than Felix Hernandez because he can magically win more games? Of course not. Among a slew of young studs, including Jose Fernandez, Shelby Miller, Michael Wacha and so on, Salazar is a diamond in the rough.
4. A.J. Burnett, up 8 spots
His projection is already plenty good. But you saw how many games he won in 2013. Anything can happen.
5. Corey Kluber, up 8 spots
Most people were probably scratching their heads when they saw Kluber’s name listed above. Frankly, I’m in love with him, and it’s because he’s a stud with a great K/BB ratio. I understand why someone may be inclined to dismiss it as an aberration, but his swinging strike and contact rates are truly excellent. Even if they regress, he should be a draft-day target.

Here are the three starting pitchers with the biggest negative change.

1. Anibal Sanchez, down 10 spots
He’s great, but he also plays for a great team. Call it Max Scherzer syndrome. He carries as big a risk as any other player to pitch great but only win five or six games, as do the next two players.
2. Hisashi Iwakuma, down 6 spots
3. Zack Greinke, down 4 spots

Let me be clear that although I created a hypothetical scenario where wins didn’t exist, I don’t advocate for blindly drafting based on “stuff.” It’s important to acknowledge that certain players have a much better chance to win than others. Chris Sale of the Chicago White Sox could win 17 games just as easily as he could win seven. It’s about playing the odds — and unless a pitcher truly pitches terribly, don’t blame the so-called experts for your bad luck. He probably put his money where his mouth is, too, and is suffering along with you.

Here is a more comprehensive list of pitchers ranked by “stuff,” if that’s the way you sculpt your strategy:

1. Clayton Kershaw
3. Felix Hernandez
4. Max Scherzer
5. Cliff Lee
6. Yu Darvish
7. Chris Sale
8. Cole Hamels
9. Jose Fernandez
11. Stephen Strasburg
12. David Price
13. Justin Verlander
14. Alex Cobb
15. Homer Bailey
16. Mat Latos
17. Gerrit Cole
18. Michael Wacha
19. Anibal Sanchez
20. James Shields
21. Danny Salazar
23. A.J. Burnett
24. Corey Kluber
25. Brandon Beachy
26. Zack Greinke
27. Matt Cain
28. Sonny Gray
29. Hisashi Iwakuma
30. Gio Gonzalez
31. Doug Fister
32. Jordan Zimmermann
33. Alex Wood
34. Kris Medlen
35. Jeff Samardzija
36. Mike Minor
37. Jake Peavy
38. Kevin Gausman
39. Tyson Ross
40. Patrick Corbin
41. Lance Lynn
42. Francisco Liriano
43. Andrew Cashner
44. Ricky Nolasco
45. CC Sabathia
46. Hiroki Kuroda
47. Tim Lincecum
48. Tim Hudson
49. Jered Weaver
50. Shelby Miller
51. Clay Buchholz
52. Tony Cingrani
53. Matt Garza
54. John Lackey
55. Ubaldo Jimenez
56. Justin Masterson
57. Julio Teheran
58. R.A. Dickey
59. A.J. Griffin
60. Hyun-Jin Ryu
61. Dan Haren
62. Johnny Cueto
63. C.J. Wilson
64. Ian Kennedy
65. Chris Archer
66. Kyle Lohse
67. Scott Kazmir
68. Carlos Martinez
69. Jon Lester
70. Ervin Santana
71. Jose Quintana
72. Derek Holland
73. Garrett Richards
74. Dan Straily
75. Tyler Skaggs

# Early SP rankings for 2014

I wouldn’t say pitching is deep, but I’m surprised by the pitchers who didn’t make my top 60.

Note: I have deemed players highlighted in pink undervalued and worthy of re-rank. Do not be alarmed just yet by what you may perceive to be a low ranking.

# Late-season pitching fliers

I mentioned in an earlier post that a pitcher’s overall statistics matter less and less as we enter September. Milwaukee Brewers pitcher Marco Estrada, deterring owners with his 4.49 ERA, could be a legitimate top-45 starter for the rest of the season, fueled by his 7.25 K/BB ratio after the All-Star Break. (He should already be owned in most leagues anyway, given his 1.20 WHIP and 8.2 K/9, but that’s not the point.) There will always be free-agent alternatives, even to supposed aces such as the struggling Adam Wainwright of St. Louis, that can help your team down the home stretch.

(Note: I would never drop Wainwright in any context, nor would I bench him. But if I had someone like who struggled all year, such as the New York Yankees’ CC Sabathia, I would consider benching him to stream someone such as Estrada. Again, I digress.)

Here are three sneaky late-season pitcher pick-ups, ranked from my favorite to least favorite.

Tyson Ross, SD (12.3% ESPN ownership)
Ross continues to chug along. His ESPN ownership has been cut in half because of two starts that, honestly, were not even that bad (12-1/3 IP, 5.83 ERA, 1.38 WHIP, 8.8 K/9). His strikeout rate on the season stands at 8.6 K/9 across 96-1/3 innings, and although his sub-3.00 ERA won’t last forever, his 1.19 WHIP is good enough to help anyone down the stretch, and he’ll likely pick up another two or three wins along the way.

Brett Oberholtzer, HOU (4.5% ESPN ownership)
If you need help with WHIP and not strikeouts, take a chance on Oberholtzer. He plays for a terrible team, yes, but he’s already racked up four wins in six starts (nine appearances). Oberholtzer was never overly dominant in the minors, and he never made any top-100 prospect lists, but he refuses to walk batters, a trait I like in any pitcher. Again, his 5.6 K/9 is not the most appealing thing in the world, but limiting baserunners will help win ballgames no matter who you play for.

Danny Duffy, KC (19.3% ESPN ownership)
Duffy is the antithesis of Oberholtzer (which isn’t a bad thing). Once a rated prospect, Duffy dominated the minors until he debuted in 2011, when he promptly started to lose all control, seeing his BB/9 ratio balloon from less than three walks-per-nine to almost four and a half. But he’s back from Tommy John surgery, and he has recovered most of his minor-league strikeout rate that made him so effective (currently 9.5 K/9). His walk rate is still a point of concern, but his strikeouts and wins will be valuable, especially if he can continue to limit the damage.

If you’re still waiting for Sabathia or R.A. Dickey to come around, stop. I’d rather take a gamble on someone who likely won’t do any worse than a flailing name-brand starter but has the upside to be a stalwart addition to your rotation.

Bonus coverage: The Philadelphia Phillies recently signed Cuban defector Miguel Alfredo Gonzalez to a three-year, \$12 million deal. He was originally slated to sign a six-year, \$50-million(ish) contract before concerns arose about the health of his elbow. There isn’t a ton of information about the guy, but consider Major League Baseball’s two most recent Cuban imports.

Oakland Athletics outfielder Yoenis Cespedes finished second in the American League Rookie of the Year voting last year and was signed on a four-year, \$36 million deal (\$9 million per year). Los Angeles Dodgers phenom Yasiel Puig is a National League ROY candidate, and he signed a six-year, \$36 million deal (\$6 million) per year).

Although Gonzalez’s contract equates to only \$4 million per year, his original contract would have been at least \$8 million per year on average, in the ranks of Cespedes and Puig. Maybe it’s foolish to hop on the bandwagon based on this nugget of information alone, but considering a player’s salary is predicated upon expected performance value, I’m sold. I don’t know in which round I would draft him next year or how much I would pay for him in an auction draft, but I’d take a low-round flier on him and maybe gamble \$6 or so. And if he rears his head in the MLB come mid-September, I’ll take a look.