Predicting pitchers’ strikeouts using xK%

Expected strikeout rate, or what I will henceforth refer to as “xK%,” is exactly what it sounds like. I want to see if a pitcher’s strikeout rate actually reflects how he has pitched in terms of how often he’s in the zone, how often he causes batters to swing and miss, and so on. Ideally, it will help explain random fluctuations in a pitcher’s strikeout rate, because even strikeouts have some luck built into them, too.

An xK% metric is not a revolutionary idea. Mike Podhorzer over at FanGraphs created one last year, but he catered it to hitters. Still, it’s nothing too wild and crazy like WAR or SIERA or any other wacky acronym. (A wackronym, if you will.)

Courtesy of Baseball Reference, I constructed a set of pitching data spanning 2010 through 2014. I focused primarily on what I thought would correlate highly with strikeout rates: looking strikes, swinging strikes and foul-ball strikes, all as a percentage of total strikes thrown. I didn’t want the model specification to be too close to a definition, so it’s beneficial that these rates are on a per-strike, rather than per-pitch, basis.

The graph plots actual strikeout rates versus expected strikeout rates with the line of best fit running through it. I ran my regression using the specification above and produced the following equation:

xK% = -.6284293 + 1.195018*lookstr + 1.517088*swingstr + .9505775*foulstr
R-squared = .9026

The R-squared term can, for easy of understanding, be interpreted as how well the model fits the data, from 0 to 1. An R-squared, then, of .9026 represents approximately a 90-percent fit. In other words, these three variables are able to explain 90 percent of a strikeout rate. (The remaining 10 percent is, for now, a mystery!)

In order for the reader to use this equation to his or her own benefit, one would insert a pitcher’s looking strike, swinging strike and foul-ball strike percentages into the appropriate variables. Fortunately, I already took the initiative. I applied the results to the same data I used: all individual qualified seasons by starting pitchers from 2010 through 2014.

The results have interesting implications. Firstly, one can see how lucky or unlucky a pitcher was in a particular season. Secondly, and perhaps most importantly, one can easily identify which pitchers habitually over- and under-perform relative to their xK%. Lastly, you can see how each pitcher is trending over time. Every pitcher is different; although the formula will fit most ordinary pitchers, it goes without saying that the aces of your fantasy squad are far from ordinary, and they should be treated on an individual basis.

(Keep in mind that a lot of these players only have one or two years’ worth of data (as indicated by “# Years”), so the average difference between their xK% and K% as a representation of a pitcher’s true skill will be largely unreliable.)

It is immediately evident: the game’s best pitchers outperform their xK% by the largest margins. Cliff Lee, Stephen Strasburg, Clayton Kershaw, Felix Hernandez and Adam Wainwright are all top-10 (or at least top-15) fantasy starters. But let’s look at their numbers over the years, along with a few others at the top of the list.

Kershaw and King Felix have not only been consistent but also look like like they’re getting better with age. Wainwright’s difference between 2013 and 2014 is a bit of a concern; he’s getting older, and this could be a concrete indicator that perhaps the decline has officially begun. Darvish’s line is interesting, too: you may or may not remember that he had a massive spike in strikeouts in 2013 compared to his already-elite strikeout rate the prior year. As you can see, it was totally legit, at least according to xK%. But for some reason, even xK% can fluctuate wildly from year to year. I see it in the data, anecdotally: Anibal Sanchez‘s huge 6.7-percent spike in xK% from 2012 to 2013 was followed by a 5.5-percent drop from 2013 to 2014. Conversely, David Price‘s 5-percent decrease in xK% from 2012 to 2013 was followed by an almost perfectly-equal 5-percent increase from 2013 to 2014. So the phenomenon seems to work both ways. Thus, perhaps it shouldn’t have come as a surprise when Darvish couldn’t repeat his 2013 success. To the baseball world’s collective dismay, we simply didn’t have enough data yet to determine which Yu was the true Yu. I plan to do some research to see how often these severe spikes in xK% are mere aberrations versus how often they are sustained over time, indicating a legitimate skills improvement.

I have also done my best to compile a list of players with only one or two years’ worth of data who saw sizable spikes and drops in their K% minus xK% (“diff%”). The idea is to find players for whom we can’t really tell how much better (or worse) their actual K% is compared to their xK% because of conflicting data points. For example, will Corey Kluber be a guy who massively outperforms his xK% as he did in 2014, or does he only slightly outperform as he did in 2013? I present the list not to provide an answer but to posit: Which version of each of these players is more truthful? I guess we will know sometime in October.

Name: [2013 diff%, 2014 diff%]

And here some fantasy-relevant guys with only data from 2014:

A smorgasbord of fantasy baseball advice

Need a Streamer has been slow lately, to say the least. I’ve missed discussing a lot of player news and opportunities to provide good streaming picks. So I’m going to try something new, and maybe it’ll stick. It should be fairly explanatory. I hope it holds readers over until the end of this week, which is probably the busiest week for me in a long time.

Player to add that isn’t Gregory PolancoA.J. Pollock, ARI OF
He’s on the DL, so you’ve got time to pull the trigger. His batting average isn’t for real, but the 6 homers and 8 steals are nice, and he will more than likely join the small number of players who achieve double-digits in each category in a given year. I would expect a batting average closer to .265, but if you can punt average for counting stats in a deeper league, I would go for it.

Hitter to drop: Jay Bruce, CIN OF
Honorable mention goes to Brandon Phillips, Bruce’s teammate, but it is more fitting that the suggested replacement player can actually replace someone. Bruce is striking out about 5 percentage points more often than last year and almost 8 percentage points more than his career rate. Meanwhile, he is hitting more ground balls than fly balls, whereas about two-thirds of all of Bruce’s batted balls over his career have been put in the air. The sample size is quite large now, and I think there may be something wrong with the slugger. His ratio of home runs to fly balls (HR/FB) is a little bit deflated, but even if it returns to his career average, I still wouldn’t expect him to hit much more than 20 home runs, and that’s a serious problem for a guy who’s value lies solely in his power. Bruce is shaping up to be the next Curtis Granderson, and I have legitimate concerns about his current and future value.

Pitcher to add: Marcus Stroman, TOR
Stroman could quickly rise to the top as Toronto’s ace come 2015 if he lives up to his minor league numbers. So far, he has. I liked Stroman a lot as a prospect, as he averaged 10.6 strikeouts and only 2.4 walks per nine innings. He began the year in the bullpen and suffered a couple of brutal appearances in a row, so his two recent (and excellent) starts have improved his numbers to a still-shaky 5.40 ERA and 1.53 WHIP. But I think he’s a starter by trade, and his 13 strikeouts and two walks over 12 innings as a starter support such a claim. Your window to claim Stroman may stay open for a while, especially if other owners simply look at his misleading ERA and WHIP or, on ESPN, his average points, which stands at an underwhelming 3.3 per appearance. However, if he keeps flashing this kind of quality, you’ll start to run out of time.

Wednesday streamer, other than Stroman: Rubby De La Rosa, BOS
I’ll be honest, I’m not thrilled about him, but everyone has caught on to Tyson Ross (although he’s still only 73-percent owned), so tomorrow’s options are slim. De La Rosa comes with K’s but also BB’s; however, he carries a 13-to-2 K/BB ratio into this start on the road, so perhaps he can continue to keep the command issues under control.

Prospect(s) to watch: Joc Pederson, LAD OF, and Mookie Betts, BOS 2B
Pederson and Betts will likely not be up any time soon, as they’re blocked by some pretty large figures at their respective positions. But given the hype surrounding a couple of 2014’s call-ups in George Springer and, most recently, Gregory Polanco, it’s good to know who the next impact players will be. Pederson is batting .327/.437/.615 with 16 home runs and 14 steals. Are you serious? I think he’s a bit too far to reach a 40/40 season, but 30/30 is probably at this point. It’s unfortunate the Dodgers are letting him rot in the minors beneath a pile of unmovable cash in their impacted outfield. Betts recently moved up to Triple-A Pawtucket; prior to this move, he stole 22 bases in 285 plate appearances while batting .346 with almost twice as many walks as strikeouts. He’s going to be really good, with astounding plate discipline, decent speed and a little bit of pop, too. If you hear Pederson’s and Bett’s names, or the names of their predecessors (Yasiel Puig, Carl Crawford, Matt Kemp, Andre Ethier, Dustin Pedroia…), in next month’s trade talks, get ready to prospectively add, add, add.