# Pitchers due for strikeout regression using PITCHf/x data

If FanGraphs were a home, or a hotel, or even a tent, I’d live there. I would swim in its oceans of data, lounge in its pools of metrics.

It houses a slew of PITCHf/x data — the numbers collected by the systems installed in all MLB ballparks that measure the frequency, velocity and movement of every pitch by every pitcher. It’s pretty astounding, but it’s also difficult for the untrainted eye to make something of the numbers aside from tracking the declining velocities of CC Sabathia‘s and Yovani Gallardo‘s fastballs.

I used linear regression to see how a pitcher’s contact, swinging strike and other measurable rates affect his strikeout percentage, and how that translates to strikeouts per inning (K/9). Ultimately, the model spits out a formula to generate an expected K/9 for a pitcher. I pulled data from FanGraphs comprised of all qualified pitchers from the last four years (2010 through 2013).

The idea is this: A pitcher who can miss more bats will strike out more batters. FanGraphs’ “Contact %” statistic illustrates this, where a lower contact rate is better. Similarly, a pitcher who can generate more swinging strikes (“SwStr %”) is more likely to strike out batters.

Using this theory coupled with the aforementioned data, I “corrected” the K/9 rates of all 2013 pitchers who notched at least 100 innings. Instead of detailing the full results, here are the largest differentials between expected and actual K/9 rates. (I will list only pitchers I deem fantasy relevant.)

Largest positive differential: Name — expected K/9 – actual K/9) = +/- change

1. Martin Perez — 7.77 – 6.08 = +1.69
2. Jarrod Parker — 7.74 – 6.12) = +1.62
3. Dan Straily — 8.63 – 7.33 = +1.30
4. Jered Weaver — 8.09 – 6.82 = +1.27
5. Hiroki Kuroda — 7.93 – 6.71 = +1.22
6. Kris Medlen — 8.38 –  7.17 = +1.21
7. Francisco Liriano — 10.31 – 9.11 = +1.20
8. Ervin Santana — 8.06 – 6.87 = +1.19
9. Ricky Nolasco — 8.47 – 7.45 = +1.02
10. Tim Hudson — 7.42 (6.51) | +0.91

Largest negative differential:

1. Tony Cingrani — 8.15 – 10.32 = -2.17
2. Ubaldo Jimenez — 7.68 – 9.56 = -1.88
3. Cliff Lee — 7.11 – 8.97 = -1.86
4. Jose Fernandez — 8.15 – 9.75 = -1.60
5. Shelby Miller — 7.20 – 8.78 = -1.58
6. Scott Kazmir — 7.71 – 9.23 = -1.52
7. Yu Darvish — 10.41 – 11.89 = -1.48
8. Lance Lynn — 7.58 – 8.84 = -1.26
9. Justin Masterson — 7.84 (9.09) | -1.25
10. Chris Tillman — 6.60 (7.81) | -1.21

There’s a lot to digest here, so I’ll break it down. It appears Perez was the unluckiest pitcher last year, of the ones who qualified for the study, notching almost 1.7 fewer strikeouts per nine innings than he would be expected to, given the rate of whiffs he induced. Conversely, rookie sensation Cingrani notched almost 2.2 more strikeouts per nine innings than expected.

There is a caveat. I was not able to account for facets of pitching such as a pitcher’s ability to hide the ball well, or his tendency to draw strikes-looking. With that said, a majority of the so-called lucky ones are pitchers who, in 2013, experienced a breakout (Cingrani, Fernandez, Miller, Darvish, Masterson, Tillman) or a renaissance (Jimenez, Kazmir, Masterson — woah, all Cleveland pitchers). Is it possible these pitchers can all repeat their performances — especially the ones who have disappointed us for years? Perhaps not.

(Update, Jan. 24: Cliff Lee’s mark of -1.86 is, amazingly, not unusual for him. Over the last four years, the average difference between his expected and actual K/9 rates is … drum roll … -1.88. Insane!)

Darvish and Liriano were in a league of their own in terms of inducing swings and misses, notching almost 30 percent each. (Anibal Sanchez was third-best with 27 percent. The average is about 21 percent.) However, Darvish recorded 2.78 more K/9 than Liriano. Is there any rhyme or reason to that? Darvish is, without much argument, the better pitcher — but is he that much better? I don’t think so. Darvish was expected to notch 10.41 K/9 given his contact rate. Any idea what his 2012 K/9 rate was? Incredibly: 10.40 K/9.

More big names produced equally interesting results. King Felix Hernandez recorded a career-best 9.51 K/9, but he was expected to produce something closer to 8.57 K/9. His rate the previous three years? 8.52 K/9.

Dan Haren didn’t produce much in the way of ERA in 2013, but he did see a much-needed spike in his strikeout rate, jumping above 8 K/9 for the first time since 2010. His expected 7.07 K/9 says otherwise, though, and it fits perfectly with how his K/9 rate was trending: 7.25 K/9 in 2011, 7.23 K/9 in 2012.

I think my models tend to exaggerate the more extreme results (most of which are noted in the lists above) because they could not account for intangibles in a player’s natural talent. However, they could prove to be excellent indicators of who’s due for regression.

Only time will tell. Maybe Jose Fernandez isn’t the elite pitcher we already think he is — not yet, at least.

————

Notes: The data almost replicates a normal distribution, with 98 of the 145 observations (67.6 percent) falling within one standard deviation (1.09 K/9) of the mean value (7.19 K/9), and 140 of 145 (96.6 percent) falling within two standard deviations. The median value is 7.27 K/9, indicating the distribution is very slightly skewed left.