Tagged: R.A. Dickey

Predicting pitchers’ walks using xBB%

The other day, I discussed predicting pitchers’ strikeout rates using xK%. I will conduct the same exercise today in regard to predicting walks. Using my best intuition, I want to see how well a pitcher’s walk rate (BB%) actually correlates with what his walk rate should be (expected BB%, henceforth “xBB%”). Similarly to xK%, I used my intuition to best identify reliable indicators of a pitcher’s true walk rate using readily available data.

An xBB% metric, like xK%, would not only if a pitcher perennially over-performs (or under-performs) his walk rate but also if he happened to do so on a given year. This article will conclude by looking at how the difference in actual and expected walk rates (BB – xBB%) varied between 2014 and career numbers, lending some insight into the (un)luckiness of each pitcher.

Courtesy of FanGraphs, I constructed another set of pitching data spanning 2010 through 2014. This time, I focused primarily on what I thought would correlate with walk rate: inability to pitch in the zone and inability to incur swings on pitches out of the zone. I also throw in first-pitch strike rate: I predict that counts that start with a ball are more likely to end in a walk than those that start with a strike. Because FanGraphs’ data measures ability rather than inability — “Zone%” measures how often a pitcher hits the zone; “O-Swing%” measures how often batters swing at pitches out of the zone; “F-Strike%” measures the rate of first-pitch strikes — each variable should have a negative coefficient attached to it.

I specify a handful of variations before deciding on a final version. Instead of using split-season data (that is, each pitcher’s individual seasons from 2010 to 2014) for qualified pitchers, I use aggregated statistics because the results better fit the data by a sizable margin. This surprised me because there were about half as many observations, but it’s also not surprising because each observation is, itself, a larger sample size than before.

At one point, I tried creating my own variable: looks (non-swings) at pitches out of the zone. I created a variable by finding the percentage of pitches out of the zone (1 – Zone%) and multiplied it by how often a batter refused to swing at them (1 – O-Swing%). This version of the model predicted a nice fit, but it was slightly worse than leaving the variables separated. Also, I ran separate-but-equal regressions for PITCHf/x data and FanGraphs’ own data. The PITCHf/x data appeared to be slightly more accurate, so I proceeded using them.

The graph plots actual walk rates versus expected walk rates. The regression yielded the following equation:

xBB% = .3766176 – .2103522*O-Swing%(pfx) – .1105723*Zone%(pfx) – .3062822*F-Strike%
R-squared = .6433

Again, R-squared indicates how well the model fits the data. An R-squared of .64 is not as exciting as the R-squared I got for xK%; it means the model predicts about 64 percent of the fit, and 36 percent is explained by things I haven’t included in the model. Certainly, more variables could help explain xBB%. I am already considering combining FanGraphs’ PITCHf/x data with some of Baseball Reference‘s data, which does a great job of keeping track of the number of 3-0 counts, four-pitch walks and so on.

And again, for the reader to use the equation above to his or her benefit, one would plug in the appropriate values for a player in a given season or time frame and determine his xBB%. Then one could compare the xBB% to single-season or career BB% to derive some kind of meaningful results. And (one more) again, I have already taken the liberty of doing this for you.

Instead of including every pitcher from the sample, I narrowed it down to only pitchers with at least three years’ worth of data in order to yield some kind of statistically significant results. (Note: a three-year sample is a small sample, but three individual samples of 160+ innings is large enough to produce some arguably robust results.) “Avg BB% – xBB%” (or “diff%”) takes the average of a pitcher’s difference between actual and expected walk rates from 2010 to 2014. It indicates how well (or poorly) he performs compared to his xBB%: the lower a number, the better. This time, I included “t-score”, which measures how reliable diff% is. The key value here is 1.96; anything greater than that means his diff% is reliable. (1.00 to 1.96 is somewhat reliable; anything less than 1.00 is very unreliable.) Again, this is slightly problematic because there are five observations (years) at most, but it’s the best and simplest usable indicator of simplicity.

Thus, Mark Buehrle, Mike Leake, Hiroki Kuroda, Doug Fister, Tim Hudson, Zack Greinke, Dan Haren and Bartolo Colon can all reasonably be expected to consistently out-perform their xBB% in any given year. Likewise, Aaron Harang, Colby Lewis, Ervin Santana and Mat Latos can all reasonably be expected to under-perform their xBB%. For everyone else, their diff% values don’t mean a whole lot. For example, R.A. Dickey‘s diff% of +0.03% doesn’t mean he’s more likely than someone else to pitch exactly as good as his xBB% predicts him to; in fact, his standard deviation (StdDev) of 0.93% indicates he’s less likely than just about anyone to do so. (What it really means is there is only a two-thirds chance his diff% will be between -0.90% and +0.96%.)

As with xK%, I compiled a list of fantasy-relevant starters with only two years’ worth of data that see sizable fluctuations between 2013 and 2014. Their data, at this point, is impossible (nay, ill-advised) to interpret now, but it is worth monitoring.

Name: [2013 diff%, 2014 diff%]

Miller is an interesting case: he was atrociously bad about gifting free passes in 2014, but his diff% was only marginally worse than it was in 2013. It’s possible that he was a smart buy-low for the braves — but it’s also possible that Miller not only perennially under-performs his xBB% but is also trending in the wrong direction.

Here are fantasy-relevant players with a) only 2014 data, and b) outlier diff% values:

I’m not gonna lie, I have no idea why Cobb, Corey Kluber and others show up as only having one year of data when they have two in the xK% dataset. This is something I noticed now. Their exclusion doesn’t fundamentally change the model’s fit whatsoever because it did not rely on split-season data; I’m just curious why it didn’t show up in FanGraphs’ leaderboards. Oh well.

Implications: Richards and Roark perhaps over-performed. Meanwhile, it’s possible that Odorizzi, Ross  and Ventura will improve (or regress) compared to last year. I’m excited about all of that. Richards will probably be pretty over-valued on draft day.

Advertisements

2014’s SP projections: the best and the worst

Maybe this is absurd, but I’ve never honestly checked the accuracy of my projections. It’s partly because I have placed a lot of trust in a computer that runs regressions with reliable data I have supplied, but it’s mostly because I originally started doing this for my own sake. I used to rely on ESPN’s projections, but the former journalist in me started to realize: it has a customer to please, and the customer may not be pleased, for example, if he sees Corey Kluber ranked in the Top 60 starting pitchers for 2014. (At this point, I am giving ESPN an out, given that everyone at FanGraphs and elsewhere knew the kind of upside he possessed.) Kluber is not the issue, however; the issue is that although ESPN (probably) wants to do its best, it also does not want to alienate its readers who, given its enormous audience, are more likely to be less statistically-inclined than FanGraphs’ faction of die-hards.

In sum: I started doing this because I no longer trusted projections put forth by popular media outlets.

So I didn’t really care how every single projection turned out. I wanted to find the players I thought were undervalued. For three years, it has largely worked in my rotisserie league. (Honestly, I am a complete mess when I enter a snake draft.)

Anyway. All of that is no longer. I quickly sampled 2014’s qualified pitchers — 88 in all — to investigate who panned out and who didn’t. I will ignore wins because they are pretty difficult to project with accuracy; I’m more concerned about ERA, WHIP and K’s.

Here is a nifty table that quickly summarizes what would have been tedious to transcribe. You will see a lot of repeat offenders, which should come as no surprise. At least there is some semblance of a pattern for the misses: I underestimated unknown quantities (and aces, who all decided to set the world ablaze in 2014) and overestimated guys in their decline. There isn’t much of a pattern to the guys I got right. Just thank mathematics and intuition for that.

Here would be a shortlist of my most accurate projections from last year, measured by me using the eye test:

Name: 2014 projected stats (actual stats)

Nathan Eovaldi: 5 W, 3.82 ERA, 1.32 WHIP, 6.4 K/9 (6 W, 4.37 ERA, 1.33 WHIP, 6/4 K/9)
R.A. Dickey: 11 W, 3.84 ERA, 1.24 WHIP, 7.2 K/9 (14 W, 3.71 ERA, 1.23 WHIP, 7.2 K/9)
Alex Cobb: 12 W, 3.49 ERA, 1.17 WHIP, 8.1 K/9 (10 W, 2.87 ERA, 1.14 WHIP, 8.1 K/9)
Hiroki Kuroda: 11 W, 3.60 ERA, 1.18 WHIP, 6.7 K/9 (11 W, 3.71 ERA, 1.14 WHIP, 6.6 K/9)
John Lackey: 10 W, 3.67 ERA, 1.25 WHIP, 7.5 K/9 (14 W, 3.82 ERA, 1.28 WHIP, 7.5 K/9)
Kyle Lohse: 9 W, 3.60 ERA, 1.17 WHIP, 6.1 K/9 (13 W, 3.54 ERA, 1.15 WHIP, 6.4 K/9)

If it brings consolation to the reader, I have since tightened the part of the projection system that predicts win totals. I’m not gonna lie, it was pretty primitive last year because I thought it’s already a crapshoot to begin with. Obviously, it shows, even in the small sample above. It’s still difficult given the volatility inherent in the category, but the formulas are now precise.

Time to panic? Pitcher edition, week 1

Should I panic? How can I even tackle this question right now? The breadth of pitchers who performed poorly so far is astonishing, so it’s understandable why you might want to not start the Philadelphia Phillies’ Cliff Lee in his next start or cut ties with Chicago White Sox closer Nate Jones all together. There are times you should panic, and there are times you should remain calm. I’m here to help you tell the difference.

Disclaimer: I get kind of annoyed when analysts waffle with guys, like, “well, I know he’s going to fall apart, but I’ll give him one more chance”. NO! You know he’s going to fall apart, but you’re giving yourself an out! I’m drawing a line in the sand, across this line YOU DO NOT — also, Dude, Chinaman is not the preferred nomenclature. … Wait, where was I? Anyway, I’m not letting myself off the hook. I am here to make the impulse decisions with (and maybe for) you, because sometimes, these impulse decisions make or break a season. Unfortunately, making them really early in the season is an absolutely horrifying experience.

Alex Cobb, SP (TB)
Dilemma: He was less than sharp, and although he gave up only five hits in five innings, he managed to walk more batters than he struck out (four to three). This is highly unlike Cobb, and that’s why I’m more inclined to think it was a case of first-start jitters rather than the beginning of a depressing trend.
Verdict: Don’t panic.

Homer Bailey, SP (CIN)
Dilemma: Lots of hits with as many walks as strikeouts. It was ugly, but he did face the Cardinals, which is no easy task. It’s hard to cut Bailey loose with how much you invested in him on draft day (outside of keeper leagues), but his breakout last year didn’t come out of nowhere, to which his second-half-of-2012 owners can attest. Unfortunately, he faces the Cardinals again in his next start. I’m not one to sit a guy early in the season, and I think it’s Bailey who will make adjustments the second time around, not the Cardinals.
Verdict: Don’t panic.

Stephen Strasburg, SP (WAS)
Dilemma: A 6.00 ERA?! Yeah, but 10 strikeouts in six innings and only a 1.167 WHIP. He got pretty unlucky, and that will happen from time to time. I would be more amped about the other batters he humiliated.
Verdict: Don’t panic.

CC Sabathia, SP (NYY)
Dilemma: Well, uh, he looked horrible. Against the Astros. It’s fine and dandy that he struck out a batter per innings and only walked one, but his fastball has become too hittable with that diminished velocity. I expect the trend to continue, and I think the solid strikeout total is the result of a free-swinging, hapless Astros offense. Remember, I said these are impulse decisions I’m making here. With a bevy of young pitching talent on waivers, I say…
Verdict: Panic.

C.J. Wilson, SP (LAA)
Dilemma: Kind of the same as Strasburg’s. High strikeouts and lots of hits sounds like an old wives’ tale about bad luck on balls in play that I’ve heard many a time. Wilson is not a second-tier starter anymore like he used to be, but he’s solid, and there’s no reason to fret.
Verdict: Don’t panic.

R.A. Dickey, SP (TOR)
Dilemma: Wow… Wow. Six walks. That hurts. I don’t know the first thing about throwing a knuckleball, and I’m sure if you have a bad day, it can be really be bad. But six walks? At least the strikeouts are there, but if your league is anything like any of mine, you probably got Dickey on the cheap. If I saw enticing performances by Seattle’s James Paxton or Toronto’s Drew Hutchison, I may cut ties, too. Surely no one else will touch him with a 10-foot pole until after his next start.
Verdict: Panic.

Corey Kluber, SP (CLE)
Dilemma: If you follow this website, you know how much I love Kluber, and how I preemptively purchased a five-year membership to the Society. Everything about the start is concerning, but I’m too proud to cut him loose. If you got him cheap, you can let him go and try your luck later. And I truly think he will break out; his peripherals were simply too good last year, and I don’t think you can fluke your way into talent like that. But perhaps I’m wrong…
Verdict: Don’t panic.

Cliff Lee, SP (PHI)
Dilemma: Wait, is this a serious question? Look, I know that sucked, but he’s freakin’ Cliff Lee. Calm down.
Verdict: Don’t panic.

Jonathan Papelbon, RP (PHI)
Dilemma: Dude, if you wanted to know what the end of the world would look like, this is it. Except in the form of a metaphor called Jonathan Papelbon.
Verdict: Panic.

Jim Johnson, RP (OAK)
Dilemma: I’ve expressed my distaste for Johnson before. He’s simply not good, and fantasy owners are blinded by two straight seasons of 50-plus saves. He would be lucky to save 35 this year without trouble; it looks like he may not get he chance to save 20 by the end of the week.
Verdict: Panic.

Nate Jones, RP (CHW)
Dilemma: The closer role was never a lock for him to keep. It looks like he agrees. Two hits, three walks and four earned runs without recording an out. Making Casper Wells look like a Cy Young candidate.
Verdict: Panic.

The role of luck in fantasy baseball

I apologize for being that guy that ruins that ooey gooey feeling you get when think about the fantasy league you won last year. As much as you want to think you are a fantasy master — perhaps even a fantasy god — you should acknowledge that you probably benefited from a good deal of luck. Sure, for your sake, I will admit you made a great pick with Max Scherzer in the fifth round. But did you, in all your mastery, predict he would win 21 games?

Don’t say yes. You didn’t. And frankly, you would be crazy to say he’ll do it again.

I focus primarily on pitching in this blog, and let it be known that pitchers are not exempt from luck in the realm of fantasy baseball. If you’re playing in a standard rotisserie league, you probably have a wins category. In a points league, you likely award points for wins.

Wins. Arguably the most arbitrary statistic in baseball. Let’s not have that discussion, though, and instead simply accept the win as it is. The win has the most drastic uncontrollable effect on a fantasy pitcher’s value. (ERA and WHIP experiences similar statistical fluctuations, but at least they aren’t arbitrary.)

I had an idea, but before I proceed, let me interject: if you’re drafting for wins, you’re doing it wrong. But, as I said, you can’t ignore wins.

But let’s say you did, and drafted strictly on talent, or “stuff” (which, here, factors in a pitcher’s durability). How would the top 30 pitchers change? Here’s my “stuff” list, which you can compare with the base projections:

  1. Clayton Kershaw
  2. Adam Wainwright
  3. Felix Hernandez
  4. Max Scherzer
  5. Cliff Lee
  6. Yu Darvish
  7. Chris Sale
  8. Cole Hamels
  9. Jose Fernandez
  10. Madison Bumgarner
  11. Stephen Strasburg
  12. David Price
  13. Justin Verlander
  14. Alex Cobb
  15. Homer Bailey
  16. Mat Latos
  17. Gerrit Cole
  18. Michael Wacha
  19. Anibal Sanchez
  20. James Shields
  21. Danny Salazar
  22. Marco Estrada
  23. A.J. Burnett
  24. Corey Kluber
  25. Brandon Beachy
  26. Zack Greinke
  27. Matt Cain
  28. Sonny Gray
  29. Hisashi Iwakuma
  30. Gio Gonzalez

Here are the five players with the biggest positive change and a breakdown of each:

  1. Brandon Beachy, up 23 spots
    His injury history has weakened his wins column projection. Consequently, the number of innings Beachy is expected to throw is significantly less than a full season. But if he managed to stay healthy for the full year (say, 200 innings)? He’s a top-1o pick based on pure stuff. If you draft with the philosophy that you can always find a viable replacement on waivers, Beachy could be your big sleeper.
  2. Marco Estrada, up 22 spots
    Estrada’s diminished expected wins is more a function of his terrible team than ability. Estrada has underperformed the past two years, Ricky Nolasco style, but if he can pull it together, he’s a top-30 pitcher based on “stuff.” And hey, maybe he can luck into some extra wins. However, if he can’t pull it together — Ricky Nolasco style — he’ll be relegated to fringe starter.
  3. Danny Salazar, up 9 spots
    Salazar has immense potential. His injury history led the Indians to cap his per-game pitch count last year, and that has been factored into his projection. But if he’s a full-time, 200-inning starter? He’s a top-25 starter with top-15 upside. Again, this is in terms of “stuff”. But is Ivan Nova better than Felix Hernandez because he can magically win more games? Of course not. Among a slew of young studs, including Jose Fernandez, Shelby Miller, Michael Wacha and so on, Salazar is a diamond in the rough.
  4. A.J. Burnett, up 8 spots
    His projection is already plenty good. But you saw how many games he won in 2013. Anything can happen.
  5. Corey Kluber, up 8 spots
    Most people were probably scratching their heads when they saw Kluber’s name listed above. Frankly, I’m in love with him, and it’s because he’s a stud with a great K/BB ratio. I understand why someone may be inclined to dismiss it as an aberration, but his swinging strike and contact rates are truly excellent. Even if they regress, he should be a draft-day target.

Here are the three starting pitchers with the biggest negative change.

  1. Anibal Sanchez, down 10 spots
    He’s great, but he also plays for a great team. Call it Max Scherzer syndrome. He carries as big a risk as any other player to pitch great but only win five or six games, as do the next two players.
  2. Hisashi Iwakuma, down 6 spots
  3. Zack Greinke, down 4 spots

Let me be clear that although I created a hypothetical scenario where wins didn’t exist, I don’t advocate for blindly drafting based on “stuff.” It’s important to acknowledge that certain players have a much better chance to win than others. Chris Sale of the Chicago White Sox could win 17 games just as easily as he could win seven. It’s about playing the odds — and unless a pitcher truly pitches terribly, don’t blame the so-called experts for your bad luck. He probably put his money where his mouth is, too, and is suffering along with you.

Here is a more comprehensive list of pitchers ranked by “stuff,” if that’s the way you sculpt your strategy:

  1. Clayton Kershaw
  2. Adam Wainwright
  3. Felix Hernandez
  4. Max Scherzer
  5. Cliff Lee
  6. Yu Darvish
  7. Chris Sale
  8. Cole Hamels
  9. Jose Fernandez
  10. Madison Bumgarner
  11. Stephen Strasburg
  12. David Price
  13. Justin Verlander
  14. Alex Cobb
  15. Homer Bailey
  16. Mat Latos
  17. Gerrit Cole
  18. Michael Wacha
  19. Anibal Sanchez
  20. James Shields
  21. Danny Salazar
  22. Marco Estrada
  23. A.J. Burnett
  24. Corey Kluber
  25. Brandon Beachy
  26. Zack Greinke
  27. Matt Cain
  28. Sonny Gray
  29. Hisashi Iwakuma
  30. Gio Gonzalez
  31. Doug Fister
  32. Jordan Zimmermann
  33. Alex Wood
  34. Kris Medlen
  35. Jeff Samardzija
  36. Mike Minor
  37. Jake Peavy
  38. Kevin Gausman
  39. Tyson Ross
  40. Patrick Corbin
  41. Lance Lynn
  42. Francisco Liriano
  43. Andrew Cashner
  44. Ricky Nolasco
  45. CC Sabathia
  46. Hiroki Kuroda
  47. Tim Lincecum
  48. Tim Hudson
  49. Jered Weaver
  50. Shelby Miller
  51. Clay Buchholz
  52. Tony Cingrani
  53. Matt Garza
  54. John Lackey
  55. Ubaldo Jimenez
  56. Justin Masterson
  57. Julio Teheran
  58. R.A. Dickey
  59. A.J. Griffin
  60. Hyun-Jin Ryu
  61. Dan Haren
  62. Johnny Cueto
  63. C.J. Wilson
  64. Ian Kennedy
  65. Chris Archer
  66. Kyle Lohse
  67. Scott Kazmir
  68. Carlos Martinez
  69. Jon Lester
  70. Ervin Santana
  71. Jose Quintana
  72. Derek Holland
  73. Garrett Richards
  74. Dan Straily
  75. Tyler Skaggs

Early SP rankings for 2014

I wouldn’t say pitching is deep, but I’m surprised by the pitchers who didn’t make my top 60.

Note: I have deemed players highlighted in pink undervalued and worthy of re-rank. Do not be alarmed just yet by what you may perceive to be a low ranking.

2014 STARTING PITCHERS

  1. Clayton Kershaw
  2. Adam Wainwright
  3. Max Scherzer
  4. Yu Darvish
  5. Felix Hernandez
  6. Cliff Lee
  7. Stephen Strasburg
  8. Jose Fernandez
  9. Cole Hamels
  10. Justin Verlander
  11. Anibal Sanchez
  12. Chris Sale
  13. Mat Latos
  14. Madison Bumgarner
  15. Alex Cobb
  16. Homer Bailey
  17. Gerrit Cole
  18. Zack Greinke
  19. David Price
  20. James Shields
  21. Jordan Zimmermann
  22. Michael Wacha
  23. Danny Salazar
  24. Jered Weaver
  25. A.J. Burnett *contingent on if he retires
  26. Kris Medlen
  27. Mike Minor
  28. Jake Peavy
  29. Corey Kluber
  30. Lance Lynn
  31. Matt Cain
  32. Hisashi Iwakuma
  33. CC Sabathia
  34. Gio Gonzalez
  35. Doug Fister
  36. Patrick Corbin
  37. Francisco Liriano
  38. Sonny Gray
  39. Ricky Nolasco
  40. Hiroki Kuroda
  41. Tim Hudson
  42. Marco Estrada
  43. Shelby Miller
  44. Trevor Rosenthal
  45. Tony Cingrani
  46. A.J. Griffin
  47. Brandon Beachy
  48. Tim Lincecum
  49. Clay Buchholz
  50. Ubaldo Jimenez
  51. Alex Wood
  52. Julio Teheran
  53. Tyson Ross
  54. Hyun-jin Ryu
  55. Matt Garza
  56. Andrew Cashner
  57. Johnny Cueto
  58. C.J. Wilson
  59. John Lackey
  60. Justin Masterson
  61. R.A. Dickey
  62. Kevin Gausman
  63. Jon Lester
  64. Dan Haren
  65. Ervin Santana
  66. Derek Holland
  67. Chris Archer
  68. Jeff Samardzija
  69. Bartolo Colon
  70. Ivan Nova
  71. Matt Moore
  72. Ian Kennedy
  73. Dan Straily
  74. Rick Porcello
  75. Jarrod Parker
  76. Carlos Martinez
  77. Jeremy Hellickson
  78. Kyle Lohse
  79. Scott Kazmir
  80. Jason Vargas
  81. Tommy Milone
  82. Wade Miley
  83. Dillon Gee
  84. Brandon Workman
  85. Chris Tillman
  86. Zack Wheeler
  87. Yovani Gallardo
  88. Miguel Gonzalez
  89. Jose Quintana
  90. Garrett Richards
  91. Robbie Erlin
  92. Felix Doubront
  93. Jhoulys Chacin
  94. Jonathon Niese
  95. Chris Capuano
  96. Nick Tepesch
  97. Alexi Ogando
  98. Bronson Arroyo
  99. Travis Wood
  100. Trevor Cahill
  101. Tyler Skaggs
  102. Randall Delgado
  103. Martin Perez
  104. Mike Leake
  105. Carlos Villanueva
  106. Todd Redmond
  107. Brandon Maurer
  108. Tyler Lyons
  109. Ryan Vogelsong
  110. Zach McAllister
  111. Wily Peralta
  112. Brett Oberholtzer
  113. Erik Johnson
  114. Jorge De La Rosa
  115. Paul Maholm
  116. Hector Santiago
  117. Burch Smith
  118. Jeff Locke
  119. Joe Kelly
  120. Jason Hammel
  121. Jake Odorizzi
  122. Danny Hultzen
  123. Anthony Ranaudo
  124. Archie Bradley
  125. Rafael Montero
  126. James Paxton
  127. Taijuan Walker
  128. Yordano Ventura