Category: Metrics

Predicting HR/FB rates for hitters using weighted pitch values

(If you care only about results and not about the process, scroll down to the section aptly titled HERE YA GO.)

Victor Martinez indirectly and semi-strangely inspired this post. I was browsing FanGraphs’ weighted pitch values for hitters — something I hadn’t done before, as I’ve really only used the metric for pitchers — for 2014 and my thought process went something like that:

Jose Abreu feasted on fastballs; V-Mart feasted on sliders… Wow, V-Mart actually fared better against sliders and curveballs than fastballs and cutters. I wonder if that has any correlation with his plate discipline.

In short: no. A hitter’s success versus pitches according to weighted pitch values (per 100 of that pitch) determines about 40 percent of his walk rate and barely 4 percent of his strikeout rate. (I’m ballparking it on the K% figure.)

But I got to thinking a little more: these weighted pitch values have to be good for something other than scouting hitters (which, moving forward, maybe we starting throwing Abreu some more offspeed stuff? I don’t know).

Alas, I took a crack at it: I tested the correlation between weighted pitch values and home runs per fly ball (HR/FB) rates. And I was very pleasantly surprised.

Let’s start with context. Each player, very obviously, records his own HR/FB rate each year. Players with more power will record higher HR/FB rates, and players with less power will record lower rates. Therefore, each player, in a sense, creates his own benchmark (which, arguably, is his career HR/FB rate: he hits this many home runs as a percentage of fly balls on average). However, we know that HR/FB fluctuates annually: a player with a 15% career-HR/FB does not hit exactly 15 of every 100 fly balls over outfield walls every season like clockwork. Still, there is an expectation that he will hit a certain number of them out — hence, the benchmark.

Using regression analysis, the idea of the benchmark can be captured by seeing how, say, 2014’s HR/FB rates correlate with 2013’s rate, as well as 2012’s, 2011’s and so on. I downloaded all available ball-in-play data for seasons by “qualified” hitter as separate seasons dating back to 2002, thereby representing an exhaustive list. The line of best fit looks as follows, where L1 represents the year prior, L2 two years prior, L3 three years prior:

x(HR/FB) = .018 + .321*L1.(HR/FB) + .252*L2.(HR/FB) + .228*L3.(HR/FB)
Between R-squared: .74

One might astutely observe that a player who hit exactly zero home runs the three previous years can still be expected to hit about 1.8 percent of his fly balls over the wall, and one might call to arms to force the intercept term to zero. It seems absurd, nay, impossible that a player who never hits home runs could be expected to suddenly hit one, but let us not forget we witnessed the impossible happen just last year. That’s what makes baseball a beautiful sport: anything can happen.

Anyway, the equation above is actually really helpful in predicting expected HR/FB; its R-squared indicates the line explains almost three-quarters of the model’s fit. It also bestows the greatest significance to the most recent year as measured by its coefficient, with declining significance associated as years become further removed, which makes sense. But… BUT.

It’s not helpful in predicting HR/FB for hitters who have only been in the league fewer than three years. Moreover, it seems especially difficult to predict future HR/FB rates for hitters with only one year of data, such as the monstrous Abreu. (Maybe Abreu did inspire this post after all.) Observe:

x(HR/FB) = .032 + .694*L1(HR/FB)

After a little bit of algebra, we can intuit that the equilibrium HR/FB rate is roughly 10.4 percent. I use the term “equilibrium” because it appears that no matter what HR/FB a hitter posted in his first career season, his next-year HR/FB will be expected to converge (aka regress) toward the magical number of 10.4 percent. Again, observe:

.032 + .694*(12%) = 11.5%
.032 + .694*(8%) = 8.7%

You can perform this exercise with any value, and the results will be the same: a 2014 HR/FB rate lower than ~10.4 percent will be expected to increase in 2015, and a rate higher than ~10.4 percent will be expected to decrease in 2015. Now this, this, is actually absurd. Granted, the equation is communicating what would happen on average, but hitters are not homogeneous.

This is all a very long-winded way of saying two things:

1) When the sample is incredibly small — namely, one observation — using history as a guide fails us.
2) I think I may have found an alternative that relies not on a single year’s worth of HR/FB data but on a single year’s worth of weighted pitch value data.

HERE YA GO

Let me be clear, up front: I know there will be a lot of multicollinearity inherent in this analysis — that is, HR/FB and weighted pitch values are dependent on each other in some fashion. I don’t know how weighted pitch values are calculated exactly — it would behoove me to look it up, but I am lazy, a current self-descriptor of which I am not proud — but, intuitively, a hitter who hits home runs more frequently off of particular pitch types will likely record higher weighted values for those pitches. Essentially, the weighted values are calculated using home run frequency, and I am now trying to reverse-engineer it.

But I don’t see that as a bad thing. There is a profound correlative capability in the data, and using that information to glean whether or not a hitter was, perhaps, a bit lucky when it came to his HR/FB frequency is, I hope, less preposterous than pulling a number out of your rear-end.

HERE YA GO, FOR REALSIES

I will use strictly weighted pitch values per 100 pitches (denoted wXX/C, where XX represents the pitch abbreviation). I omit knuckleballs because not all players saw them, and I omit splitfingers because they are statistically insignificant, probably because they aren’t thrown very often, rendering the weighted pitch values more volatile. I also add K% and BABIP presuming the following: strikeout rates are positively correlated with HR/FB rates, and BABIP, which positively correlates with hard-hit balls such as line drives, is likely to also positively correlate with similarly-hard-hit balls such as home runs. (A regression that includes only weighted pitch values and excludes K% and BABIP produces an adjusted R-squared of .45.) The line of best fit equation is as follows:

x(HR/FB) = .2049 + .0352*(wFB/C) + .0081*(wSL/C) + .0014*(wCT/C) + .0041*(wCB/C) + .0063*(wCH/C) + .5244*K% — .6706*BABIP
Adjusted R-squared: .75

Again, the model produces a great line of best fit per its R-squared — almost identical to its lagged-variable counterpart. As it should; if there’s multicollinearity, it should. (And there is.) But reverse-engineering the process should create accurate predictions of what should have been a hitter’s HR/FB rate in a given season because of the multicollinearity; in this instance, it’s not a bad thing.

Some trends emerge instantly, trends similar to those I saw in the xK% and xBB% studies I performed earlier: regardless of a player’s power potential, he will over-perform or under-perform his expected HR/FB rate, and he will do so with consistency. For example, Adam LaRoche, despite his apparent power stroke, consistently under-performs his xHR/FB:

HR/FB: actual minus expected
2010: -1.89%
2012: -1.87%
2013: -1.77%
2014: -0.75%

Meanwhile, Albert Pujols consistently out-performs his xHR/FB:

2010: +2.02%
2011: +5.67%
2012: +1.80%
2014: +2.13%

Each data set has its noise, but you can see based on these limited samples where each hitter experienced a bit of luck: LaRoche, in 2014, saw a minor spike, and Pujols saw a major spike in 2011.

Rather than going through each player individually, I will highlight a few extreme, fantasy-relevant outliers from 2014 and reflect accordingly. Without further adieu (and in alphabetical order by first name):

Adam Eaton, -8.03%
This is the largest negative differential in the 2014 data. Without another full season of data to compare, this huge difference is likely a sign of bad luck, although there is a chance that he is a severe under-performer in the same vein as Matt Carpenter (who has under-performed his xHR/FB by about 7 percent the past two years). I already liked the guy for his speed and control of the strike zone, and the prospect of a pending power spike is enticing.

Coco Crisp, -5.78%
Crisp is a great case study: he notched a career-high 12.4-percent HR/FB in 2013, then promptly slid back down to single digits in 2014. His 2014 xHR/FB, however, indicates his HR/FB should have been closer to 11.5 percent, almost 6 percent higher than his actual mark and only 1.2 percent less than 2013. Meanwhile, his 2012 and 2013 expected and actual HR/FB rates are almost identical. His power-speed combination was pretty valuable two years ago — when he wasn’t on the disabled list, at least.

Curtis Granderson, -5.92%
Granderson bottomed out in woeful aplomb last year, but his xHR/FB offers a glimmer of hope. I’ll be honest, though, I can’t remember the last time this guy was fantasy relevant. But if you’re looking for sneaky power at the expense of everything ever, he could be your guy.

Giancarlo Stanton, +5.33%
The Artist Formerly Known as Mike posted positive differentials in 2011 and 2013, but each was one-half and one-third the magnitude of last year’s differential. His 2013 and 2014 xHR/FBs are practically identical — 20.16% and 20.17% — so it looks like Stanton chose a good year to get a little bit lucky.

Jason Heyward, -5.56%
Speaking of bottoming out, Heyward’s power all but evaporated last year. Fear not, however, as his 2014 xHR/FB is only 4 percentage points less than 2013’s — which still sucks, but at least it’s not as bad as a whopping 10 percentage points. It’s probably too obvious to count on a comeback, but no matter.

Jason Kipnis, -4.39%
His year-by-year differentials: -0.01%, -2.61%, -4.39%. His year-by-year xHR/FB: 9.71%, 15.01%, 9.19%. I don’t know what to believe, really, because it’s hard to tell what’s real here and what’s not. But, again, here ye beholdeth another bounceback candidate.

Jonathan Lucroy, -3.77%
His 2014 xHR/FB was a percentage point better than 2013’s. The dude is too good.

Jose Abreu, +8.52%
Now this man, THIS MAN, is the real reason why we’re all here. What can we make of that? We know that prodigious power hitters such as Pujols and Stanton can exceed expectations. But this expectation is set pretty high. I think we’re all expecting regression, but it’s everyone’s best guess as to how much. I’m thinking a drop from 27-ish percent closer to a Chris Davis-esque 22 percent.

Lucas Duda, -3.38%
I don’t have any other reliable full-season data for Duda to compare, but at least it wasn’t a positive differential. The negative implies that last year’s breakout was probably legit — and maybe there’s still room for improvement.

Matt Adams, -3.49%
Similarly to Duda, Adams’ only full season came last year. But the mammoth power we saw in 2013 didn’t disappear as much as it did suffer some bad luck. His 2014 xHR/FB of 12.19 percent still isn’t where any of us would like it to be, but again, maybe there’s still room for improvement.

Matt Holliday, -3.09%
Holliday, who perennially out-performs his xHR/FB, appears to have gotten pretty unlucky last year. Of the last five years (dating back to 2010), 2014’s xHR/FB was right in the middle. I know he’s getting old, but man, he’s a monster, and I think there’s juice still in the tank.

Nick Castellanos, -5.20%
Might be a little more pop in that bat than we know.

Nori Aoki, -6.08%
His power simply vanished, but the xHR/FB is in line with past years. He could return to his 10-HR, 25-SB ways in short order.

Robinson Cano, +2.33%
This is my absolutely favorite result in the entire 2014 data set. Cano always out-performs his xHR/FB; that part does not concern me. It’s the xHR/FB itself: it dropped off almost 7 percent from 2013 to 2014. Seven percent! Say what you will about Safeco Field sapping power, but methinks a larger share of that 7 percent is a 32-year-old man in decline.

Xander Bogaerts, -3.88%
See Castellanos, Nick.

Yasiel Puig, -4.58%
Remember how Puig hit way fewer home runs last year and all that stuff? Hey, I traded him midseason (he will cost only $13 next year, but I won my league so it all works out) for Carlos Gomez and a closer. In the moment, I think I made the right move: Puig’s home run rate never really improved. But his 2013 differential was +5.24%. Cutting the crap, his 2013 and 2014 xHR/FB rates were 16.56% and 15.68%, respectively — smack-dab in the middle of both years. Thus, taking the average of the two may not be such a bad method for projection after all.

OK, that’s everything. The players listed above were merely a sample and are by no means exhaustive when it comes to the peculiar splits I saw. More importantly, the implications are most interesting where they are hardest to draw: players such as Abreu and Eaton very clearly seem to have benefited (and suffered) at the hands of luck, and we can surely expect regression. But… how much? ‘Tis the question of the day, my friends.

Edit (1/8/15, 11:42 am): FanGraphs’ Mike Podhorzer, who coincidentally posted a xHR/FB metric for pitchers today, developed a similar metric for hitters a while back, to the tune of a .65 adjusted R-squared. I feel pretty good about my work now.

Predicting pitchers’ walks using xBB%

The other day, I discussed predicting pitchers’ strikeout rates using xK%. I will conduct the same exercise today in regard to predicting walks. Using my best intuition, I want to see how well a pitcher’s walk rate (BB%) actually correlates with what his walk rate should be (expected BB%, henceforth “xBB%”). Similarly to xK%, I used my intuition to best identify reliable indicators of a pitcher’s true walk rate using readily available data.

An xBB% metric, like xK%, would not only if a pitcher perennially over-performs (or under-performs) his walk rate but also if he happened to do so on a given year. This article will conclude by looking at how the difference in actual and expected walk rates (BB – xBB%) varied between 2014 and career numbers, lending some insight into the (un)luckiness of each pitcher.

Courtesy of FanGraphs, I constructed another set of pitching data spanning 2010 through 2014. This time, I focused primarily on what I thought would correlate with walk rate: inability to pitch in the zone and inability to incur swings on pitches out of the zone. I also throw in first-pitch strike rate: I predict that counts that start with a ball are more likely to end in a walk than those that start with a strike. Because FanGraphs’ data measures ability rather than inability — “Zone%” measures how often a pitcher hits the zone; “O-Swing%” measures how often batters swing at pitches out of the zone; “F-Strike%” measures the rate of first-pitch strikes — each variable should have a negative coefficient attached to it.

I specify a handful of variations before deciding on a final version. Instead of using split-season data (that is, each pitcher’s individual seasons from 2010 to 2014) for qualified pitchers, I use aggregated statistics because the results better fit the data by a sizable margin. This surprised me because there were about half as many observations, but it’s also not surprising because each observation is, itself, a larger sample size than before.

At one point, I tried creating my own variable: looks (non-swings) at pitches out of the zone. I created a variable by finding the percentage of pitches out of the zone (1 – Zone%) and multiplied it by how often a batter refused to swing at them (1 – O-Swing%). This version of the model predicted a nice fit, but it was slightly worse than leaving the variables separated. Also, I ran separate-but-equal regressions for PITCHf/x data and FanGraphs’ own data. The PITCHf/x data appeared to be slightly more accurate, so I proceeded using them.

The graph plots actual walk rates versus expected walk rates. The regression yielded the following equation:

xBB% = .3766176 – .2103522*O-Swing%(pfx) – .1105723*Zone%(pfx) – .3062822*F-Strike%
R-squared = .6433

Again, R-squared indicates how well the model fits the data. An R-squared of .64 is not as exciting as the R-squared I got for xK%; it means the model predicts about 64 percent of the fit, and 36 percent is explained by things I haven’t included in the model. Certainly, more variables could help explain xBB%. I am already considering combining FanGraphs’ PITCHf/x data with some of Baseball Reference‘s data, which does a great job of keeping track of the number of 3-0 counts, four-pitch walks and so on.

And again, for the reader to use the equation above to his or her benefit, one would plug in the appropriate values for a player in a given season or time frame and determine his xBB%. Then one could compare the xBB% to single-season or career BB% to derive some kind of meaningful results. And (one more) again, I have already taken the liberty of doing this for you.

Instead of including every pitcher from the sample, I narrowed it down to only pitchers with at least three years’ worth of data in order to yield some kind of statistically significant results. (Note: a three-year sample is a small sample, but three individual samples of 160+ innings is large enough to produce some arguably robust results.) “Avg BB% – xBB%” (or “diff%”) takes the average of a pitcher’s difference between actual and expected walk rates from 2010 to 2014. It indicates how well (or poorly) he performs compared to his xBB%: the lower a number, the better. This time, I included “t-score”, which measures how reliable diff% is. The key value here is 1.96; anything greater than that means his diff% is reliable. (1.00 to 1.96 is somewhat reliable; anything less than 1.00 is very unreliable.) Again, this is slightly problematic because there are five observations (years) at most, but it’s the best and simplest usable indicator of simplicity.

Thus, Mark Buehrle, Mike Leake, Hiroki Kuroda, Doug Fister, Tim Hudson, Zack Greinke, Dan Haren and Bartolo Colon can all reasonably be expected to consistently out-perform their xBB% in any given year. Likewise, Aaron Harang, Colby Lewis, Ervin Santana and Mat Latos can all reasonably be expected to under-perform their xBB%. For everyone else, their diff% values don’t mean a whole lot. For example, R.A. Dickey‘s diff% of +0.03% doesn’t mean he’s more likely than someone else to pitch exactly as good as his xBB% predicts him to; in fact, his standard deviation (StdDev) of 0.93% indicates he’s less likely than just about anyone to do so. (What it really means is there is only a two-thirds chance his diff% will be between -0.90% and +0.96%.)

As with xK%, I compiled a list of fantasy-relevant starters with only two years’ worth of data that see sizable fluctuations between 2013 and 2014. Their data, at this point, is impossible (nay, ill-advised) to interpret now, but it is worth monitoring.

Name: [2013 diff%, 2014 diff%]

Miller is an interesting case: he was atrociously bad about gifting free passes in 2014, but his diff% was only marginally worse than it was in 2013. It’s possible that he was a smart buy-low for the braves — but it’s also possible that Miller not only perennially under-performs his xBB% but is also trending in the wrong direction.

Here are fantasy-relevant players with a) only 2014 data, and b) outlier diff% values:

I’m not gonna lie, I have no idea why Cobb, Corey Kluber and others show up as only having one year of data when they have two in the xK% dataset. This is something I noticed now. Their exclusion doesn’t fundamentally change the model’s fit whatsoever because it did not rely on split-season data; I’m just curious why it didn’t show up in FanGraphs’ leaderboards. Oh well.

Implications: Richards and Roark perhaps over-performed. Meanwhile, it’s possible that Odorizzi, Ross  and Ventura will improve (or regress) compared to last year. I’m excited about all of that. Richards will probably be pretty over-valued on draft day.

Predicting pitchers’ strikeouts using xK%

Expected strikeout rate, or what I will henceforth refer to as “xK%,” is exactly what it sounds like. I want to see if a pitcher’s strikeout rate actually reflects how he has pitched in terms of how often he’s in the zone, how often he causes batters to swing and miss, and so on. Ideally, it will help explain random fluctuations in a pitcher’s strikeout rate, because even strikeouts have some luck built into them, too.

An xK% metric is not a revolutionary idea. Mike Podhorzer over at FanGraphs created one last year, but he catered it to hitters. Still, it’s nothing too wild and crazy like WAR or SIERA or any other wacky acronym. (A wackronym, if you will.)

Courtesy of Baseball Reference, I constructed a set of pitching data spanning 2010 through 2014. I focused primarily on what I thought would correlate highly with strikeout rates: looking strikes, swinging strikes and foul-ball strikes, all as a percentage of total strikes thrown. I didn’t want the model specification to be too close to a definition, so it’s beneficial that these rates are on a per-strike, rather than per-pitch, basis.

The graph plots actual strikeout rates versus expected strikeout rates with the line of best fit running through it. I ran my regression using the specification above and produced the following equation:

xK% = -.6284293 + 1.195018*lookstr + 1.517088*swingstr + .9505775*foulstr
R-squared = .9026

The R-squared term can, for easy of understanding, be interpreted as how well the model fits the data, from 0 to 1. An R-squared, then, of .9026 represents approximately a 90-percent fit. In other words, these three variables are able to explain 90 percent of a strikeout rate. (The remaining 10 percent is, for now, a mystery!)

In order for the reader to use this equation to his or her own benefit, one would insert a pitcher’s looking strike, swinging strike and foul-ball strike percentages into the appropriate variables. Fortunately, I already took the initiative. I applied the results to the same data I used: all individual qualified seasons by starting pitchers from 2010 through 2014.

The results have interesting implications. Firstly, one can see how lucky or unlucky a pitcher was in a particular season. Secondly, and perhaps most importantly, one can easily identify which pitchers habitually over- and under-perform relative to their xK%. Lastly, you can see how each pitcher is trending over time. Every pitcher is different; although the formula will fit most ordinary pitchers, it goes without saying that the aces of your fantasy squad are far from ordinary, and they should be treated on an individual basis.

(Keep in mind that a lot of these players only have one or two years’ worth of data (as indicated by “# Years”), so the average difference between their xK% and K% as a representation of a pitcher’s true skill will be largely unreliable.)

It is immediately evident: the game’s best pitchers outperform their xK% by the largest margins. Cliff Lee, Stephen Strasburg, Clayton Kershaw, Felix Hernandez and Adam Wainwright are all top-10 (or at least top-15) fantasy starters. But let’s look at their numbers over the years, along with a few others at the top of the list.

Kershaw and King Felix have not only been consistent but also look like like they’re getting better with age. Wainwright’s difference between 2013 and 2014 is a bit of a concern; he’s getting older, and this could be a concrete indicator that perhaps the decline has officially begun. Darvish’s line is interesting, too: you may or may not remember that he had a massive spike in strikeouts in 2013 compared to his already-elite strikeout rate the prior year. As you can see, it was totally legit, at least according to xK%. But for some reason, even xK% can fluctuate wildly from year to year. I see it in the data, anecdotally: Anibal Sanchez‘s huge 6.7-percent spike in xK% from 2012 to 2013 was followed by a 5.5-percent drop from 2013 to 2014. Conversely, David Price‘s 5-percent decrease in xK% from 2012 to 2013 was followed by an almost perfectly-equal 5-percent increase from 2013 to 2014. So the phenomenon seems to work both ways. Thus, perhaps it shouldn’t have come as a surprise when Darvish couldn’t repeat his 2013 success. To the baseball world’s collective dismay, we simply didn’t have enough data yet to determine which Yu was the true Yu. I plan to do some research to see how often these severe spikes in xK% are mere aberrations versus how often they are sustained over time, indicating a legitimate skills improvement.

I have also done my best to compile a list of players with only one or two years’ worth of data who saw sizable spikes and drops in their K% minus xK% (“diff%”). The idea is to find players for whom we can’t really tell how much better (or worse) their actual K% is compared to their xK% because of conflicting data points. For example, will Corey Kluber be a guy who massively outperforms his xK% as he did in 2014, or does he only slightly outperform as he did in 2013? I present the list not to provide an answer but to posit: Which version of each of these players is more truthful? I guess we will know sometime in October.

Name: [2013 diff%, 2014 diff%]

And here some fantasy-relevant guys with only data from 2014:

On Dee Gordon’s breakout, and what to expect rest-of-season

Let’s be honest: Did anyone see Los Angeles Dodgers shortstop Dee Gordon‘s breakout coming? No. Not one person. It was fair to say he could hold his own, maybe fight off Cuban import Alexander Guerrero for a month or two. But Gordon, who hit .229 across 2012 and 2013, did not give really any indication that he’d be this valuable.

So I want to amend the question. Rather than did anyone see it coming, could anyone see it coming? Perhaps the answer is yes.

His first year in the majors was, by most measures, pretty successful. A 23-year-old Gordon batted .304 with 24 stolen bases in 56 games. It’s no wonder why people have hoped for Gordon to break out and have been wildly disappointed in his failure to do so. Leading up to 2014, his strikeout rate skyrocketed from 11.6 to 19.8 percent, and his low batting average on balls in play (BABIP) relative to other speedsters coupled with an absolute lack of power made for poor batting and on-base rates.

Fast-forward to 2014, and Gordon has shaved his strikeout rate by 4.5 percent, a huge margin. Meanwhile, his BABIP is way up — at .378, I can tell you without looking that it’s one of the highest in Major League Baseball. Thing is, he’s a guy with enough speed and to make it work, especially if he keeps racking up hits on bunts and balls in the infield. When I say “make it work,” though, I simply mean he will maintain an above-average BABIP, maybe in the .325-.335 range, rather than stay lofted in the .370s.

Meanwhile, the steals… Oh, man, the steals. They are legit, people. It’s hard to believe that he’s stealing in almost half of his opportunities, but he is. I thought, maybe the guy is getting lucky with the number of stolen base opportunities relative to all other baserunners. According to Baseball Reference, the average baserunner has the next base available to him about 37 percent of the time. So Gordon must have, like, a rate north of 40 or even 50 percent, right? Nay, squire — Gordon has had an open base before him only 33 percent of the time.

I want to do two things, now: predict Gordon’s end-of-season stats, and predict his rest-of-season stats. Without further ado:

Revised end-of-season 2014 projection: .276/.316/.352, 82 R, 2 HR, 42 RBI, 81 SB (156 games)

That’s right, folks. This is the Billy Hamilton you were looking for. It’s important to note that I project him for 156 games, but there’s a possibility that if he falls into a deep funk, Guerrero could usurp Gordon’s role. Worse, Guerrero could do so before a slump even hits, given the $28 million the Dodgers are now watching waste away in Triple-A.

As much as it is important to see Gordon’s end-of-year stat line, it’s the rest-of-year stats that truly matter most, especially if you’re trying to decide whether to sell high on the guy or simply hang tight.

Rest-of-season projection: .261/.300/.326, 58 R, 1 HR, 31 RBI, 57 SB (118 games)

Bottom line: he’s worth his weight in gold based solely on his steals. But a .261 batting average and .300 on-base percentage don’t bode especially well for his high runs tally as well as the frequency at which he will be able to steal bases. With almost a steal every other game, though, you’re nitpicking if you are complaining about a few percentage points of OBP affected his steals.

Although I just trivialized OBP, it is worth monitoring his decline, because it will happen — trust me. Dee Gordon is not a .322 hitter, let alone a .300 hitter. He may be able to luck his way to a nice batting average, though, with a few more bunt base-hits here and there.

Overall, though, he is still not a great hitter and doesn’t get on base as much as you’d like to make him much more than a one-category player. If you’ve already staked yourself to a massive lead in steals, I’d sell high — although when I say high, I mean really high. Fifty-seven swipes in a rotisserie league is incredibly valuable. My main roto league has 80 percent more home runs than steals — that is, a home run is worth about 55 percent of every steal. Now, that’s not to say Gordon is worth a guy who can hit 103 home runs, because 1) that’s impossible, 2) Gordon simply doesn’t contribute in many other categories other than maybe runs, and 3) he’s no guarantee to finish out the year at second base. But you could probably get a really solid, well-rounded top pick (read: top-50 player) for Gordon in a trade today — maybe better.

When it’s all said and done, I think Gordon could finish as high as top-30 on the ESPN Player Rater if he can last an entire season with Guerrero breathing down his neck. And I would hold on to him until I observe a sizable downward trend in his on-base abilities midseason.

Pitchers to sell high, buy low or cut bait

All right. It’s April. It’s horrifying, unless you’re doing well, and then it’s not. But, full disclosure, I’m not. Chicago White Sox staff ace Chris Sale just hit the 15-day disabled list yesterday, joining the Philadelphia Phillies’ Cole Hamels, Seattle Mariners’ James Paxton, Tampa Bay Rays’ Alex Cobb, Cincinnati Reds’ Mat Latos, New York Yankees’ David Robertson and the Detroit Tigers’ Doug Fister on my teams’ DLs. It’s killing me, really. It’s incredibly painful.

What I’m saying is I’ve spent more time than I’d care to admit frolicking in free agency, trying to figure out which early-season studs are legit or not. I’ve been pondering various buy-low situations as well. So I jumped into a pool of peripherals and PITCHf/x data to look for answers.

The list below is not remotely exhaustive. It’s mostly players I am watching or already using as replacements for my teams. Here they are, in no particular order.

Jake Peavy, BOS | 0-0, 3.33 ERA, 1.48 WHIP, 9.25 K/9
Peavy’s prime came and went about five years ago, so, full disclosure, I don’t know as much about him off the top of my head as I should. But I do know one thing: he doesn’t strike out a batter per inning anymore. In his defense, batters’ contact rate against him is the best it has been since 2009, his last truly good year. So maybe he will strike out a few more batters than last year, but I think it’ll be closer to 2012’s 7.97 K/9, not 2009’s 9.74 K/9. The WHIP is atrocious;  the walk rate is through the roof. If there’s a guy in your league who will pay for what will end up being the illusion of ERA and strikeouts, by all means, trade him. He’s owned in 100 percent of leagues but doesn’t deserve to be.
Verdict: Sell high

John Lackey, BOS | 2-2, 5.25 ERA, 1.46 WHIP, 8.63 K/9
Another Boston pitcher, another bad start to the season. I like Lackey a lot more, though, for a variety of reasons. One, last year’s renaissance was legitimate. Two, he’s not walking many batters right now, so his unspectacular ratios are more a result of an unlucky batting average on balls in play (.333 BABIP) than incompetence. Three, his swinging strike and contact rates are currently career bests. Again, we’re working with small sample sizes here, and this could easily regress. But considering his velocity is also at a career high, I don’t find it improbable that Lackey actually does better than he did last season. If an owner in your league has already dropped him, put in your waiver claim now.
Verdict: Buy low

Jesse Chavez, OAK | 1-0, 1.38 ERA, 0.92 WHIP, 9.69 K/9
Talk about unexpected. Chavez, who has been relevant about zero times, is making for an intriguing play in all leagues. It’s a given he will regress, especially considering the .242 BABIP, but his improved walk rate could be here to stay, as he is pounding the zone more than he ever has in his career. The strikeouts are somewhat of a mirage, but it looks like he can be a low-WHIP, moderate-strikeout guy, and that’s still valuable.
Verdict: Sell really high, or just ride the hot hand

Nathan Eovaldi, MIA | 1-1, 3.55 ERA, 1.14 WHIP, 8.17 K/9
I wouldn’t call Eovaldi a trendy sleeper, but he certainly was a sleeper coming into 2014. It was all about whether he could command his pitchers better — and, like magic, it appears he has, walking only 1.07 batters per nine innings as opposed to 3.39-per-nine last year. The swinging strike and contact rates are concerning, as they are the lowest of his career, so it’s hard to see his strikeout rate going anywhere but down. However, he’s throwing 65 percent of his pitches in the strike zone, highest of all qualified pitchers. So there are two ways to look at this. His control has probably legitimate improved. Unfortunately, even the masterful Cliff Lee only threw 53.3 percent of pitches in the zone last year, and I am hesitant to claim Eovaldi has better control than Lee. This could be a “breakout” year of sorts for Eovaldi, but I’m using that term liberally here. He’s only owned in 20.5 percent of leagues, so this makes him more of a ride-the-hot-hand type, like Mr. Chavez above.
Verdict: Eventually drop, ideally before he does damage to your team

Mark Buehrle, TOR | 4-0, 0.64 ERA, 0.93 WHIP, 6.11 K/9
Look, I have had a long-standing man crush on Buehrle, but this is ridiculous. You know better than I that these happy dreams will soon become nightmares, not because Buehrle is awful or anything, but because regression rears its head in occasionally very brutal ways.
Verdict: Sell high

Alfredo Simon, CIN | 0.86 ERA, 0.81 WHIP, 5.57 K/9
Something isn’t right here. A 0.81 WHIP and… fewer than six strikeouts per nine innings? As you become more familiar with sabermetrics, you quickly realize certain things don’t mesh. A low WHIP combined with the low strikeout rate is one of those things. I can tell you without looking that his BABIP is impossibly low — and, now looking, I see I’m right: it’s .197. Tristan H. Cockcroft of ESPN is all about Simon, and in his defense, Simon’s PITCHf/x data foreshadows some positive regression coming his way in the strikeout department. But it can only get worse from here for Simon. However, I think he has a bit of a Dan Straily look to him, and that’s certainly serviceable.
Verdict: Sell high, or just ride the hot hand

Yovani Gallardo, MIL | 1.46 ERA, 1.09 WHIP, 6.93 K/9
This is a disaster waiting to happen. Like Simon, his strikeout rate is low, but for Gallardo, it is deservedly so: his swinging strike and contact rates are, by far, career worsts. Meanwhile, his ratios are buoyed by a .264 BABIP and 89.8% LOB% (left-on-base percentage), despite his 74.7% career LOB%. The Brewers will fall with him. Sell high, and sell fast.
Verdict: Sell high

Shelby Miller, STL | 3.57 ERA, 1.50 WHIP, 8.34 K/9
Miller is the first pitcher on this list in whom owners actually invested a lot. Be patient. The 98.3-percent of owners who didn’t cut bait before his last start were surely rewarded. I imagine he’s leaving his pitches up in the zone, given his increased percentage of pitches thrown in the zone coupled with his home run rate. Speaking of which, he shouldn’t be walking five batters per nine innings when he’s throwing more than 50 percent of his pitches in the zone. He’ll be fine.
Verdict: Buy low

Homer Bailey, CIN | 5.75 ERA, 1.87 WHIP, 11.07 K/9
Two words: .421 BABIP. Yowza. Again, owners invested way too much in this guy. Perfect buy-low opportunity here if you know your fellow owner is impatient.
Verdict: Buy low

Drew Hutchison, TOR | 3.60 ERA, 1.45 WHIP, 10.80 K/9
I’ll be honest, I was surprised to see Hutchison’s xFIP stand at 3.43. It seems like he has been much worse — but has he really? The walks are problematic but not unmanageable (see: Matt Moore), and they’ve actually shored up a bit in his last couple of starts. Moreover, he is still striking out batters at an elite rate, and the PITCHf/x data supports his success, albeit probably not with quite as much success as he’s having now. As for the WHIP? A .365 BABIP sure doesn’t help. Hutchison was once a highly-touted prospect. Your window of opportunity to gamble on this live arm may be closing if he can keep his ERA down.
Verdict: Add via free agency, sooner rather than later

Belt, Trumbo, home runs, and knowing when to sell high

San Francisco Giants first baseman Brandon Belt will never be more valuable than he is now. Many expected his breakout, and it seems those who invested in the late bloomer will be rewarded handsomely, depending on how much they paid for him or in which round they drafted him. He leads MLB tied for most home runs (5) with Arizona Diamondbacks outfielder Mark Trumbo, a free-swinging, powerful fella. Those are important words, because that is exactly what Belt has been so far.

The sample size is very small — 35 plate appearances — but the statistics are telling: He has 10 strikeouts and zero walks. Meanwhile, Belt is batting .343, which is buoyed by a .350 batting average on balls in play (BAbip). Savvy readers will be quick to point out that his 2012 and 2013 BAbips were both .351, so perhaps that’s his baseline. And it’s possible. But that would be his saving grace. If his BAbip fell to a league-average level around .300, we’re looking at Trumbo numbers, or maybe even (Pittsburgh Pirates third baseman) Pedro Alvarez numbers.

It’s realistic to think he will walk a little more and strike out a little less. His fly ball rate is conducive for home runs given his power, but it’s unrealistic to think he will hit a third of all fly balls out of the park. That’s territory reserved for, well, no one. Only a dozen batters hit 15 percent of fly balls as home runs (15% HR/FB), all of them fabled power hitters. Even Toronto Blue Jays first baseman Edwin Encarnacion and Boston Red Sox designated hitter David Ortiz notched HR/FB rates of 14.0 percent and 12.6 percent, respectively.

I think projecting a HR/FB rate of 13 percent is fair, and it would afford him 30 to 35 home runs for the season — a tremendous performance, indeed. But the batting average is bound to plummet (not that it took a rocket scientist to know he can’t sustain a .343 batting average), and it’s entirely dependent on his plate discipline and whether or not his BAbip is actually real. Today’s power hitters have pretty polarized BAbips, and it mostly comes down to their plate discipline: Ortiz, Detroit Tigers first baseman Miguel Cabrera, Los Angeles Angels of Anaheim outfielder Mike Trout and Diamondbacks first baseman Paul Goldschmidt all struck out in, at most, 20 percent of plate appearances last year, and all of them posted BAbips above .320. Meanwhile, Alvarez, Oakland Athletics third baseman Brandon Moss, New York Yankees outfielder Alfonso Soriano, and Chicago White Sox designated hitter Adam Dunn all strike out in at least 25 percent of plate appearances, and only Moss posted a BAbip above .300 (fun fact: it was .301).

It’s possible that Belt is a unique breed of hitter that can strike out a lot and hit for a high batting average on balls in play, and it’s certainly possible he sustains it for the rest of the season. But strikeout-prone power hitters tend to be batting average liabilities — one of the reasons why Baltimore Orioles first baseman Chris Davis is, I think, due for some heavy batting average regression.

This has all been a long-winded way of me saying: Belt’s batting average will regress to the mean, but it’s impossible to know whether he’ll end up hitting .295 or .245. Even somewhere in the middle means it’s a long way to fall for Belt.

I would absolutely sell high on Belt, depending on the format. If I’m in a dynasty league, or I can keep him next year at a discount, then I would be inclined to keep him. But if I owned him and had the opportunity to swipe Cincinnati Reds outfielder Jay Bruce from a panicked owner, I would pull the trigger. Bruce will probably hit more home runs the rest of the way, and his batting average will only trend upward while Belt’s trends downward.

When it comes down to it, I think Belt will hit about .275 and end up with 32 home runs. But I also think the possibility of him pulling a Justin Upton or Domonic Brown circa 2013, during which both players hit 12 home runs in one month and slept the rest of the year, is very real.

——————-

Meanwhile, Trumbo has also hit five home runs. This isn’t anything new from him, although the frequency and earliness of the bombs is surely delightful for owners. It’s worth keeping in mind that Trumbo hit no fewer than five home runs and no more than seven in any given month last year. It’s possible he surpasses his monthly high from last year by next week, but it’s also worth noting he hit seven, nine and eight home runs in May through July of 2012, only to go cold in the other three months. Every player has ups and downs, and I would be wary that such a high in April will lead to, say, an equally low August, as he regresses to the mean.

It probably sounds like I’m super down on these guys, but I’m not. I swear! It’s just that smart fantasy owner knows when to sell high and buy low, and even Trumbo can be a sell-high candidate. He will probably also hit 32 home runs, just like Belt, but if you can somehow trade him for a slow-to-start Encarnacion, who has the potential to hit 40 bombs, I would again pull the trigger. That’s at least 10 more home runs you would have otherwise gotten had you kept Trumbo all year, and Encarnacion will hit for a better average in the long run.

Other home run leaders, per ESPN’s MLB home page: Blue Jays outfielders Melky Cabrera and Jose Bautista (both at 4), Tigers outfielder Torii Hunter (3), White Sox outfielder Alejandro De Aza (3), Milwaukee Brewers outfielder Ryan Braun (3), and Colorado Rockies outfielder Carlos Gonzalez (3). Bautista, Braun and Gonzalez are legit. Cabrera is not legit, but that’s not to say he doesn’t have power. I projected him for 14 home runs and 11 stolen bases, but at this point I think he’s well on his way to a 15/15 season supplemented by a .280 batting average at the top of Toronto’s batting order. De Aza and Hunter also have pop, but they are not noteworthy hitters — go ahead and sell high, but they are still valuable commodities otherwise.

Pitchers due for strikeout regression using PITCHf/x data

If FanGraphs were a home, or a hotel, or even a tent, I’d live there. I would swim in its oceans of data, lounge in its pools of metrics.

It houses a slew of PITCHf/x data — the numbers collected by the systems installed in all MLB ballparks that measure the frequency, velocity and movement of every pitch by every pitcher. It’s pretty astounding, but it’s also difficult for the untrainted eye to make something of the numbers aside from tracking the declining velocities of CC Sabathia‘s and Yovani Gallardo‘s fastballs.

I used linear regression to see how a pitcher’s contact, swinging strike and other measurable rates affect his strikeout percentage, and how that translates to strikeouts per inning (K/9). Ultimately, the model spits out a formula to generate an expected K/9 for a pitcher. I pulled data from FanGraphs comprised of all qualified pitchers from the last four years (2010 through 2013).

The idea is this: A pitcher who can miss more bats will strike out more batters. FanGraphs’ “Contact %” statistic illustrates this, where a lower contact rate is better. Similarly, a pitcher who can generate more swinging strikes (“SwStr %”) is more likely to strike out batters.

Using this theory coupled with the aforementioned data, I “corrected” the K/9 rates of all 2013 pitchers who notched at least 100 innings. Instead of detailing the full results, here are the largest differentials between expected and actual K/9 rates. (I will list only pitchers I deem fantasy relevant.)

Largest positive differential: Name — expected K/9 – actual K/9) = +/- change

  1. Martin Perez — 7.77 – 6.08 = +1.69
  2. Jarrod Parker — 7.74 – 6.12) = +1.62
  3. Dan Straily — 8.63 – 7.33 = +1.30
  4. Jered Weaver — 8.09 – 6.82 = +1.27
  5. Hiroki Kuroda — 7.93 – 6.71 = +1.22
  6. Kris Medlen — 8.38 –  7.17 = +1.21
  7. Francisco Liriano — 10.31 – 9.11 = +1.20
  8. Ervin Santana — 8.06 – 6.87 = +1.19
  9. Ricky Nolasco — 8.47 – 7.45 = +1.02
  10. Tim Hudson — 7.42 (6.51) | +0.91

Largest negative differential:

  1. Tony Cingrani — 8.15 – 10.32 = -2.17
  2. Ubaldo Jimenez — 7.68 – 9.56 = -1.88
  3. Cliff Lee — 7.11 – 8.97 = -1.86
  4. Jose Fernandez — 8.15 – 9.75 = -1.60
  5. Shelby Miller — 7.20 – 8.78 = -1.58
  6. Scott Kazmir — 7.71 – 9.23 = -1.52
  7. Yu Darvish — 10.41 – 11.89 = -1.48
  8. Lance Lynn — 7.58 – 8.84 = -1.26
  9. Justin Masterson — 7.84 (9.09) | -1.25
  10. Chris Tillman — 6.60 (7.81) | -1.21

There’s a lot to digest here, so I’ll break it down. It appears Perez was the unluckiest pitcher last year, of the ones who qualified for the study, notching almost 1.7 fewer strikeouts per nine innings than he would be expected to, given the rate of whiffs he induced. Conversely, rookie sensation Cingrani notched almost 2.2 more strikeouts per nine innings than expected.

There is a caveat. I was not able to account for facets of pitching such as a pitcher’s ability to hide the ball well, or his tendency to draw strikes-looking. With that said, a majority of the so-called lucky ones are pitchers who, in 2013, experienced a breakout (Cingrani, Fernandez, Miller, Darvish, Masterson, Tillman) or a renaissance (Jimenez, Kazmir, Masterson — woah, all Cleveland pitchers). Is it possible these pitchers can all repeat their performances — especially the ones who have disappointed us for years? Perhaps not.

(Update, Jan. 24: Cliff Lee’s mark of -1.86 is, amazingly, not unusual for him. Over the last four years, the average difference between his expected and actual K/9 rates is … drum roll … -1.88. Insane!)

Darvish and Liriano were in a league of their own in terms of inducing swings and misses, notching almost 30 percent each. (Anibal Sanchez was third-best with 27 percent. The average is about 21 percent.) However, Darvish recorded 2.78 more K/9 than Liriano. Is there any rhyme or reason to that? Darvish is, without much argument, the better pitcher — but is he that much better? I don’t think so. Darvish was expected to notch 10.41 K/9 given his contact rate. Any idea what his 2012 K/9 rate was? Incredibly: 10.40 K/9.

More big names produced equally interesting results. King Felix Hernandez recorded a career-best 9.51 K/9, but he was expected to produce something closer to 8.57 K/9. His rate the previous three years? 8.52 K/9.

Dan Haren didn’t produce much in the way of ERA in 2013, but he did see a much-needed spike in his strikeout rate, jumping above 8 K/9 for the first time since 2010. His expected 7.07 K/9 says otherwise, though, and it fits perfectly with how his K/9 rate was trending: 7.25 K/9 in 2011, 7.23 K/9 in 2012.

I think my models tend to exaggerate the more extreme results (most of which are noted in the lists above) because they could not account for intangibles in a player’s natural talent. However, they could prove to be excellent indicators of who’s due for regression.

Only time will tell. Maybe Jose Fernandez isn’t the elite pitcher we already think he is — not yet, at least.

————

Notes: The data almost replicates a normal distribution, with 98 of the 145 observations (67.6 percent) falling within one standard deviation (1.09 K/9) of the mean value (7.19 K/9), and 140 of 145 (96.6 percent) falling within two standard deviations. The median value is 7.27 K/9, indicating the distribution is very slightly skewed left.