# Predicting HR/FB rates for hitters using weighted pitch values

(If you care only about results and not about the process, scroll down to the section aptly titled HERE YA GO.)

Victor Martinez indirectly and semi-strangely inspired this post. I was browsing FanGraphs’ weighted pitch values for hitters — something I hadn’t done before, as I’ve really only used the metric for pitchers — for 2014 and my thought process went something like that:

Jose Abreu feasted on fastballs; V-Mart feasted on sliders… Wow, V-Mart actually fared better against sliders and curveballs than fastballs and cutters. I wonder if that has any correlation with his plate discipline.

In short: no. A hitter’s success versus pitches according to weighted pitch values (per 100 of that pitch) determines about 40 percent of his walk rate and barely 4 percent of his strikeout rate. (I’m ballparking it on the K% figure.)

But I got to thinking a little more: these weighted pitch values have to be good for something other than scouting hitters (which, moving forward, maybe we starting throwing Abreu some more offspeed stuff? I don’t know).

Alas, I took a crack at it: I tested the correlation between weighted pitch values and home runs per fly ball (HR/FB) rates. And I was very pleasantly surprised.

Let’s start with context. Each player, very obviously, records his own HR/FB rate each year. Players with more power will record higher HR/FB rates, and players with less power will record lower rates. Therefore, each player, in a sense, creates his own benchmark (which, arguably, is his career HR/FB rate: he hits this many home runs as a percentage of fly balls on average). However, we know that HR/FB fluctuates annually: a player with a 15% career-HR/FB does not hit exactly 15 of every 100 fly balls over outfield walls every season like clockwork. Still, there is an expectation that he will hit a certain number of them out — hence, the benchmark.

Using regression analysis, the idea of the benchmark can be captured by seeing how, say, 2014’s HR/FB rates correlate with 2013’s rate, as well as 2012’s, 2011’s and so on. I downloaded all available ball-in-play data for seasons by “qualified” hitter as separate seasons dating back to 2002, thereby representing an exhaustive list. The line of best fit looks as follows, where L1 represents the year prior, L2 two years prior, L3 three years prior:

x(HR/FB) = .018 + .321*L1.(HR/FB) + .252*L2.(HR/FB) + .228*L3.(HR/FB)
Between R-squared: .74

One might astutely observe that a player who hit exactly zero home runs the three previous years can still be expected to hit about 1.8 percent of his fly balls over the wall, and one might call to arms to force the intercept term to zero. It seems absurd, nay, impossible that a player who never hits home runs could be expected to suddenly hit one, but let us not forget we witnessed the impossible happen just last year. That’s what makes baseball a beautiful sport: anything can happen.

Anyway, the equation above is actually really helpful in predicting expected HR/FB; its R-squared indicates the line explains almost three-quarters of the model’s fit. It also bestows the greatest significance to the most recent year as measured by its coefficient, with declining significance associated as years become further removed, which makes sense. But… BUT.

It’s not helpful in predicting HR/FB for hitters who have only been in the league fewer than three years. Moreover, it seems especially difficult to predict future HR/FB rates for hitters with only one year of data, such as the monstrous Abreu. (Maybe Abreu did inspire this post after all.) Observe:

x(HR/FB) = .032 + .694*L1(HR/FB)

After a little bit of algebra, we can intuit that the equilibrium HR/FB rate is roughly 10.4 percent. I use the term “equilibrium” because it appears that no matter what HR/FB a hitter posted in his first career season, his next-year HR/FB will be expected to converge (aka regress) toward the magical number of 10.4 percent. Again, observe:

.032 + .694*(12%) = 11.5%
.032 + .694*(8%) = 8.7%

You can perform this exercise with any value, and the results will be the same: a 2014 HR/FB rate lower than ~10.4 percent will be expected to increase in 2015, and a rate higher than ~10.4 percent will be expected to decrease in 2015. Now this, this, is actually absurd. Granted, the equation is communicating what would happen on average, but hitters are not homogeneous.

This is all a very long-winded way of saying two things:

1) When the sample is incredibly small — namely, one observation — using history as a guide fails us.
2) I think I may have found an alternative that relies not on a single year’s worth of HR/FB data but on a single year’s worth of weighted pitch value data.

## HERE YA GO

Let me be clear, up front: I know there will be a lot of multicollinearity inherent in this analysis — that is, HR/FB and weighted pitch values are dependent on each other in some fashion. I don’t know how weighted pitch values are calculated exactly — it would behoove me to look it up, but I am lazy, a current self-descriptor of which I am not proud — but, intuitively, a hitter who hits home runs more frequently off of particular pitch types will likely record higher weighted values for those pitches. Essentially, the weighted values are calculated using home run frequency, and I am now trying to reverse-engineer it.

But I don’t see that as a bad thing. There is a profound correlative capability in the data, and using that information to glean whether or not a hitter was, perhaps, a bit lucky when it came to his HR/FB frequency is, I hope, less preposterous than pulling a number out of your rear-end.

## HERE YA GO, FOR REALSIES

I will use strictly weighted pitch values per 100 pitches (denoted wXX/C, where XX represents the pitch abbreviation). I omit knuckleballs because not all players saw them, and I omit splitfingers because they are statistically insignificant, probably because they aren’t thrown very often, rendering the weighted pitch values more volatile. I also add K% and BABIP presuming the following: strikeout rates are positively correlated with HR/FB rates, and BABIP, which positively correlates with hard-hit balls such as line drives, is likely to also positively correlate with similarly-hard-hit balls such as home runs. (A regression that includes only weighted pitch values and excludes K% and BABIP produces an adjusted R-squared of .45.) The line of best fit equation is as follows:

x(HR/FB) = .2049 + .0352*(wFB/C) + .0081*(wSL/C) + .0014*(wCT/C) + .0041*(wCB/C) + .0063*(wCH/C) + .5244*K% — .6706*BABIP

Again, the model produces a great line of best fit per its R-squared — almost identical to its lagged-variable counterpart. As it should; if there’s multicollinearity, it should. (And there is.) But reverse-engineering the process should create accurate predictions of what should have been a hitter’s HR/FB rate in a given season because of the multicollinearity; in this instance, it’s not a bad thing.

Some trends emerge instantly, trends similar to those I saw in the xK% and xBB% studies I performed earlier: regardless of a player’s power potential, he will over-perform or under-perform his expected HR/FB rate, and he will do so with consistency. For example, Adam LaRoche, despite his apparent power stroke, consistently under-performs his xHR/FB:

HR/FB: actual minus expected
2010: -1.89%
2012: -1.87%
2013: -1.77%
2014: -0.75%

Meanwhile, Albert Pujols consistently out-performs his xHR/FB:

2010: +2.02%
2011: +5.67%
2012: +1.80%
2014: +2.13%

Each data set has its noise, but you can see based on these limited samples where each hitter experienced a bit of luck: LaRoche, in 2014, saw a minor spike, and Pujols saw a major spike in 2011.

Rather than going through each player individually, I will highlight a few extreme, fantasy-relevant outliers from 2014 and reflect accordingly. Without further adieu (and in alphabetical order by first name):

This is the largest negative differential in the 2014 data. Without another full season of data to compare, this huge difference is likely a sign of bad luck, although there is a chance that he is a severe under-performer in the same vein as Matt Carpenter (who has under-performed his xHR/FB by about 7 percent the past two years). I already liked the guy for his speed and control of the strike zone, and the prospect of a pending power spike is enticing.

Coco Crisp, -5.78%
Crisp is a great case study: he notched a career-high 12.4-percent HR/FB in 2013, then promptly slid back down to single digits in 2014. His 2014 xHR/FB, however, indicates his HR/FB should have been closer to 11.5 percent, almost 6 percent higher than his actual mark and only 1.2 percent less than 2013. Meanwhile, his 2012 and 2013 expected and actual HR/FB rates are almost identical. His power-speed combination was pretty valuable two years ago — when he wasn’t on the disabled list, at least.

Curtis Granderson, -5.92%
Granderson bottomed out in woeful aplomb last year, but his xHR/FB offers a glimmer of hope. I’ll be honest, though, I can’t remember the last time this guy was fantasy relevant. But if you’re looking for sneaky power at the expense of everything ever, he could be your guy.

Giancarlo Stanton, +5.33%
The Artist Formerly Known as Mike posted positive differentials in 2011 and 2013, but each was one-half and one-third the magnitude of last year’s differential. His 2013 and 2014 xHR/FBs are practically identical — 20.16% and 20.17% — so it looks like Stanton chose a good year to get a little bit lucky.

Jason Heyward, -5.56%
Speaking of bottoming out, Heyward’s power all but evaporated last year. Fear not, however, as his 2014 xHR/FB is only 4 percentage points less than 2013’s — which still sucks, but at least it’s not as bad as a whopping 10 percentage points. It’s probably too obvious to count on a comeback, but no matter.

Jason Kipnis, -4.39%
His year-by-year differentials: -0.01%, -2.61%, -4.39%. His year-by-year xHR/FB: 9.71%, 15.01%, 9.19%. I don’t know what to believe, really, because it’s hard to tell what’s real here and what’s not. But, again, here ye beholdeth another bounceback candidate.

Jonathan Lucroy, -3.77%
His 2014 xHR/FB was a percentage point better than 2013’s. The dude is too good.

Jose Abreu, +8.52%
Now this man, THIS MAN, is the real reason why we’re all here. What can we make of that? We know that prodigious power hitters such as Pujols and Stanton can exceed expectations. But this expectation is set pretty high. I think we’re all expecting regression, but it’s everyone’s best guess as to how much. I’m thinking a drop from 27-ish percent closer to a Chris Davis-esque 22 percent.

Lucas Duda, -3.38%
I don’t have any other reliable full-season data for Duda to compare, but at least it wasn’t a positive differential. The negative implies that last year’s breakout was probably legit — and maybe there’s still room for improvement.

Similarly to Duda, Adams’ only full season came last year. But the mammoth power we saw in 2013 didn’t disappear as much as it did suffer some bad luck. His 2014 xHR/FB of 12.19 percent still isn’t where any of us would like it to be, but again, maybe there’s still room for improvement.

Matt Holliday, -3.09%
Holliday, who perennially out-performs his xHR/FB, appears to have gotten pretty unlucky last year. Of the last five years (dating back to 2010), 2014’s xHR/FB was right in the middle. I know he’s getting old, but man, he’s a monster, and I think there’s juice still in the tank.

Nick Castellanos, -5.20%
Might be a little more pop in that bat than we know.

Nori Aoki, -6.08%
His power simply vanished, but the xHR/FB is in line with past years. He could return to his 10-HR, 25-SB ways in short order.

Robinson Cano, +2.33%
This is my absolutely favorite result in the entire 2014 data set. Cano always out-performs his xHR/FB; that part does not concern me. It’s the xHR/FB itself: it dropped off almost 7 percent from 2013 to 2014. Seven percent! Say what you will about Safeco Field sapping power, but methinks a larger share of that 7 percent is a 32-year-old man in decline.

Xander Bogaerts, -3.88%
See Castellanos, Nick.

Yasiel Puig, -4.58%
Remember how Puig hit way fewer home runs last year and all that stuff? Hey, I traded him midseason (he will cost only \$13 next year, but I won my league so it all works out) for Carlos Gomez and a closer. In the moment, I think I made the right move: Puig’s home run rate never really improved. But his 2013 differential was +5.24%. Cutting the crap, his 2013 and 2014 xHR/FB rates were 16.56% and 15.68%, respectively — smack-dab in the middle of both years. Thus, taking the average of the two may not be such a bad method for projection after all.

OK, that’s everything. The players listed above were merely a sample and are by no means exhaustive when it comes to the peculiar splits I saw. More importantly, the implications are most interesting where they are hardest to draw: players such as Abreu and Eaton very clearly seem to have benefited (and suffered) at the hands of luck, and we can surely expect regression. But… how much? ‘Tis the question of the day, my friends.

Edit (1/8/15, 11:42 am): FanGraphs’ Mike Podhorzer, who coincidentally posted a xHR/FB metric for pitchers today, developed a similar metric for hitters a while back, to the tune of a .65 adjusted R-squared. I feel pretty good about my work now.

# 2014 Rankings: Catcher

Rankings based on standard 5×5 rotisserie league.

Name – R / RBI / HR / SB / BA

1. Buster Posey – 69 / 85 / 20 / 2 / .299
2. Wilin Rosario – 67 / 78 / 27 / 4 / .267
3. Jonathan Lucroy – 60 / 78 / 17 / 6 / .293
4. Yadier Molina – 59 / 78 / 16 / 4 / .296
5. Brian McCann – 58 / 82 / 20 / 3 / .281
6. Joe Mauer – 72 / 66 / 10 / 5 / .297
7. Wilson Ramos – 54 / 73 / 26 / 0 / .283
8. Carlos Santana – 76 / 75 / 20 / 5 / .253
9. Matt Wieters – 60 / 73 / 23 / 2 / .256
10. Evan Gattis – 52 / 76 / 24 / 0 / .264
11. Miguel Montero – 65 / 75 / 15 / 0 / .260
12. A.J. Pierzynski – 58 / 64 / 15 / 1 / .277
13. Jason Castro – 74 / 58 / 17 / 1 / .245
14. Salvador Perez – 54 / 71 / 12 / 0 / .270
15. Yan Gomes – 56 / 53 / 16 / 1 / .278

Thoughts:

• Gomes is Cleveland’s starting catcher; my projection doesn’t account for that. Give him a full year of at-bats and he’s easily a top-10 catcher who threatens the top 5.
• Perez is “due” for a breakout, as everyone says. Try him for a bargain pick, but I think there are adequate substitutes you can still get for cheap without risking the lack of counting stats.
• Castro may be safer than Gattis, but even an underwhelming Gattis may still rival Castro’s numbers fueled by an anemic team and spacious ballpark.
• I love Lucroy but, like yesterday’s shortstops, he’s not definitively better than Molina. Just look at their projections — they’re almost identical. Go with your gut.

# Brian McCann is a top-3 (or top-4… or AT LEAST a top-5) catcher

Atlanta Braves catcher Brian McCann doesn’t show up on any leaderboards because he’s a handful of plate appearances short of qualification — off-season shoulder surgery to repair a torn labrum sidelined him for the first couple of months of the season — but that doesn’t mean he a) doesn’t exist, and b) isn’t playing extremely well.

This is my very scholarly presentation titled “Brian McCann is having a monster year and I think he’s really great”. During my presentation I will say nothing. Zero words. I will instead let McCann’s statistics do the talking and prove my point for me.

Well, not quite. But here goes.

Statistics presented are current through August 3. All comparisons involve catchers who are relevant in standard-format fantasy leagues. Sorry, John Jaso. Your .387 OBP is not welcome here.

BA (batting average):

2. Joe Mauer, .320
3. Buster Posey, .308
4. Brian McCann, .286

McCann is already in good company among his catching brethren, and his .285 BAbip suggests he’s back to form. Mauer and Poser hit for average perennially, so it’s no knock on McCann to be behind those guys. (Side note: Mauer’s batting average is just under his career mark while his wild .384 BAbip is 46 points higher than his career mark. It’s not insane for Mauer to bat .320, but that BAbip is crazy.)

OBP (on-base percentage):

1. Joe Mauer, .402
2. Buster Posey, .378
Carlos Santana, .374
4. Brian McCann, .372

The Minnesota Twins’ Joe Mauer and the San Francisco Giants’ Buster Posey will lead this category perennially, and that’s part of what makes them so valuable. But McCann has kept pace with them and other touted catchers.

ISO (isolated power):

1. Brian McCann, .256
2. Evan Gattis, .249
3. Jonathan Lucroy, .218

With the exception of El Oso Blanco (Atlanta Braves catcher Evan Gattis), who is probably not even human (because he’s a bear, right?), nobody else comes close to McCann’s isolated power. His power numbers are through the roof, albeit unsustainable, but he likely won’t fall off as severely as teammate Justin Upton did. McCann has hit 18 to 24 home runs in each year since 2006, though, including his dismal 2012 season, so the power is consistent and certainly not a fluke. He also sports a 25-percent line drive rate. Frankly, he’s crushing the ball.

(Let us take a moment to acknowledge the Milwaukee Brewers’ Jonathan Lucroy. He didn’t get any preseason love from the so-called experts despite hitting .284 with 16 home runs in about half a season’s worth of at-bats last year.)

PA/HR (plate appearances per home run), where a smaller ratio is better:

1. Brian McCann, 16.3 (read “one home run per 16.3 plate appearances”)
2. Evan Gattis, 16.7
3. J.P. Arencibia, 21.6

I sort of alluded to this, but again, it’s the Braves’ catchers leading the pack. Except did you really think McCann was hitting home runs more frequently than Gattis? Me neither. The Blue Jays’ J.P. Arencibia has pop and leads all MLB catchers with 17 home runs but comes at the steep price of a .214 batting average. Speaking of which…

Home runs:

1. J.P. Arencibia, 17
2. Brian McCann, 16
Jonathan Lucroy, 16
3. Matt Wieters, 15
Evan Gattis, 15
Wilin Rosario, 15

McCann hit as many home runs as Lucroy in 106 fewer plate appearances. With all this talk about power, let’s take a look at OPS (on-base percentage plus slugging percentage) and BRA (on-base percentage times slugging percentage, giving more weight to OBP). McCann leads in both categories (.914 and .202, respectively), meaning Mauer’s or Posey’s elevated OBP may not necessarily warrant the praise or favoritism it gets when valuing those players.

(R+RBI)/PA), or how many runs and RBI a player record per plate appearance, where a larger number is better:

1. Evan Gattis, 0.299
2. Brian McCann, 0.280
3. Wilin Rosario, 0.274
Jarrod Saltalamacchia, 0.256

This isn’t a fancy metric, just a simpler way to measure production frequency instead of breaking it up for each category (R and RBI). The number associated with each player may be difficult for some readers to process without a picture painted for them, so I’ll paint one. If McCann and the St. Louis Cardinals’ Yadier Molina each recorded exactly 500 plate appearances, McCann would produce 140 runs and RBI (think 70 R, 70 RBI) to Molina’s 128 (64 R, 64 RBI).

BB% (BB/PA, or walk percentage):

1. Joe Mauer, 12.2%
2. Russell Martin, 11.8%
3. Brian McCann, 11.1%
4. Miguel Montero, 10.9%

On top of this…

K/BB (ratio of strikeouts to walks), where a smaller number is better:

1. Buster Posey, 1.26
2. Carlos Santana, 1.34
3. Brian McCann, 1.41
4. Victor Martinez, 1.43

… McCann is third best in his strikeout rate relative to his walk rate.

Lastly — and perhaps most importantly — is WAR. Here’s how the WAR leaderboard for catchers looks according to FanGraphs:

2. Joe Mauer, 4.2
3. Buster Posey, 3.9
4. Russell Martin, 3.6
5. Jonathan Lucroy, 3.1
6. Buster McCann, 2.9

… which aggregates not only offensive and but also defensive performance. Remember that McCann has about 60 percent of the plate appearances as other “full-time” catchers (and he’s already sixth in WAR — wow!). If I normalize each player’s WAR to, say, WAR per 100 plate appearances, the list now looks like this:

1. Brian McCann, 1.11
3. Russell Martin, 1.03
4. Joe Mauer, 0.95
5. Buster Posey, 0.93
6. Jonathan Lucroy, 0.84

That’s right, folks. McCann has the best WAR relative to his playing time.

This concludes the bulk of my presentation. Brian McCann is indeed having a monster year. But is he a top-3 catcher?

I respect Tristan H. Cockcroft’s opinions on matters such as these, especially because he’s a genius. In his most recent Hit Parade column, he ranked McCann as the seventh-best catcher. There are a lot of factors at play, including Mauer’s and Posey’s high BAs and OBPs but offensively miserable lineups in which they are entrenched, Colorado Rockies catcher Wilin Rosario‘s poor plate discipline and Lucroy’s OBP that leaves something to be desired.

Since McCann’s first game (May 6)
Victor Martinez: .289 BA, .770 OPS, 8 HR, 39 R, 46 RBI, 26 XBH
Joe Mauer: .333 BA, .901 OPS, 6 HR, 40 R, 29 RBI, 31 XBH
Buster Posey: .314 BA, .869 OPS, 10 HR, 32 R, 40 RBI, 31 XBH
Brian McCann: .286 BA, .913 OPS, 16 HR, 29 R, 44 RBI, 26 XBH
Jonathan Lucroy: .307 BA, .916 OPS, 13 HR, 24 R, 42 RBI, 31 XBH
Wilin Rosario: .262 BA, .722 OPS, 8 HR, 29 R, 33 RBI, 21 XBH
Carlos Santana: .238 BA, .720 OPS, 6 HR, 31 R, 35 RBI, 25 XBH

Performance-wise, McCann is probably fourth-best of the list, although it’s very close. Are six home runs worth more than 28 points of batting average? Possibly. Ultimately, it’s a small sample size, and it’s not going to tell us a whole lot. But McCann has clearly outperformed Rosario or Santana, and all three of their career numbers indicate the trend will likely continue. I like McCann a lot going forward, based pretty heavily on the fact that he’s a part of a good offensive lineup. Mauer, Posey and Lucroy, not so much. And if McCann just steals even a couple of bases like he has done in the past, it will boost his value greatly relative to everyone else.

So yes, I will be bold and declare it: Brian McCann is a top-4 catcher! At least for the rest of the season. Or until Molina comes back. Geez, so many conditions. But if McCann can steal even just three bags, which Martinez has barely done cumulatively in 11 professional years, it will elevate McCann to the top 3. I love Lucroy, but I’m having a hard time not being a little bit skeptical. He rounds out the top 5.

One last bonus stat to bolster McCann’s case…

AB/HR, all MLB players (not limited to catchers):

1. Chris Davis, 10.0
2. Miguel Cabrera, 11.9
3. Pedro Alvarez, 13.6
4. Raul Ibanez, 13.8
5. Edwin Encarnacion, 14.0
6. Brian McCann, 14.2

WOWZA!!!

Oh, and for anyone who dismissed McCann’s .230 batting average and decline in production last year as his demise: sorry.