# Predicting HR/FB rates for hitters using weighted pitch values

(If you care only about results and not about the process, scroll down to the section aptly titled HERE YA GO.)

Victor Martinez indirectly and semi-strangely inspired this post. I was browsing FanGraphs’ weighted pitch values for hitters — something I hadn’t done before, as I’ve really only used the metric for pitchers — for 2014 and my thought process went something like that:

Jose Abreu feasted on fastballs; V-Mart feasted on sliders… Wow, V-Mart actually fared better against sliders and curveballs than fastballs and cutters. I wonder if that has any correlation with his plate discipline.

In short: no. A hitter’s success versus pitches according to weighted pitch values (per 100 of that pitch) determines about 40 percent of his walk rate and barely 4 percent of his strikeout rate. (I’m ballparking it on the K% figure.)

But I got to thinking a little more: these weighted pitch values have to be good for something other than scouting hitters (which, moving forward, maybe we starting throwing Abreu some more offspeed stuff? I don’t know).

Alas, I took a crack at it: I tested the correlation between weighted pitch values and home runs per fly ball (HR/FB) rates. And I was very pleasantly surprised.

Let’s start with context. Each player, very obviously, records his own HR/FB rate each year. Players with more power will record higher HR/FB rates, and players with less power will record lower rates. Therefore, each player, in a sense, creates his own benchmark (which, arguably, is his career HR/FB rate: he hits this many home runs as a percentage of fly balls on average). However, we know that HR/FB fluctuates annually: a player with a 15% career-HR/FB does not hit exactly 15 of every 100 fly balls over outfield walls every season like clockwork. Still, there is an expectation that he will hit a certain number of them out — hence, the benchmark.

Using regression analysis, the idea of the benchmark can be captured by seeing how, say, 2014’s HR/FB rates correlate with 2013’s rate, as well as 2012’s, 2011’s and so on. I downloaded all available ball-in-play data for seasons by “qualified” hitter as separate seasons dating back to 2002, thereby representing an exhaustive list. The line of best fit looks as follows, where L1 represents the year prior, L2 two years prior, L3 three years prior:

x(HR/FB) = .018 + .321*L1.(HR/FB) + .252*L2.(HR/FB) + .228*L3.(HR/FB)
Between R-squared: .74

One might astutely observe that a player who hit exactly zero home runs the three previous years can still be expected to hit about 1.8 percent of his fly balls over the wall, and one might call to arms to force the intercept term to zero. It seems absurd, nay, impossible that a player who never hits home runs could be expected to suddenly hit one, but let us not forget we witnessed the impossible happen just last year. That’s what makes baseball a beautiful sport: anything can happen.

Anyway, the equation above is actually really helpful in predicting expected HR/FB; its R-squared indicates the line explains almost three-quarters of the model’s fit. It also bestows the greatest significance to the most recent year as measured by its coefficient, with declining significance associated as years become further removed, which makes sense. But… BUT.

It’s not helpful in predicting HR/FB for hitters who have only been in the league fewer than three years. Moreover, it seems especially difficult to predict future HR/FB rates for hitters with only one year of data, such as the monstrous Abreu. (Maybe Abreu did inspire this post after all.) Observe:

x(HR/FB) = .032 + .694*L1(HR/FB)

After a little bit of algebra, we can intuit that the equilibrium HR/FB rate is roughly 10.4 percent. I use the term “equilibrium” because it appears that no matter what HR/FB a hitter posted in his first career season, his next-year HR/FB will be expected to converge (aka regress) toward the magical number of 10.4 percent. Again, observe:

.032 + .694*(12%) = 11.5%
.032 + .694*(8%) = 8.7%

You can perform this exercise with any value, and the results will be the same: a 2014 HR/FB rate lower than ~10.4 percent will be expected to increase in 2015, and a rate higher than ~10.4 percent will be expected to decrease in 2015. Now this, this, is actually absurd. Granted, the equation is communicating what would happen on average, but hitters are not homogeneous.

This is all a very long-winded way of saying two things:

1) When the sample is incredibly small — namely, one observation — using history as a guide fails us.
2) I think I may have found an alternative that relies not on a single year’s worth of HR/FB data but on a single year’s worth of weighted pitch value data.

## HERE YA GO

Let me be clear, up front: I know there will be a lot of multicollinearity inherent in this analysis — that is, HR/FB and weighted pitch values are dependent on each other in some fashion. I don’t know how weighted pitch values are calculated exactly — it would behoove me to look it up, but I am lazy, a current self-descriptor of which I am not proud — but, intuitively, a hitter who hits home runs more frequently off of particular pitch types will likely record higher weighted values for those pitches. Essentially, the weighted values are calculated using home run frequency, and I am now trying to reverse-engineer it.

But I don’t see that as a bad thing. There is a profound correlative capability in the data, and using that information to glean whether or not a hitter was, perhaps, a bit lucky when it came to his HR/FB frequency is, I hope, less preposterous than pulling a number out of your rear-end.

## HERE YA GO, FOR REALSIES

I will use strictly weighted pitch values per 100 pitches (denoted wXX/C, where XX represents the pitch abbreviation). I omit knuckleballs because not all players saw them, and I omit splitfingers because they are statistically insignificant, probably because they aren’t thrown very often, rendering the weighted pitch values more volatile. I also add K% and BABIP presuming the following: strikeout rates are positively correlated with HR/FB rates, and BABIP, which positively correlates with hard-hit balls such as line drives, is likely to also positively correlate with similarly-hard-hit balls such as home runs. (A regression that includes only weighted pitch values and excludes K% and BABIP produces an adjusted R-squared of .45.) The line of best fit equation is as follows:

x(HR/FB) = .2049 + .0352*(wFB/C) + .0081*(wSL/C) + .0014*(wCT/C) + .0041*(wCB/C) + .0063*(wCH/C) + .5244*K% — .6706*BABIP

Again, the model produces a great line of best fit per its R-squared — almost identical to its lagged-variable counterpart. As it should; if there’s multicollinearity, it should. (And there is.) But reverse-engineering the process should create accurate predictions of what should have been a hitter’s HR/FB rate in a given season because of the multicollinearity; in this instance, it’s not a bad thing.

Some trends emerge instantly, trends similar to those I saw in the xK% and xBB% studies I performed earlier: regardless of a player’s power potential, he will over-perform or under-perform his expected HR/FB rate, and he will do so with consistency. For example, Adam LaRoche, despite his apparent power stroke, consistently under-performs his xHR/FB:

HR/FB: actual minus expected
2010: -1.89%
2012: -1.87%
2013: -1.77%
2014: -0.75%

Meanwhile, Albert Pujols consistently out-performs his xHR/FB:

2010: +2.02%
2011: +5.67%
2012: +1.80%
2014: +2.13%

Each data set has its noise, but you can see based on these limited samples where each hitter experienced a bit of luck: LaRoche, in 2014, saw a minor spike, and Pujols saw a major spike in 2011.

Rather than going through each player individually, I will highlight a few extreme, fantasy-relevant outliers from 2014 and reflect accordingly. Without further adieu (and in alphabetical order by first name):

This is the largest negative differential in the 2014 data. Without another full season of data to compare, this huge difference is likely a sign of bad luck, although there is a chance that he is a severe under-performer in the same vein as Matt Carpenter (who has under-performed his xHR/FB by about 7 percent the past two years). I already liked the guy for his speed and control of the strike zone, and the prospect of a pending power spike is enticing.

Coco Crisp, -5.78%
Crisp is a great case study: he notched a career-high 12.4-percent HR/FB in 2013, then promptly slid back down to single digits in 2014. His 2014 xHR/FB, however, indicates his HR/FB should have been closer to 11.5 percent, almost 6 percent higher than his actual mark and only 1.2 percent less than 2013. Meanwhile, his 2012 and 2013 expected and actual HR/FB rates are almost identical. His power-speed combination was pretty valuable two years ago — when he wasn’t on the disabled list, at least.

Curtis Granderson, -5.92%
Granderson bottomed out in woeful aplomb last year, but his xHR/FB offers a glimmer of hope. I’ll be honest, though, I can’t remember the last time this guy was fantasy relevant. But if you’re looking for sneaky power at the expense of everything ever, he could be your guy.

Giancarlo Stanton, +5.33%
The Artist Formerly Known as Mike posted positive differentials in 2011 and 2013, but each was one-half and one-third the magnitude of last year’s differential. His 2013 and 2014 xHR/FBs are practically identical — 20.16% and 20.17% — so it looks like Stanton chose a good year to get a little bit lucky.

Jason Heyward, -5.56%
Speaking of bottoming out, Heyward’s power all but evaporated last year. Fear not, however, as his 2014 xHR/FB is only 4 percentage points less than 2013’s — which still sucks, but at least it’s not as bad as a whopping 10 percentage points. It’s probably too obvious to count on a comeback, but no matter.

Jason Kipnis, -4.39%
His year-by-year differentials: -0.01%, -2.61%, -4.39%. His year-by-year xHR/FB: 9.71%, 15.01%, 9.19%. I don’t know what to believe, really, because it’s hard to tell what’s real here and what’s not. But, again, here ye beholdeth another bounceback candidate.

Jonathan Lucroy, -3.77%
His 2014 xHR/FB was a percentage point better than 2013’s. The dude is too good.

Jose Abreu, +8.52%
Now this man, THIS MAN, is the real reason why we’re all here. What can we make of that? We know that prodigious power hitters such as Pujols and Stanton can exceed expectations. But this expectation is set pretty high. I think we’re all expecting regression, but it’s everyone’s best guess as to how much. I’m thinking a drop from 27-ish percent closer to a Chris Davis-esque 22 percent.

Lucas Duda, -3.38%
I don’t have any other reliable full-season data for Duda to compare, but at least it wasn’t a positive differential. The negative implies that last year’s breakout was probably legit — and maybe there’s still room for improvement.

Similarly to Duda, Adams’ only full season came last year. But the mammoth power we saw in 2013 didn’t disappear as much as it did suffer some bad luck. His 2014 xHR/FB of 12.19 percent still isn’t where any of us would like it to be, but again, maybe there’s still room for improvement.

Matt Holliday, -3.09%
Holliday, who perennially out-performs his xHR/FB, appears to have gotten pretty unlucky last year. Of the last five years (dating back to 2010), 2014’s xHR/FB was right in the middle. I know he’s getting old, but man, he’s a monster, and I think there’s juice still in the tank.

Nick Castellanos, -5.20%
Might be a little more pop in that bat than we know.

Nori Aoki, -6.08%
His power simply vanished, but the xHR/FB is in line with past years. He could return to his 10-HR, 25-SB ways in short order.

Robinson Cano, +2.33%
This is my absolutely favorite result in the entire 2014 data set. Cano always out-performs his xHR/FB; that part does not concern me. It’s the xHR/FB itself: it dropped off almost 7 percent from 2013 to 2014. Seven percent! Say what you will about Safeco Field sapping power, but methinks a larger share of that 7 percent is a 32-year-old man in decline.

Xander Bogaerts, -3.88%
See Castellanos, Nick.

Yasiel Puig, -4.58%
Remember how Puig hit way fewer home runs last year and all that stuff? Hey, I traded him midseason (he will cost only \$13 next year, but I won my league so it all works out) for Carlos Gomez and a closer. In the moment, I think I made the right move: Puig’s home run rate never really improved. But his 2013 differential was +5.24%. Cutting the crap, his 2013 and 2014 xHR/FB rates were 16.56% and 15.68%, respectively — smack-dab in the middle of both years. Thus, taking the average of the two may not be such a bad method for projection after all.

OK, that’s everything. The players listed above were merely a sample and are by no means exhaustive when it comes to the peculiar splits I saw. More importantly, the implications are most interesting where they are hardest to draw: players such as Abreu and Eaton very clearly seem to have benefited (and suffered) at the hands of luck, and we can surely expect regression. But… how much? ‘Tis the question of the day, my friends.

Edit (1/8/15, 11:42 am): FanGraphs’ Mike Podhorzer, who coincidentally posted a xHR/FB metric for pitchers today, developed a similar metric for hitters a while back, to the tune of a .65 adjusted R-squared. I feel pretty good about my work now.

# Panning for gold using spring stats, hitter edition

You’ve probably heard a hundred times this month alone: spring training statistics don’t mean anything. Too many times a player has had a monster spring only to completely flop during the season (do Aaron Hicks or Jackie Bradley circa 2013 ring a bell?). Still, in disbelief we all watched Julio Teheran‘s monster spring last year, and he humiliated batters and baserunners throughout his rookie campaign.

Ultimately, spring stats do tell a story, albeit a short or biased one. But if you know where to look — that is, if you know the stats on which to focus your attention — you can maybe decipher which spring performances are legit and which are not.

Important stats: 12 for 42 (.286 BA), 9 SB, 8 K
Why they’re important: Well, holy smokes. Look at those steals. We’ve always known he’s fast, but wow. Also, he has struck out in only 19 percent of at-bats, which certainly isn’t the worst thing in the world. What I’m looking for here is if he can hold his own at the plate, even if it’s just for a month or two, and right now he’s hitting .286 — nothing spectacular, but not miserable, either. Oh, and did I mention he has four triples already?  Gordon isn’t a top-10 second baseman, but handcuff him to Alexander Guerrero (or simply jump ship when Guerrero finally gets the call) and this could be a great draft strategy.

Billy Hamilton, CIN CF
Important stats: 10 for 33 (.303 BA), 9 SB, 4 K, 6 BB
Why they’re important: Not only is Hamilton stealing bases at an unfathomable rate, he is also barely striking out (only 12 percent of at-bats have ended in a K) and has actually walked more times than he has struck out. Everyone and their mothers were worried Hamilton would be overpowered at the plate. Don’t get caught in the hype, I hear them saying. Yet I can’t help myself. If he keeps putting the bat on the ball the way he’s doing, he will get on base, he will steal, and he will score runs.

Billy Burns, OAK LF
Important stats: 8 SB, 13 K in 52 AB
Why they’re important: OK, maybe I was little too obvious when I sorted MLB.com’s spring training stats by stolen bases. Burns is getting way more hype than anyone in spring training right now, or at least it seems that way. He’s effectively blocked in the A’s outfield, but his speed, plate discipline and glove-work will fast-track him to the majors. Unfortunately, 25 percent of at-bats are ending in strikeouts, so he may be overmatched. No skin off our backs, though, especially if he doesn’t start this year in the majors.

Other stolen base leaders who are legitimate fantasy options: Jarrod Dyson (6 SB) and Rajai Davis (5 SB). I’ve raved about Davis’ fantasy value before.

Mike Moustakas, KC 3B
Important stats: 17 for 35 (.486 BA), 4 HR, 4 K, 6 BB
Why they’re important: Moustakas has been mostly a letdown during his major league career. He’s crushing home runs right now and has walked more than he’s struck out, and people are starting to be optimistic about the guy. I’m hesitant, and I would still leave him undrafted in standard mixed leagues, but he could be worth an extra couple of dollars in AL-only leagues. I’ll watch his name as the season progresses, though. He’s worth following if you’re picking a risky or injury-prone third base asset such as Ryan Zimmerman or Aramis Ramirez.

Important stats: 14 for 34 (.412 BA), 2 3B, 4 HR, 1 SB
Why they’re important: Guys… are you serious. I cannot love this guy any more. And he’s still hitting triples!!! It’s not a fluke, people. I think Miller is the second coming of Ian Desmond.

Jason Heyward, ATL RF
Important stats: 14 for 40 (.350 BA), 3 HR, 1 SB
Why they’re important: …Jason Heyward? Is that really you?

Javier Baez, CHC SS
Important stats: .297/.297/.703, 4 HR, 1 SB, 11 K, 0 BB
Why they’re important: Is Baez even a real person? The split between his slugging and on-base percentages is impossibly large. Meanwhile, zero walks and 11 K’s in 37 at-bats. This kid is going to be amazing, if not occasionally frustrating at first.

Other business-as-usual home run hitters: Russell Martin (kind of — he had a huge spring last year, too, if I remember correctly), Hunter Pence (4 HR), Giancarlo Stanton (4 HR), Jose Bautista (3 HR), Miguel Cabrera (3 HR), Chris Davis (3 HR), Andrew McCutchen (3 HR).

Nick Castellanos, DET 3B (formerly LF)
Important stats: 18 for 45 (.400 BA), 7 2B, 2 HR, 2 SB, 16 RBI
Why they’re important: Castellanos is a highly touted prospect with very little major-league exposure with which we can form solid opinions about him. But nine multi-base hits in 45 at-bats, plus a pair of bombs and swipes, makes it look like this kid is the real deal, regardless of his sort of lackluster minor-league stats. Don’t get too enamored with the RBI total, but clearly he’s not afraid of so-called clutch situations, either.

Dustin Ackley, SEA LF (formerly 2B)
Important stats: .432/.462/.703, 1 HR, 6 K in 37 AB
Why they’re important: Maybe the former No. 2 pick can recoup some of his losses. He had a somewhat strong showing in the latter half of 2013. It will be interesting to see if it carries over. As the Magic 8-Ball might say, “All signs point to yes.” Or something like that.

As for players who scare me right now, Corey Hart is batting .129/.250/.161 with 16 strikeouts in 31 at-bats; B.J. Upton is batting .297/.366/.351 but with 14 strikeouts in 37 at-bats, an unsustainable rate for that batting average; and Domonic Brown is batting a miserable .171/.326/.229 with 12 strikeouts in 35 at-bats, albeit with eight walks.

Do your own research, form your own opinions. This is just a sampling of the many names that are shining bright or falling flat. And, of course, it’s simply too risky to make a decision on such a small sample size. But it never hurts to remember a name or two.

# Breaking down recent trades and signings. Who won?

I been gone a long time. Sorry, folks. Let’s break down some recent trades now that the stove hath been declared “hot.”

DET 1B Prince Fielder for TEX 2B Ian Kinsler
With the emergence of second baseman Jurickson Profar, the Rangers had a logjam in the middle infield, especially after extending Elvis Andrus‘ contract. Trading Kinsler was the solution, and any chatter about trading Profar to St. Louis for outfield prospect Oscar Taveras was promptly silenced. The Rangers will take on about \$10 million more in salary per year, not to mention all the additional years at the tail end of Fielder’s contract, but will be able to replace the floundering Mitch Moreland at first base. Some analysts (and Detroit fans) have sworn off Fielder and declared his power decline already in motion. I’ll get to that.

The Tigers had more needs to fill. Infielders Jhonny Peralta and Omar Infante are free agents, and both are coming off solid years and will likely test the market. Trading for Kinsler fills one of these needs, and quite soundly, too. Kinsler will bring veteran presence and skills to an already highly-talented team. Moreover, moving Fielder away from Detroit, where his (alleged) declining production, poor postseason performance and lukewarm-at-best fan relations have alienated him, frees up salary space to offer Cy Young pitcher Max Scherzer a long-term contract. The one thing I haven’t seen discussed much: two-time AL MVP Miguel Cabrera was plagued by nagging injuries the last month of the season. Moving him to first base will alleviate defensive problems, yes, but it will also give him a chance to heal at a less intensive defensive position. I don’t know who will play third base, but rookie Nick Castellanos played third base before the Tigers moved him to left field.

As for Fielder’s power and production, let’s do a simplistic blind resume.

Player A – .313, 30 HR, 82 R, 108 RBI
Player B – .279, 25 HR, 83 R, 106 RBI

Player B is Fielder in 2013; Player A is Fielder in 2012. The big difference? He hit fewer than 30 home runs for the first time in forever, and his on-base-plus-slugging (OPS) is way down. You can complain about the batting average, too, but the 2012 average is the anomaly here, not the 2013 average. Looking more deeply into his peripherals, though, Fielder his a boatload of line drives — 26 percent of all balls in play, in fact. Fielder’s average line drive percentage is 21 percent; the MLB average is 19 percent. He also put the most balls into play in his career with the lowest ratio of home runs to fly balls (HR/FB). Unless Fielder was trying to hit line drives all last year, which he likely wasn’t, I expect 32-or-so home runs from Fielder in 2014. My one concern is his depleted walk rate (although his on-base percentage is still very solid), but the dude also dealt with a divorce all year, too. I can’t say I’ve ever been divorced before, so I don’t know what it’s like, but I can’t imagine it’s always pretty.

I understand both sides of this trade, though, and hesitate to declare one team the winner over the other. Kinsler and Fielder are getting old, so declines in production should be expected. I think the winner of this trade will be decided in if Profar pans out and Scherzer lives up to his ace potential and reputation.

Winners: Detroit Tigers, Texas Rangers

SF Giants sign SP Tim Hudson
Even as someone who doesn’t identify as an Atlanta Braves fan, it’s sad to see Hudson go. However, I don’t know how much good this does the Giants. Their farm system is weak and their rotation pitiful. Adding Hudson for a couple of years for back-end rotation help and veteran presence is not going to produce another championship. The team needs to focus on rebuilding, and shedding salary may be a good first step in doing so. Also, why didn’t the Braves re-sign him? Their rotation is very young with zero veteran presence. At least Hudson could fill up a back-end spot that would surely be better than what Paul Maholm could muster. Then again, they could probably turn me into a quality starting pitcher with the magic they evidently possess.

Winner: Tim Hudson
Loser: San Francisco Giants, Atlanta Braves

KC Royals sign SP Jason Vargas
Who considers this a major baseball-related announcement? Jokes aside, Vargas was probably the Angels’ most reliable pitcher and has been better than decent the past two or three years for the Los Angeles Angels of Anaheim and Seattle Mariners. For a team that’s working toward a postseason berth, this isn’t a bad play. Besides, who could be worse than Bruce Chen?

Winners: Kansas City Royals, Jason Vargas

STL 3B David Freese for LAA OFs Peter Bourjos, Randal Grichuk (AAA)
David Freese is an average third baseman who is widely (and incorrectly) perceived as an above-average player because of his postseason heroics. Freese is simply average, though, and shipping him to Anaheim makes it clear that the Angels are going to shop Mark Trumbo. It’s their best chance at getting some prospects, of which their depleted farm system has none.

Trading Freese allows the once-utility second baseman Matt Carpenter to move to third base in order to free up space for rookie Kolten Wong. With outfielder Carlos Beltran, a free agent, likely on the move, I expect to see über-prospect Oscar Taveras man center field while Matt Holliday and Allen Craig play left and right field, with Matt Adams at first base. Another scenario could see Taveras getting the call sometime in May or June and Bourjos manning center field until then. Yet another possibility — the least optimal of them — would see an outfield of Jon Jay, Bourjos and Holliday, with Craig at first base and Adams relegated to the bench.

Either way, the Cardinals are even more stacked than they were before the trade. Ridding of Freese was probably difficult, but it was necessary for progress. The Angels made a decent move in ridding of extra outfield pieces, but sending Randal Grichuk, the Angels No.-2 prospect, to a loaded St. Louis farm system (where Grichuk will likely rank no better than 10th) further guts the Angels minor league system. Rancho Cucamonga is a barren wasteland at this point. (Thank you, Cliff Clinton, for enlightening me as to who Grichuk is. Even as an Angels fan I sure as hell didn’t know.)

We’ll have to wait and see who the Angels get in return for Trumbo, but it won’t change the fact that they lost this trade.

Winner: St. Louis Cardinals
Loser: Los Angeles Angels of Anaheim

Fan report: Dan Uggla has put his Atlanta home up for sale. (Thanks, Charles Henninger, for this tidbit.)
Let’s get his ass out of there. I’ll never forgive you, Dan, for singlehandedly losing me the 2012 fantasy baseball title.