Major League Equivalencies for The Negro Leagues
February 9, 2020 by Kevin Johnson · 2 Comments
Major League Equivalents (MLEs) are a series of calculations designed to take non-major league baseball performance and estimate what that performance’s results would look like statistically in the context of the Major Leagues. Bill James gets credit for popularizing MLEs, as he outlined his method for minor league batters in the 1985 Baseball Abstract. James was only interested at the time in making sense of minor league statistics, but MLE’s can be used to evaluate ANY baseball performance, including minor league, Japanese or other foreign league, Negro League, NCAA leagues, etc. You can also use the basic MLE procedure to evaluate the performance of an American League player relative to the National League, or perhaps calculate what type of batting statistics Ty Cobb’s 1909 performance would look like in the 2007 AL.
Of course, creating MLEs at all is a bit of a fool’s errand. We can’t really KNOW how any player would have played placed in a different environment, especially one that is drastically different regarding its level of competition. Some players can adapt and adjust their playing when faced with different settings, and some have difficulty. However it can be a fun and enlightening exercise.
For current baseball players, MLEs can also be used to build predictive models of future play results. This is really what James had in mind – a way to use past non-major league data to predict future major league performance of young players by converting that non-major league data into data that would approximate MLB level play, then use THAT data along with any MLB historical data on the player to give a greater sample size on which to make predictions. Today, everyone from major league team executives to fantasy league players rely on predictions built upon some basic framework for MLEs.
Besides executives and fantasy baseball players, MLEs can be useful for baseball historians and baseball ‘gamers’ (those who play simulation games like Diamond Mind, Out of the Park, APBA). MLEs can help to answer questions such as:
How would Ted Williams, Bob Gibson, Ty Cobb, and Barry Bonds do if all placed in a league together?
What if Japanese League players had been allowed in MLB beginning in the 1960’s?
What if the Major Leagues had integrated in the 1920’s?
MLEs can give us somewhat realistic “What ifs?” that can be analyzed, simulated, and just plain enjoyed.
Creating good MLE’s involve these basic steps:
1. Determine the relative strengths between the FROM League environment vs. the TO League Environment.
Ideally, you’d have actual data from which to do this, such as players who move from PCL to NL within the same year. Compare their stats between the two, adjust for quantity (player may have only 5 PAs in NL and 450 in PCL that year, while another has 400 and 70, for example), adjust for selective sampling if needed, sum up, and compare. For Japanese Leagues, you generally only have players moving to and from MLB BETWEEN seasons, so you would want to pair one season to the following season, but since the player would be a year older the 2nd year, maybe make a slight adjustment for age to make those pairs comparable. For Negro Leagues, you may have only limited pairs in the 1940’s, or almost no pairs in the 1920’s, in which case you have to make some assumptions (educated guesses) about league strengths.
2. Determine the differences in League Run Environments.
This SHOULD be straightforward, but it’s not. For example, if 10 Runs per Game are scored in the PCL, and 8 Runs per Game are scored in the NL, you would think that the PCL stats for the MLE calculation would need to be decreased by 20% for batters and pitchers (lower runs allowed for pitchers). However, ballparks on average may be a little smaller in the PCL, and perhaps if the PCL had played their season in MLB parks, they would have scored only 15% more than the NL instead of 20% more. If that’s the case, then a BATTER moving from the PCL to the NL is going to see his offensive production decline by even MORE than 20%, while a PCL pitcher would actually see his Runs Allowed IMPROVE by more than 20%! This means you need the next step:
3. Determine the differences in Ballparks (and other factors) between leagues.
As mentioned, league run environments are impacted by the parks, the balls, and the bats (like NCAA players moving from aluminum bats to wooden bats). If a player like Tuffy Rhodes is moving from the NL to Japan, he’s moving to a run scoring environment around 6% LESS than MLB so we would expect his stat line adjustment in Step #2 to be 6% worse. However, partially due to parks and partially due to the baseball, the park run environment in Japan (pre-2012) is much more hitter friendly ON AVERAGE than parks in MLB, perhaps as much as 13% more hitter friendly. So, not only does Tuffy get around a 10% boost in step #1 for moving to a weaker league, he gets another 7% boost (13% – 6%) from steps #2 and #3 together.
Calculating this step is tricky, because the evidence is intertwined with the league run scoring environment. The best estimating technique is to look at the DIFFERENCE between batters and pitchers who move between the same league environments. For example, if the empirical evidence shows that PCL batters hit 15% worse in MLB, while PCL pitchers allow only 5% more runs moving to the MLB, that’s evidence that the PCL parks are around 5% more hitter friendly on average than MLB parks. (-15%+5%)/2.
4. Determine the differences in Ballparks WITHIN leagues.
Step #3 uses the ‘average’ parks for the FROM and TO leagues, but the specific park a batter played in, and the specific park he’s being calculated into, should be adjusted for if estimates are known.
There have been several good publicly available methods already created to calculate MLE’s. Bill James of course had his formulas in the 1985 Abstract, specifically for AA and AAA players going to MLB. James then had the “Willie Davis Method” in his Historical Baseball Abstract, specifically to convert any one major league batting season into a ‘neutral’ major league. Dan Syzmborski does MLEs called ZIPS that are calculated very similarly to Bill James for batters, only he also has formulas for pitchers.
I too have my own MLE calculations, with batting MLEs based primarily on the “Odds Ratio” method outlined in many blogs over the years by Tom M. Tango, author of “The Book” and currently the Senior Database Architect of Stats for MLB Advanced Media. For pitching, MLEs my calculations closely follow the method of Sean Smith, whose method was previously used by Baseball-Reference.com for their neutralized stat calculations.
Since here at Seamheads we specialize in the rich history of Negro Leaguers, one question that often lurks in the background, and sometimes the foreground, is “Just how good WERE those guys?” MLEs are the tool that, along with those important environment variables and caveats above, can help us down the road a bit to answering that question.
Getting into the nitty-gritty details of the calculations are for a future article, but to demonstrate the power of MLEs we will take the Negro League stats for Wilber “Bullet” Rogan, who as a two-way player will provide us with both batting and pitching stats to work with, and as a Hall of Fame performer will give us an idea of how good the top players in the Negro Leagues might have been. We’ll use my “KJOK” method, and see what the results look like using 2019 NL as the MLE “TO” season:
Here are Rogan’s raw stats from the Negro Leagues (per seamheads.com)
Here are the translated stats using my method:
Some general observations on the results:
While the method does try to ‘shape’ statistics for a change in eras where the distribution of Singles, Home Runs, etc. is vastly different, like the 1920’s Negro Leagues versus 2019 National League, the larger the differences in distribution, the harder it is for the model to create realistic stats in the new environment. We are not only moving from the Negro Leagues to MLB, we are also moving from 1920s to 2019. If we had moved into the 1920s Nation League environment, the model would do a better job. So instead of hitting .340 in the 2019 era, maybe a better estimate would be .320 but with a few more extra base hits. The model adds plate appearances for the difference in league game schedules between the leagues, so the higher HR numbers are a combination of much higher HR environment and a longer season schedule.
On the pitching side, the combination of Rogan striking out batters at a much higher rate than his contemporary Negro League pitchers, put into the high strikeout 2019 environment, results in translated strikeout totals that may be a bit too high to be realistic.
Admittedly these are Rogan’s prime, best seasons, but the translations do seem to confirm his reputation as a great two-way player. Note again that this does not mean Rogan is PREDICTED to hit 36 Home Runs if he played in the 2019 NL instead of the 1922 NNL. It just means that given what he did in the 1922 NNL, and making some assumptions about the quality of play and the ballparks, what he did do at the plate would be approximately EQUIVALENT to hitting 36 home runs, batting .300, etc.
Assumptions of course can be wrong. The point is not necessarily that the MLEs are ‘correct’, but the point is that we are now starting to have the data for players, leagues, ballparks, etc. that combined with statistical tools can be used to approximate “how good these guys really were” as opposed to just purely guessing based on anecdotal stories or very incomplete statistics that do not have any league scoring context to provide an analytical framework.
In future articles we’ll step back into the detailed data a bit and discuss how to analyze players when we don’t have 100% complete data to work with, like missing strikeouts for batters. How do we get around that? Even if we can approximate batting or pitching performance, what about defense, or even baserunning? Do we have ways to approximate those also, or are they hopelessly lost to history? Stay tuned….
Have you ever considered adding MLEs as a part of player pages? Or publishing a work presenting your results for at least some of the top players? Maybe a Top 100 of all-time to lend perspective. I realize MLEs are based on assumptions and the article does a great job explaining both the utility and limitations of MLEs but I believe it would be a welcome addition to the site to have them added… or given some of the care that may need to go into customizing each calculation, at least some of the greats.
Thanks for everything y’all do on the site. It’s invaluable.
Hello Mark:
It has been on our list of potential projects for several years. We want to do it as ‘correctly’ as possible of course. Once we finish getting fielding data for all the 1920 – 1948 seasons, we may tackle it.
In the meantime, you may want to check out Eric Chalek’s work on Negro Leage MLE’s here: https://homemlb.wordpress.com/the-negro-leagues/
Glad you like the site.