NORMALIZING NEGRO LEAGUE STATISTICS

February 13, 2020 by · 8 Comments

Most baseball fans are familiar with the concept of ‘normalizing’ statistics. For MLB statistics, the most basic adjustment is to normalize for park effects. The simplest park normalization calculation takes the impact of a team’s park on runs scored then divides that number, either positive or negative, in half, and then that calculation is applied to a player’s OPS, ERA, wRC, etc. to get a normalized performance (usually indicated as OPS+, ERA+, wRC+). If you want to compare players from different leagues or seasons, add an adjustment for the individual league scoring rates and, viola, you have a normalized statistic.

However, the reason simple park calculations ‘work’ for normalization is that there is an underlying assumption that, except for home parks, players within a league all face almost identical conditions under which their teams perform. Those conditions include:

1.     Playing the same number of games as all other teams.
2.     Playing schedules with close to the same difficulty.
3.     Playing an equal number of home and away games (and not
playing any neutral site games).
4.     Playing most or all home games in the same park.

Teams, and within teams, the individual players, do not EXACTLY all meet these conditions. Some teams play more difficult schedules than others. Some batters may, by schedule or just bad luck, face better pitchers on average than other batters, and vice versa for pitchers. Some players may play more or fewer home games. But those are exceptions, and unless there’s a need to make really fine distinctions between very similar players, adjustments are typically not made for strength of competition, or for the fact that players play better at home than on the road, etc.

For the Negro Leagues, those assumed conditions all fall apart. Not just for the ‘pre-league’ 1900-1919 era, but even after formal leagues formed, the following conditions still prevailed in the Negro Leagues:

1.     Teams played varying numbers of total games.
2.     Teams played differing numbers of games against other
league teams.
3.     Teams played an unbalanced number of home and away games.
4.     Teams played in multiple ‘neutral’ parks.

As a result, the simple park calculation won’t work for the Negro Leagues. To do a good enough job of normalization, we need to adjust for frequency of home field advantage, the strength of the opponent’s batters and pitchers, and finally the combination of parks played in, both at home and on the road.

The steps used to normalize Negro League stats on seamheads.com are:

1.     Estimate the Negro Leagues home field advantage in runs pergame.

2.     Calculate each team’s Simple Rating System (SRS) number in
runs per game. SRS uses the run difference in each game
between teams plus an adjustment from #1 above based on the
game being home/away/neutral to come up with a Strength of
Schedule which feeds back into the final SRS rating.
Baseball-reference.com calculates SRS for MLB teams (in 2011
the Yankees led MLB with a 1.4 RPG SRS while Houston had a
-1.2 SRS). For more details on the calculation see the
football example at: http://www.pro-football-reference.com/blog/?p=37

3.     Estimate based on Runs Scored/Allowed the SRS broken down
into offense and defense/pitching for each team. Using 2011
MLB as an example, perhaps the Yankees 1.4 SRS would be 1.3
for Offense and 0.1 for defense/pitching. So if our
team is playing the Yankees, our pitchers are going to get a
lot more ‘credit’ for having faced the Yankee batters than
our batters will for having to face the Yankee
pitchers/defense.

4.     Calculate a park factor adjustment for every park played in. We do this by calculating a ‘lifetime’ park factor for each Negro League park played in, including neutral sites, then we ‘resize’ to each league/season so that all the parks in a league/season average to 1.00.

5.     For each game, apply a run adjustment based on the opponent
SRS (again with batters and pitchers getting separate
adjustments), adjust for whether the game was at home, away
or on a neutral site, apply the specific park adjustment,
then add those all together for final batter and pitcher
difficulty runs adjustments for that game. Finally, all of
the team’s individual game adjustments are then summed and
averaged for the normalizing factor for batters and pitchers
for that team.

Comments

8 Responses to “NORMALIZING NEGRO LEAGUE STATISTICS”
  1. Justin Oakes says:

    Thanks for an interesting post, Kevin.

    1. Are the Negro League stats presented (so far) on the site raw or normalized?

    2. When do you envision the stats from 1923-1947 being added to the database?

    Thanks,

    Justin

  2. Justin:

    OPS+ and ERA+ are normalized for 1916-1922. We haven’t yet normalized the Cuban Winter League seasons. They weren’t a priority because:
    1. Almost all of the games were played in the same park.2. They actually did play a balanced, round-robin schedule.
    However, strength of schedule is important especially when you sometimes only have 3 teams in the league, so we will get those Cuban Leagues done soon.

    We will have the 1923 NNL added very soon, but after that the plan is to add 1903-08 Cuban Summer Leagues plus the 1902/03 and 1903/04 Cuban Winter leagues before the end of the year.

    For next year, we will be adding the 1908-1915 Negro League Teams data.

    We do also hope to add the 1923 ECL next year, but 1924 and later is still to be determined for some time 2013 and beyond…

  3. Chris D. says:

    Which Negro League season(s) do you find to be the most complete in terms of all the stats being gathered, or that give the most complete picture of statistical accuracy? Seems like 1926 would be an example. When would you have the 1930 or 1925 fielding stats available? Thank you for your fine work.

  4. The seasons in the 1920s are the most complete, including 1926. We do need to add fielding to some of those, including 1925. At the moment, we are working backwards from 1942 to add more data, so it could be a year or so before 1930 and 1925 fielding stats are available.

  5. Chris D. says:

    1937-1938, 1943-1948: would these also be considered as more complete? Why are some teams in the standings show as playing far fewer games compared to most of the others? Is it because there is incomplete data on them or do some team quit playing or disband? Thank you.

  6. For 1937-38, the issue is not every game against other major black teams had a published box score. For 1943-48, especially for the NAL, that problem became even worse, so those seasons are more incomplete.

    As far as standings go, some teams disbanded, some teams played more games against semi-pro teams instead of playing league games, which we don’t yet show in the standings.

  7. art kyriazis says:

    it seems to me you have a chicken and egg problem with “normalizing” Negro League stats relative to AL and NL stats, which is this:

    The AL and NL stats from 1890 to 1946 were the product of a segregated, all white baseball.

    Whereas, the Negro League stats allowed everyone to play, Latin, Negro, Native American.

    We’re not on numbers here–we’re on computer modeling what the AL/NL would have been assuming they were integrated from 1890-1946, and then normalizing the negro league stats to THAT set of numbers.

    Because as it is, 1890-1946 “major” league stats are kind of bogus.

    Art K

  8. Hello Art – Yes, just to be clear, what we are NOT doing is modeling what the AL/NL would have been if they were integrated. What we are doing is modeling what if we took ONE Negro League player, and dropped him into the AL/NL – the impact of just one player assumed to be minimal to the overall stats of the AL/NL. To try to model what would happen if 100 Negro League players REPLACE 100 AL/NL players would be magnitudes more difficult, and way beyond any simple modeling.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar !

Mobilize your Site
View Site in Mobile | Classic
Share by: