The Seamheads.com Parkfactors Database (Seamheads Parkfactors 2019_December.accdb) Release Date: December 14, 2019 ---------------------------------------------------------------------- README CONTENTS 0.1 Copyright Notice 0.2 Contact Information 1.0 Release Contents 1.1 Introduction 1.2 What's New 1.3 Acknowledgements 1.4 Using this Database 2.0 Data Tables 2.1 Home Main Data Tables 2.2 Visitor Main Data Table 2.3 ParkConfig Table 2.4 RH_LH_Data Tables 2.5 Parks Table 2.6 Teams Table 2.7 Leagues Table 2.8 LeagueTeams Table 2.9 x-Reference Table 3.0 Data Issues 4.0 Online Version of Database ---------------------------------------------------------------------- 0.1 Copyright Notice & Limited Use License This database is copyright 2019 by Kevin D. Johnson. A license is granted for individual use for research purposes. It may not be re-distributed without permission. Any commercial use, or other dissemination of the database in part or in whole is prohibited. Use of this database constitutes acceptance of these terms. For licensing information or further information, contact me at kjokbaseball@yahaoo.com. ---------------------------------------------------------------------- 0.2 Contact Information Groups IO egroup: kjokbaseball E-Mail : kjokbaseball@yahoo.com ---------------------------------------------------------------------- 1.0 Release Contents MS Access Versions: Seamheads Parkfactors 2019_December.accdb Seamheads Parkfactors 2019_December Documentation.txt Comma Delimited Version: Seamheads Parkfactors 2019_December Documentation.txt Active Park Events.csv Home Main Data.csv Home Main Data With Parks Breakout.csv Leagues.csv LeaguesTeams.csv Park Events.csv ParkConfig.csv Parks.csv Retroshet_BBDB_Team_XREF.csv RH_LH_ALL_HR.csv RH_LH_Boxscore.csv RH_LH_Data.csv Teams.csv Visitor_Main_Data.csv ---------------------------------------------------------------------- 1.1 Introduction This database contains batting statistics by ballpark for Major League Baseball from 1871 through 2019. It includes data from the two current leagues (American and National), the four other "major" leagues (American Association, Union Association, Players League, and Federal League), and the National Association of 1871-1875. This database also contains park configuration data by year for each major league ballpark used. This data, however, should be understood to be based on many reported measurements which may be unreliable and may conflict with other reported measurements. Some reported data has been corrected via the usage of maps and geometric approximations to make the data as accurate as possible. If you have any problems or find any errors, please let me know. Any feedback is appreciated ---------------------------------------------------------------------- 1.2 What's New 2019 December Version data changes (12/14/2019) The 2019 version now includes 2018, 1906 and 1907 home and away, and LH and RH splits based on Retrosheet play-by-play and Boxscore data. (note sometimes bathand is unknown for switch-hitters and for some other batters pre-1926). Added numerous configuration data updates. ---------------------------------------------------------------------- 1.3 Acknowledgements This database has been built based on the data in many other sources, and help from many people over the years, including: The MacMillan Encyclopedia - Original source of Home and Road Runs for 20th century thru 1987 Retrosheet.org - Game Logs source of Home and Road data. Event Files source of LH/RH H/A by Park Data The SABR Home Run Log (David Vincent) - source of Home and Road Home Runs Dan Hirsch - LH/RH H/A breakdowns of Retrosheet.org Event and Boxscore file data Howard Johnson - LH/RH H/A breakdowns of Retrosheet.org Event and Boxscore File data Brian Cartwright – LH/RH H/A breakdowns of Retrosheet.org Event File data Mark Miller - LH/RH H/A breakdowns of Retrosheet.org Event File data Clem Comley - LH/RH H/A breakdowns of Retrosheet.org Event File data Victor Wilson - LH/RH H/A HR breakdowns Charles Saeger - Original source of 19th Century Home and Road Home Runs Paul Wendt - Various 19th century ballpark usage issues Green Cathedrals by Phillip Lowry - primary source of Park Configuration data Ballparks of the Deadball Era by Ron Selter - primary source of Deadball era Park Configuration data Ballparks.com - Secondary source of Park Configuration data Ballparksofbaseball.com - Secondary source of Park Configuration data The Lahman Database - Source of Team/League Data Tom M. Tango (TangoTiger) - Database Design Assistance Sean Forman - Database Design Assistance Eric Jones - Detailed 1914 & 1915 Federal League Home and Road Data The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org". For the online version of this database at www.seamheads.com, special thanks to Mike Lynch for his leadership and persistence and Dan Hirsch for his technical expertise in setting up the user interface. Thanks to all, and if I missed anyone, please let me know. ---------------------------------------------------------------------- 1.4 Using this Database This version of the database is available in Microsoft Access format or in a generic, comma delimited format. Because this is a relational database, you will not be able to use the data in a flat-database application. Please note that this is not a stand alone application. It requires a database application or some other application designed specifically to interact with the database. ------------------------------------------------------------------------------ 2.0 Data Tables The design follows these general principles. Each park is assigned a unique number (ParkID). All of the information relating to that park is tagged with its ParkID. The ParkIDs are linked to Teams and Years in the main data tables. The database is comprised of the following main tables: Home Main Data W Park BreakOUT - Runs and Home Runs for Home and Opponent teams for all seasons by Park by Team By Season. Other offensive events (doubles, triples, etc) for the seasons: 1906-2018 1871, 1872, 1874 (NA) Home Main Data - Runs and Home Runs for Home and Opponent teams for all seasons by Team by Season. Other offensive events (doubles, triples, etc) for the seasons: 1906-2018 1871, 1872, 1874 (NA) Visitor Main Data - Runs and Home Runs for Visitor and Opponent teams for all seasons by Team By Season. Other offensive events (doubles, triples, etc) for the seasons: 1906-2018 1871, 1872, 1874 (NA) ParkConfig - Data on descriptive park items such as foul line distances, fence heights, capacities, etc. It is supplemented by these tables: RH_LH_Data - Subset of Main Data Tables broken down by LH/RH for most offensive events within H/A for years where available (note some years before 1974 incomplete). Box Score Data was used to supplement the data for seasons prior to 1941. RH_LH_ALL_HR - Home Runs broken down by LH/RH within H/A for all seasons by Team by Season (may not match Home Runs in RH_LH_Data for any years where Retrosheet play by play data is incomplete). Parks - Park ID Master Table Teams - Team ID Master Table Leagues - League ID Master Table LeagueTeams - League/Team Valid Combinations Master Table Retrosheet_BBDB_Team_Xref - Cross reference of team IDs between Retrosheet and Baseball Databank where different Park Events - Firsts and Lasts, special events per park for selected historical parks (under construction) No-Hitters - Firsts and lasts, special events per park (under construction) Sections 2.1 through 2.9 of this document describe each of the tables in detail and the fields that each contains. --------------------------------------------------------------------------- 2.1 Home Main Data With Parks Breakout Year Season TeamID Lahman Database Team Identifier ParkID Park Code based on Retrosheet Park IDs (NOT in Home_Main_Data_WO_Parks) LgID League (NL, AL, AA, etc.) SEQ Numeric Code which helps link a team's "main" home park data with the road data for that same year (NOT in Home Main Data W/O Parks) GP_H Games Played as Home Team R_Off_H Runs Scored as Home Team R_Def_H Runs Allowed as Home Team HR_Off_H Home Runs Hit as Home Team HR_Def_H Home Runs Allowed as Home Team AB_Off_H At Bats as Home Team H_Off_H Base Hits as Home Team D_Off_H Doubles as Home Team T_Off_H Triples as Home Team RBI_Off_H RBI's as Home Team SH_Off_H Sacrifices as Home Team SF_Off_H Sacrifice Flies as Home Team HBP_OFF_H Hit By Pitches as Home Team BB_Off_H Base on Balls as Home Team IW_Off_H Intentional Walks as Home Team K_Off_H Strikeouts as Home Team SB_Off_H Stolen Bases as Home Team CS_Off_H Caught Stealing as Home Team GDP_Off_H Grounded Into Double Plays as Home Team AB_Def_H At Bats for Opposition when Home Team H_Def_H Base Hits Allowed as Home Team D_Def_H Doubles Allowed as Home Team T_Def_H Triples Allowed as Home Team RBI_Def_H RBI's Allowed as Home Team SH_Def_H Sacrifices Allowed as Home Team SF_Def_H Sacrifice Flies Allowed as Home Team HBP_Def_H Hit By Pitches Allowed as Home Team BB_Def_H Base on Balls Allowed as Home Team IW_Def_H Intentional Walks Given as Home Team K_Def_H Strikeouts Made as Home Team SB_Def_H Stolen Bases Allowed as Home Team CS_Def_H Caught Stealing Made as Home Team GDP_Def_H Grounded Into Double Plays Made as Home Team ------------------------------------------------------------------------------ 2.11 Home Main Data Year Season TeamID Lahman Database Team Identifier LgID League (NL, AL, AA, etc.) GP_H Games Played as Home Team R_Off_H Runs Scored as Home Team R_Def_H Runs Allowed as Home Team HR_Off_H Home Runs Hit as Home Team HR_Def_H Home Runs Allowed as Home Team AB_Off_H At Bats as Home Team H_Off_H Base Hits as Home Team D_Off_H Doubles as Home Team T_Off_H Triples as Home Team RBI_Off_H RBI's as Home Team SH_Off_H Sacrifices as Home Team SF_Off_H Sacrifice Flies as Home Team HBP_OFF_H Hit By Pitches as Home Team BB_Off_H Base on Balls as Home Team IW_Off_H Intentional Walks as Home Team K_Off_H Strikeouts as Home Team SB_Off_H Stolen Bases as Home Team CS_Off_H Caught Stealing as Home Team GDP_Off_H Grounded Into Double Plays as Home Team AB_Def_H At Bats for Opposition when Home Team H_Def_H Base Hits Allowed as Home Team D_Def_H Doubles Allowed as Home Team T_Def_H Triples Allowed as Home Team RBI_Def_H RBI's Allowed as Home Team SH_Def_H Sacrifices Allowed as Home Team SF_Def_H Sacrifice Flies Allowed as Home Team HBP_Def_H Hit By Pitches Allowed as Home Team BB_Def_H Base on Balls Allowed as Home Team IW_Def_H Intentional Walks Given as Home Team K_Def_H Strikeouts Made as Home Team SB_Def_H Stolen Bases Allowed as Home Team CS_Def_H Caught Stealing Made as Home Team GDP_Def_H Grounded Into Double Plays Made as Home Team ----------------------------------------------------------------------------- 2.2 Visitor Main Data Year Season TeamID Lahman Database Team Identifier ParkID Park Code based on Retrosheet Park IDs LgID League (NL, AL, AA, etc.) SEQ Numeric Code which helps link a team's "main" home park data with the road data for that same year GP_A Games Played as Visiting Team R_Off_A Runs Scored as Visiting Team R_Def_A Runs Allowed as Visiting Team HR_Off_A Home Runs Hit as Visiting Team HR_Def_A Home Runs Allowed as Visiting Team AB_Off_A At Bats as Visiting Team H_Off_A Base Hits as Visiting Team D_Off_A Doubles as Visiting Team T_Off_A Triples as Visiting Team RBI_Off_A RBI's as Visiting Team SH_Off_A Sacrifices as Visiting Team SF_Off_A Sacrifice Flies as Visiting Team HBP_OFF_A Hit By Pitches as Visiting Team BB_Off_A Base on Balls as Visiting Team IW_Off_A Intentional Walks as Visiting Team K_Off_A Strikeouts as Visiting Team SB_Off_A Stolen Bases as Visiting Team CS_Off_A Caught Stealing as Visiting Team GDP_Off_A Grounded Into Double Plays as Visiting Team AB_Def_A At Bats for Opposition when Visiting Team H_Def_A Base Hits Allowed as Visiting Team D_Def_A Doubles Allowed as Visiting Team T_Def_A Triples Allowed as Visiting Team RBI_Def_A RBI's Allowed as Visiting Team SH_Def_A Sacrifices Allowed as Visiting Team SF_Def_A Sacrifice Flies Allowed as Visiting Team HBP_Def_A Hit By Pitches Allowed as Visiting Team BB_Def_A Base on Balls Allowed as Visiting Team IW_Def_A Intentional Walks Given as Visiting Team K_Def_A Strikeouts Made as Visiting Team SB_Def_A Stolen Bases Allowed as Visiting Team CS_Def_A Caught Stealing Made as Visiting Team GDP_Def_A Grounded Into Double Plays Made as Visiting Team ------------------------------------------------------------------------------ 2.3 ParkConfig table ParkID Park Code based on Retrosheet Park IDs Name Most common name used for park IN THAT SEASON Year Season Capacity Estimated Normal Maximum Capacity Surface Type of Surface (N=Natural, T=Turf) Area_Fair Square Feet of Fair Territory estimated in thousands of Square Feet Cover Type of Roof (O=Open, D=Dome, R=Retractable) LF_Dim Left Field Line Fence Distance in Feet at the Foul Pole SLF_Dim Straightaway Left Field Distance in Feet approx. 15 degrees in from foul line LFA_Dim Left Field Power Alley Distance in Feet approx. 22.5 degrees in from foul line LC_Dim Left Center Field Distance in Feet approx. 30 degrees in from foul line RCC_Dim Right Centerfield Corner Distance in Feet between Left Center and CF CF_Dim Centerfield (straightway) Fence Distance in feet 45 degrees in from foul lines LCC_Dim Left Centerfield Corner Distance in Feet between Right Center and CF RC_Dim Right Center Field Distance in Feet approx. 30 degrees in from foul line RFA_Dim Right Field Power Alley Distance in Feet approx. 22.5 degrees in from foul line SRF_Dim Straightaway Right Field Distance in Feet approx. 15 degrees in from foul line RF_Dim Right Field Line Fence Distance in Feet at the Foul Pole Backstop Distance from Home Plate to Stands Foul Square Feet of Foul Territory estimated in thousands of Square Feet OR (L=Large, N=Normal, S=Small) LF_W Left Field Wall Height in Feet LC_W Left Center Field Wall Height in Feet CF_W Center Field Wall Height in Feet RC_W Right Center Field Wall Height in Feet RF_W Right Field Wall Height in Feet Comments Comments about remodeling, fires, special features, etc. ------------------------------------------------------------------------------ 2.4 RH_LH_Data Year Season TeamID Lahman Database Team Identifier Park_ID Park Code based on Retrosheet Park IDs Off_Def Indicator of team being on offense or defense H_A Indicator of team being Home or Visitors Bathand R=Right, L=Left, B=Switch-hitter, bathand unknown, U=bathand unknown AB At Bats 1B Singles 2B Doubles 3B Triples HR Home Runs RBI Runs Batted In BB Base on Balls IBB Intentional Walks K Strikeouts HBP Hit By Pitches SF Sacrifice Flies SH Sacrifices GDP Ground into Double Plays PA Plate Appearances R Runs G Games SB Stolen Bases (based on batter handedness) CS Caught Stealing (based on batter handedness) ROE Reached on Error ------------------------------------------------------------------------------ 2.41 RH_LH_ALL_HR Year Season TeamID Lahman Database Team Identifier Off_Def Indicator of team being on offense or defense H_A Indicator of team being Home or Visitors Bathand R=Right and L=Left HR Home Runs ------------------------------------------------------------------------------ 2.5 Parks PARKID Park Code based on Retrosheet Park IDs NAME Most Common Name for Park DURING IT's LIFETIME CITY City Location of Park STATE STATE or Province Location of Park START Date of first major league game at Park END Date of last major league game at Park LEAGUE League that Park was most often used in NOTES Various Notes about Park AKA Other Names Park may have been known as EXACT Latitude and Longitude are exact known coordinates Latitude Latitude location of park in degrees decimal Longitude Longitude location of park in degrees longitude Altitude Altitude of park in thousands of feet NOTE: The EXACT locations of all parks ever used in MLB games are now the Parks table, except for four parks: Monumental Park - Baltimore, MD Ludlow Park - Ludlow, KY West New York Field - West New York, NJ Fireworks Park, GLoucester City, NJ Any location information on these parks would be greatly appreciated. ------------------------------------------------------------------------------ 2.6 Teams TeamID Lahman Database Team Identifier League League Start Year First Year of Team ENd Year Last Year of Team City City Name only of Team Nickname Nickname only of team ------------------------------------------------------------------------------ 2.7 Leagues LgID League (NL, AL, AA, etc.) LgName Name of League Start_Year First Major League Season of League End_Year Last Major League Season of League Comments Comments about league history ------------------------------------------------------------------------------ 2.8 LeagueTeams table LgID League (NL, AL, AA, etc.) TeamID Lahman Database Team Identifier Year Season ------------------------------------------------------------------------------ 2.9 Retrosheet_BBDB_Team_Xref Year Season RetroID Retrosheet Team ID BBDBID Baseball-Databank Team ID ------------------------------------------------------------------------------ 2.10 Park Events ParkID First_Game Score_Winner First_Attendance First_Starting_Pitchers First_Batter First_Game_Result First_Hit First_Result_Inning First_Run First_RBI FIRST_HR Date_Game_No First K_Batter First_Win First_Loss First_Grand_Slam First_Inside_the_Park_HR First_No_Hitter Last_Game Score_Winner Last-Attendance Last_Starting_Pitchers Last_Batter Last_Game_Result Last_Hit Last_Result_Inning Last_Run Last_RBI Last_HR Last_K_Batter Last_Win Last_Loss Last_Grand_Slam Last_Inside_the_Park_HR Last_Most_Recent_No_Hitter Trivia 2.11 No-Hitters Under construction 3.0 Data Issues RH_LH_Data Table MAY not tie exactly to Home_Main_Data Tables and Visitor_Main_Data in some seasons for all data due to incomplete play by play data in Retrosheet Event Files. RH_LH_ALL_HR_Only Table home runs MAY not tie exactly to RH_LH_Data table home runs due to incomplete play by play data in Retrosheet Event Files. LH/RH HR data for 1906 thru 1925 seasons have records marked as "U" for "Unknown" as the handedness of some batters is not known. LH/RH data for 1906 thru 1936 seasons have records marked as "B" for "switch-hitter, handedness unknown". If anyone knows where to find the exact numbers for these items, please let me know. ------------------------------------------------------------------------------ 4.0 Online Version of Database The online database version includes the following additional data: Map of park location Percentage calculations by season of Turf and Roof types. Averages by season of field dimensions, wall heights, fair territory, and backstop distances. Totals of games by team and by city. 1 Year Park Factors: The 1 year park factors are based on UNREGRESSED observed data. There is an 'other parks corrector' calculation made due to the other road parks' total difference from the league average being offset by the park rating of the park that is being rated. In other words, if you're calculating factors for Coors Field, then Coors itself is not part of the 'road' set of parks it is being compared against, so that road set of parks is actually slightly pitcher friendly (assuming Coors is hitter friendly that year) instead of being 100% neutral, so the 'other parks corrector' makes an adjustment for that fact. Except for the other parks corrector calculation, the 1-year park factors are simple rates of components per At Bat at home divided by rate of components per AB on the road. 3 Year Park Factors: The 3 year park factors are REGRESSED, and meant as an estimate of the park's 'true' impact on batting components. We calculate this factor slightly different from Total Baseball/Baseball-Reference.com (see http://www.baseball-reference.com/about/parkadjust.shtml for an explanation of that calculation). For a given park/season, we use the 1-year factor for that park/season plus the 1-year factor for the PREVIOUS season plus the 1-year factor for the FOLLOWING season. We weight each based on the number of home games for each season, but otherwise we weight them equally (we don't add weight to the current season). Then we regress that number 25% towards that parks' long-term historical factor for that component. The long-term historical factor is the sum of all 1-year factors for the history of the park weighted by home games each season. We believe this gives us a closest possible approximation of the 'true' park factor without adding more complicating variables such as modifications to park characteristics, new parks in the league, weighting the long term factor by number of seasons, etc. For the very first and very last year of a park, since there is no PREVIOUS or FOLLOWING season in those cases, we chose to use an additional following season (season +2) for the first park year and an additional previous season (season - 2) in those calculations. This results in the first TWO seasons and the last TWO seasons of the 3 year calculations being the same! What we're saying is that lacking an adjacent season our best guess of the first and last seasons of a parks existence is the same guess we use for the 2nd season and the next to last season. If anyone wants to prove that there is a more accurate way to handle these 'end' seasons, we are very open to ideas.