Create a regression model with BigQuery DataFrames
Stay organized with collections
Save and categorize content based on your preferences.
Create a linear regression model on the body mass of penguins using the BigQuery DataFrames API.
Explore further
For detailed documentation that includes this code sample, see the following:
Code sample
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
, and code samples are licensed under the Apache 2.0 License
. For details, see the Google Developers Site Policies
. Java is a registered trademark of Oracle and/or its affiliates.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis example demonstrates creating a linear regression model to predict penguin body mass using the BigQuery DataFrames API.\u003c/p\u003e\n"],["\u003cp\u003eThe code utilizes the \u003ccode\u003ebigquery-public-data.ml_datasets.penguins\u003c/code\u003e dataset, specifically focusing on the Adelie Penguin species.\u003c/p\u003e\n"],["\u003cp\u003eThe script involves loading data, filtering by species, dropping irrelevant columns, handling null values, and splitting data into training sets.\u003c/p\u003e\n"],["\u003cp\u003eA \u003ccode\u003eLinearRegression\u003c/code\u003e model is created, trained, and scored using specified feature and label columns, and predictions are made on the test set.\u003c/p\u003e\n"],["\u003cp\u003eThe sample uses the BigQuery DataFrames library with the python language.\u003c/p\u003e\n"]]],[],null,["# Create a regression model with BigQuery DataFrames\n\nCreate a linear regression model on the body mass of penguins using the BigQuery DataFrames API.\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Use BigQuery DataFrames](/bigquery/docs/use-bigquery-dataframes)\n\nCode sample\n-----------\n\n### Python\n\n\nBefore trying this sample, follow the Python setup instructions in the\n[BigQuery quickstart using\nclient libraries](/bigquery/docs/quickstarts/quickstart-client-libraries).\n\n\nFor more information, see the\n[BigQuery Python API\nreference documentation](/python/docs/reference/bigquery/latest).\n\n\nTo authenticate to BigQuery, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for client libraries](/bigquery/docs/authentication#client-libs).\n\n from bigframes.ml.linear_model import LinearRegression\n import bigframes.pandas as bpd\n\n # Load data from BigQuery\n query_or_table = \"bigquery-public-data.ml_datasets.penguins\"\n bq_df = bpd.read_gbq(query_or_table)\n\n # Filter down to the data to the Adelie Penguin species\n adelie_data = bq_df[bq_df.species == \"Adelie Penguin (Pygoscelis adeliae)\"]\n\n # Drop the species column\n adelie_data = adelie_data.drop(columns=[\"species\"])\n\n # Drop rows with nulls to get training data\n training_data = adelie_data.dropna()\n\n # Specify your feature (or input) columns and the label (or output) column:\n feature_columns = training_data[\n [\"island\", \"culmen_length_mm\", \"culmen_depth_mm\", \"flipper_length_mm\", \"sex\"]\n ]\n label_columns = training_data[[\"body_mass_g\"]]\n\n test_data = adelie_data[adelie_data.body_mass_g.isnull()]\n\n # Create the linear model\n model = LinearRegression()\n model.fit(feature_columns, label_columns)\n\n # Score the model\n score = model.score(feature_columns, label_columns)\n\n # Predict using the model\n result = model.predict(test_data)\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=bigquery)."]]