Module model_selection (1.19.0)

Functions for test/train split and model tuning. This module is styled after scikit-learn's model_selection module: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection .

Classes

KFold

  KFold 
 ( 
 n_splits 
 : 
 int 
 = 
 5 
 , 
 * 
 , 
 random_state 
 : 
 typing 
 . 
 Optional 
 [ 
 int 
 ] 
 = 
 None 
 )

K-Fold cross-validator.

Split data in train/test sets. Split dataset into k consecutive folds.

Each fold is then used once as a validation while the k - 1 remaining folds form the training set.

Parameters

Name

Description

n_splits

int

Number of folds. Must be at least 2. Default to 5.

random_state

Optional[int]

A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time. Default to None.

Modules Functions

train_test_split

  train_test_split 
 ( 
 * 
 arrays 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 ], 
 test_size 
 : 
 typing 
 . 
 Optional 
 [ 
 float 
 ] 
 = 
 None 
 , 
 train_size 
 : 
 typing 
 . 
 Optional 
 [ 
 float 
 ] 
 = 
 None 
 , 
 random_state 
 : 
 typing 
 . 
 Optional 
 [ 
 int 
 ] 
 = 
 None 
 , 
 stratify 
 : 
 typing 
 . 
 Optional 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 ] 
 = 
 None 
 ) 
 - 
> typing 
 . 
 List 
 [ 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 ]]

Splits dataframes or series into random train and test subsets.

Parameters

Name

Description

\*arrays

 bigframes.dataframe.DataFrame 
or bigframes.series.Series

A sequence of BigQuery DataFrames or Series that can be joined on their indexes.

test_size

default None

The proportion of the dataset to include in the test split. If None, this will default to the complement of train_size. If both are none, it will be set to 0.25.

train_size

default None

The proportion of the dataset to include in the train split. If None, this will default to the complement of test_size.

random_state

default None

A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time.

Returns

Type

Description

List[Union[ bigframes.dataframe.DataFrame 
, bigframes.series.Series 
]]

A list of BigQuery DataFrames or Series.

Module model_selection (1.19.0) Stay organized with collections Save and categorize content based on your preferences.

Classes

KFold

Modules Functions

train_test_split

Module model_selection (1.19.0)