Class KFold (2.17.0)

  KFold 
 ( 
 n_splits 
 : 
 int 
 = 
 5 
 , 
 * 
 , 
 random_state 
 : 
 typing 
 . 
 Optional 
 [ 
 int 
 ] 
 = 
 None 
 ) 
 

K-Fold cross-validator.

Split data in train/test sets. Split dataset into k consecutive folds.

Each fold is then used once as a validation while the k - 1 remaining folds form the training set.

Examples:

 >>> import bigframes.pandas as bpd
>>> from bigframes.ml.model_selection import KFold
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [1, 3, 5], "feat1": [2, 4, 6]})
>>> y = bpd.DataFrame({"label": [1, 2, 3]})
>>> kf = KFold(n_splits=3, random_state=42)
>>> for i, (X_train, X_test, y_train, y_test) in enumerate(kf.split(X, y)):
...     print(f"Fold {i}:")
...     print(f"  X_train: {X_train}")
...     print(f"  X_test: {X_test}")
...     print(f"  y_train: {y_train}")
...     print(f"  y_test: {y_test}")
...
Fold 0:
  X_train:    feat0  feat1
1      3      4
2      5      6
<BLANKLINE>
[2 rows x 2 columns]
  X_test:    feat0  feat1
0      1      2
<BLANKLINE>
[1 rows x 2 columns]
  y_train:    label
1      2
2      3
<BLANKLINE>
[2 rows x 1 columns]
  y_test:    label
0      1
<BLANKLINE>
[1 rows x 1 columns]
Fold 1:
  X_train:    feat0  feat1
0      1      2
2      5      6
<BLANKLINE>
[2 rows x 2 columns]
  X_test:    feat0  feat1
1      3      4
<BLANKLINE>
[1 rows x 2 columns]
  y_train:    label
0      1
2      3
<BLANKLINE>
[2 rows x 1 columns]
  y_test:    label
1      2
<BLANKLINE>
[1 rows x 1 columns]
Fold 2:
  X_train:    feat0  feat1
0      1      2
1      3      4
<BLANKLINE>
[2 rows x 2 columns]
  X_test:    feat0  feat1
2      5      6
<BLANKLINE>
[1 rows x 2 columns]
  y_train:    label
0      1
1      2
<BLANKLINE>
[2 rows x 1 columns]
  y_test:    label
2      3
<BLANKLINE>
[1 rows x 1 columns] 

Parameters

Name
Description
n_splits
int

Number of folds. Must be at least 2. Default to 5.

random_state
Optional[int]

A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time. Default to None.

Methods

get_n_splits

  get_n_splits 
 () 
 - 
> int 
 

Returns the number of splitting iterations in the cross-validator.

Returns
Type
Description
int
the number of splitting iterations in the cross-validator.

split

  split 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ], 
 y 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 - 
> typing 
 . 
 Generator 
 [ 
 tuple 
 [ 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 NoneType 
 ], 
 ... 
 , 
 ], 
 None 
 , 
 None 
 , 
 ] 
 

Generate indices to split data into training and test set.

Parameters
Name
Description
X
bigframes.dataframe.DataFrame or bigframes.series.Series

BigFrames DataFrame or Series of shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features.

y
bigframes.dataframe.DataFrame , bigframes.series.Series or None :Yields: *X_train ( bigframes.dataframe.DataFrame or bigframes.series.Series )* -- The training data for that split. X_test ( bigframes.dataframe.DataFrame or bigframes.series.Series ): The testing data for that split. y_train ( bigframes.dataframe.DataFrame , bigframes.series.Series or None): The training label for that split. y_test ( bigframes.dataframe.DataFrame , bigframes.series.Series or None): The testing label for that split.

BigFrames DataFrame, Series of shape (n_samples,) or None. The target variable for supervised learning problems. Default to None.

Design a Mobile Site
View Site in Mobile | Classic
Share by: