Class XGBClassifier (0.14.1)

  XGBClassifier 
 ( 
 num_parallel_tree 
 : 
 int 
 = 
 1 
 , 
 booster 
 : 
 typing 
 . 
 Literal 
 [ 
 "gbtree" 
 , 
 "dart" 
 ] 
 = 
 "gbtree" 
 , 
 dart_normalized_type 
 : 
 typing 
 . 
 Literal 
 [ 
 "tree" 
 , 
 "forest" 
 ] 
 = 
 "tree" 
 , 
 tree_method 
 : 
 typing 
 . 
 Literal 
 [ 
 "auto" 
 , 
 "exact" 
 , 
 "approx" 
 , 
 "hist" 
 ] 
 = 
 "auto" 
 , 
 min_tree_child_weight 
 : 
 int 
 = 
 1 
 , 
 colsample_bytree 
 : 
 float 
 = 
 1.0 
 , 
 colsample_bylevel 
 : 
 float 
 = 
 1.0 
 , 
 colsample_bynode 
 : 
 float 
 = 
 1.0 
 , 
 gamma 
 : 
 float 
 = 
 0.0 
 , 
 max_depth 
 : 
 int 
 = 
 6 
 , 
 subsample 
 : 
 float 
 = 
 1.0 
 , 
 reg_alpha 
 : 
 float 
 = 
 0.0 
 , 
 reg_lambda 
 : 
 float 
 = 
 1.0 
 , 
 early_stop 
 : 
 bool 
 = 
 True 
 , 
 learning_rate 
 : 
 float 
 = 
 0.3 
 , 
 max_iterations 
 : 
 int 
 = 
 20 
 , 
 min_rel_progress 
 : 
 float 
 = 
 0.01 
 , 
 enable_global_explain 
 : 
 bool 
 = 
 False 
 , 
 xgboost_version 
 : 
 typing 
 . 
 Literal 
 [ 
 "0.9" 
 , 
 "1.1" 
 ] 
 = 
 "0.9" 
 , 
 )

XGBoost classifier model.

Parameters

Name

Description

num_parallel_tree

Optional[int]

Number of parallel trees constructed during each iteration. Default to 1.

booster

Optional[str]

Specify which booster to use: gbtree or dart. Default to "gbtree".

dart_normalized_type

Optional[str]

Type of normalization algorithm for DART booster. Possible values: "TREE", "FOREST". Default to "TREE".

tree_method

Optional[str]

Specify which tree method to use. Default to "auto". If this parameter is set to default, XGBoost will choose the most conservative option available. Possible values: ""exact", "approx", "hist".

min_child_weight

Optional[float]

Minimum sum of instance weight(hessian) needed in a child. Default to 1.

colsample_bytree

Optional[float]

Subsample ratio of columns when constructing each tree. Default to 1.0.

colsample_bylevel

Optional[float]

Subsample ratio of columns for each level. Default to 1.0.

colsample_bynode

Optional[float]

Subsample ratio of columns for each split. Default to 1.0.

gamma

Optional[float]

(min_split_loss) Minimum loss reduction required to make a further partition on a leaf node of the tree. Default to 0.0.

max_depth

Optional[int]

Maximum tree depth for base learners. Default to 6.

subsample

Optional[float]

Subsample ratio of the training instance. Default to 1.0.

reg_alpha

Optional[float]

L1 regularization term on weights (xgb's alpha). Default to 0.0.

reg_lambda

Optional[float]

L2 regularization term on weights (xgb's lambda). Default to 1.0.

early_stop

Optional[bool]

Whether training should stop after the first iteration. Default to True.

learning_rate

Optional[float]

Boosting learning rate (xgb's "eta"). Default to 0.3.

max_iterations

Optional[int]

Maximum number of rounds for boosting. Default to 20.

min_rel_progress

Optional[float]

Minimum relative loss improvement necessary to continue training when early_stop is set to True. Default to 0.01.

enable_global_explain

Optional[bool]

Whether to compute global explanations using explainable AI to evaluate global feature importance to the model. Default to False.

xgboost_version

Optional[str]

Specifies the Xgboost version for model training. Default to "0.9". Possible values: "0.9", "1.1".

Methods

repr

  __repr__ 
 ()

Print the estimator's constructor with all non-default parameter values

fit

  fit 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 ], 
 y 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 ], 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 base 
 . 
 _T

Fit gradient boosting model.

Note that calling fit() multiple times will cause the model object to be re-fit from scratch. To resume training from a previous checkpoint, explicitly pass xgb_model argument.

Parameters

Name

Description

X

 bigframes.dataframe.DataFrame 
or bigframes.series.Series

Series or DataFrame of shape (n_samples, n_features). Training data.

y

 bigframes.dataframe.DataFrame 
or bigframes.series.Series

DataFrame of shape (n_samples,) or (n_samples, n_targets). Target values. Will be cast to X's dtype if necessary.

Returns

Type

Description

XGBModel

Fitted Estimator.

get_params

  get_params 
 ( 
 deep 
 : 
 bool 
 = 
 True 
 ) 
 - 
> typing 
 . 
 Dict 
 [ 
 str 
 , 
 typing 
 . 
 Any 
 ]

Get parameters for this estimator.

Parameter

Name

Description

deep

bool, default True

Default True . If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

Type

Description

Dictionary

A dictionary of parameter names mapped to their values.

predict

  predict 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 ] 
 ) 
 - 
> bigframes 
 . 
 dataframe 
 . 
 DataFrame

Predict using the XGB model.

Parameter

Name

Description

X

 bigframes.dataframe.DataFrame 
or bigframes.series.Series

Series or DataFrame of shape (n_samples, n_features). Samples.

Returns

Type

Description

DataFrame of shape (n_samples,)

Returns predicted values.

register

  register 
 ( 
 vertex_ai_model_id 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 base 
 . 
 _T

After register, go to Google Cloud Console ( https://console.cloud.google.com/vertex-ai/models ) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter

Name

Description

vertex_ai_model_id

Optional[str], default None

optional string id as model id in Vertex. If not set, will by default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

  score 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 ], 
 y 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 ], 
 )

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Name

Description

X

 bigframes.dataframe.DataFrame 
or bigframes.series.Series

DataFrame of shape (n_samples, n_features). Test samples.

y

 bigframes.dataframe.DataFrame 
or bigframes.series.Series

DataFrame of shape (n_samples,) or (n_samples, n_outputs). True labels for X .

Returns

Type

Description

 bigframes.dataframe.DataFrame

A DataFrame of the evaluation result.

to_gbq

  to_gbq 
 ( 
 model_name 
 : 
 str 
 , 
 replace 
 : 
 bool 
 = 
 False 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 ensemble 
 . 
 XGBClassifier

Save the model to BigQuery.

Parameters

Name

Description

model_name

str

the name of the model.

replace

bool, default False

whether to replace if the model already exists. Default to False.

Returns

Type

Description

XGBClassifier

saved model.

Class XGBClassifier (0.14.1) Stay organized with collections Save and categorize content based on your preferences.

Parameters

Methods

__repr__

fit

get_params

predict

register

score

to_gbq

Class XGBClassifier (0.14.1)

repr