Class KMeans (2.29.0)

  KMeans 
 ( 
 n_clusters 
 : 
 int 
 = 
 8 
 , 
 * 
 , 
 init 
 : 
 typing 
 . 
 Literal 
 [ 
 "kmeans++" 
 , 
 "random" 
 , 
 "custom" 
 ] 
 = 
 "kmeans++" 
 , 
 init_col 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 distance_type 
 : 
 typing 
 . 
 Literal 
 [ 
 "euclidean" 
 , 
 "cosine" 
 ] 
 = 
 "euclidean" 
 , 
 max_iter 
 : 
 int 
 = 
 20 
 , 
 tol 
 : 
 float 
 = 
 0.01 
 , 
 warm_start 
 : 
 bool 
 = 
 False 
 ) 
 

K-Means clustering.

Examples:

 >>> import bigframes.pandas as bpd
>>> from bigframes.ml.cluster import KMeans

>>> X = bpd.DataFrame({"feat0": [1, 1, 1, 10, 10, 10], "feat1": [2, 4, 0, 2, 4, 0]})
>>> kmeans = KMeans(n_clusters=2).fit(X)
>>> kmeans.predict(bpd.DataFrame({"feat0": [0, 12], "feat1": [0, 3]}))["CENTROID_ID"] # doctest:+SKIP
0    1
1    2
Name: CENTROID_ID, dtype: Int64

>>> kmeans.cluster_centers_ # doctest:+SKIP
centroid_id feature  numerical_value categorical_value
0            1   feat0              5.5                []
1            1   feat1              1.0                []
2            2   feat0              5.5                []
3            2   feat1              4.0                []

[4 rows x 4 columns] 

Properties

cluster_centers_

Information of cluster centers.

Returns
Type
Description
DataFrame of cluster centers, containing following columns: centroid_id: An integer that identifies the centroid. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the centroid that centroid_id identifies. If feature is not numeric, the value is NULL. categorical_value: An list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per centroid.

Methods

__repr__

  __repr__ 
 () 
 

Print the estimator's constructor with all non-default parameter values.

detect_anomalies

  detect_anomalies 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ], 
 * 
 , 
 contamination 
 : 
 float 
 = 
 0.1 
 ) 
 - 
> bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 

Detect the anomaly data points of the input.

Returns
Type
Description
detected DataFrame.

fit

  fit 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ], 
 y 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 base 
 . 
 _T 
 

Compute k-means clustering.

Parameters
Name
Description
X
bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series

DataFrame of shape (n_samples, n_features). Training data.

y
default None

Not used, present here for API consistency by convention.

Returns
Type
Description
KMeans
Fitted estimator.

get_params

  get_params 
 ( 
 deep 
 : 
 bool 
 = 
 True 
 ) 
 - 
> typing 
 . 
 Dict 
 [ 
 str 
 , 
 typing 
 . 
 Any 
 ] 
 

Get parameters for this estimator.

Parameter
Name
Description
deep
bool, default True

Default True . If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
Type
Description
Dictionary
A dictionary of parameter names mapped to their values.

predict

  predict 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ], 
 ) 
 - 
> bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 

Predict the closest cluster each sample in X belongs to.

Returns
Type
Description
DataFrame of shape (n_samples, n_input_columns + n_prediction_columns). Returns predicted labels.

register

  register 
 ( 
 vertex_ai_model_id 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 base 
 . 
 _T 
 

Register the model to Vertex AI.

After register, go to the Google Cloud console ( https://console.cloud.google.com/vertex-ai/models ) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter
Name
Description
vertex_ai_model_id
Optional[str], default None

Optional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

  score 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ], 
 y 
 = 
 None 
 , 
 ) 
 - 
> bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 

Calculate evaluation metrics of the model.

Returns
Type
Description
DataFrame of the metrics.

to_gbq

  to_gbq 
 ( 
 model_name 
 : 
 str 
 , 
 replace 
 : 
 bool 
 = 
 False 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 cluster 
 . 
 KMeans 
 

Save the model to BigQuery.

Returns
Type
Description
KMeans
Saved model.
Create a Mobile Website
View Site in Mobile | Classic
Share by: