Class PCA (2.28.0)

  PCA 
 ( 
 n_components 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 int 
 , 
 float 
 ]] 
 = 
 None 
 , 
 * 
 , 
 svd_solver 
 : 
 typing 
 . 
 Literal 
 [ 
 "full" 
 , 
 "randomized" 
 , 
 "auto" 
 ] 
 = 
 "auto" 
 )

Principal component analysis (PCA).

Examples:

 >>> import bigframes.pandas as bpd
>>> from bigframes.ml.decomposition import PCA
>>> X = bpd.DataFrame({"feat0": [-1, -2, -3, 1, 2, 3], "feat1": [-1, -1, -2, 1, 1, 2]})
>>> pca = PCA(n_components=2).fit(X)
>>> pca.predict(X) # doctest:+SKIP
    principal_component_1  principal_component_2
0              -0.755243               0.157628
1               -1.05405              -0.141179
2              -1.809292               0.016449
3               0.755243              -0.157628
4                1.05405               0.141179
5               1.809292              -0.016449
<BLANKLINE>
[6 rows x 2 columns]
>>> pca.explained_variance_ratio_ # doctest:+SKIP
    principal_component_id  explained_variance_ratio
0                       1                   0.00901
1                       0                   0.99099
<BLANKLINE>
[2 rows x 2 columns]

Parameters

Name

Description

n_components

int, float or None, default None

Number of components to keep. If n_components is not set, all components are kept, n_components = min(n_samples, n_features). If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.

svd_solver

"full", "randomized" or "auto", default "auto"

The solver to use to calculate the principal components. Details: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca#pca_solver .

Properties

components_

Principal axes in feature space, representing the directions of maximum variance in the data.

Returns

Type

Description

 bigframes.dataframe.DataFrame

DataFrame of principal components, containing following columns: principal_component_id: An integer that identifies the principal component. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the principal component that principal_component_id identifies. If feature isn't numeric, the value is NULL. categorical_value: A list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per component.

explained_variance_

The amount of variance explained by each of the selected components.

Returns

Type

Description

 bigframes.dataframe.DataFrame

DataFrame containing following columns: principal_component_id: An integer that identifies the principal component. explained_variance: The factor by which the eigenvector is scaled. Eigenvalue and explained variance are the same concepts in PCA.

explained_variance_ratio_

Percentage of variance explained by each of the selected components.

Returns

Type

Description

 bigframes.dataframe.DataFrame

DataFrame containing following columns: principal_component_id: An integer that identifies the principal component. explained_variance_ratio: the total variance is the sum of variances, also known as eigenvalues, of all of the individual principal components. The explained variance ratio by a principal component is the ratio between the variance, also known as eigenvalue, of that principal component and the total variance.

Methods

repr

  __repr__ 
 ()

Print the estimator's constructor with all non-default parameter values.

detect_anomalies

  detect_anomalies 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ], 
 * 
 , 
 contamination 
 : 
 float 
 = 
 0.1 
 ) 
 - 
> bigframes 
 . 
 dataframe 
 . 
 DataFrame

Detect the anomaly data points of the input.

Parameters

Name

Description

X

 bigframes.dataframe.DataFrame 
or bigframes.series.Series

Series or a DataFrame to detect anomalies.

contamination

float, default 0.1

Identifies the proportion of anomalies in the training dataset that are used to create the model. The value must be in the range [0, 0.5].

Returns

Type

Description

 bigframes.dataframe.DataFrame

detected DataFrame.

fit

  fit 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ], 
 y 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 base 
 . 
 _T

Fit the model according to the given training data.

Parameters

Name

Description

X

 bigframes.dataframe.DataFrame 
or bigframes.series.Series 
or pandas.core.frame.DataFrame or pandas.core.series.Series

Series or DataFrame of shape (n_samples, n_features). Training vector, where n_samples is the number of samples and n_features is the number of features.

y

default None

Ignored.

Returns

Type

Description

PCA

Fitted estimator.

get_params

  get_params 
 ( 
 deep 
 : 
 bool 
 = 
 True 
 ) 
 - 
> typing 
 . 
 Dict 
 [ 
 str 
 , 
 typing 
 . 
 Any 
 ]

Get parameters for this estimator.

Parameter

Name

Description

deep

bool, default True

Default True . If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

Type

Description

Dictionary

A dictionary of parameter names mapped to their values.

predict

  predict 
 ( 
 X 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 dataframe 
 . 
 DataFrame 
 , 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 pandas 
 . 
 core 
 . 
 frame 
 . 
 DataFrame 
 , 
 pandas 
 . 
 core 
 . 
 series 
 . 
 Series 
 , 
 ], 
 ) 
 - 
> bigframes 
 . 
 dataframe 
 . 
 DataFrame

Predict the closest cluster for each sample in X.

Parameter

Name

Description

X

 bigframes.dataframe.DataFrame 
or bigframes.series.Series 
or pandas.core.frame.DataFrame or pandas.core.series.Series

Series or a DataFrame to predict.

Returns

Type

Description

 bigframes.dataframe.DataFrame

Predicted DataFrames.

register

  register 
 ( 
 vertex_ai_model_id 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 base 
 . 
 _T

After register, go to the Google Cloud console ( https://console.cloud.google.com/vertex-ai/models ) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter

Name

Description

vertex_ai_model_id

Optional[str], default None

Optional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

  score 
 ( 
 X 
 = 
 None 
 , 
 y 
 = 
 None 
 ) 
 - 
> bigframes 
 . 
 dataframe 
 . 
 DataFrame

Calculate evaluation metrics of the model.

Parameters

Name

Description

X

default None

Ignored.

y

default None

Ignored.

Returns

Type

Description

 bigframes.dataframe.DataFrame

DataFrame that represents model metrics.

to_gbq

  to_gbq 
 ( 
 model_name 
 : 
 str 
 , 
 replace 
 : 
 bool 
 = 
 False 
 ) 
 - 
> bigframes 
 . 
 ml 
 . 
 decomposition 
 . 
 PCA

Save the model to BigQuery.

Parameters

Name

Description

model_name

str

The name of the model.

replace

bool, default False

Determine whether to replace if the model already exists. Default to False.

Returns

Type

Description

PCA

Saved model.

Class PCA (2.28.0) Stay organized with collections Save and categorize content based on your preferences.

Parameters

Properties

components_

explained_variance_

explained_variance_ratio_

Methods

__repr__

detect_anomalies

fit

get_params

predict

register

score

to_gbq

Class PCA (2.28.0)

repr