- 2.17.0 (latest)
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 2.0.0-dev0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
PCA
(
n_components
:
typing
.
Optional
[
typing
.
Union
[
int
,
float
]]
=
None
,
*
,
svd_solver
:
typing
.
Literal
[
"full"
,
"randomized"
,
"auto"
]
=
"auto"
)
Principal component analysis (PCA).
Examples:
>>> import bigframes.pandas as bpd
>>> from bigframes.ml.decomposition import PCA
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [-1, -2, -3, 1, 2, 3], "feat1": [-1, -1, -2, 1, 1, 2]})
>>> pca = PCA(n_components=2).fit(X)
>>> pca.predict(X) # doctest:+SKIP
principal_component_1 principal_component_2
0 -0.755243 0.157628
1 -1.05405 -0.141179
2 -1.809292 0.016449
3 0.755243 -0.157628
4 1.05405 0.141179
5 1.809292 -0.016449
<BLANKLINE>
[6 rows x 2 columns]
>>> pca.explained_variance_ratio_ # doctest:+SKIP
principal_component_id explained_variance_ratio
0 1 0.00901
1 0 0.99099
<BLANKLINE>
[2 rows x 2 columns]
Parameters
n_components
int, float or None, default None
Number of components to keep. If n_components is not set, all components are kept, n_components = min(n_samples, n_features). If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.
svd_solver
"full", "randomized" or "auto", default "auto"
The solver to use to calculate the principal components. Details: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca#pca_solver .
Properties
components_
Principal axes in feature space, representing the directions of maximum variance in the data.
explained_variance_
The amount of variance explained by each of the selected components.
explained_variance_ratio_
Percentage of variance explained by each of the selected components.
Methods
__repr__
__repr__
()
Print the estimator's constructor with all non-default parameter values.
detect_anomalies
detect_anomalies
(
X
:
typing
.
Union
[
bigframes
.
dataframe
.
DataFrame
,
bigframes
.
series
.
Series
,
pandas
.
core
.
frame
.
DataFrame
,
pandas
.
core
.
series
.
Series
,
],
*
,
contamination
:
float
=
0.1
)
-
> bigframes
.
dataframe
.
DataFrame
Detect the anomaly data points of the input.
X
contamination
float, default 0.1
Identifies the proportion of anomalies in the training dataset that are used to create the model. The value must be in the range [0, 0.5].
fit
fit
(
X
:
typing
.
Union
[
bigframes
.
dataframe
.
DataFrame
,
bigframes
.
series
.
Series
,
pandas
.
core
.
frame
.
DataFrame
,
pandas
.
core
.
series
.
Series
,
],
y
:
typing
.
Optional
[
typing
.
Union
[
bigframes
.
dataframe
.
DataFrame
,
bigframes
.
series
.
Series
,
pandas
.
core
.
frame
.
DataFrame
,
pandas
.
core
.
series
.
Series
,
]
]
=
None
,
)
-
> bigframes
.
ml
.
base
.
_T
Fit the model according to the given training data.
X
bigframes.dataframe.DataFrame
or bigframes.series.Series
or pandas.core.frame.DataFrame or pandas.core.series.Series
Series or DataFrame of shape (n_samples, n_features). Training vector, where n_samples
is the number of samples and n_features
is the number of features.
y
default None
Ignored.
PCA
get_params
get_params
(
deep
:
bool
=
True
)
-
> typing
.
Dict
[
str
,
typing
.
Any
]
Get parameters for this estimator.
deep
bool, default True
Default True
. If True, will return the parameters for this estimator and contained subobjects that are estimators.
Dictionary
predict
predict
(
X
:
typing
.
Union
[
bigframes
.
dataframe
.
DataFrame
,
bigframes
.
series
.
Series
,
pandas
.
core
.
frame
.
DataFrame
,
pandas
.
core
.
series
.
Series
,
],
)
-
> bigframes
.
dataframe
.
DataFrame
Predict the closest cluster for each sample in X.
X
bigframes.dataframe.DataFrame
or bigframes.series.Series
or pandas.core.frame.DataFrame or pandas.core.series.Series
Series or a DataFrame to predict.
register
register
(
vertex_ai_model_id
:
typing
.
Optional
[
str
]
=
None
)
-
> bigframes
.
ml
.
base
.
_T
Register the model to Vertex AI.
After register, go to the Google Cloud console ( https://console.cloud.google.com/vertex-ai/models ) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.
vertex_ai_model_id
Optional[str], default None
Optional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.
score
score
(
X
=
None
,
y
=
None
)
-
> bigframes
.
dataframe
.
DataFrame
Calculate evaluation metrics of the model.
X
default None
Ignored.
y
default None
Ignored.
to_gbq
to_gbq
(
model_name
:
str
,
replace
:
bool
=
False
)
-
> bigframes
.
ml
.
decomposition
.
PCA
Save the model to BigQuery.
model_name
str
The name of the model.
replace
bool, default False
Determine whether to replace if the model already exists. Default to False.
PCA