Changelog

PyPI History

0.16.0 (2023-12-12)

Features

Add ARIMAPlus.predict parameters ( #264 ) ( 99598c7 )
Add DataFrame from_dict and from_records methods ( #244 ) ( 8d81e24 )
Add DataFrame.select_dtypes method ( #242 ) ( 1737acc )
Add nunique method to Series/DataFrameGroupby ( #256 ) ( c8ec245 )
Support dataframe.loc with conditional columns selection ( #233 ) ( 3febea9 )

Bug Fixes

Enfore pandas version requirement <2.1.4 ( #265 ) ( 9dd63f6 )
Exclude pandas 2.1.4 from prerelease tests to unblock e2e tests ( b02fc2c )
Fix value_counts column label for normalize=True ( #245 ) ( d3fa6f2 )
Migrate e2e tests to bigframes-load-testing project ( 8766ac6 )
Ml.sql logic ( #262 ) ( 68c6fdf )
Update the llm_kmeans notebook ( #247 ) ( 66d1839 )

Documentation

Add code samples for shape and head ( #257 ) ( 5bdcc65 )
Add example for dataframe.melt, dataframe.pivot, dataframe.stac… ( #252 ) ( 8c63697 )
Add example to dataframe.nlargest, dataframe.nsmallest, datafra… ( #234 ) ( e735412 )
Add examples for dataframe.cummin, dataframe.cummax, dataframe.cumsum, dataframe.cumprod ( #243 ) ( 0523a31 )
Add examples for dataframe.nunique, dataframe.diff, dataframe.a… ( #251 ) ( 77074ec )
Correct the docs for option_context ( #263 ) ( d21c6dd )
Correct the params rendering for ml.remote and ml.ensemble modules ( #248 ) ( c2829e3 )
Fix return annotation in API docstrings ( #253 ) ( 89a1c67 )

0.15.0 (2023-11-29)

⚠ BREAKING CHANGES

model.predict returns all the columns ( #204 )

Features

Add info and memory_usage methods to dataframe ( #219 ) ( 9d6613d )
Add remote vertex model support ( #237 ) ( 0bfc4fb )
Add the recent api method for ML component ( #225 ) ( ed8876d )
Model.predict returns all the columns ( #204 ) ( 416171a )
Send warnings on LLM prediction partial failures ( #216 ) ( 81125f9 )

Bug Fixes

Add df snapshots lookup for read_gbq ( #229 ) ( d0d9b84 )
Avoid unnecessary row_number() on sort key for io ( #211 ) ( a18d40e )
Dedup special character ( #209 ) ( dd78acb )
Invalid JSON type of the notebook ( #215 ) ( a729831 )
Make to_pandas override enable_downsampling when sampling_method is manually set. ( #200 ) ( ae03756 )
Polish the llm+kmeans notebook ( #208 ) ( e8532b1 )
Update the llm+kmeans notebook with recent change ( #236 ) ( f8917ab )
Use anonymous dataset to create remote_function ( #205 ) ( 69b016e )

Documentation

Add code samples for index and column properties ( #212 ) ( c88d38e )
Add code samples for df reshaping, function, merge, and join methods ( #203 ) ( 010486c )
Add examples for dataframe.kurt, dataframe.std, dataframe.count ( #232 ) ( f9c6e72 )
Add examples for dataframe.mean, dataframe.median, dataframe.va… ( #228 ) ( edd0522 )
Add examples for dataframe.min, dataframe.max and dataframe.sum ( #227 ) ( 3a375e8 )
Code samples for Series.dot and DataFrame.dot ( #226 ) ( b62a07a )
Code samples for Series.where and Series.mask ( #217 ) ( 52dfad2 )
Code samples for dataframe.any, dataframe.all and dataframe.prod ( #223 ) ( d7957fa )
Make the code samples reflect default bq connection usage ( #206 ) ( 71844b0 )

Miscellaneous Chores

Release 0.15.0 ( #241 ) ( 6c899be )

0.14.1 (2023-11-16)

Bug Fixes

Correctly handle null values when initializing fingerprint ordering ( #210 ) ( 8324f13 )

Documentation

Add an example notebook about line graphs ( #197 ) ( f957b27 )

0.14.0 (2023-11-14)

Features

Add ‘cross’ join support ( #176 ) ( 765446a )
Add ‘index’, ‘pad’, ‘nearest’ interpolate methods ( #162 ) ( 6a28403 )
Add series.sample (identical to existing dataframe.sample) ( #187 ) ( 37914a4 )
Add unordered sql compilation ( #156 ) ( 58f420c )
Log most recent API calls as recent-bigframes-api-xx labels on BigQuery jobs ( #145 ) ( 4ea33b7 )
Read_gbq creates order deterministically without table copy ( #191 ) ( 8ab81de )
Support date_series.astype("string[pyarrow]") to cast DATE to STRING ( #186 ) ( aee0e8e )
Support series.at[row_label] = scalar ( #173 ) ( 0c8bd33 )
Temporary resources no longer use BigQuery Sessions ( #194 ) ( 4a02cac )

Bug Fixes

All sort operation are now stable ( #195 ) ( 3a2761f )
Default to 7 days expiration for read_csv , read_json , read_parquet ( #193 ) ( 03606cd )
Deprecate the remote_service_type in llm model ( #180 ) ( a8a409a )
For reset_index on unnamed multiindex, always use level_[n] label ( #182 ) ( f95000d )
Match pandas behavior when assigning listlike to empty dfs ( #172 ) ( c1d1f42 )
Use anonymous dataset instead of session dataset for temp tables ( #181 ) ( 800d44e )
Use random table for read_pandas ( #192 ) ( 741c75e )
Use random table when loading data for read_csv , read_json , read_parquet ( #175 ) ( 9d2e6dc )

Documentation

Add code samples for read_gbq_function using community UDFs ( #188 ) ( 7506eab )
Add docstring code samples for Series.apply and DataFrame.map ( #185 ) ( c816d84 )
Add llm kmeans notebook as an included example ( #177 ) ( d49ae42 )
Use head() to get top n results, not to preview results ( #190 ) ( 87f84c9 )

0.13.0 (2023-11-07)

Features

to_gbq without a destination table writes to a temporary table ( #158 ) ( e1817c9 )
Add DataFrame.__iter__ , DataFrame.iterrows , DataFrame.itertuples , and DataFrame.keys methods ( #164 ) ( c065071 )
Add Series.__iter__ method ( #164 ) ( c065071 )
Add interpolate() to series and dataframe ( #157 ) ( b9cb55c )
Support 32k text-generation and multilingual embedding models ( #161 ) ( 5f0ea37 )

Bug Fixes

Update default temp table expiration to 7 days ( #174 ) ( 4ff26cd )

0.12.0 (2023-11-01)

Features

Add DataFrame.melt ( #113 ) ( 4e4409c )
Add DataFrame.to_pandas_batches() to download large DataFrame objects ( #136 ) ( 3afd4a3 )
Add bigframes.options.compute.maximum_bytes_billed option that sets maximum bytes billed on query jobs ( #133 ) ( 63c7919 )
Add pandas.qcut ( #104 ) ( 8e44518 )
Add pd.get_dummies ( #149 ) ( d8baad5 )
Add unstack to series, add level param ( #115 ) ( 5edcd19 )
Implement operator @ for DataFrame.dot ( #139 ) ( 79a638e )
Populate ibis version in user agent ( #140 ) ( c639a36 )

Bug Fixes

Don’t override the global logging config ( #138 ) ( 2ddbf74 )
Fix bug with column names under repeated column assignment ( #150 ) ( 29032d0 )
Resolve plotly rendering issue by using ipython html for job pro… ( #134 ) ( 39df43e )
Use indexee’s session for loc listlike cases ( #152 ) ( 27c5725 )

Documentation

Add artithmetic df sample code ( #153 ) ( ac44ccd )
Fix indentation on read_gbq_function code sample ( #163 ) ( 0801d96 )
Link to ML.EVALUATE BQML page for score() methods ( #137 ) ( 45c617f )

0.11.0 (2023-10-26)

Features

Add back reset_session as an alias for close_session ( #124 ) ( 694a85a )
Change query parameter to query_or_table in read_gbq ( #127 ) ( f9bb3c4 )

Bug Fixes

Expose bigframes.pandas.reset_session as a public API ( #128 ) ( b17e1f4 )
Use series’s own session in series.reindex listlike case ( #135 ) ( 95bff3f )

Documentation

Add runnable code samples for DataFrames I/O methods and property ( #129 ) ( 6fea8ef )
Add runnable code samples for reading methods ( #125 ) ( a669919 )

0.10.0 (2023-10-19)

Features

Implement DataFrame.dot for matrix multiplication ( #67 ) ( 29dd414 )

0.9.0 (2023-10-18)

⚠ BREAKING CHANGES

rename bigframes.pandas.reset_session to close_session ( #101 )

Features

Add bigframes.options.bigquery.application_name for partner attribution ( #117 ) ( 52d64ff )
Add AtIndexer getitems ( #107 ) ( 752b01f )
Rename bigframes.pandas.reset_session to close_session ( #101 ) ( 36693bf )
Send BigQuery cancel request when canceling bigframes process ( #103 ) ( e325fbb )
Support external packages in remote_function ( #98 ) ( ec10c4a )
Use ArrowDtype for STRUCT columns in to_pandas ( #85 ) ( 9238fad )

Bug Fixes

Support multiindex for three loc getitem overloads ( #113 ) ( 68e3cd3 )

Performance Improvements

If primary keys are defined, read_gbq avoids copying table data ( #112 ) ( e6c0cd1 )

Documentation

Add documentation for Series.struct.field and Series.struct.explode ( #114 ) ( a6dab9c )
Add open-source link in API doc ( #106 ) ( db51fe3 )
Update ML overview API doc ( #105 ) ( 1b3f3a5 )

0.8.0 (2023-10-12)

⚠ BREAKING CHANGES

The default behavior of to_parquet is changing from no compression to 'snappy' compression.

Features

Support compression in to_parquet ( a8c286f )

Bug Fixes

Create session dataset for remote functions only when needed ( #94 ) ( 1d385be )

0.7.0 (2023-10-11)

Features

Add aliases for several series properties ( #80 ) ( c0efec8 )
Add equals methods to series/dataframe ( #76 ) ( 636a209 )
Add iat and iloc accessing by tuples of integers ( #90 ) ( 228aeba )
Add level param to DataFrame.stack ( #88 ) ( 97b8bec )
Allow df.drop to take an index object ( #68 ) ( 740c451 )
Use default session connection ( #87 ) ( 4ae4ef9 )

Bug Fixes

Change the invalid url in docs ( #93 ) ( 969800d )

Documentation

Add more preprocessing models into the docs menu. ( #97 ) ( 1592315 )

0.6.0 (2023-10-04)

Features

Add df.unstack ( #63 ) ( 4a84714 )
Add idxmin, idxmax to series, dataframe ( #74 ) ( 781307e )
Add ml.preprocessing.KBinsDiscretizer ( #81 ) ( 24c6256 )
Add multi-column dataframe merge ( #73 ) ( c9fa85c )
Add update and align methods to dataframe ( #57 ) ( bf050cf )
Support STRUCT data type with Series.struct.field to extract child fields ( #71 ) ( 17afac9 )

Bug Fixes

Avoid 403 response too large to return error with read_gbq and large query results ( #77 ) ( 8f3b5b2 )
Change return type of Series.loc[scalar] ( #40 ) ( fff3d45 )
Fix df/series.iloc by list with multiindex ( #79 ) ( 971d091 )

0.5.0 (2023-09-28)

Features

Add DataFrame.kurtosis / DF.kurt method ( c1900c2 )
Add DataFrame.rolling and DataFrame.expanding methods ( c1900c2 )
Add items , apply methods to DataFrame . ( #43 ) ( 3adc1b3 )
Add axis param to simple df aggregations ( #52 ) ( 9cf9972 )
Add index dtype , astype , drop , fillna , aggregate attributes. ( #38 ) ( 1a254a4 )
Add ml.preprocessing.LabelEncoder ( #50 ) ( 2510461 )
Add ml.preprocessing.MaxAbsScaler ( #56 ) ( 14b262b )
Add ml.preprocessing.MinMaxScaler ( #64 ) ( 392113b )
Add more index methods ( #54 ) ( a6e32aa )
Support calculate_p_values parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support class_weights="balanced" in LogisticRegression model ( c1900c2 )
Support df[column_name] = df_only_one_column ( c1900c2 )
Support early_stop parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support enable_global_explain parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support l2_reg parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support learn_rate_strategy parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support ls_init_learn_rate parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support max_iterations parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support min_rel_progress parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support optimize_strategy parameter in bigframes.ml.linear_model.LinearRegression ( c1900c2 )
Support casting string to integer or float ( #59 ) ( 3502f83 )

Bug Fixes

Fix header skipping logic in read_csv ( #49 ) ( d56258c )
Generate unique ids on join to avoid id collisions ( #65 ) ( 7ab65e8 )
LabelEncoder params consistent with Sklearn ( #60 ) ( 632caec )
Loosen filter items tests to accomodate shifting pandas impl ( #41 ) ( edabdbb )

Performance Improvements

Add ability to cache dataframe and series to session table ( #51 ) ( 416d7cb )
Inline small Series and DataFrames in query text ( #45 ) ( 5e199ec )
Reimplement unpivot to use cross join rather than union ( #47 ) ( f9a93ce )
Simplify join order to use multiple order keys instead of string. ( #36 ) ( 5056da6 )

Documentation

Link to Remote Functions code samples from README and API reference ( c1900c2 )

0.4.0 (2023-09-16)

Features

Add axis parameter to droplevel and reorder_levels ( 7c6b0dd )
Add bfill and ffill to DataFrame and Series ( 7c6b0dd )
Add DataFrame.combine and DataFrame.combine_first ( #27 ) ( 7c6b0dd )
Add DataFrame.nlargest , nsmallest ( 7c6b0dd )
Add DataFrame.pct_change and Series.pct_change ( 7c6b0dd )
Add DataFrame.skew and GroupBy.skew ( 7c6b0dd )
Add DataFrame.to_dict , to_excel , to_latex , to_records , to_string , to_markdown , to_pickle , to_orc ( 7c6b0dd )
Add diff method to DataFrame and GroupBy ( 7c6b0dd )
Add filter and reindex to Series and DataFrame ( 7c6b0dd )
Add reindex_like to DataFrame and Series ( 7c6b0dd )
Add swaplevel to DataFrame and Series ( 7c6b0dd )
Add partial support for Sereies.replace ( 7c6b0dd )
Support DataFrame.loc[bool_series, column] = scalar ( 7c6b0dd )
Support a persistent name in remote_function ( 7c6b0dd )

Bug Fixes

remote_function uses same credentials as other APIs ( 7c6b0dd )
Add type hints to models ( 7c6b0dd )
Raise error when ARIMAPlus is used with Pipeline ( 7c6b0dd )
Remove transforms parameter in model.fit ( breaking change) ( 7c6b0dd )
Support column joins with “None indexer” ( 7c6b0dd )
Use for literals Int64Dtype in cut ( 7c6b0dd )
Use lowercase strings for parameter literals in bigframes.ml ( breaking change) ( 7c6b0dd )

Performance Improvements

bigframes-api label to I/O query jobs ( 7c6b0dd )

Documentation

Document possible parameter values for PaLM2TextGenerator ( 7c6b0dd )
Document region logic in README ( 7c6b0dd )
Fix OneHotEncoder sample ( 7c6b0dd )

0.3.2 (2023-09-06)

Bug Fixes

Make release.sh script for PyPI upload executable ( #20 ) ( 9951610 )

0.3.1 (2023-09-05)

Bug Fixes

release:Use correct directory name for release build config ( #17 ) ( 3dd25b3 )

0.3.0 (2023-09-02)

Features

Add bigframes.get_global_session() and bigframes.reset_session() aliases ( a32b747 )
Add bigframes.pandas.read_pickle function ( a32b747 )
Add components_ , explained_variance_ , and explained_variance_ratio_ properties to bigframes.ml.decomposition.PCA ( 89b9503 )
Add fit_transform to bigquery.ml transformers ( a32b747 )
Add Series.dropna and DataFrame.fillna ( 8fab755 )
Add Series.str methods isalpha , isdigit , isdecimal , isalnum , isspace , islower , isupper , zfill , center ( a32b747 )
Support bigframes.pandas.merge() ( 8fab755 )
Support DataFrame.isin with list and dict inputs ( 8fab755 )
Support DataFrame.pivot ( a32b747 )
Support DataFrame.stack ( 89b9503 )
Support DataFrame - DataFrame binary operations ( 8fab755 )
Support df[my_column] = [a python list] ( 89b9503 )
Support Index.is_monotonic ( 8fab755 )
Support np.arcsin , np.arccos , np.arctan , np.sinh , np.cosh , np.tanh , np.arcsinh , np.arccosh , np.arctanh , np.exp with Series argument ( 89b9503 )
Support np.sin , np.cos , np.tan , np.log , np.log10 , np.sqrt , np.abs with Series argument ( 89b9503 )
Support pow() and power operator in DataFrame and Series ( 8fab755 )
Support read_json with engine=bigquery for newline-delimited JSON files ( 89b9503 )
Support Series.corr ( 89b9503 )
Support Series.map ( 8fab755 )
Support for np.add , np.subtract , np.multiply , np.divide , np.power ( 8fab755 )
Support MultiIndex for DataFrame columns ( a32b747 )
Use pandas.Index for column labels ( a32b747 )
Use default session and connection in ml.llm and ml.imported ( 8fab755 )

Bug Fixes

Add error message to set_index ( a32b747 )
Align column names with pandas in DataFrame.agg results ( 89b9503 )
Allow (but still not recommended) ORDER BY in read_gbq input when an index_col is defined ( 89b9503 )
Check for IAM role on the BigQuery connection when initializing a remote_function ( 89b9503 )
Check that types are specified in read_gbq_function ( a32b747 )
Don’t use query cache for Session construction ( a32b747 )
Include survey link in abstract NotImplementedError exception messages ( 89b9503 )
Label temp table creation jobs with source=bigquery-dataframes-temp label ( 89b9503 )
Make X_train argument names consistent across methods ( 8fab755 )
Raise AttributeError for unimplemented pandas methods ( 89b9503 )
Raise exception for invalid function in read_gbq_function ( a32b747 )
Support spaces in column names in DataFrame initializater ( 89b9503 )

Performance Improvements

Add local cache for __repr_\*__ methods ( a32b747 )
Lazily instantiate client library objects ( 89b9503 )
Use row_number() filter for head / tail ( 8fab755 )

Documentation

Add ML section under Overview ( a32b747 )
Add release status to table of contents ( a32b747 )
Add samples and best practices to read_gbq docs ( a32b747 )
Correct the return types of Dataframe and Series ( a32b747 )
Create subfolders for notebooks ( a32b747 )
Fix link to GitHub ( 89b9503 )
Highlight bigframes is open-source ( a32b747 )
Sample ML Drug Name Generation notebook ( a32b747 )
Set options.bigquery.project in sample code ( 89b9503 )
Transform remote function user guide into sample code ( a32b747 )
Update remote function notebook with read_gbq_function usage ( 8fab755 )

0.2.0 (2023-08-17)

Features

Add KMeans.cluster_centers_.
Allow column labels to be any type handled by bq df, column labels can be integers now.
Add dataframegroupby.agg().
Add Series Property is_monotonic_increasing and is_monotonic_decreasing.
Add match, fullmatch, get, pad str methods.
Add series isin function.

Bug Fixes

Update ML package to use sessions for queries.
Optimize read_gbq with index_col set to cluster by index_col .
Raise ValueError if the location mismatched.
read_gbq no longer uses ‘time travel’ with query inputs.

Documentation

Add docstring to _uniform_sampling to avoid user using it.

0.1.1 (2023-08-14)

Documentation

Correct link to code repository in setup.py and use correct terminology for console.cloud.google.com links.

0.1.0 (2023-08-11)

Features

Add bigframes.pandas package with an API compatible with pandas . Supported data sources include: BigQuery SQL queries, BigQuery tables, CSV (local and GCS), Parquet (local and Cloud Storage), and more.
Add bigframes.ml package with an API inspired by scikit-learn . Train machine learning models and run batch predicition, powered by BigQuery ML .

0.0.0 (2023-02-22)

Empty package to reserve package name.