Module transfer_manager (2.14.0)

Parameters

Name

Description

blob

 Blob

The blob to be downloaded.

filename

str

The destination filename or path.

chunk_size

int

The size in bytes of each chunk to send. The optimal chunk size for maximum throughput may vary depending on the exact network environment and size of the blob.

download_kwargs

dict

A dictionary of keyword arguments to pass to the download method. Refer to the documentation for blob.download_to_file() or blob.download_to_filename() for more information. The dict is directly passed into the download methods and is not validated by this function. Keyword arguments "start" and "end" which are not supported and will cause a ValueError if present. The key "checksum" is also not supported in download_kwargs , but see the argument crc32c_checksum (which does not go in download_kwargs ) below.

deadline

int

The number of seconds to wait for all threads to resolve. If the deadline is reached, all threads will be terminated regardless of their progress and concurrent.futures.TimeoutError will be raised. This can be left as the default of None (no deadline) for most use cases.

worker_type

str

The worker type to use; one of google.cloud.storage.transfer_manager.PROCESS or google.cloud.storage.transfer_manager.THREAD . Although the exact performance impact depends on the use case, in most situations the PROCESS worker type will use more system resources (both memory and CPU) and result in faster operations than THREAD workers. Because the subprocesses of the PROCESS worker type can't access memory from the main process, Client objects have to be serialized and then recreated in each subprocess. The serialization of the Client object for use in subprocesses is an approximation and may not capture every detail of the Client object, especially if the Client was modified after its initial creation or if Client._http was modified in any way. THREAD worker types are observed to be relatively efficient for operations with many small files, but not for operations with large files. PROCESS workers are recommended for large file operations.

max_workers

int

The maximum number of workers to create to handle the workload. With PROCESS workers, a larger number of workers will consume more system resources (memory and CPU) at once. How many workers is optimal depends heavily on the specific use case, and the default is a conservative number that should work okay in most cases without consuming excessive resources.

crc32c_checksum

bool

Whether to compute a checksum for the resulting object, using the crc32c algorithm. As the checksums for each chunk must be combined using a feature of crc32c that is not available for md5, md5 is not supported.

Exceptions

Type

Description

`concurrent.futures.TimeoutError

if deadline is exceeded. google.resumable_media.common.DataCorruption if the download's checksum doesn't agree with server-computed checksum. The google.resumable_media exception is used here for consistency with other download methods despite the exception originating elsewhere.

Parameters

Name

Description

blob_file_pairs

List(Tuple(' google.cloud.storage.blob.Blob 
', IOBase or str))

A list of tuples of blob and a file or filename. Each blob will be downloaded to the corresponding blob by using APIs identical to blob.download_to_file() or blob.download_to_filename() as appropriate. Note that blob.download_to_filename() does not delete the destination file if the download fails. File handlers are only supported if worker_type is set to THREAD. If worker_type is set to PROCESS, please use filenames only.

download_kwargs

dict

A dictionary of keyword arguments to pass to the download method. Refer to the documentation for blob.download_to_file() or blob.download_to_filename() for more information. The dict is directly passed into the download methods and is not validated by this function.

threads

int

DEPRECATED Sets worker_type to THREAD and max_workers to the number specified. If worker_type or max_workers are set explicitly, this parameter should be set to None. Please use worker_type and max_workers instead of this parameter.

deadline

int

The number of seconds to wait for all threads to resolve. If the deadline is reached, all threads will be terminated regardless of their progress and concurrent.futures.TimeoutError will be raised. This can be left as the default of None (no deadline) for most use cases.

raise_exception

bool

If True, instead of adding exceptions to the list of return values, instead they will be raised. Note that encountering an exception on one operation will not prevent other operations from starting. Exceptions are only processed and potentially raised after all operations are complete in success or failure.

worker_type

str

The worker type to use; one of google.cloud.storage.transfer_manager.PROCESS or google.cloud.storage.transfer_manager.THREAD . Although the exact performance impact depends on the use case, in most situations the PROCESS worker type will use more system resources (both memory and CPU) and result in faster operations than THREAD workers. Because the subprocesses of the PROCESS worker type can't access memory from the main process, Client objects have to be serialized and then recreated in each subprocess. The serialization of the Client object for use in subprocesses is an approximation and may not capture every detail of the Client object, especially if the Client was modified after its initial creation or if Client._http was modified in any way. THREAD worker types are observed to be relatively efficient for operations with many small files, but not for operations with large files. PROCESS workers are recommended for large file operations. PROCESS workers do not support writing to file handlers. Please refer to files by filename only when using PROCESS workers.

max_workers

int

The maximum number of workers to create to handle the workload. With PROCESS workers, a larger number of workers will consume more system resources (memory and CPU) at once. How many workers is optimal depends heavily on the specific use case, and the default is a conservative number that should work okay in most cases without consuming excessive resources.

skip_if_exists

bool

Before downloading each blob, check if the file for the filename exists; if it does, skip that blob.

Exceptions

Type

Description

`concurrent.futures.TimeoutError

if deadline is exceeded.

Returns

Type

Description

list

A list of results corresponding to, in order, each item in the input list. If an exception was received, it will be the result for that operation. Otherwise, the return value from the successful download method is used (which will be None).

Parameters

Name

Description

bucket

 Bucket

The bucket which contains the blobs to be downloaded

blob_names

list(str)

A list of blobs to be downloaded. The blob name in this string will be used to determine the destination file path as well. The full name to the blob must be blob_name_prefix + blob_name. The blob_name is separate from the blob_name_prefix because the blob_name will also determine the name of the destination blob. Any shared part of the blob names that need not be part of the destination path should be included in the blob_name_prefix.

destination_directory

str

A string that will be prepended (with os.path.join()) to each blob_name in the input list, in order to determine the destination path for that blob. For instance, if the destination_directory string is "/tmp/img" and a blob_name is "0001.jpg", with an empty blob_name_prefix, then the source blob "0001.jpg" will be downloaded to destination "/tmp/img/0001.jpg" . This parameter can be an empty string. Note that this parameter allows directory traversal (e.g. "/", "../") and is not intended for unsanitized end user input.

blob_name_prefix

str

A string that will be prepended to each blob_name in the input list, in order to determine the name of the source blob. Unlike the blob_name itself, the prefix string does not affect the destination path on the local filesystem. For instance, if the destination_directory is "/tmp/img/", the blob_name_prefix is "myuser/mystuff-" and a blob_name is "0001.jpg" then the source blob "myuser/mystuff-0001.jpg" will be downloaded to "/tmp/img/0001.jpg". The blob_name_prefix can be blank (an empty string).

download_kwargs

dict

A dictionary of keyword arguments to pass to the download method. Refer to the documentation for blob.download_to_file() or blob.download_to_filename() for more information. The dict is directly passed into the download methods and is not validated by this function.

threads

int

DEPRECATED Sets worker_type to THREAD and max_workers to the number specified. If worker_type or max_workers are set explicitly, this parameter should be set to None. Please use worker_type and max_workers instead of this parameter.

deadline

int

The number of seconds to wait for all threads to resolve. If the deadline is reached, all threads will be terminated regardless of their progress and concurrent.futures.TimeoutError will be raised. This can be left as the default of None (no deadline) for most use cases.

create_directories

bool

If True, recursively create any directories that do not exist. For instance, if downloading object "images/img001.png", create the directory "images" before downloading.

raise_exception

bool

If True, instead of adding exceptions to the list of return values, instead they will be raised. Note that encountering an exception on one operation will not prevent other operations from starting. Exceptions are only processed and potentially raised after all operations are complete in success or failure. If skip_if_exists is True, 412 Precondition Failed responses are considered part of normal operation and are not raised as an exception.

worker_type

str

The worker type to use; one of google.cloud.storage.transfer_manager.PROCESS or google.cloud.storage.transfer_manager.THREAD . Although the exact performance impact depends on the use case, in most situations the PROCESS worker type will use more system resources (both memory and CPU) and result in faster operations than THREAD workers. Because the subprocesses of the PROCESS worker type can't access memory from the main process, Client objects have to be serialized and then recreated in each subprocess. The serialization of the Client object for use in subprocesses is an approximation and may not capture every detail of the Client object, especially if the Client was modified after its initial creation or if Client._http was modified in any way. THREAD worker types are observed to be relatively efficient for operations with many small files, but not for operations with large files. PROCESS workers are recommended for large file operations.

max_workers

int

The maximum number of workers to create to handle the workload. With PROCESS workers, a larger number of workers will consume more system resources (memory and CPU) at once. How many workers is optimal depends heavily on the specific use case, and the default is a conservative number that should work okay in most cases without consuming excessive resources.

skip_if_exists

bool

Before downloading each blob, check if the file for the filename exists; if it does, skip that blob. This only works for filenames.

Exceptions

Type

Description

`concurrent.futures.TimeoutError

if deadline is exceeded.

Returns

Type

Description

list

A list of results corresponding to, in order, each item in the input list. If an exception was received, it will be the result for that operation. Otherwise, the return value from the successful download method is used (which will be None).

Parameters

Name

Description

filename

str

The path to the file to upload. File-like objects are not supported.

blob

 Blob

The blob to which to upload.

content_type

str

(Optional) Type of content being uploaded.

chunk_size

int

The size in bytes of each chunk to send. The optimal chunk size for maximum throughput may vary depending on the exact network environment and size of the blob. The remote API has restrictions on the minimum and maximum size allowable, see: https://cloud.google.com/storage/quotas#requests

deadline

int

The number of seconds to wait for all threads to resolve. If the deadline is reached, all threads will be terminated regardless of their progress and concurrent.futures.TimeoutError will be raised. This can be left as the default of None (no deadline) for most use cases.

worker_type

str

The worker type to use; one of google.cloud.storage.transfer_manager.PROCESS or google.cloud.storage.transfer_manager.THREAD . Although the exact performance impact depends on the use case, in most situations the PROCESS worker type will use more system resources (both memory and CPU) and result in faster operations than THREAD workers. Because the subprocesses of the PROCESS worker type can't access memory from the main process, Client objects have to be serialized and then recreated in each subprocess. The serialization of the Client object for use in subprocesses is an approximation and may not capture every detail of the Client object, especially if the Client was modified after its initial creation or if Client._http was modified in any way. THREAD worker types are observed to be relatively efficient for operations with many small files, but not for operations with large files. PROCESS workers are recommended for large file operations.

max_workers

int

The maximum number of workers to create to handle the workload. With PROCESS workers, a larger number of workers will consume more system resources (memory and CPU) at once. How many workers is optimal depends heavily on the specific use case, and the default is a conservative number that should work okay in most cases without consuming excessive resources.

checksum

str

(Optional) The checksum scheme to use: either "md5", "crc32c" or None. Each individual part is checksummed. At present, the selected checksum rule is only applied to parts and a separate checksum of the entire resulting blob is not computed. Please compute and compare the checksum of the file to the resulting blob separately if needed, using the "crc32c" algorithm as per the XML MPU documentation.

timeout

float or tuple

(Optional) The amount of time, in seconds, to wait for the server response. See: configuring_timeouts

retry

google.api_core.retry.Retry

(Optional) How to retry the RPC. A None value will disable retries. A google.api_core.retry.Retry value will enable retries, and the object will configure backoff and timeout options. Custom predicates (customizable error codes) are not supported for media operations such as this one. This function does not accept ConditionalRetryPolicy values because preconditions are not supported by the underlying API call. See the retry.py source code and docstrings in this package ( google.cloud.storage.retry ) for information on retry types and how to configure them.

Exceptions

Type

Description

`concurrent.futures.TimeoutError

if deadline is exceeded.

Parameters

Name

Description

file_blob_pairs

List(Tuple(IOBase or str, ' google.cloud.storage.blob.Blob 
'))

A list of tuples of a file or filename and a blob. Each file will be uploaded to the corresponding blob by using APIs identical to blob.upload_from_file() or blob.upload_from_filename() as appropriate. File handlers are only supported if worker_type is set to THREAD. If worker_type is set to PROCESS, please use filenames only.

skip_if_exists

bool

If True, blobs that already have a live version will not be overwritten. This is accomplished by setting if_generation_match = 0 on uploads. Uploads so skipped will result in a 412 Precondition Failed response code, which will be included in the return value but not raised as an exception regardless of the value of raise_exception.

upload_kwargs

dict

A dictionary of keyword arguments to pass to the upload method. Refer to the documentation for blob.upload_from_file() or blob.upload_from_filename() for more information. The dict is directly passed into the upload methods and is not validated by this function.

threads

int

DEPRECATED Sets worker_type to THREAD and max_workers to the number specified. If worker_type or max_workers are set explicitly, this parameter should be set to None. Please use worker_type and max_workers instead of this parameter.

deadline

int

The number of seconds to wait for all threads to resolve. If the deadline is reached, all threads will be terminated regardless of their progress and concurrent.futures.TimeoutError will be raised. This can be left as the default of None (no deadline) for most use cases.

raise_exception

bool

If True, instead of adding exceptions to the list of return values, instead they will be raised. Note that encountering an exception on one operation will not prevent other operations from starting. Exceptions are only processed and potentially raised after all operations are complete in success or failure. If skip_if_exists is True, 412 Precondition Failed responses are considered part of normal operation and are not raised as an exception.

worker_type

str

The worker type to use; one of google.cloud.storage.transfer_manager.PROCESS or google.cloud.storage.transfer_manager.THREAD . Although the exact performance impact depends on the use case, in most situations the PROCESS worker type will use more system resources (both memory and CPU) and result in faster operations than THREAD workers. Because the subprocesses of the PROCESS worker type can't access memory from the main process, Client objects have to be serialized and then recreated in each subprocess. The serialization of the Client object for use in subprocesses is an approximation and may not capture every detail of the Client object, especially if the Client was modified after its initial creation or if Client._http was modified in any way. THREAD worker types are observed to be relatively efficient for operations with many small files, but not for operations with large files. PROCESS workers are recommended for large file operations. PROCESS workers do not support writing to file handlers. Please refer to files by filename only when using PROCESS workers.

max_workers

int

The maximum number of workers to create to handle the workload. With PROCESS workers, a larger number of workers will consume more system resources (memory and CPU) at once. How many workers is optimal depends heavily on the specific use case, and the default is a conservative number that should work okay in most cases without consuming excessive resources.

Exceptions

Type

Description

`concurrent.futures.TimeoutError

if deadline is exceeded.

Returns

Type

Description

list

A list of results corresponding to, in order, each item in the input list. If an exception was received, it will be the result for that operation. Otherwise, the return value from the successful upload method is used (which will be None).

Parameters

Name

Description

bucket

 Bucket

The bucket which will contain the uploaded blobs.

filenames

list(str)

A list of filenames to be uploaded. This may include part of the path. The file will be accessed at the full path of source_directory + filename .

source_directory

str

A string that will be prepended (with os.path.join() ) to each filename in the input list, in order to find the source file for each blob. Unlike the filename itself, the source_directory does not affect the name of the uploaded blob. For instance, if the source_directory is "/tmp/img/" and a filename is "0001.jpg", with an empty blob_name_prefix, then the file uploaded will be "/tmp/img/0001.jpg" and the destination blob will be "0001.jpg". This parameter can be an empty string. Note that this parameter allows directory traversal (e.g. "/", "../") and is not intended for unsanitized end user input.

blob_name_prefix

str

A string that will be prepended to each filename in the input list, in order to determine the name of the destination blob. Unlike the filename itself, the prefix string does not affect the location the library will look for the source data on the local filesystem. For instance, if the source_directory is "/tmp/img/", the blob_name_prefix is "myuser/mystuff-" and a filename is "0001.jpg" then the file uploaded will be "/tmp/img/0001.jpg" and the destination blob will be "myuser/mystuff-0001.jpg". The blob_name_prefix can be blank (an empty string).

skip_if_exists

bool

If True, blobs that already have a live version will not be overwritten. This is accomplished by setting if_generation_match = 0 on uploads. Uploads so skipped will result in a 412 Precondition Failed response code, which will be included in the return value, but not raised as an exception regardless of the value of raise_exception.

blob_constructor_kwargs

dict

A dictionary of keyword arguments to pass to the blob constructor. Refer to the documentation for blob.Blob() for more information. The dict is directly passed into the constructor and is not validated by this function. name and bucket keyword arguments are reserved by this function and will result in an error if passed in here.

upload_kwargs

dict

A dictionary of keyword arguments to pass to the upload method. Refer to the documentation for blob.upload_from_file() or blob.upload_from_filename() for more information. The dict is directly passed into the upload methods and is not validated by this function.

threads

int

DEPRECATED Sets worker_type to THREAD and max_workers to the number specified. If worker_type or max_workers are set explicitly, this parameter should be set to None. Please use worker_type and max_workers instead of this parameter.

deadline

int

The number of seconds to wait for all threads to resolve. If the deadline is reached, all threads will be terminated regardless of their progress and concurrent.futures.TimeoutError will be raised. This can be left as the default of None (no deadline) for most use cases.

raise_exception

bool

If True, instead of adding exceptions to the list of return values, instead they will be raised. Note that encountering an exception on one operation will not prevent other operations from starting. Exceptions are only processed and potentially raised after all operations are complete in success or failure. If skip_if_exists is True, 412 Precondition Failed responses are considered part of normal operation and are not raised as an exception.

worker_type

str

The worker type to use; one of google.cloud.storage.transfer_manager.PROCESS or google.cloud.storage.transfer_manager.THREAD . Although the exact performance impact depends on the use case, in most situations the PROCESS worker type will use more system resources (both memory and CPU) and result in faster operations than THREAD workers. Because the subprocesses of the PROCESS worker type can't access memory from the main process, Client objects have to be serialized and then recreated in each subprocess. The serialization of the Client object for use in subprocesses is an approximation and may not capture every detail of the Client object, especially if the Client was modified after its initial creation or if Client._http was modified in any way. THREAD worker types are observed to be relatively efficient for operations with many small files, but not for operations with large files. PROCESS workers are recommended for large file operations.

max_workers

int

The maximum number of workers to create to handle the workload. With PROCESS workers, a larger number of workers will consume more system resources (memory and CPU) at once. How many workers is optimal depends heavily on the specific use case, and the default is a conservative number that should work okay in most cases without consuming excessive resources.

additional_blob_attributes

dict

A dictionary of blob attribute names and values. This allows the configuration of blobs beyond what is possible with blob_constructor_kwargs. For instance, {"cache_control": "no-cache"} would set the cache_control attribute of each blob to "no-cache". As with blob_constructor_kwargs, this affects the creation of every blob identically. To fine-tune each blob individually, use upload_many and create the blobs as desired before passing them in.

Exceptions

Type

Description

`concurrent.futures.TimeoutError

if deadline is exceeded.

Returns

Type

Description

list

A list of results corresponding to, in order, each item in the input list. If an exception was received, it will be the result for that operation. Otherwise, the return value from the successful upload method is used (which will be None).

Modules Functions

download_chunks_concurrently

download_many

download_many_to_path

upload_chunks_concurrently

upload_many

upload_many_from_filenames

Module transfer_manager (2.14.0) Stay organized with collections Save and categorize content based on your preferences.

Modules Functions

download_chunks_concurrently

download_many

download_many_to_path

upload_chunks_concurrently

upload_many

upload_many_from_filenames

Module transfer_manager (2.14.0)