Sliced object downloads

One strategy for downloading large files is called sliced object downloads . In such a download, ranged GET requests are made in parallel, storing data within a temporary, pre-allocated destination file. Once all slices have completed downloading, the temporary file is renamed to the destination file.

Sliced object downloads can be significantly faster if network and disk speed are not limiting factors; however, sliced object downloads cause multiple writes to occur at various locations on disk, so this download strategy can degrade performance for disks with slow seek times, especially when breaking a download into a large number of slices. Tools such as the Google Cloud CLI have low default values for the number of slices they create to minimize the possibility of performance impacts.

Sliced object downloads should always use a fast composable checksum (CRC32C) to verify the data integrity of the slices. To perform sliced object downloads, tools such as the gcloud CLI require a compiled version of crcmod on the machine performing the download. If compiled crcmod is not available, the gcloud CLI performs non-sliced object downloads instead.

How tools and APIs use sliced object downloads

Depending on how you interact with Cloud Storage, sliced object downloads might be managed automatically on your behalf. This section describes the sliced object download behavior for different tools and provides information for how you can modify the behavior.

Console

The Google Cloud console does not perform sliced object downloads.

Command line

By default, gcloud storage cp enables sliced object downloads. You can control how and when the gcloud CLI performs sliced object downloads by modifying the following properties:

  • storage/sliced_object_download_threshold : The minimum total file size for performing a sliced object download. You can disable all sliced object downloads by setting this value to 0 .

  • storage/sliced_object_download_max_components : The maximum number of slices to use in the download. Set 0 for no limit, in which case the number of slices is determined solely by storage/sliced_object_download_component_size .

  • storage/sliced_object_download_component_size : The target size for each download slice. This property is ignored if the total file size is so large that downloading slices of this size would require more slices than allowed, as set in storage/sliced_object_download_max_components .

You can modify these properties by creating a named configuration and applying the configuration either on a per-command basis by using the --configuration project-wide flag or for all gcloud CLI commands by using the gcloud config set command .

No additional local disk space is required when using the gcloud CLI to perform sliced object downloads. If the download fails prior to completion, run the command again to resume the slices that failed. Slices that were downloaded successfully before the failure are not re-downloaded when you retry, except in the case where the source object has changed between download attempts.

Temporary downloaded objects appear in the destination directory with the suffix _.gstmp in their name.

Client libraries

Java

For more information, see the Cloud Storage Java API reference documentation .

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

You can perform sliced object downloads by setting AllowDivideAndConquer to true . For example:

  import 
  
 com.google.cloud.storage. BlobInfo 
 
 ; 
 import 
  
 com.google.cloud.storage.transfermanager. DownloadResult 
 
 ; 
 import 
  
 com.google.cloud.storage.transfermanager. ParallelDownloadConfig 
 
 ; 
 import 
  
 com.google.cloud.storage.transfermanager. TransferManager 
 
 ; 
 import 
  
 com.google.cloud.storage.transfermanager. TransferManagerConfig 
 
 ; 
 import 
  
 java.nio.file.Path 
 ; 
 import 
  
 java.util.List 
 ; 
 class 
 AllowDivideAndConquerDownload 
  
 { 
  
 public 
  
 static 
  
 void 
  
 divideAndConquerDownloadAllowed 
 ( 
  
 List<BlobInfo> 
  
 blobs 
 , 
  
 String 
  
 bucketName 
 , 
  
 Path 
  
 destinationDirectory 
 ) 
  
 { 
  
  TransferManager 
 
  
 transferManager 
  
 = 
  
  TransferManagerConfig 
 
 . 
 newBuilder 
 () 
  
 . 
 setAllowDivideAndConquerDownload 
 ( 
 true 
 ) 
  
 . 
 build 
 () 
  
 . 
 getService 
 (); 
  
  ParallelDownloadConfig 
 
  
 parallelDownloadConfig 
  
 = 
  
  ParallelDownloadConfig 
 
 . 
 newBuilder 
 () 
  
 . 
 setBucketName 
 ( 
 bucketName 
 ) 
  
 . 
 setDownloadDirectory 
 ( 
 destinationDirectory 
 ) 
  
 . 
 build 
 (); 
  
 List<DownloadResult> 
  
 results 
  
 = 
  
 transferManager 
 . 
  downloadBlobs 
 
 ( 
 blobs 
 , 
  
 parallelDownloadConfig 
 ). 
 getDownloadResults 
 (); 
  
 for 
  
 ( 
  DownloadResult 
 
  
 result 
  
 : 
  
 results 
 ) 
  
 { 
  
 System 
 . 
 out 
 . 
 println 
 ( 
  
 "Download of " 
  
 + 
  
  result 
 
 . 
 getInput 
 (). 
 getName 
 () 
  
 + 
  
 " completed with status " 
  
 + 
  
  result 
 
 . 
 getStatus 
 ()); 
  
 } 
  
 } 
 } 
 

Node.js

For more information, see the Cloud Storage Node.js API reference documentation .

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

You can perform sliced object downloads using the downloadFileInChunks method. For example:

  /** 
 * TODO(developer): Uncomment the following lines before running the sample. 
 */ 
 // The ID of your GCS bucket 
 // const bucketName = 'your-unique-bucket-name'; 
 // The ID of the GCS file to download 
 // const fileName = 'your-file-name'; 
 // The path to which the file should be downloaded 
 // const destFileName = '/local/path/to/file.txt'; 
 // The size of each chunk to be downloaded 
 // const chunkSize = 1024; 
 // Imports the Google Cloud client library 
 const 
  
 { 
 Storage 
 , 
  
 TransferManager 
 } 
  
 = 
  
 require 
 ( 
 ' @google-cloud/storage 
' 
 ); 
 // Creates a client 
 const 
  
 storage 
  
 = 
  
 new 
  
 Storage 
 (); 
 // Creates a transfer manager client 
 const 
  
 transferManager 
  
 = 
  
 new 
  
  TransferManager 
 
 ( 
 storage 
 . 
 bucket 
 ( 
 bucketName 
 )); 
 async 
  
 function 
  
 downloadFileInChunksWithTransferManager 
 () 
  
 { 
  
 // Downloads the files 
  
 await 
  
 transferManager 
 . 
  downloadFileInChunks 
 
 ( 
 fileName 
 , 
  
 { 
  
 destination 
 : 
  
 destFileName 
 , 
  
 chunkSizeBytes 
 : 
  
 chunkSize 
 , 
  
 }); 
  
 console 
 . 
 log 
 ( 
  
 `gs:// 
 ${ 
 bucketName 
 } 
 / 
 ${ 
 fileName 
 } 
 downloaded to 
 ${ 
 destFileName 
 } 
 .` 
  
 ); 
 } 
 downloadFileInChunksWithTransferManager 
 (). 
 catch 
 ( 
 console 
 . 
 error 
 ); 
 

Python

For more information, see the Cloud Storage Python API reference documentation .

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

You can perform sliced object downloads using the download_chunks_concurrently method. For example:

  def 
  
 download_chunks_concurrently 
 ( 
 bucket_name 
 , 
 blob_name 
 , 
 filename 
 , 
 chunk_size 
 = 
 32 
 * 
 1024 
 * 
 1024 
 , 
 workers 
 = 
 8 
 ): 
  
 """Download a single file in chunks, concurrently in a process pool.""" 
 # The ID of your GCS bucket 
 # bucket_name = "your-bucket-name" 
 # The file to be downloaded 
 # blob_name = "target-file" 
 # The destination filename or path 
 # filename = "" 
 # The size of each chunk. The performance impact of this value depends on 
 # the use case. The remote service has a minimum of 5 MiB and a maximum of 
 # 5 GiB. 
 # chunk_size = 32 * 1024 * 1024 (32 MiB) 
 # The maximum number of processes to use for the operation. The performance 
 # impact of this value depends on the use case, but smaller files usually 
 # benefit from a higher number of processes. Each additional process occupies 
 # some CPU and memory resources until finished. Threads can be used instead 
 # of processes by passing `worker_type=transfer_manager.THREAD`. 
 # workers=8 
 from 
  
 google.cloud.storage 
  
 import 
  Client 
 
 , 
  transfer_manager 
 
 storage_client 
 = 
 Client 
 () 
 bucket 
 = 
 storage_client 
 . 
  bucket 
 
 ( 
 bucket_name 
 ) 
 blob 
 = 
 bucket 
 . 
 blob 
 ( 
 blob_name 
 ) 
  transfer_manager 
 
 . 
  download_chunks_concurrently 
 
 ( 
 blob 
 , 
 filename 
 , 
 chunk_size 
 = 
 chunk_size 
 , 
 max_workers 
 = 
 workers 
 ) 
 print 
 ( 
 "Downloaded 
 {} 
 to 
 {} 
 ." 
 . 
 format 
 ( 
 blob_name 
 , 
 filename 
 )) 
 

REST APIs

Both the JSON API and XML API support ranged GET requests, which means you can use either API to implement your own sliced object download strategy.

In order to protect against data corruption due to the source object changing during the download, you should provide the generation number of the source object in each download request for a slice of the object.

Design a Mobile Site
View Site in Mobile | Classic
Share by: