NOTE: This package is in preview. It is not stable, and is likely to change.
Lister
typeListerstruct{// contains filtered or unexported fields}
Lister is used for interacting with Dataflux fast-listing. The caller should
initialize it with NewLister() instead of creating it directly.
Example
packagemainimport("context""log""cloud.google.com/go/storage""cloud.google.com/go/storage/dataflux""google.golang.org/api/iterator")funcmain(){ctx:=context.Background()// Pass in any client opts or set retry policy here.client,err:=storage.NewClient(ctx)iferr!=nil{// handle error}// Create dataflux fast-list input and provide desired options,// including number of workers, batch size, query to filer objects, etc.in:=&dataflux.ListerInput{BucketName:"mybucket",// Optionally specify params to apply to lister.Parallelism:100,BatchSize:500000,Query:storage.Query{},SkipDirectoryObjects:false,}// Create Lister with fast-list input.df:=dataflux.NewLister(client,in)deferdf.Close()varnumOfObjectsintfor{objects,err:=df.NextBatch(ctx)iferr!=nil{// handle error}iferr==iterator.Done{numOfObjects+=len(objects)// No more objects in the bucket to list.break}iferr!=nil{// handle error}numOfObjects+=len(objects)}log.Printf("listing %d objects in bucket %q is complete.",numOfObjects,in.BucketName)}
NextBatch returns the next N objects in the bucket, where N is [ListerInput.BatchSize].
In case of failure, all processes are stopped and an error is returned immediately. Create a new Lister to retry.
For the first batch, both worksteal listing and sequential
listing runs in parallel to quickly list N number of objects in the bucket. For subsequent
batches, only the method which returned object faster in the first batch is used.
For smaller dataset, sequential listing is expected to be faster. For larger dataset,
worksteal listing is expected to be faster.
Worksteal algorithm list objects in GCS bucket in parallel using multiple parallel
workers and each worker in the list operation is able to steal work from its siblings
once it has finished all currently slated listing work.
ListerInput
typeListerInputstruct{// BucketName is the name of the bucket to list objects from. Required.BucketNamestring// Parallelism is number of parallel workers to use for listing.// Default value is 10x number of available CPU. Optional.Parallelismint// BatchSize is the minimum number of objects to list in each batch.// The number of objects returned in a batch will be rounded up to// include all the objects received in the last request to GCS.// By default, the Lister returns all objects in one batch.// Optional.BatchSizeint// Query is the query to filter objects for listing. Default value is nil.// Use ProjectionNoACL for faster listing. Including ACLs increases// latency while fetching objects. Optional.Querystorage.Query// SkipDirectoryObjects is to indicate whether to list directory objects.// Note: Even if directory objects are excluded, they contribute to the// [ListerInput.BatchSize] count. Default value is false. Optional.SkipDirectoryObjectsbool}
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eThe \u003ccode\u003edataflux\u003c/code\u003e package offers a method to parallelize listing operations in Google Cloud Storage, enabling faster object retrieval.\u003c/p\u003e\n"],["\u003cp\u003eVersion 1.51.0 is the latest release of the package, and numerous prior versions are accessible, providing flexibility for different project needs.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eLister\u003c/code\u003e struct is used for interacting with fast-listing, and users should use \u003ccode\u003eNewLister()\u003c/code\u003e to initialize it, rather than direct creation.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eListerInput\u003c/code\u003e struct allows for customization of listing operations, such as specifying the bucket name, setting parallelism, defining batch size, applying query filters, and skipping directory objects.\u003c/p\u003e\n"],["\u003cp\u003eThis package is in preview and not stable, so there may be changes to come.\u003c/p\u003e\n"]]],[],null,["# Package cloud.google.com/go/storage/dataflux (v1.51.0)\n\nVersion latestkeyboard_arrow_down\n\n- [1.51.0 (latest)](/go/docs/reference/cloud.google.com/go/storage/latest/dataflux)\n- [1.50.0](/go/docs/reference/cloud.google.com/go/storage/1.50.0/dataflux)\n- [1.49.0](/go/docs/reference/cloud.google.com/go/storage/1.49.0/dataflux)\n- [1.48.0](/go/docs/reference/cloud.google.com/go/storage/1.48.0/dataflux)\n- [1.47.0](/go/docs/reference/cloud.google.com/go/storage/1.47.0/dataflux)\n- [1.46.0](/go/docs/reference/cloud.google.com/go/storage/1.46.0/dataflux)\n- [1.45.0](/go/docs/reference/cloud.google.com/go/storage/1.45.0/dataflux)\n- [1.44.0](/go/docs/reference/cloud.google.com/go/storage/1.44.0/dataflux)\n- [1.43.0](/go/docs/reference/cloud.google.com/go/storage/1.43.0/dataflux)\n- [1.42.0](/go/docs/reference/cloud.google.com/go/storage/1.42.0/dataflux)\n- [1.41.0](/go/docs/reference/cloud.google.com/go/storage/1.41.0/dataflux)\n- [1.40.0](/go/docs/reference/cloud.google.com/go/storage/1.40.0/dataflux)\n- [1.39.1](/go/docs/reference/cloud.google.com/go/storage/1.39.1/dataflux)\n- [1.38.0](/go/docs/reference/cloud.google.com/go/storage/1.38.0/dataflux)\n- [1.37.0](/go/docs/reference/cloud.google.com/go/storage/1.37.0/dataflux)\n- [1.36.0](/go/docs/reference/cloud.google.com/go/storage/1.36.0/dataflux)\n- [1.35.1](/go/docs/reference/cloud.google.com/go/storage/1.35.1/dataflux)\n- [1.34.1](/go/docs/reference/cloud.google.com/go/storage/1.34.1/dataflux)\n- [1.33.0](/go/docs/reference/cloud.google.com/go/storage/1.33.0/dataflux)\n- [1.32.0](/go/docs/reference/cloud.google.com/go/storage/1.32.0/dataflux)\n- [1.31.0](/go/docs/reference/cloud.google.com/go/storage/1.31.0/dataflux)\n- [1.30.1](/go/docs/reference/cloud.google.com/go/storage/1.30.1/dataflux)\n- [1.29.0](/go/docs/reference/cloud.google.com/go/storage/1.29.0/dataflux)\n- [1.28.1](/go/docs/reference/cloud.google.com/go/storage/1.28.1/dataflux)\n- [1.27.0](/go/docs/reference/cloud.google.com/go/storage/1.27.0/dataflux)\n- [1.26.0](/go/docs/reference/cloud.google.com/go/storage/1.26.0/dataflux)\n- [1.25.0](/go/docs/reference/cloud.google.com/go/storage/1.25.0/dataflux)\n- [1.24.0](/go/docs/reference/cloud.google.com/go/storage/1.24.0/dataflux)\n- [1.23.0](/go/docs/reference/cloud.google.com/go/storage/1.23.0/dataflux)\n- [1.22.1](/go/docs/reference/cloud.google.com/go/storage/1.22.1/dataflux)\n- [1.21.0](/go/docs/reference/cloud.google.com/go/storage/1.21.0/dataflux)\n- [1.20.0](/go/docs/reference/cloud.google.com/go/storage/1.20.0/dataflux)\n- [1.19.0](/go/docs/reference/cloud.google.com/go/storage/1.19.0/dataflux)\n- [1.18.2](/go/docs/reference/cloud.google.com/go/storage/1.18.2/dataflux)\n- [1.17.0](/go/docs/reference/cloud.google.com/go/storage/1.17.0/dataflux)\n- [1.16.1](/go/docs/reference/cloud.google.com/go/storage/1.16.1/dataflux)\n- [1.15.0](/go/docs/reference/cloud.google.com/go/storage/1.15.0/dataflux)\n- [1.14.0](/go/docs/reference/cloud.google.com/go/storage/1.14.0/dataflux)\n- [1.13.0](/go/docs/reference/cloud.google.com/go/storage/1.13.0/dataflux)\n- [1.12.0](/go/docs/reference/cloud.google.com/go/storage/1.12.0/dataflux) \n**Note:** To get more information about this package, such as access to older versions, view [this package on pkg.go.dev](https://pkg.go.dev/cloud.google.com/go/storage/dataflux). \n\u003cbr /\u003e\n\nPackage dataflux provides an easy way to parallelize listing in Google\nCloud Storage.\n\nMore information about Google Cloud Storage is available at\n\u003chttps://cloud.google.com/storage/docs\u003e.\n\nSee \u003chttps://pkg.go.dev/cloud.google.com/go\u003e for authentication, timeouts,\nconnection pooling and similar aspects of this package.\n\nNOTE: This package is in preview. It is not stable, and is likely to change. \n\nLister\n------\n\n type Lister struct {\n \t// contains filtered or unexported fields\n }\n\nLister is used for interacting with Dataflux fast-listing. The caller should\ninitialize it with NewLister() instead of creating it directly. \n\n### Example\n\n package main\n\n import (\n \t\"context\"\n \t\"log\"\n\n \t\"cloud.google.com/go/storage\"\n \t\"cloud.google.com/go/storage/dataflux\"\n \t\"google.golang.org/api/iterator\"\n )\n\n func main() {\n \tctx := context.Background()\n \t// Pass in any client opts or set retry policy here.\n \tclient, err := storage.NewClient(ctx)\n \tif err != nil {\n \t\t// handle error\n \t}\n\n \t// Create dataflux fast-list input and provide desired options,\n \t// including number of workers, batch size, query to filer objects, etc.\n \tin := &dataflux.ListerInput{\n \t\tBucketName: \"mybucket\",\n \t\t// Optionally specify params to apply to lister.\n \t\tParallelism: 100,\n \t\tBatchSize: 500000,\n \t\tQuery: storage.https://cloud.google.com/go/docs/reference/cloud.google.com/go/storage/latest/index.html#cloud_google_com_go_storage_Query{},\n \t\tSkipDirectoryObjects: false,\n \t}\n\n \t// Create Lister with fast-list input.\n \tdf := dataflux.NewLister(client, in)\n \tdefer df.Close()\n\n \tvar numOfObjects int\n\n \tfor {\n \t\tobjects, err := df.NextBatch(ctx)\n \t\tif err != nil {\n \t\t\t// handle error\n \t\t}\n\n \t\tif err == iterator.Done {\n \t\t\tnumOfObjects += len(objects)\n \t\t\t// No more objects in the bucket to list.\n \t\t\tbreak\n \t\t}\n \t\tif err != nil {\n \t\t\t// handle error\n \t\t}\n \t\tnumOfObjects += len(objects)\n \t}\n \tlog.Printf(\"listing %d objects in bucket %q is complete.\", numOfObjects, in.BucketName)\n }\n\n### func NewLister\n\n func NewLister(c */go/docs/reference/cloud.google.com/go/storage/latest./go/docs/reference/cloud.google.com/go/storage/latest#cloud_google_com_go_storage_Client, in *#cloud_google_com_go_storage_dataflux_ListerInput) *#cloud_google_com_go_storage_dataflux_Lister\n\nNewLister creates a new \\[Lister\\] that can be used to list objects in the given bucket. \n\n### func (\\*Lister) Close\n\n func (c *#cloud_google_com_go_storage_dataflux_Lister) Close()\n\nClose is used to close the Lister. \n\n### func (\\*Lister) NextBatch\n\n func (c *#cloud_google_com_go_storage_dataflux_Lister) NextBatch(ctx https://pkg.go.dev/context.https://pkg.go.dev/context#Context) ([]*/go/docs/reference/cloud.google.com/go/storage/latest./go/docs/reference/cloud.google.com/go/storage/latest#cloud_google_com_go_storage_ObjectAttrs, https://pkg.go.dev/builtin#error)\n\nNextBatch returns the next N objects in the bucket, where N is \\[ListerInput.BatchSize\\].\nIn case of failure, all processes are stopped and an error is returned immediately. Create a new Lister to retry.\nFor the first batch, both worksteal listing and sequential\nlisting runs in parallel to quickly list N number of objects in the bucket. For subsequent\nbatches, only the method which returned object faster in the first batch is used.\nFor smaller dataset, sequential listing is expected to be faster. For larger dataset,\nworksteal listing is expected to be faster.\n\nWorksteal algorithm list objects in GCS bucket in parallel using multiple parallel\nworkers and each worker in the list operation is able to steal work from its siblings\nonce it has finished all currently slated listing work. \n\nListerInput\n-----------\n\n type ListerInput struct {\n \t// BucketName is the name of the bucket to list objects from. Required.\n \tBucketName https://pkg.go.dev/builtin#string\n\n \t// Parallelism is number of parallel workers to use for listing.\n \t// Default value is 10x number of available CPU. Optional.\n \tParallelism https://pkg.go.dev/builtin#int\n\n \t// BatchSize is the minimum number of objects to list in each batch.\n \t// The number of objects returned in a batch will be rounded up to\n \t// include all the objects received in the last request to GCS.\n \t// By default, the Lister returns all objects in one batch.\n \t// Optional.\n \tBatchSize https://pkg.go.dev/builtin#int\n\n \t// Query is the query to filter objects for listing. Default value is nil.\n \t// Use ProjectionNoACL for faster listing. Including ACLs increases\n \t// latency while fetching objects. Optional.\n \tQuery /go/docs/reference/cloud.google.com/go/storage/latest./go/docs/reference/cloud.google.com/go/storage/latest#cloud_google_com_go_storage_Query\n\n \t// SkipDirectoryObjects is to indicate whether to list directory objects.\n \t// Note: Even if directory objects are excluded, they contribute to the\n \t// [ListerInput.BatchSize] count. Default value is false. Optional.\n \tSkipDirectoryObjects https://pkg.go.dev/builtin#bool\n }\n\nListerInput contains options for listing objects."]]