Package dataflux provides an easy way to parallelize listing in Google Cloud Storage.
More information about Google Cloud Storage is available at https://cloud.google.com/storage/docs .
See https://pkg.go.dev/cloud.google.com/go for authentication, timeouts, connection pooling and similar aspects of this package.
NOTE: This package is in preview. It is not stable, and is likely to change.
Lister
type
Lister
struct
{
// contains filtered or unexported fields
}
Lister is used for interacting with Dataflux fast-listing. The caller should initialize it with NewLister() instead of creating it directly.
func NewLister
func
NewLister
(
c
*
storage
.
Client
,
in
*
ListerInput
)
*
Lister
NewLister creates a new dataflux Lister to list objects in the give bucket.
func (*Lister) Close
func
(
c
*
Lister
)
Close
()
Close closes the range channel of the Lister.
func (*Lister) NextBatch
NextBatch runs worksteal algorithm and sequential listing in parallel to quickly return a list of objects in the bucket. For smaller dataset, sequential listing is expected to be faster. For larger dataset, worksteal listing is expected to be faster.
ListerInput
type
ListerInput
struct
{
// BucketName is the name of the bucket to list objects from. Required.
BucketName
string
// Parallelism is number of parallel workers to use for listing.
// Default value is 10x number of available CPU. Optional.
Parallelism
int
// BatchSize is the number of objects to list. Default value returns
// all objects at once. The number of objects returned will be
// rounded up to a multiple of gcs page size. Optional.
BatchSize
int
// Query is the query to filter objects for listing. Default value is nil.
// Use ProjectionNoACL for faster listing. Including ACLs increases
// latency while fetching objects. Optional.
Query
storage
.
Query
// SkipDirectoryObjects is to indicate whether to list directory objects.
// Default value is false. Optional.
SkipDirectoryObjects
bool
}
ListerInput contains options for listing objects.