The following table lists the BigQuery, Cloud Storage, and other Google Cloud commands that you can use with Mainframe Connector.
Note:We recommend that you use the
qsam decode
and qsam encode
commands to
perform this task. For information on the advantages of using the qsam
commands, see Advantages of qsam
commands
.The
bq export
command supports some performance tuning
capabilities. For more information, see Performance improvements for the bq export
command
. You can use customized character
sets with the bq export
command. For more information, see Use customized character sets
.Note:The
bq export
command fails requests
to export large Bigtable tables. To avoid errors, add the -allowLargeResults
flag to the bq export
command when you want to export large tables.You can also use the
bq mk
command to generate a
BigQuery table directly from parsing COBOL copybooks. For more
information, see Create a
BigQuery table from a copybook
.--sql
flag or from QUERY DD
.
If both are provided, the query in the --sql
flag takes
precedence.Use the
--follow=true
flag to generate a report that displays
the results of a select query. In order to write this report to a file in
the mainframe, define a DD statement AUDITL
that points to the
file that should contain the audit logs report. Don't use the --follow
flag if you want normal logging behavior.Some query results may return a large number of rows, sometimes in millions. In order for the output to remain human readable the number of lines displayed is capped. To control the number of rows being displayed, use the
--report_row_limit
flag. For example, use --report_row_limit 10
to limit the results to 10 lines. By
default, the number of lines displayed is limited to 30.To use
bq query
parameterization, see bq query parameterization
. bq rm
Note:We recommend that you use the
copy text
commands to perform
this task as it provides better capabilities.You can also use this command to copy application source code defined in job control language (JCL) .
gsutil
utilityNote:We recommend that you use the
qsam decode
and qsam encode
commands to
perform this task. For information on the advantages of using the qsam
commands, see Advantages of qsam
commands
.If you want the command to read the data from a Data Source Name (DSN) file, use the following flags:
-
--inDsn
: the input dataset DSN. If provided, this flag overrides INFILE DD. -
--cobDsn
: the copybook DSN. If provided, this flag overrides COPYBOOK DD.
Optionally, you can use this command to interact with the Mainframe Connector gRPC service running on a VM on the mainframe. To do so, set the
SRVHOST
and SRVPORT
environment variables, or
provide the hostname and port number using command line options. When the
gRPC service is used, the input dataset is first copied to
Cloud Storage by the Mainframe Connector, and then a
remote procedure (RPC) call is made to instruct the gRPC service to
transcode the file.You can also perform the following tasks with the
gsutil cp
command:
gszutil
utility gszutil
gszutil
utility runs using the IBM JZOS Java SDK
and
provides a shell emulator that accepts gsutil
and
BigQuery command line invocations using JCL.Note:We recommend that you use the
qsam decode
and qsam encode
commands to
perform this task. For information on the advantages of using the qsam
commands, see Advantages of qsam
commands
.The
gszutil
utility extends the functionality of the gsutil
utility by accepting a schema in the form of a COPYBOOK DD
,
using it to transcode COBOL datasets directly to ORC before uploading to
Cloud Storage. The gszutil
utility also lets you
execute BigQuery query
and load
using JCL.The
gszutil
utility works with the gRPC
server, which helps you reduce the million instructions per second (MIPS)
consumption. We recommend using the gszutil
utility in your
production environment to convert binary files in Cloud Storage to
the ORC format.qsam
commands--output-format
argument. The original QSAM file is
split into chunks based on the value you specify with the --max-chunk-size
argument. The transcoded output is saved in
the target path as lexicographically sorted files.--input-format
argument. curl
gcloud dataflow flex-template run
Based on the flag you use, the
systemreport
command prints the
following system data: -
--supported_ciphers
: Supported ciphers -
--available_security_providers
: Available security providers
Use customized character sets
Mainframe Connector supports different character sets that decode bytes into BigQuery strings, and the other way around. Mainframe Connector lets you configure your own customized charset. You can configure a customized character set by building a Unicode Character Mapping (UCM) file. Mainframe Connector supports the following subset of the UCM format:
<code_set_name> "<name>"
<uconv_class> "SBCS"
<subchar> \x1A #Example
CHARMAP
#_______ _________
<U0000> \x00 |0 #For the third column, only 0 is supported.
<U0001> \x01 |0
#etc
END CHARMAP
If you want to use a customized character set, define a configuration file in
the UCM format. You can use this customized character set with the gsutil cp
or bq export
commands by setting the --encoding= charset
flag.
When you create a customized character set, verify the following:
- While defining a UCM file, keep the following in mind:
- Mainframe Connector only supports customized character sets using a single byte character set (SBCS).
- Mainframe Connector only supports the UCM precision indicator
|0
. - Verify that the UCM files are located in the z/OS Unix System Services (USS) and not on a multiple virtual storage partitioned dataset (MVS PDS).
- Verify that the UCM files are saved in American Standard Code for Information Interchange (ASCII) format and not in the Extended Binary Coded Decimal Interchange Code (EBCDIC) format.
- Provide explicit mapping for every possible single byte value to a Unicode
character. If you're unsure about which Unicode character you want to map a
byte to, we recommend that you map it to
U+FFFD
. You can map different byte sequences to the same Unicode character. However, in these cases the mapping is not bidirectional, that is, when you load data to BigQuery and later export it back to a binary file, the output might differ from the original input. - Verify that the byte sequences in the second column are unique. If multiple byte sequences map to the same Unicode character, this Unicode character is decoded to a byte sequence of the last defined mapping in the UCM file.
- Verify that Mainframe Connector can find the UCM file by
setting the environment variable
BQSH_FEATURE_CUSTOM_CHARSET
to the UCM file's path. If you want to use multiple character sets, you can provide the paths to multiple characters sets separated by the semi-colon delimiter. For example,BQSH_FEATURE_CUSTOM_CHARSET= path1;path2
. path can either point to a local file or to a file stored on Cloud Storage. If you execute thegsutil cp
orbq export
commands with the--remote
flag to perform remote transcoding , Mainframe Connector uses the local value set for theBQSH_FEATURE_CUSTOM_CHARSET
environment variable. The same applies when you run Mainframe Connector in standalone mode . If the--encoding
flag refers to a customized character set that doesn't correspond to the value you set forBQSH_FEATURE_CUSTOM_CHARSET
(or if you've not setBQSH_FEATURE_CUSTOM_CHARSET
at all), the command exits with an error message.
Performance tuning configuration for the bq export
command
Mainframe Connector supports the following performance tuning
configuration for the bq export
command:
-
exporter_thread_count
: (Optional) Set the number of worker threads. The default value is 4. -
max_read_streams
: (Optional) Set the maximum read streams. The default value is the same as that of the value set forexporter_thread_count
. -
order_response
: (Optional) If you set this flag to true, the exporter retains the query result order. This flag affects the export performance. The default value is false. -
max_read_queue
: (Optional) Set the maximum number of read record queues. The default value is twice the number of threads. -
transcoding_buffer
: (Optional) Set the size of the transcoding buffer per thread in MBs. The default value is 20 MB.
Note that you can also try increasing the transport window size by setting the OVERRIDE_GRPC_WINDOW_MB
environment variable to improve performance. The default window size is 4 MB.
Create a BigQuery table from a copybook
You can use the bq mk
command to generate a BigQuery
table directly from parsing COBOL copybooks. The native copybook parser extracts
default values from the VALUE
clause within a copybook, and assigns
them to the corresponding columns in a newly created BigQuery
table.
To help you test this feature, the bq mk
command also provides
a dry run
mode. This mode lets you preview the generated CREATE TABLE SQL
command without actually creating the table in
BigQuery.
The bq mk
command provides the following configuration options to
support this feature:
-
--schema_from_copybook
: Specifies the copybook to use to create the table. -
--dry_run
: (Optional) When enabled, the command only prints the generatedCREATE TABLE SQL
command without executing it. This flag is set to false by default. -
--tablespec "[PROJECT_ID]:[DATASET].[TABLE]"
: Specifies the BigQuery project ID, dataset, and table name for the target table. -
--encoding
: Specifies the encoding used to read the copybook file. The default value isCP037
.
The following VALUE
clauses are supported:
VAR1 PIC 9(5) VALUE 55.
*-- Set VAR1 to 55
VAR1 PIC X(5) VALUE aaaa. Set VAR1 to aaaa
VAR1 PIC 9(3) COMP VALUE 3. Set VAR1 to 3 (binary)
VAR1 PIC [9(5), X(5)] VALUE <literal>. Set VAR1 to <literal>
VAR1 PIC [9(5), X(5)] VALUE ZERO. Set VAR1 to 0 or "0"
VAR1 PIC [9(5), X(5)] VALUE ZEROS. Set VAR1 to 0 or "00000"
VAR1 PIC [9(5), X(5)] VALUE ZEROES. Set VAR1 to 0 or "00000"
VAR1 PIC X(5) VALUE SPACE. Set VAR1 to " "
VAR1 PIC X(5) VALUE SPACES. Set VAR1 to " "
HIGH-VALUE
and LOW-VALUE
clauses are supported for
alphanumeric variables only.
VAR1 PIC X(5) VALUE HIGH-VALUE. Set VAR1 to `X"FF "
VAR1 PIC X(5) VALUE HIGH-VALUES. Set VAR1 to 0 or `X"FFFFFFFFFF"
VAR1 PIC X(5) VALUE LOW-VALUE. Set VAR1 to `X"00" (NULL)
VAR1 PIC X(5) VALUE LOW-VALUES. Set VAR1 to `X"0000000000" (NULL)
VAR1 PIC X(5) VALUE QUOTE. Set VAR1 to `"`
VAR1 PIC X(5) VALUE `QUOTES`. Set VAR1 to 0 or `""""`
VAR1 PIC [9(5), X(5)] VALUE NULL. Not defined and won't be supported
VAR1 PIC [9(5), X(5)] VALUE ALL <literal>. Set all fields with the value ALL to <literal>
bq query
parameterization
Mainframe Connector lets you use parameterized queries with bq query
.
The following is an example of how to you can use a parameterized bq query
query:
Query file
SELECT * FROM `bigquery-public-data.samples.wikipedia` WHERE title = @xtitle
The following is an example with multiple parameters.
Query file
SELECT * FROM bigquery-public-data.samples.wikipedia WHERE title = @mytitle AND num_characters > @min_chars;
Execution example
bq query \
--project_id=mainframe-connector-dev \
--location="US" \
--parameters=mytitle::Hippocrates,min_chars:INT64:42600
Perform a dry run of the gsutil cp
command
The gsutil cp
command decodes a QSAM file using a COBOL copybook,
and generates an ORC file on Cloud Storage. You can perform a dry run
of the gsutil cp
command using the dry_run
flag and
test the following steps:
- Parse a COBOL copybook or data file and check whether it is compatible with Mainframe Connector.
- Decode a QSAM file without writing it Cloud Storage.
Use the following command to perform a dry run:
gsutil cp \
--dry_run \
gs://result-dir
If all steps are executed successfully, the command exits with return code 0. If any issues are encountered, an error message is displayed.
When you use the dry_run
flag, all statistics such as total bytes
read, number of written records, total errors, are logged.
If you use the dry_run
flag and the data source doesn't exist, the
command doesn't return an error. It instead only checks the copybook parser and
and then completes execution.
Copy a file from Cloud Storage to your Mainframe
You can use the gsutil cp
command to copy a file from
Cloud Storage to a Mainframe dataset. Note that you cannot copy
partitioned data sets (PDS).
To copy a file from Cloud Storage to a Mainframe dataset, specify the DSN and space requirements of the file you want to download to the Mainframe in JCL, as shown in the following example:
//OUTFILE DD DSN=MAINFRAME.DSN.FILE,DISP=(,CATLG),
// RECFM=FB,DSORG=PS,
// SPACE=(10,(2,1),RLSE),
// AVGREC=M,
// UNIT=SYSDA
//SYSPRINT DD SYSOUT=*
//SYSDUMP DD SYSOUT=*
//STDIN DD *
Specify the gsutil cp
command in the following format. If the file
already exists on your Mainframe, verify that you add the --replace
flag to the command.
gsutil cp GCS_URI DSN --recfm= RECFM --lrecl= LRECL --blksize= BLKSIZE --noseek
Replace the following:
- GCS_URI
: The Cloud Storage uniform resource identifier
(URI) of the Cloud Storage file. For example,
gs://bucket/sample.mainframe.dsn
. - DSN : The DSN destination location on the Mainframe.
- RECFM : The record format (RECFM) of the Mainframe file. The valid values are F, FB, and U. Note that these values are case-insensitive.
- LRECL : (Optional) The record length (LRECL) of the file. The value must be an integer >= 0. If LRECL is not specified the file is assumed to be in the undefined-length record format (U).
- BLKSIZE : (Optional) The block-size of the file. If set to 0, the system will determine the optimal block-size. The value must be an integer >= 0. If you don't specify a value, the file is treated as an unblocked file.
- noseek : (Optional) Include this parameter if you want to improve download performance. This flag is set to false by default, that is, seek operations are enabled.
Execution example
gsutil cp gs://sample-bucket/MAINFRAME.DSN.FILE MAINFRAME.DSN.FILE \
--lrecl=16 --blksize=0 --recfm=fb
Performance tuning configuration for the gsutil cp
command
Mainframe Connector supports the following performance tuning
configuration for the gsutil cp
command.
- Use the
--parallelism
flag to set the number of threads. The default value is 1 (single threaded). - Use the
--maxChunkSize
argument to set the maximum size of each chunk. Each chunk will have its own ORC file. Increase this value to reduce the number of chunks created at the cost of larger memory requirements during the transcoding process. For details, see Parse themaxChunkSize
argument . The default value is 128 MiB. - Use
--preload_chunk_count
argument to set the amount of data to preload to memory while all workers are busy. This argument can improve performance at the cost of memory. The default value is 2.
Execution example
gsutil cp \
--replace \
--parser_type=copybook \
--parallelism=8 \
--maxChunkSize=256MiB \
gs://$BUCKET/test.orc
In this example, we've considered a large file and so have used 8 threads at which line rate is reached. If you have enough memory, we recommend that you increase the chunk size to 256 MiB or even 512 MiB since it reduces creating overhead and finalizing Cloud Storage objects. For small files using less threads and smaller chunks might produce better results.
Parse the maxChunkSize
argument
The maxChunkSize
flag accepts values in the form of an amount and a
unit of measurement, for example 5 MiB. You can use whitespace between the
amount and magnitude.
You can provide the value in the following formats:
- Java format:b/k/m/g/t, for byte, kibibyte, mebibyte, gibibyte, and tebibyte respectively
- International format:KiB/MiB/GiB/TiB, for kibibyte, mebibyte, gibibyte, and tebibyte respectively
- Metric format:b/kb/mb/gb/tb, for kilobyte, megabyte, gigabyte, and terabyte respectively
Data size parsing is case insensitive. Note that you can't specify partial amounts. For example, use 716 KiB instead of 0.7 MiB.