A Dataproc job for running Apache PySpark applications on YARN.
JSON representation |
---|
{
"mainPythonFileUri"
:
string
,
"args"
:
[
string
]
,
"pythonFileUris"
:
[
string
]
,
"jarFileUris"
:
[
string
]
,
"fileUris"
:
[
string
]
,
"archiveUris"
:
[
string
]
,
"properties"
:
{
string
:
string
,
...
}
,
"loggingConfig"
:
{
object (
|
Fields | |
---|---|
mainPythonFileUri
|
Required. The HCFS URI of the main Python file to use as the driver. Must be a .py file. |
args[]
|
Optional. The arguments to pass to the driver. Do not include arguments, such as |
pythonFileUris[]
|
Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip. |
jarFileUris[]
|
Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Python driver and tasks. |
fileUris[]
|
Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks. |
archiveUris[]
|
Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip. |
properties
|
Optional. A mapping of property names to values, used to configure PySpark. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/spark/conf/spark-defaults.conf and classes in user code. An object containing a list of |
loggingConfig
|
Optional. The runtime log config for job execution. |