This page lists services that Dataproc image versions run on Dataproc cluster nodes.
All nodes
The following services run on all nodes in a cluster.
Node type
Service
Image versions
Description
Standard clusters
The following services run on standard clusters.
Node type
Service
Image versions
Description
all
Manages Hive table metadata. As a default, uses the local
mariadb
(image versions < 1.5) or mysql
(image versions 1.5+) database
on the master node as the Hive table metadata store.
Using the default database is not recommended because these databases
are tied to the cluster's lifecycle. Instead, use either of the following as
the Hive metastore database (in recommendation order):
all
< 1.5
A relational database used as the default underlying database for Hive
metastore in Dataproc < 1.5 images
1.5+
A relational database used as the default underlying database for Hive metastore
in Dataproc 1.5+ images
HA Clusters
In Dataproc High Availability (HA) clusters , different services run on different master nodes, as show below. HA cluster worker node services are the same as those listed for standard clusters .
Node type
Service
Image versions
Description
All masters
all
A quorum of journal nodes maintains an edit log of HDFS namespace modifications.
If a failover occurs, the Standby NameNode reads the edit log
and takes control from the Active NameNode.
all
Manages Hive table metadata. As a default, uses the local
mariadb
(image versions < 1.5) or mysql
(image versions 1.5+) database
on the master node as the Hive table metadata store.
Using the default database is not recommended because these databases
are tied to the cluster's lifecycle. Instead, use either of the following as
the Hive metastore database (in recommendation order):
all
all
A ZooKeeper quorum is used for distributed coordination. In High Availability (HA) clusters
,
it is used for HDFS NameNodes
and YARN
resource managers
leader election.
all
ZKFC is the
ZKFailoverController
process, which runs
with the HDFS NameNode. It monitors the health of the NameNode, and manages leader
election via ZooKeeper in the event of a failover.< 1.5
A relational database used as the default underlying database for Hive
metastore in Dataproc < 1.5 images
1.5+
A relational database used as the default underlying database for Hive metastore
in Dataproc 1.5+ images