This page lists services that Dataproc image versions run on Dataproc cluster nodes.
All nodes
The following services run on all nodes in a cluster.
Node type
 
 Service
 
 Image versions
 
 Description
 
Standard clusters
The following services run on standard clusters.
Node type
 
 Service
 
 Image versions
 
 Description
 
all
 
 Manages Hive table metadata. As a default, uses the local 
 
mariadb 
(image versions < 1.5) or mysql 
(image versions 1.5+) database
  on the master node as the Hive table metadata store.
  Using the default database is not recommended because these databases
  are tied to the cluster's lifecycle. Instead, use either of the following as
  the Hive metastore database (in recommendation order):  
all
 
  
< 1.5
 
 A relational database used as the default underlying database for Hive
  metastore in Dataproc < 1.5 images
 
1.5+
 
 A relational database used as the default underlying database for Hive metastore
  in Dataproc 1.5+ images
 
HA Clusters
In Dataproc High Availability (HA) clusters , different services run on different master nodes, as show below. HA cluster worker node services are the same as those listed for standard clusters .
Node type
 
 Service
 
 Image versions
 
 Description
 
All masters
 
  
 all
 
 A quorum of journal nodes maintains an edit log of HDFS namespace modifications.
If a failover occurs, the Standby NameNode reads the edit log
and takes control from the Active NameNode.
 
all
 
 Manages Hive table metadata. As a default, uses the local 
 
mariadb 
(image versions < 1.5) or mysql 
(image versions 1.5+) database
  on the master node as the Hive table metadata store.
  Using the default database is not recommended because these databases
  are tied to the cluster's lifecycle. Instead, use either of the following as
  the Hive metastore database (in recommendation order):  
all
 
  
all
 
 A ZooKeeper quorum is used for distributed coordination. In High Availability (HA) clusters 
,
  it is used for HDFS NameNodes 
and YARN
resource managers 
leader election.
 
all
 
 ZKFC is the 
 
ZKFailoverController 
process, which runs
with the HDFS NameNode. It monitors the health of the NameNode, and manages leader
election via ZooKeeper in the event of a failover.< 1.5
 
 A relational database used as the default underlying database for Hive
  metastore in Dataproc < 1.5 images
 
1.5+
 
 A relational database used as the default underlying database for Hive metastore
  in Dataproc 1.5+ images
 

