Enrichment
Enrichment uses the following methods to add context to a Unified Data Model (UDM) indicator or event:
- Identifies alias entities that describe an indicator, typically a UDM field.
- Populates the UDM message with additional details from the identified aliases or entities.
- Adds global enrichment data, such as GeoIP and VirusTotal, to UDM events.
Understand enrichment logic patterns
Google SecOps applies different logical patterns to data depending on the enrichment type. Use the following table to understand these patterns for troubleshooting and to explain why certain fields are populated, merged, or overwritten.
| Logic pattern | Description | Applicable enrichment |
|---|---|---|
|
First-match
|
Follows a strict priority list. The pipeline queries only the first available value found in the sequence. | Artifact (file hashes) |
|
Merged
|
Gathers and combines data from multiple fields simultaneously to build a single "golden" entity record. | Asset, User |
|
Conditional fallback
|
A specific field is only used for enrichment if a higher-priority identifier is missing. | Asset ( ip
address) |
|
Mapping & overwrite
|
Uses a unique ID ( PSPI
) to resolve entities. Aliased data from the enrichment source replaces existing parsed data. |
Process |
Asset enrichment
For asset enrichment, the pipeline identifies unique assets by evaluating multiple UDM fields. Unlike artifact enrichment (which picks one), asset enrichment merges context from multiple IDs to build a complete asset profile.
Google SecOps enriches asset events that are classified with the same namespace .
For assets, the logic is cumulative rather than exclusive , except for specific fallback scenarios. Use these details for explanation:
- Logic type: Merged or Fallback. The pipeline gathers data from all available fields to create a single "Entity" view, unless a fallback condition (like the
asset_idcheck) is met. - Field mappings:
- Hostname, MAC, and
asset_id: Treated as primary IDs. Aliasing results from all these fields are merged together to produce the final enriched asset profile. - IP address: Included in the enrichment query only if
asset_idis unavailable.
- Hostname, MAC, and
For each asset event, the pipeline extracts the following UDM fields from the principal
, src
, and target
entities:
| UDM field | Indicator type | Logic / precedence |
|---|---|---|
hostname
|
HOSTNAME | Merged: Aliasing results from these fields are combined to produce the final enriched asset record. |
asset_id
|
PRODUCT_SPECIFIC_ID | Merged: Primary identifier used to consolidate asset context. |
mac
|
MAC | Merged: Used in conjunction with other identifiers to resolve the asset. |
ip
|
IP | Fallback: Included in the enrichment query only if asset_id
is not available. |
User enrichment
User enrichment resolves identity data by looking for specific identifiers. Like artifact enrichment , this pipeline uses a preference of order to determine which identifier is used as the primary key for the lookup.
For each user event, the pipeline extracts the following UDM fields from principal
, src
, and target
:
| UDM field | Indicator type | Logic or precedence |
|---|---|---|
user.email_addresses
|
Highest priority:The pipeline first attempts to enrich based on the user's primary or secondary email addresses. | |
user.windows_sid
|
WINDOWS_SID | Second priority:If no email is available, the pipeline uses the Windows Security Identifier (SID). |
user.userid
|
USER_ID | Third priority:Used only if email and SID are missing; typically maps to local or application-specific IDs. |
user.employee_id
|
EMPLOYEE_ID | Lowest priority:The final fallback for resolving a user identity. |
For each indicator, the pipeline performs the following actions:
- Retrieves a list of user entities. For example, the entities of
principal.email_addressandprincipal.useridmight be the same, or they might be different. - Chooses the aliases from the highest priority indicator type, using this
priority order:
WINDOWS_SID,EMAIL,USERNAME,EMPLOYEE_ID, andPRODUCT_OBJECT_ID. - Populates
noun.userwith the entity whose validity interval intersects with the event time.
Process enrichment
Process enrichment focuses on providing visibility into execution events. The pipeline extracts process details and enriches them by cross-referencing file reputations and parent-child relationships.
Use process enrichment to map a product-specific process ID
( product_specific_process_id
), or PSPI, to the actual process and retrieve
details about the parent process. This process relies on the EDR event batch
type.
| UDM entity | Field source | Logic or priority |
|---|---|---|
|
Primary entities
|
principal
, src
, target
|
Extraction:The pipeline extracts the PSPI from these top-level entities to initiate the lookup. |
|
Parent processes
|
principal.process.parent_process
,src.process.parent_process
,target.process.parent_process
|
Mapping:The PSPI retrieves details about the parent process with process aliasing. |
|
Data merging
|
noun.process
(for example, principal.process
) |
Overwrite Rule:Aliased fields take absolute priority. If both parsed data and aliased data exist for the same field, the pipeline replaces the parsed data with the aliased data. |
The pipeline uses process aliasing to identify the actual process from the PSPI
and retrieves information about the parent process. It then merges this data
into the corresponding noun.process
field within the enriched message.
EDR indexed fields for process aliasing
When a process launches, the system collects metadata (for example, command lines, file hashes, and parent process details). The EDR software running on the machine assigns a vendor-specific process UUID.
The following table lists the fields that are indexed during a process launch event:
| UDM field | Indicator type |
|---|---|
| target.product_specific_process_id | PROCESS_ID
|
| target.process | Whole process; not just the indicator |
In addition to the target.process
field from the normalized event,
Google SecOps collects and indexes parent process information.
Artifact enrichment
Artifact enrichment adds file hash metadata from VirusTotal and geolocation data
for IP addresses. For file hashes, the pipeline stops at the first value found in a prioritized list; however, for IP addresses, it processes all entries in parallel. For each UDM event, the pipeline extracts and queries context
data for the following artifact indicators from the principal
, src
, and target
entities, where the enrichment behavior differs based on the indicator type:
-
file.sha256 -
file.sha1 -
file.md5 -
process.file.sha256 -
process.file.sha1 -
process.file.md5
The pipeline uses UNIX epoch and event hour to define the time range for the
file artifact queries. If geolocation data is available, the pipeline overwrites
the following UDM fields for the principal
, src
, and target
entities,
based on the origin of the geolocation data:
-
artifact.ip -
artifact.location -
artifact.network(only if the data includes IP network context) -
location(only if the original data doesn't include this field)
If the pipeline finds file hash metadata, it adds that metadata to the file or process.file
fields, depending on the origin of the indicator. The pipeline
keeps any existing values that don't overlap with the new data.
IP geolocation enrichment
Geographic aliasing provides geolocation data for external IP addresses. For
each unaliased IP address in the principal
, target
, or src
field for a UDM event, an ip_geo_artifact
subprotocol buffer is created
with the associated location and ASN information.
Geographic aliasing doesn't use lookback or caching. Due to the high volume of events, Google SecOps maintains an index in memory.
Enrich events with VirusTotal file metadata
Google SecOps enriches file hashes into UDM events and provides additional context during an investigation. Hash aliasing enriches UDM events by combining all types of file hashes and providing information about a file hash during a search.
Google SecOps integrates VirusTotal file metadata and relationship enrichment to identify patterns of malicious activity and track malware movements across a network.
A raw log provides limited information about the file. VirusTotal enriches the event with file metadata, including details about malicious hashes and files. The metadata includes information, for example, filenames, types, imported functions, and tags. You can use this information in the UDM search and detection engine with YARA-L to understand malicious file events and during threat hunting. For example, you can detect modifications to the original file that use the file metadata for threat detection.
The following information is stored with the record. For a list of all UDM fields, see Unified Data Model field list .
| Type of data | UDM field |
|---|---|
| sha-256 | ( principal | target | src | observer ).file.sha256
|
| md5 | ( principal | target | src | observer ).file.md5
|
| sha-1 | ( principal | target | src | observer ).file.sha1
|
| size | ( principal | target | src | observer ).file.size
|
| ssdeep | ( principal | target | src | observer ).file.ssdeep
|
| vhash | ( principal | target | src | observer ).file.vhash
|
| authentihash | ( principal | target | src | observer ).file.authentihash
|
| PE file metadata Imphash | ( principal | target | src | observer ).file.pe_file.imphash
|
| security_result.threat_verdict | ( principal | target | src | observer ).(process | file).security_result.threat_verdict
|
| security_result.severity | ( principal | target | src | observer ).(process | file).security_result.severity
|
| last_modification_time | ( principal | target | src | observer ).file.last_modification_time
|
| first_seen_time | ( principal | target | src | observer ).file.first_seen_time
|
| last_seen_time | ( principal | target | src | observer ).file.last_seen_time
|
| last_analysis_time | ( principal | target | src | observer ).file.last_analysis_time
|
| exif_info.original_file | ( principal | target | src | observer ).file.exif_info.original_file
|
| exif_info.product | ( principal | target | src | observer ).file.exif_info.product
|
| exif_info.company | ( principal | target | src | observer ).file.exif_info.company
|
| exif_info.file_description | ( principal | target | src | observer ).file.exif_info.file_description
|
| signature_info.codesign.id | ( principal | target | src | observer ).file.signature_info.codesign.id
|
| signature_info.sigcheck.verfied | ( principal | target | src | observer ).file.signature_info.sigcheck.verified
|
| signature_info.sigcheck.verification_message | ( principal | target | src | observer ).file.signature_info.sigcheck.verification_message
|
| signature_info.sigcheck.signers.name | ( principal | target | src | observer ).file.signature_info.sigcheck.signers.name
|
| signature_info.sigcheck.status | ( principal | target | src | observer ).file.signature_info.sigcheck.signers.status
|
| signature_info.sigcheck.valid_usage | ( principal | target | src | observer ).file.signature_info.sigcheck.signers.valid_usage
|
| signature_info.sigcheck.cert_issuer | ( principal | target | src | observer ).file.signature_info.sigcheck.signers.cert_issuer
|
| file_type | ( principal | target | src | observer ).file.file_type
|
Troubleshoot enrichment
If you notice that a UDM event is missing expected enrichment data, use the following suggestions to help resolve your issue.
General enrichment
If some of your events aren't enriched at all, a likely cause can be that Google SecOps prioritizes delivery speed. A small percentage of events (<1%) may skip enrichment during the first pass. To solve this, check back in a few minutes. The system automatically reprocesses these events. If enrichment is still missing after an hour, verify that the log source is correctly parsed into UDM.
Artifact enrichment (first-match logic)
If your event has an MD5 and a SHA256 hash, but you can only see VirusTotal metadata for the SHA256, this is first-match logic
. The pipeline stops as soon as it finds the highest-priority hash ( sha256
). It doesn't query VirusTotal for the MD5 if a SHA256 is present.
If you see geolocation for principal.ip
, but not for target.ip
, parallel logic treats each IP independently. If one IP is internal or private (non-routable) and the other is public, only the public IP receives geolocation enrichment.
Asset enrichment (merged and fallback logic)
If the IP address field doesn't show enrichment data on your asset, it means it's conditional fallback logic. The IP is only used for an enrichment query if the asset_id
(PSID) is missing. If an asset_id
exists, the system relies on it and ignores the IP for that specific query to prevent redundant or conflicting data.
User enrichment (preference of order)
If the Department
field shows "IT"
when my local logs say "Security"
, it means that the user enrichment prefers parsed fields over aliased fields. If your raw log was parsed with "IT"
, the enrichment pipeline doesn't overwrite it with the "Security"
value from your identity source (for example, Okta or AD).
Process enrichment (mapping and overwrite)
If you see a process name in your raw log, but in the UDM search, it's replaced by a different name, it means it's an overwrite logic. Process enrichment prioritizes aliased fields. If the PSPI lookup returns a more accurate process name from the EDR context, it completely replaces the original parsed value.
What's next
For information about how to use enriched data with other Google SecOps features, see the following:
- Use context-enriched data in UDM Search .
- Use context-enriched data in rules .
- Use context-enriched data in reports .
Need more help? Get answers from Community members and Google SecOps professionals.

