Enrichment

Supported in:

Enrichment uses the following methods to add context to a Unified Data Model (UDM) indicator or event:

  • Identifies alias entities that describe an indicator, typically a UDM field.
  • Populates the UDM message with additional details from the identified aliases or entities.
  • Adds global enrichment data, such as GeoIP and VirusTotal, to UDM events.

Understand enrichment logic patterns

Google SecOps applies different logical patterns to data depending on the enrichment type. Use the following table to understand these patterns for troubleshooting and to explain why certain fields are populated, merged, or overwritten.

Logic pattern Description Applicable enrichment
First-match
Follows a strict priority list. The pipeline queries only the first available value found in the sequence. Artifact (file hashes)
Merged
Gathers and combines data from multiple fields simultaneously to build a single "golden" entity record. Asset, User
Conditional fallback
A specific field is only used for enrichment if a higher-priority identifier is missing. Asset ( ip address)
Mapping & overwrite
Uses a unique ID ( PSPI ) to resolve entities. Aliased data from the enrichment source replaces existing parsed data. Process

Asset enrichment

For asset enrichment, the pipeline identifies unique assets by evaluating multiple UDM fields. Unlike artifact enrichment (which picks one), asset enrichment merges context from multiple IDs to build a complete asset profile.

Google SecOps enriches asset events that are classified with the same namespace .

For assets, the logic is cumulative rather than exclusive , except for specific fallback scenarios. Use these details for explanation:

  • Logic type: Merged or Fallback. The pipeline gathers data from all available fields to create a single "Entity" view, unless a fallback condition (like the asset_id check) is met.
  • Field mappings:
    • Hostname, MAC, and asset_id : Treated as primary IDs. Aliasing results from all these fields are merged together to produce the final enriched asset profile.
    • IP address: Included in the enrichment query only if asset_id is unavailable.

For each asset event, the pipeline extracts the following UDM fields from the principal , src , and target entities:

UDM field Indicator type Logic / precedence
hostname
HOSTNAME Merged: Aliasing results from these fields are combined to produce the final enriched asset record.
asset_id
PRODUCT_SPECIFIC_ID Merged: Primary identifier used to consolidate asset context.
mac
MAC Merged: Used in conjunction with other identifiers to resolve the asset.
ip
IP Fallback: Included in the enrichment query only if asset_id is not available.

User enrichment

User enrichment resolves identity data by looking for specific identifiers. Like artifact enrichment , this pipeline uses a preference of order to determine which identifier is used as the primary key for the lookup.

For each user event, the pipeline extracts the following UDM fields from principal , src , and target :

UDM field Indicator type Logic or precedence
user.email_addresses
EMAIL Highest priority:The pipeline first attempts to enrich based on the user's primary or secondary email addresses.
user.windows_sid
WINDOWS_SID Second priority:If no email is available, the pipeline uses the Windows Security Identifier (SID).
user.userid
USER_ID Third priority:Used only if email and SID are missing; typically maps to local or application-specific IDs.
user.employee_id
EMPLOYEE_ID Lowest priority:The final fallback for resolving a user identity.

For each indicator, the pipeline performs the following actions:

  • Retrieves a list of user entities. For example, the entities of principal.email_address and principal.userid might be the same, or they might be different.
  • Chooses the aliases from the highest priority indicator type, using this priority order: WINDOWS_SID , EMAIL , USERNAME , EMPLOYEE_ID , and PRODUCT_OBJECT_ID .
  • Populates noun.user with the entity whose validity interval intersects with the event time.

Process enrichment

Process enrichment focuses on providing visibility into execution events. The pipeline extracts process details and enriches them by cross-referencing file reputations and parent-child relationships.

Use process enrichment to map a product-specific process ID ( product_specific_process_id ), or PSPI, to the actual process and retrieve details about the parent process. This process relies on the EDR event batch type.

UDM entity Field source Logic or priority
Primary entities
principal , src , target Extraction:The pipeline extracts the PSPI from these top-level entities to initiate the lookup.
Parent processes
principal.process.parent_process ,
src.process.parent_process ,
target.process.parent_process
Mapping:The PSPI retrieves details about the parent process with process aliasing.
Data merging
noun.process (for example, principal.process ) Overwrite Rule:Aliased fields take absolute priority. If both parsed data and aliased data exist for the same field, the pipeline replaces the parsed data with the aliased data.

The pipeline uses process aliasing to identify the actual process from the PSPI and retrieves information about the parent process. It then merges this data into the corresponding noun.process field within the enriched message.

EDR indexed fields for process aliasing

When a process launches, the system collects metadata (for example, command lines, file hashes, and parent process details). The EDR software running on the machine assigns a vendor-specific process UUID.

The following table lists the fields that are indexed during a process launch event:

UDM field Indicator type
target.product_specific_process_id PROCESS_ID
target.process Whole process; not just the indicator

In addition to the target.process field from the normalized event, Google SecOps collects and indexes parent process information.

Artifact enrichment

Artifact enrichment adds file hash metadata from VirusTotal and geolocation data for IP addresses. For file hashes, the pipeline stops at the first value found in a prioritized list; however, for IP addresses, it processes all entries in parallel. For each UDM event, the pipeline extracts and queries context data for the following artifact indicators from the principal , src , and target entities, where the enrichment behavior differs based on the indicator type:

Indicator type
Extraction logic
Precedence / order of operations
File hashes
First-match
The pipeline searches for hashes in the following order and picks only the first available to query VirusTotal:
  1. file.sha256
  2. file.sha1
  3. file.md5
  4. process.file.sha256
  5. process.file.sha1
  6. process.file.md5
IP address
Parallel (repeated)
Every public or routable IP address is treated as an independent entry. There is no order of preference; each IP receives its own enrichment results.

The pipeline uses UNIX epoch and event hour to define the time range for the file artifact queries. If geolocation data is available, the pipeline overwrites the following UDM fields for the principal , src , and target entities, based on the origin of the geolocation data:

  • artifact.ip
  • artifact.location
  • artifact.network (only if the data includes IP network context)
  • location (only if the original data doesn't include this field)

If the pipeline finds file hash metadata, it adds that metadata to the file or process.file fields, depending on the origin of the indicator. The pipeline keeps any existing values that don't overlap with the new data.

IP geolocation enrichment

Geographic aliasing provides geolocation data for external IP addresses. For each unaliased IP address in the principal , target , or src field for a UDM event, an ip_geo_artifact subprotocol buffer is created with the associated location and ASN information.

Geographic aliasing doesn't use lookback or caching. Due to the high volume of events, Google SecOps maintains an index in memory.

Google SecOps enriches file hashes into UDM events and provides additional context during an investigation. Hash aliasing enriches UDM events by combining all types of file hashes and providing information about a file hash during a search.

Google SecOps integrates VirusTotal file metadata and relationship enrichment to identify patterns of malicious activity and track malware movements across a network.

A raw log provides limited information about the file. VirusTotal enriches the event with file metadata, including details about malicious hashes and files. The metadata includes information, for example, filenames, types, imported functions, and tags. You can use this information in the UDM search and detection engine with YARA-L to understand malicious file events and during threat hunting. For example, you can detect modifications to the original file that use the file metadata for threat detection.

The following information is stored with the record. For a list of all UDM fields, see Unified Data Model field list .

Type of data UDM field
sha-256 ( principal | target | src | observer ).file.sha256
md5 ( principal | target | src | observer ).file.md5
sha-1 ( principal | target | src | observer ).file.sha1
size ( principal | target | src | observer ).file.size
ssdeep ( principal | target | src | observer ).file.ssdeep
vhash ( principal | target | src | observer ).file.vhash
authentihash ( principal | target | src | observer ).file.authentihash
PE file metadata Imphash ( principal | target | src | observer ).file.pe_file.imphash
security_result.threat_verdict ( principal | target | src | observer ).(process | file).security_result.threat_verdict
security_result.severity ( principal | target | src | observer ).(process | file).security_result.severity
last_modification_time ( principal | target | src | observer ).file.last_modification_time
first_seen_time ( principal | target | src | observer ).file.first_seen_time
last_seen_time ( principal | target | src | observer ).file.last_seen_time
last_analysis_time ( principal | target | src | observer ).file.last_analysis_time
exif_info.original_file ( principal | target | src | observer ).file.exif_info.original_file
exif_info.product ( principal | target | src | observer ).file.exif_info.product
exif_info.company ( principal | target | src | observer ).file.exif_info.company
exif_info.file_description ( principal | target | src | observer ).file.exif_info.file_description
signature_info.codesign.id ( principal | target | src | observer ).file.signature_info.codesign.id
signature_info.sigcheck.verfied ( principal | target | src | observer ).file.signature_info.sigcheck.verified
signature_info.sigcheck.verification_message ( principal | target | src | observer ).file.signature_info.sigcheck.verification_message
signature_info.sigcheck.signers.name ( principal | target | src | observer ).file.signature_info.sigcheck.signers.name
signature_info.sigcheck.status ( principal | target | src | observer ).file.signature_info.sigcheck.signers.status
signature_info.sigcheck.valid_usage ( principal | target | src | observer ).file.signature_info.sigcheck.signers.valid_usage
signature_info.sigcheck.cert_issuer ( principal | target | src | observer ).file.signature_info.sigcheck.signers.cert_issuer
file_type ( principal | target | src | observer ).file.file_type

Troubleshoot enrichment

If you notice that a UDM event is missing expected enrichment data, use the following suggestions to help resolve your issue.

General enrichment

If some of your events aren't enriched at all, a likely cause can be that Google SecOps prioritizes delivery speed. A small percentage of events (<1%) may skip enrichment during the first pass. To solve this, check back in a few minutes. The system automatically reprocesses these events. If enrichment is still missing after an hour, verify that the log source is correctly parsed into UDM.

Artifact enrichment (first-match logic)

If your event has an MD5 and a SHA256 hash, but you can only see VirusTotal metadata for the SHA256, this is first-match logic . The pipeline stops as soon as it finds the highest-priority hash ( sha256 ). It doesn't query VirusTotal for the MD5 if a SHA256 is present.

If you see geolocation for principal.ip , but not for target.ip , parallel logic treats each IP independently. If one IP is internal or private (non-routable) and the other is public, only the public IP receives geolocation enrichment.

Asset enrichment (merged and fallback logic)

If the IP address field doesn't show enrichment data on your asset, it means it's conditional fallback logic. The IP is only used for an enrichment query if the asset_id (PSID) is missing. If an asset_id exists, the system relies on it and ignores the IP for that specific query to prevent redundant or conflicting data.

User enrichment (preference of order)

If the Department field shows "IT" when my local logs say "Security" , it means that the user enrichment prefers parsed fields over aliased fields. If your raw log was parsed with "IT" , the enrichment pipeline doesn't overwrite it with the "Security" value from your identity source (for example, Okta or AD).

Process enrichment (mapping and overwrite)

If you see a process name in your raw log, but in the UDM search, it's replaced by a different name, it means it's an overwrite logic. Process enrichment prioritizes aliased fields. If the PSPI lookup returns a more accurate process name from the EDR context, it completely replaces the original parsed value.

What's next

For information about how to use enriched data with other Google SecOps features, see the following:

Need more help? Get answers from Community members and Google SecOps professionals.

Create a Mobile Website
View Site in Mobile | Classic
Share by: