Correlate data with outer joins
This document describes outer joins (left join and right join). Join operations are used to correlate and combine data from multiple sources based on common field values. By combining related security events and entities into a single, comprehensive view, you can provide effective threat detection and investigation.
Unlike standard (inner) joins
, which require matching entries in both data
sources, an outer join retrieves all records from one side of the join, even
if there are no matching entries in the other. Unmatched fields from the other
side are typically filled with null
. This prevents you from losing data that
doesn't have a match.
How outer joins work
The concept of outer joins in YARA-L 2.0 is identical to standard SQL outer joins:
-
The left outer join preserves all records from the left side of the join.
-
The right outer join preserves all records from the right side of the join.
Outer join syntax (left join and right join) is supported for all
queries–both with and without a match
condition.
Understand the left outer join
A left outer join (or left join) preserves all records from the data source
on the left side
of the left join
keyword.
-
If a record from the left side has no match in the right event, the fields from the right event are returned as
null. -
Placeholder implication: Any placeholder variable used in the
matchsection must reference a field from the left event to ensure accurate data aggregation across the full result set.
Event-to-event left join example
The following example demonstrates a left outer join to correlate user login
events with subsequent network connection events occurring on the same host.
The left join ensures that all USER_LOGIN
events are preserved in the result
set. If a
matching NETWORK_CONNECTION
event ( $e2
) is found, its data is joined.
If no match is found, the fields for $e2
are null
.
$e1.metadata.event_type = "USER_LOGIN"
$e2.metadata.event_type = "NETWORK_CONNECTION"
left join $e1.principal.hostname = $e2.principal.hostname
Define the left event
The following query example defines the left side of the join ( $e1
), which is
the event set that is preserved in the final result:
$e1.metadata.event_type = "USER_LOGIN"
The following table represents the query result, identifying the initial set of user login events:
USER_LOGIN
|
kiosk-4
|
192.168.1.104
|
|---|
Define the right event
The following query example defines the right side of the join ( $e2
), which
is the event set that is matched against the left events:
$e2.metadata.event_type = "NETWORK_CONNECTION"
The following table represents the set of network connection events available for matching:
NETWORK_CONNECTION
|
kiosk-4
|
203.0.113.3
|
|---|
Join the events
with a match section
The following example demonstrates a match query using a left outer join on
the principal.hostname
field:
$e1.metadata.event_type = "USER_LOGIN"
$e2.metadata.event_type = "NETWORK_CONNECTION"
left join $e1.principal.hostname = $e2.principal.hostname
$host = $e1.principal.hostname
match:
$host over 5m
-
The left outer join ensures that every
USER_LOGINevent ($e1) is included in the final result set. -
The placeholder $host is assigned the value from
$e1.principal.hostname. The left outer join ensures the presence of event$e1, making sure that the $host variable is always populated for aggregation. -
The rule aggregates the results by the host for a 5-minute time window.
Join result
The resulting data shows the combination of the two events. All records
from the left table ( $e1
) are retained, and fields from the right table ( $e2
)
are set to null
when no matching hostname is found (for example, for server-db-03
).
USER_LOGIN
|
kiosk-4
|
192.168.1.104
|
NETWORK_CONNECTION
|
203.0.113.3
|
Match found |
|---|
Example left join queries
This section provides example left join queries.
Joins with a match condition
-
Event-entity
$e1.metadata.event_type = "NETWORK_CONNECTION" $g1.graph.metadata.entity_type = "ASSET" left join $e1.principal.asset.hostname = $g1.graph.entity.asset.hostname $host = $e1.principal.asset.hostname match: $host over 5m -
Event-datatable
$host = $e1.principal.hostname left join $e1.principal.hostname = %all_dt_column_types.hostname match: $host by 5m
Joins without a match condition
-
Event-event
$e1.metadata.event_type = "USER_LOGIN" $e1.principal.ip = "114.241.96.87" $e2.metadata.event_type = "NETWORK_CONNECTION" left join $e1.principal.hostname = $e2.principal.hostname -
Event-entity
$e1.metadata.event_type = "NETWORK_CONNECTION" $g1.graph.metadata.entity_type = "ASSET" left join $e1.principal.asset.hostname = $g1.graph.entity.asset.hostname $host = $e1.principal.asset.hostname -
Event-datatable
$host = $e1.principal.hostname left join $e1.principal.hostname = %all_dt_column_types.hostname
Right outer join
A right outer join (or right join) preserves all records from the data source on
the right
side of the right join
keyword.
-
If a record from the right event has no match in the left event, the fields from the left event are returned as
null. -
Placeholder implication:Any placeholder variable used in the
matchsection must reference a field from the right event to ensure accurate data aggregation across the full result set.
Event-to-Event right join example
The following example demonstrates a right outer join
to correlate user login
events with subsequent network connection events occurring on the same host.
The right join
ensures that all NETWORK_CONNECTION
events are preserved in the
result set. If a matching USER_LOGIN
event is found, its data is joined.
If no match is found, the fields for $e1
are null
.
$e1.metadata.event_type = "USER_LOGIN"
$e2.metadata.event_type = "NETWORK_CONNECTION"
right join $e1.principal.hostname = $e2.principal.hostname
Define left event
The following query defines the left side of the join ( $e1
), which is the optional
event set in the final result:
$e1.metadata.event_type = "USER_LOGIN"
The following table represents the result of the query, identifying the initial set of user login events:
USER_LOGIN
|
server-db-03
|
10.0.0.50
|
|---|
Define right event
The following query defines the right side of the join ( $e2
), which is the event set
that is preserved in the final result.
$e2.metadata.event_type = "NETWORK_CONNECTION"
The following table represents the set of network connection events available for matching.
NETWORK_CONNECTION
|
vm-unauth-05
|
203.0.113.3
|
|---|
Join the events
The following example shows a match query with a right outer join on the principal.hostname
field:
$e1.metadata.event_type = "USER_LOGIN"
$e2.metadata.event_type = "NETWORK_CONNECTION"
right join $e1.principal.hostname = $e2.principal.hostname
$host = $e1.principal.hostname
match:
$host over 5m
-
The right outer join ensures that every
NETWORK_CONNECTIONevent ($e2) is included in the final result set. -
The placeholder
$hostis assigned the value from$e2.principal.hostname. The right outer join ensures the presence of event$e2, making sure the$hostvariable is always populated for aggregation. -
The rule aggregates the results by the host for a 5-minute time window.
Join result
The resulting dataset shows the combination of the two events. All records from
the right table ( $e2
) are retained, and fields from the left table ( $e1
) are set
to null when no matching hostname is found (for example, vm-unauth-05
).
null
|
null
|
null
|
NETWORK_CONNECTION
|
vm-unauth-05
|
203.0.113.4
|
No match |
|---|
Example right join queries
This section provides example right join queries.
Joins with a match condition
-
Event-event
$e1.metadata.event_type = "USER_LOGIN" $e2.metadata.event_type = "NETWORK_CONNECTION" right join $e1.principal.hostname = $e2.principal.hostname $host = $e2.principal.hostname match: $host over 5m -
Entity-event
$e1.metadata.event_type = "NETWORK_CONNECTION" $g1.graph.metadata.entity_type = "ASSET" right join $g1.graph.entity.asset.hostname = $e1.principal.asset.hostname $host = $e1.principal.asset.hostname match: $host over 5m -
Datatable-event
$host = $e1.principal.hostname right join %all_dt_column_types.hostname = $e1.principal.hostname match: $host by 5m
Joins without a match condition
-
Event-event
$e1.metadata.event_type = "USER_LOGIN" $e1.principal.ip = "114.241.96.87" $e2.metadata.event_type = "NETWORK_CONNECTION" right join $e1.principal.hostname = $e2.principal.hostname -
Entity-event
$e1.metadata.event_type = "NETWORK_CONNECTION" $g1.graph.metadata.entity_type = "ASSET" right join $g1.graph.entity.asset.hostname = $e1.principal.asset.hostname $host = $e1.principal.asset.hostname -
Datatable-event
$host = $e1.principal.hostname right join %all_dt_column_types.hostname = $e1.principal.hostname
Limitations
Consider the following limitations when you create outer joins:
-
A full outer join (left join and right join together) is not supported.
-
The query time range for matchless joins is limited to a maximum of 14 days.
-
You cannot directly join two contextual sources (for example, an entity directly to a datatable).
-
The primary Unified Data Model (UDM) event must be the preserved side of the outer join. The query is invalid if the primary event is on the "nullable" side.
-
An event-entity join must be a left join. This correctly preserves the event (
$e1). -
An entity-event join must be a right join. This correctly preserves the event (
$e1).
The following example is invalid because the UDM event (
$e1) is on the left, but the right join preserves the right side ($g1), which violates the rule that the UDM event must be preserved:// Invalid query $e1.metadata.event_type = "NETWORK_CONNECTION" $g1.graph.metadata.entity_type = "ASSET" right join $e1.principal.asset.hostname = $g1.graph.entity.asset.hostname -
Best practices
To prevent slow performance and query timeouts in outer join queries, use specific and narrow filters.
For example, a broad query like the following:
$e1.metadata.event_type = "USER_LOGIN"
$e2.metadata.event_type = "NETWORK_CONNECTION"
right join $e1.principal.hostname = $e2.principal.hostname
Can be optimized by adding specific criteria follows:
$e1.metadata.event_type = "USER_LOGIN"
$e1.principal.ip = "121.121.121.121"
$e1.principal.user.userid = "alex"
$e2.metadata.event_type = "NETWORK_CONNECTION"
$e2.src.hostname = "altostrat.com"
$e1.principal.hostname = $e2.principal.hostname
Need more help? Get answers from Community members and Google SecOps professionals.

