Collect URLScan IO logs
This document explains how to ingest URLScan IO logs to Google Security Operations using Amazon S3.
Before you begin
Make sure you have the following prerequisites:
- A Google SecOps instance
- Privileged access to URLScan IOtenant
- Privileged access to AWS(S3, IAM, Lambda, EventBridge)
Get URLScan IO prerequisites
- Sign in to URLScan IO .
- Click your profile icon.
- Select API Keyfrom the menu.
- If you don't have an API key yet:
- Click Create API Keybutton.
- Enter a descriptionfor the API key (for example,
Google SecOps Integration). - Select the permissionsfor the key (for read-only access, select Readpermissions).
- Click Generate API Key.
- Copy and save in a secure location the following details:
- API_KEY: The generated API key string (format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) - API Base URL:
https://urlscan.io/api/v1(this is constant for all users)
- API_KEY: The generated API key string (format:
- Note your API quota limits:
- Free accounts: Limited to 1000 API calls per day, 60 per minute
- Pro accounts: Higher limits based on subscription tier
- If you need to restrict searches to your organization's scans only, note down:
- User identifier: Your username or email (for use with
user:search filter) - Team identifier: If using teams feature (for use with
team:search filter)
- User identifier: Your username or email (for use with
Configure AWS S3 bucket and IAM for Google SecOps
- Create Amazon S3 bucketfollowing this user guide: Creating a bucket .
- Save bucket Nameand Regionfor future reference (for example,
urlscan-logs-bucket). - Create a Userfollowing this user guide: Creating an IAM user .
- Select the created User.
- Select Security credentialstab.
- Click Create Access Keyin section Access Keys.
- Select Third-party serviceas Use case.
- Click Next.
- Optional: Add a description tag.
- Click Create access key.
- Click Download CSV fileto save the Access Keyand Secret Access Keyfor future reference.
- Click Done.
- Select Permissionstab.
- Click Add permissionsin section Permissions policies.
- Select Add permissions.
- Select Attach policies directly.
- Search for AmazonS3FullAccesspolicy.
- Select the policy.
- Click Next.
- Click Add permissions.
Configure the IAM policy and role for S3 uploads
- In the AWS console, go to IAM > Policies.
- Click Create policy > JSON tab.
-
Enter the following policy:
{ "Version" : "2012-10-17" , "Statement" : [ { "Sid" : "AllowPutObjects" , "Effect" : "Allow" , "Action" : "s3:PutObject" , "Resource" : "arn:aws:s3:::urlscan-logs-bucket/*" }, { "Sid" : "AllowGetStateObject" , "Effect" : "Allow" , "Action" : "s3:GetObject" , "Resource" : "arn:aws:s3:::urlscan-logs-bucket/urlscan/state.json" } ] }- Replace
urlscan-logs-bucketif you entered a different bucket name.
- Replace
-
Click Next > Create policy.
-
Go to IAM > Roles > Create role > AWS service > Lambda.
-
Attach the newly created policy.
-
Name the role
urlscan-lambda-roleand click Create role.
Create the Lambda function
- In the AWS Console, go to Lambda > Functions > Create function.
- Click Author from scratch.
-
Provide the following configuration details:
Setting Value Name urlscan-collectorRuntime Python 3.13 Architecture x86_64 Execution role urlscan-lambda-role -
After the function is created, open the Codetab, delete the stub and enter the following code (
urlscan-collector.py):import json import os import boto3 from datetime import datetime , timedelta import urllib3 import base64 s3 = boto3 . client ( 's3' ) http = urllib3 . PoolManager () def lambda_handler ( event , context ): # Environment variables bucket = os . environ [ 'S3_BUCKET' ] prefix = os . environ [ 'S3_PREFIX' ] state_key = os . environ [ 'STATE_KEY' ] api_key = os . environ [ 'API_KEY' ] api_base = os . environ [ 'API_BASE' ] search_query = os . environ . get ( 'SEARCH_QUERY' , 'date:>now-1h' ) page_size = int ( os . environ . get ( 'PAGE_SIZE' , '100' )) max_pages = int ( os . environ . get ( 'MAX_PAGES' , '10' )) # Load state state = load_state ( bucket , state_key ) last_run = state . get ( 'last_run' ) # Prepare search query if last_run : # Adjust search query based on last run search_time = datetime . fromisoformat ( last_run ) time_diff = datetime . utcnow () - search_time hours = int ( time_diff . total_seconds () / 3600 ) + 1 search_query = f 'date:>now- { hours } h' # Search for scans headers = { 'API-Key' : api_key } all_results = [] for page in range ( max_pages ): search_url = f " { api_base } /search/" params = { 'q' : search_query , 'size' : page_size , 'offset' : page * page_size } # Make search request response = http . request ( 'GET' , search_url , fields = params , headers = headers ) if response . status != 200 : print ( f "Search failed: { response . status } " ) break search_data = json . loads ( response . data . decode ( 'utf-8' )) results = search_data . get ( 'results' , []) if not results : break # Fetch full result for each scan for result in results : uuid = result . get ( 'task' , {}) . get ( 'uuid' ) if uuid : result_url = f " { api_base } /result/ { uuid } /" result_response = http . request ( 'GET' , result_url , headers = headers ) if result_response . status == 200 : full_result = json . loads ( result_response . data . decode ( 'utf-8' )) all_results . append ( full_result ) else : print ( f "Failed to fetch result for { uuid } : { result_response . status } " ) # Check if we have more pages if len ( results ) < page_size : break # Write results to S3 if all_results : now = datetime . utcnow () file_key = f " { prefix } year= { now . year } /month= { now . month : 02d } /day= { now . day : 02d } /hour= { now . hour : 02d } /urlscan_ { now . strftime ( '%Y%m %d _%H%M%S' ) } .json" # Create NDJSON content ndjson_content = ' \n ' . join ([ json . dumps ( r , separators = ( ',' , ':' )) for r in all_results ]) # Upload to S3 s3 . put_object ( Bucket = bucket , Key = file_key , Body = ndjson_content . encode ( 'utf-8' ), ContentType = 'application/x-ndjson' ) print ( f "Uploaded { len ( all_results ) } results to s3:// { bucket } / { file_key } " ) # Update state state [ 'last_run' ] = datetime . utcnow () . isoformat () save_state ( bucket , state_key , state ) return { 'statusCode' : 200 , 'body' : json . dumps ({ 'message' : f 'Processed { len ( all_results ) } scan results' , 'location' : f "s3:// { bucket } / { prefix } " }) } def load_state ( bucket , key ): try : response = s3 . get_object ( Bucket = bucket , Key = key ) return json . loads ( response [ 'Body' ] . read ()) except s3 . exceptions . NoSuchKey : return {} except Exception as e : print ( f "Error loading state: { e } " ) return {} def save_state ( bucket , key , state ): try : s3 . put_object ( Bucket = bucket , Key = key , Body = json . dumps ( state ), ContentType = 'application/json' ) except Exception as e : print ( f "Error saving state: { e } " ) -
Go to Configuration > Environment variables.
-
Click Edit > Add new environment variable.
-
Enter the following environment variables, replacing with your values:
Key Example value S3_BUCKETurlscan-logs-bucketS3_PREFIXurlscan/STATE_KEYurlscan/state.jsonAPI_KEY<your-api-key>API_BASEhttps://urlscan.io/api/v1SEARCH_QUERYdate:>now-1hPAGE_SIZE100MAX_PAGES10 -
After the function is created, stay on its page (or open Lambda > Functions > your-function).
-
Select the Configurationtab.
-
In the General configurationpanel click Edit.
-
Change Timeoutto 5 minutes (300 seconds)and click Save.
Create an EventBridge schedule
- Go to Amazon EventBridge > Scheduler > Create schedule.
- Provide the following configuration details:
- Recurring schedule: Rate(
1 hour). - Target: your Lambda function
urlscan-collector. - Name:
urlscan-collector-1h.
- Recurring schedule: Rate(
- Click Create schedule.
Optional: Create read-only IAM user & keys for Google SecOps
- Go to AWS Console > IAM > Users.
- Click Add users.
- Provide the following configuration details:
- User: Enter
secops-reader. - Access type: Select Access key – Programmatic access.
- User: Enter
- Click Create user.
- Attach minimal read policy (custom): Users > secops-reader > Permissions > Add permissions > Attach policies directly > Create policy.
-
In the JSON editor, enter the following policy:
{ "Version" : "2012-10-17" , "Statement" : [ { "Effect" : "Allow" , "Action" : [ "s3:GetObject" ], "Resource" : "arn:aws:s3:::urlscan-logs-bucket/*" }, { "Effect" : "Allow" , "Action" : [ "s3:ListBucket" ], "Resource" : "arn:aws:s3:::urlscan-logs-bucket" } ] } -
Set the name to
secops-reader-policy. -
Go to Create policy > search/select > Next > Add permissions.
-
Go to Security credentials > Access keys > Create access key.
-
Download the CSV(these values are entered into the feed).
Configure a feed in Google SecOps to ingest URLScan IO logs
- Go to SIEM Settings > Feeds.
- Click Add New Feed.
- In the Feed namefield, enter a name for the feed (for example,
URLScan IO logs). - Select Amazon S3 V2as the Source type.
- Select URLScan IOas the Log type.
- Click Next.
- Specify values for the following input parameters:
- S3 URI:
s3://urlscan-logs-bucket/urlscan/ - Source deletion options: Select deletion option according to your preference.
- Maximum File Age: Include files modified in the last number of days. Default is 180 days.
- Access Key ID: User access key with access to the S3 bucket.
- Secret Access Key: User secret key with access to the S3 bucket.
- Asset namespace: The asset namespace .
- Ingestion labels: The label applied to the events from this feed.
- S3 URI:
- Click Next.
- Review your new feed configuration in the Finalizescreen, and then click Submit.
Need more help? Get answers from Community members and Google SecOps professionals.

