Building an AI Agent Audit Trail That Holds Up in a Security Incident

When your agent causes a security incident, "the agent did it" is not an acceptable incident report. Here's what a useful audit trail for AI agents actually looks like — and what most teams are missing.

What makes an audit trail useful

An audit trail's value is determined by what questions it can answer after an incident. The questions that matter for AI agent incidents are different from those for human user incidents. For human users: which account accessed what resource, when, and from where? For AI agents: which agent accessed what resource, in what run, triggered by which human, using what credentials, and did the access pattern match the expected behavior?

Most standard audit logs answer the first set of questions reasonably well. GitHub's audit log tells you which app made a request. Slack's audit log tells you which bot token was used. But neither tells you which agent run generated the call, which task was being executed, or whether the access was within the agent's expected behavioral envelope. That gap is what makes post-incident investigation for AI agents painful.

The audit trail you need has four attributes: per-event attribution (which agent, which run, which task), credential traceability (which token was used for each API call, where that token came from, what scope it carried), behavioral baseline comparison (was this access pattern consistent with prior runs), and coverage across all providers (a unified view of activity across GitHub, Slack, Google, and any other OAuth-connected service the agent uses).

The attribution gap in provider audit logs

Provider audit logs are designed for their own security model, not yours. GitHub's audit log records events by app installation — "GitHub App X created a commit." If ten different AI agents use the same GitHub App installation, the audit log attributes all of their activity to the same app. You cannot distinguish which agent made which commit without additional context you have to supply yourself.

The same problem exists for Slack bot tokens, Google service accounts, and most other OAuth-based integrations. The provider authenticates the app credential, not the agent using it. Their audit log reflects the credential, not the agent. For a system with one agent using one credential, this is workable. For a system with ten agents sharing credentials (a common pattern in early multi-agent deployments), provider audit logs are nearly useless for attribution.

The fix is to give each agent its own unique credentials and ensure those credentials appear in the provider audit log in a way that is attributable to the specific agent. Per-agent credentials is a prerequisite for per-agent attribution. This is one reason why credential sharing across agents is a bad practice even before you think about security — it makes post-incident investigation much harder.

What the Alter event schema looks like

Every event that Alter logs has a consistent schema regardless of the OAuth provider involved. This allows a single SIEM rule to process token events from GitHub, Slack, and Google without provider-specific parsers.

{
  "event_type": "token.minted",
  "timestamp": "2025-02-03T14:23:17.432Z",
  "agent_id": "agt_01hx4k7m9n2p3q5r6s7t8u9v0w",
  "agent_name": "pr-review-agent",
  "run_id": "run_7f3a9b2c1d4e5f6a",
  "task_context": "review-pr-4521",
  "provider": "github",
  "token_id": "tok_9a8b7c6d5e4f3a2b",
  "scope_requested": ["repo:write", "issues:write"],
  "scope_granted": ["repo:read", "issues:read"],
  "scope_downscoped": true,
  "ttl_seconds": 3600,
  "policy_id": "pol_code-review-agent-v3",
  "source_ip": "10.0.1.45"
}

The run_id correlates all token events within a single agent invocation. The task_context is set by the agent and describes what it was doing when it requested the token. The scope_downscoped flag surfaces cases where the agent requested broader access than its policy allowed. All of these fields are present for every event, which means SIEM dashboards built on this schema work without provider-specific customization.

Connecting to Splunk

Alter exports events to Splunk via HTTP Event Collector (HEC). In Settings → Integrations → Splunk, paste your HEC URL and token. Alter starts sending events immediately. The index, sourcetype, and source are configurable; we recommend a dedicated index (alter_agent_tokens) to keep agent events searchable without noise from other sources.

Once events are flowing, a useful starting set of Splunk searches:

// All token events for a specific agent in the last 24h
index=alter_agent_tokens agent_id="agt_01hx4..." earliest=-24h

// Scope downscoping events (agent requested more than policy allows)
index=alter_agent_tokens scope_downscoped=true

// Token events for a specific run (incident investigation)
index=alter_agent_tokens run_id="run_7f3a9b2c1d4e5f6a"

// Unusual activity: token mints outside business hours
index=alter_agent_tokens event_type="token.minted"
| eval hour=strftime(_time, "%H") | where hour < 6 OR hour > 22

Connecting to Datadog

The Datadog integration uses the Datadog Logs API directly. In Settings → Integrations → Datadog, enter your API key and optionally your site (e.g., datadoghq.eu for EU customers). Alter tags all events with source:alter and service:agent-auth by default, which you can override to match your existing tagging conventions.

In Datadog Log Management, create a log parsing rule that indexes the agent_id, run_id, provider, and scope_downscoped fields as facets. This makes them searchable and filterable in the Log Explorer without requiring full-text search on the raw event. The standard Datadog monitor for "unexpected scope downscoping" fires a P3 alert when any agent has more than 5 downscoping events in a 15-minute window — typically a sign that the agent's code has been updated to request broader access than its policy allows.

The incident timeline reconstruction problem

In a post-incident investigation, you typically need to answer: "What did the agent do between time T1 and T2, using which credentials, and what was the impact?" Without a credential-layer audit trail, this requires correlating provider audit logs across multiple services, mapping each event back to the agent by credential, and trying to infer the task context from the API calls made.

With Alter's event stream in your SIEM, the query is: filter by agent_id, filter by time range, sort by timestamp. The result shows every token that was minted and used, in order, with the task context label the agent provided. If you need to reconstruct exactly which API calls were made with each token, the token_id field lets you correlate against provider-side audit logs to get the complete picture.

The reconstruction that takes two days without proper instrumentation takes 20 minutes with it. That is the operational value of getting the audit trail right before an incident happens, not after.

Retention and compliance

Alter retains audit events for 90 days on the standard plan, 1 year on the enterprise plan, and indefinitely for customers who archive to their own S3-compatible storage. For SOC 2 Type II compliance, the 1-year minimum is typical — check with your auditor for your specific requirements.

All events are immutable once written. The tamper-evident log property is enforced by chaining each event's hash to the previous event, similar to blockchain-style append-only logs. This is a requirement for audit logs that will be reviewed in legal or compliance contexts — a mutable log is not evidence.

If your compliance program requires log integrity verification, Alter provides a verification API that lets you check the hash chain for any time range. The verification can be run by your internal compliance tooling or by an external auditor without access to the raw event data.