L2: Session & Lifecycle Layer
Purpose
L2 manages the lifecycle of a task session — from creation through execution to termination. It defines the session state machine, standard event types for execution visibility, and mechanisms for checkpoint and recovery.
Session
A session represents a single task execution lifecycle. It is created when a task passes L3 safety validation and destroyed when the task reaches a terminal state.
Session Properties
| Property | Type | Description |
|---|---|---|
session_id |
string (UUID v4) | Unique session identifier, generated by the callee |
state |
enum | Current lifecycle state |
created_at |
timestamp | When the session was created |
updated_at |
timestamp | Last state transition time |
session_token |
string | L3-issued token authorizing this session |
risk_level |
enum (R1–R5) | Risk level assessed by L3 |
metadata |
object | Caller-provided and callee-assigned metadata |
Session State Machine
TaskSubmit received
│
▼
PENDING
│ │
L3 Approve L3 Reject
│ │
▼ ▼
RUNNING REJECTED ─── (terminal)
│ │ │
┌────────┘ │ └────────┐
▼ │ ▼
PAUSED RUNNING ABORTING
│ │
└───► RUNNING ▼
ABORTED ──── (terminal)
RUNNING
│ │
▼ ▼
COMPLETED FAILED ──── (terminal)
(terminal)
State Definitions
| State | Description | Trigger |
|---|---|---|
| PENDING | Task received, awaiting L3 safety check | task_submit received |
| REJECTED | Task failed L3 safety check (terminal) | L3 rejects task |
| RUNNING | Task is actively being executed by the callee harness | L3 approves task |
| PAUSED | Execution temporarily suspended, session preserved | Callee decision or checkpoint |
| ABORTING | Abort requested, callee is cleaning up | Caller sends abort |
| ABORTED | Execution aborted, cleanup complete (terminal) | Callee finishes abort cleanup |
| COMPLETED | Task finished successfully (terminal) | Callee reports success |
| FAILED | Task terminated due to unrecoverable error (terminal) | Callee reports failure |
Valid State Transitions
| From | To | Trigger |
|---|---|---|
| PENDING | RUNNING | L3 safety check passed |
| PENDING | REJECTED | L3 safety check failed |
| RUNNING | PAUSED | Callee pauses execution |
| RUNNING | ABORTING | Caller sends abort |
| RUNNING | COMPLETED | Task finishes successfully |
| RUNNING | FAILED | Unrecoverable error |
| PAUSED | RUNNING | Callee resumes execution |
| PAUSED | ABORTING | Caller sends abort while paused |
| ABORTING | ABORTED | Callee finishes cleanup |
Event Stream
During execution, the callee emits a stream of events to provide visibility into progress. All events are delivered through the L1 event channel.
Event Envelope
Events are carried in the payload of an L1 message with type: "event":
{
"hcp_version": "1.0",
"message_id": "...",
"timestamp": "...",
"session_id": "...",
"type": "event",
"payload": {
"event_type": "progress",
"sequence": 42,
"data": { }
}
}
| Field | Type | Description |
|---|---|---|
event_type |
string (enum) | Type of event |
sequence |
integer | Monotonically increasing sequence number within the session |
data |
object | Event-type-specific content |
Standard Event Types
| Event Type | Description | Data Fields |
|---|---|---|
session_created |
Session has been created and execution is starting | state, risk_level, session_token |
state_changed |
Session state has transitioned | from_state, to_state, reason |
progress |
Execution progress update | stage (string), percent (number, optional), message (string) |
intermediate_result |
Partial or interim result available | result_type, data, is_partial |
log |
Execution log entry | level (info/warn/error), message, details |
warning |
Non-fatal warning | code, message, details |
error |
Error occurred but execution continues | code, message, recoverable |
checkpoint_created |
A checkpoint was saved | checkpoint_id, description, resumable |
session_closed |
Session has reached a terminal state | final_state, reason |
Event Ordering
- Events MUST be emitted with strictly increasing
sequencenumbers within a session. - Consumers MUST process events in
sequenceorder. - If events arrive out of order (due to transport characteristics), the consumer SHOULD buffer and reorder.
Integration with L1 Stream Continuity
The sequence number is the L2 mechanism that works with L1’s AMQP ACK-based stream continuity (see L1-transport-encoding.md — Stream Continuity) to provide lossless, deduplicated, ordered event delivery.
Caller-side per-session tracking:
The caller MUST maintain a last_processed_sequence value for each active session. This value is used for:
-
Deduplication on redelivery: When L1 redelivers messages after a caller crash (AMQP requeue), the caller checks
event.sequence <= last_processed_sequence— if true, the event has already been processed and is skipped (but still ACKed to advance the queue). -
Gap detection: If the caller receives
sequence = N+2without having processedN+1, it detects a gap. This should not occur under normal AMQP delivery but serves as a safety check. On gap detection, the caller SHOULD log a warning and continue processing (the missing event may arrive via redelivery). -
Recovery after restart: On restart, the caller loads
last_processed_sequencefrom persistent storage (if available) to resume deduplication. If not persisted, idempotent event processing (as recommended by L1) handles redeliveries.
Interaction model:
L1 (AMQP) L2 (Session) Caller Application
│ │ │
│ deliver event │ │
│ (delivery_tag=7) │ │
│─────────────────────────────►│ │
│ │ parse session_id, sequence │
│ │ check: seq > last_processed?│
│ │ │
│ │ ├─ Yes: forward to app ────►│ process event
│ │ │ update last_processed │
│ │ │ │
│ basic.ack(delivery_tag=7) ◄─┤ │ signal ACK to L1 │
│ │ │ │
│ │ └─ No (duplicate): skip │
│ basic.ack(delivery_tag=7) ◄─┤ ACK without processing │
│ │ │
Key principle: L1 ensures messages are never lost (AMQP durable delivery + manual ACK + requeue). L2 ensures events are never processed twice (sequence-based deduplication) and always in order (sequence-based ordering). Together, they provide exactly-once semantics at the application level.
Checkpoint & Recovery
For long-running tasks, checkpoints allow execution to be resumed after interruption.
Checkpoint
A checkpoint is a snapshot of the callee’s execution state at a given point. When a checkpoint is created, the callee emits a checkpoint_created event.
{
"event_type": "checkpoint_created",
"sequence": 100,
"data": {
"checkpoint_id": "ckpt-001",
"description": "Completed phase 1: material preparation",
"resumable": true,
"created_at": "2025-01-15T10:00:00.000Z"
}
}
Recovery
If a callee harness fails and restarts, it MAY resume from the latest checkpoint. Recovery is an internal concern of the callee — the protocol does not define how checkpoints are stored or how state is reconstructed. From the caller’s perspective:
- The event stream may have a gap (events between failure and recovery are lost).
- The callee SHOULD emit a
state_changedevent withreason: "recovered_from_checkpoint"upon recovery. - Execution continues from the checkpoint, and new events are appended to the same session.
Session Timeout
- The callee SHOULD enforce a maximum session duration, derived from the task’s
max_durationconstraint (L4) or a system default. - If execution exceeds the timeout, the callee transitions to FAILED with reason
"timeout". - Idle sessions (no events emitted for a configurable period) MAY be cleaned up by the callee.
Abort Protocol
When the caller sends an abort message:
- The session transitions to ABORTING.
- The callee begins cleanup: stops LLM calls, terminates running tools, releases resources.
- Cleanup SHOULD be bounded by a callee-defined abort timeout.
- Upon completion, the session transitions to ABORTED.
- The callee emits
session_closedwithfinal_state: "ABORTED".
The callee MUST make a best-effort attempt at cleanup but is not required to guarantee resource release within any specific timeframe. The caller SHOULD treat ABORTING as a transient state that will resolve to ABORTED.