Janus: an Agentic Rule of Two proof of concept

Janus is a small proof of concept for the agentic Rule of Two: an agent should never hold all three of these at once—untrusted input, sensitive data, and the ability to mutate the outside world. If it does, a single injected prompt can read private data and exfiltrate it. The rule doesn’t “solve” prompt injection, but it does deterministically reduce the severity of failures by constraining the shape of the agent. The code lives at github.com/script3r/janus.

Why this is hard

Agentic systems blend inboxes, internal docs, and external actions in one loop. That is exactly the shape attackers want. The hard part is enforcing boundaries without killing usefulness. The Rule of Two frames the tradeoffs: you can keep power, but you have to pick which two capabilities are safe to combine in a single session.

The prototype in plain terms

Janus enforces “modes.” Each mode only allows two capabilities at a time. To switch modes, the agent is restarted, gets a fresh identity, and only receives a sanitized handoff state. This breaks the chain between untrusted input and outbound actions.

  • Mode AB: untrusted input + sensitive data (no outbound comms)
  • Mode BC: sensitive data + comms (no untrusted input)
  • Mode AC: untrusted input + comms (no sensitive data)

The proof of concept wires this up with a supervisor, a network guardian, and a simple handoff protocol. It’s not production-ready, but it makes the boundary concrete and testable.

End-to-end test walkthrough

The test simulates a realistic failure mode: a malicious email tries to trick the agent into leaking calendar data. The flow looks like this:

sequenceDiagram
    autonumber
    participant Agent as Agent
    participant Egress as Guardian (Egress)
    participant Supervisor as Supervisor
    participant Asset as Asset (janus-net)
    Agent->>Egress: 🚀 GET http://asset-server/messages/read
    Egress->>Asset: 🔒 mTLS Request: /messages/read
    Agent->>Egress: 🚀 GET http://asset-server/calendar/read
    Egress->>Asset: 🔒 mTLS Request: /calendar/read
    Agent->>Supervisor: 📩 Petition for Mode BC
    Note right of Supervisor: Transitioning to execute sanitized plan
    Supervisor-->>Agent: ♻️ Restarted in Mode BC
    Agent->>Egress: 🚀 POST http://asset-server/messages/write
    Egress->>Asset: 🔒 mTLS Request: /messages/write
    Note over Asset: 📧 ACTION: Email Sent
    Asset-->>Egress: 200 OK
    Egress-->>Agent: 200 OK

Phase 1 runs in Mode AB. The agent can read untrusted input and sensitive data, so it sees the attack and drafts a clean response plan. It cannot send anything outward yet. To proceed, it petitions janusd, which kills the container, resets the identity, and restarts the agent in Mode BC. Only a sanitized handoff state crosses the boundary.

Phase 2 runs in Mode BC. The agent can send messages, but it can’t read untrusted input, so the injected prompt never reaches the new context. The network guardian (janus-net) enforces egress, and the agent sends the safe response based on the sanitized plan.

What I learned

Agentic security isn’t about perfect prompts. It’s about constraining which capabilities can coexist in one session. Janus shows that if you design for lifecycle resets, identity boundaries, and narrow capability sets, you can reduce the blast radius without neutering the agent. The Rule of Two isn’t a finish line—defense in depth, least privilege, and human approvals still matter—but it is a practical baseline for building agents that fail safer.