Issue #25: Durable Agent Workflows with Policy-Bounded Execution

15 min read | April 25, 2026

A lot of agent workflow demos still look like a state machine with model calls bolted onto the side. They persist checkpoints, pass around status enums, and call that autonomy, but the model is not actually participating in a meaningful workflow. It is just filling in text between deterministic branches.

That is the wrong shape if you want an AI system that still feels like software. The model should do interpretation, drafting, and critique. Deterministic code should still own the safety boundary, execution semantics, and restart behavior. If those responsibilities are inverted, the system becomes harder to trust right at the point where it starts taking actions.

In this issue, we build a durable local-first agent workflow in C#, using home repair as the concrete execution scenario. Microsoft Agent Framework handles the agent runtime through an OpenAI-compatible local endpoint, but deterministic code still owns policy evaluation, action normalization, idempotent execution, persisted workflow state, and live operational logs.

What You Are Building

You are building a production-shaped repair workflow that keeps both the AI work and the control boundaries visible:

Load runtime config from appsettings.json and REPAIRWF_ environment overrides
Use CoordinatorAgent to turn free-text repair requests into a structured assessment plus a proposed action plan
Use deterministic policy code to convert agent-assigned severity signals into a bounded urgency and action envelope
Use ReviewerAgent to critique the policy-aligned plan before execution
Persist workflow state, checkpoints, agent traces, and executed actions as JSON
Execute the final action idempotently through a file-backed action gateway
Recover pending executions after restart without duplicating side effects
Print live runtime logs so you can see agent prompts, raw responses, policy corrections, and execution steps in sequence

This is a compact action workflow where the model participates in the reasoning path, but deterministic code still decides what is allowed to happen.

System Structure

The architecture is intentionally small. The app loads a validated runtime profile, captures a repair request, asks the first agent for a structured workflow draft, evaluates the deterministic policy envelope for that assessment, aligns the plan to policy-safe internal actions, asks a reviewer agent to critique the aligned plan, then executes the final action exactly once through an idempotent gateway.

The diagram below shows the high-level control flow:

Runtime Configuration First

The app starts by loading and validating the workflow profile before any model call happens:

var configuration = new ConfigurationBuilder()
  .SetBasePath(Directory.GetCurrentDirectory())
  .AddJsonFile("appsettings.json", optional: true, reloadOnChange: false)
  .AddEnvironmentVariables(prefix: "REPAIRWF_")
  .Build();

var config = AppConfig.Load(configuration);
config.Validate();

The default local profile in this repo:

{
  "App": {
    "AgentBaseUrl": "http://localhost:11434/v1",
    "AgentApiKey": "ollama",
    "AgentModelId": "qwen3:8b",
    "AgentTimeoutSeconds": 60,
    "DataDirectory": "data/home-repair-runtime",
    "DispatchProviderName": "North Home Response",
    "ActionLeadTimeMinutes": 20,
    "AutonomousApprovalCapUsd": 250
  }
}

This matters because the execution boundary is operational, not just logical. Endpoint, model identity, timeout budget, storage path, provider name, and approval cap all shape what the workflow can safely do.

The App Wires One Durable Control Loop

The console host assembles the workflow runtime in one place:

var runtimeLogger = new ConsoleWorkflowRuntimeLogger();
var workflowStore = new JsonWorkflowStore(dataDirectory);
var policyEngine = new HomeRepairPolicyEngine(config);
var timeProvider = TimeProvider.System;

IHomeRepairAgentWorkflow agentWorkflow = new AgentFrameworkHomeRepairAgentWorkflow(
  new Uri(config.AgentBaseUrl),
  config.AgentApiKey,
  config.AgentModelId,
  TimeSpan.FromSeconds(config.AgentTimeoutSeconds),
  runtimeLogger);

var actionGateway = new FileBackedRecoveryActionGateway(
  dataDirectory,
  config.DispatchProviderName,
  config.ActionLeadTimeMinutes,
  timeProvider);

var engine = new HomeRepairWorkflowEngine(
  workflowStore,
  agentWorkflow,
  policyEngine,
  actionGateway,
  timeProvider,
  runtimeLogger);

That is the right level of indirection for a sample like this. There is one agent workflow, one deterministic policy engine, one durable store, and one action gateway. The moving parts stay small enough to inspect without pretending the problem is simpler than it is.

Two Agents, Two Narrow Roles

The system uses two ChatClientAgent instances over the same local OpenAI-compatible model endpoint:

_coordinatorAgent = new ChatClientAgent(
  chatClient,
  new ChatClientAgentOptions
  {
      Name = "CoordinatorAgent",
      Instructions = BuildCoordinatorInstructions()
  });

_reviewerAgent = new ChatClientAgent(
  chatClient,
  new ChatClientAgentOptions
  {
      Name = "ReviewerAgent",
      Instructions = BuildReviewerInstructions()
  });

The first role is intentionally broad only inside a narrow contract:

You are CoordinatorAgent in a durable home repair workflow.

Your job is to read the customer request and produce one grounded workflow draft:
1. a compact structured assessment of the issue
2. a proposed next-step repair plan

Non-negotiable rules:
1. Stay grounded in the customer text only.
2. Do not invent diagnostics, technician findings, or policy rules.
3. Prefer one clear next action instead of a long checklist.
4. If the case is ambiguous, say so in the summary and use conservative actions.
5. Return JSON only.

The reviewer then acts as a second control boundary rather than another free-form writer:

You are ReviewerAgent in a durable home repair workflow.

You receive the original customer request, a structured issue assessment, a deterministic policy envelope,
and a policy-aligned proposed plan. Your job is to challenge weak plans without turning the workflow into
a human-approval bottleneck.

Non-negotiable rules:
1. Keep only actions that are supported by the request and allowed by the supplied policy envelope.
2. Prefer Approve when the plan is already reasonable.
3. Use ReplaceAction only when another policy-eligible action is clearly safer or more appropriate.
4. Use CloseGuidanceOnly only when the evidence is too weak for autonomous work.
5. Return JSON only.

That separation matters. The first agent interprets and proposes. The second agent critiques. Neither of them gets to directly execute side effects.

Structured Output Is the Shared Contract

The coordinator returns both assessment and plan as one structured draft:

{
  "assessment": {
    "issueType": "WaterLeak|HeatingOutage|ElectricalHazard|ApplianceFailure|StructuralDamage|GeneralRepair|Unknown",
    "severity": "Low|Medium|High|Critical",
    "emergencyDispatchNeeded": false,
    "sameDayVisitNeeded": false,
    "propertyUnsafe": false,
    "vulnerableOccupant": false,
    "summary": "string"
  },
  "plan": {
    "summary": "string",
    "issueCategory": "string",
    "operationalUrgency": "Low|Medium|High",
    "eligibleOptions": [
      "string"
    ],
    "recommendedAction": "string",
    "autoExecutable": true,
    "approvalAmountUsd": 0,
    "customerMessage": "string",
    "dispatchMessage": "string",
    "policyReason": "string",
    "confidence": "high|medium|low",
    "policyCorrections": []
  }
}

The reviewer then returns a second structured contract:

{
  "decision": "Approve|ReplaceAction|FallbackToDefault|CloseGuidanceOnly",
  "reviewedAction": "string",
  "reviewedApprovalAmountUsd": 0,
  "rationale": "string",
  "confidence": "high|medium|low"
}

This is where agent output stops being loose text. Once both stages are forced into stable shapes, deterministic code can parse, normalize, validate, and reject bad output without having to infer intent from paragraphs.

That split is important in the current implementation. CoordinatorAgent assigns the initial issue classification and severity-like signals such as sameDayVisitNeeded or propertyUnsafe. The system-owned operational urgency that the workflow actually executes against is then computed by HomeRepairPolicyEngine.

Policy Alignment Owns the Safety Boundary

The most important part of the architecture is that free-form model output is not treated as executable intent. It is aligned into a bounded policy envelope first:

var policyDecision = policyEngine.Evaluate(authoringResult.Draft.Assessment);
workflow.Plan = policyEngine.AlignPlan(
  authoringResult.Draft.Plan,
  authoringResult.Draft.Assessment,
  policyDecision);

The alignment step is explicit about action replacement, approval caps, urgency normalization, and message regeneration:

if (!policyDecision.EligibleOptions.Contains(plan.RecommendedAction, StringComparer.Ordinal))
{
  planChanged = true;
  plan.PolicyCorrections.Add(
      $"Recommended action '{plan.RecommendedAction}' is not policy-eligible. Replaced with '{policyDecision.DefaultAction}'.");
  plan.RecommendedAction = policyDecision.DefaultAction;
}

if (plan.ApprovalAmountUsd > policyDecision.MaxApprovalAmountUsd)
{
  planChanged = true;
  plan.PolicyCorrections.Add(
      $"Approval amount ${plan.ApprovalAmountUsd} exceeded the policy cap of ${policyDecision.MaxApprovalAmountUsd}. Clamped to the cap.");
  plan.ApprovalAmountUsd = policyDecision.MaxApprovalAmountUsd;
}

if (planChanged)
{
  plan.CustomerMessage = BuildDefaultCustomerMessage(plan);
  plan.DispatchMessage = BuildDefaultDispatchMessage(plan);
  plan.PolicyCorrections.Add("Customer and dispatch messages were regenerated to match the policy-aligned action.");
}

That is the core control idea in the repo. The model can describe a repair in natural language and assign the initial issue severity, but the workflow only trusts internal action names like ScheduleUrgentVisit, DispatchEmergencyTechnician, or GuidanceOnly, and it derives the operational urgency from deterministic policy code rather than from free-form model language.

Policy Rules Are Deterministic and Inspectable

The policy engine decides the allowed action set directly from the structured assessment, not from a second model call:

private HomeRepairPolicyDecision BuildHeatingDecision(HomeRepairAssessment assessment)
{
  var urgent = assessment.VulnerableOccupant || assessment.SameDayVisitNeeded;
  return new HomeRepairPolicyDecision
  {
      DefaultAction = urgent ? "ScheduleUrgentVisit" : "ScheduleStandardVisit",
      DefaultUrgency = urgent ? "High" : "Medium",
      EligibleOptions = urgent
          ? ["ScheduleUrgentVisit", "ScheduleStandardVisit", "AuthorizeTemporaryFix"]
          : ["ScheduleStandardVisit", "GuidanceOnly"],
      MaxApprovalAmountUsd = Math.Min(config.AutonomousApprovalCapUsd, urgent ? 120m : 0m),
      AutoExecutable = true,
      PolicySummary = urgent
          ? "Heating outages with urgency or vulnerable occupants should move to an urgent visit."
          : "Routine heating outages can be booked as a standard visit with guidance if needed."
  };
}

Water leaks use a different policy envelope:

private HomeRepairPolicyDecision BuildWaterLeakDecision(HomeRepairAssessment assessment)
{
  if (assessment.PropertyUnsafe || assessment.EmergencyDispatchNeeded)
  {
      return new HomeRepairPolicyDecision
      {
          DefaultAction = "DispatchEmergencyTechnician",
          DefaultUrgency = "High",
          EligibleOptions = ["DispatchEmergencyTechnician", "AuthorizeTemporaryFix", "GuidanceOnly"],
          MaxApprovalAmountUsd = Math.Min(config.AutonomousApprovalCapUsd, 250m),
          AutoExecutable = true,
          PolicySummary = "Unsafe water leaks can auto-dispatch an emergency technician and approve a temporary fix cap."
      };
  }

  return new HomeRepairPolicyDecision
  {
      DefaultAction = assessment.SameDayVisitNeeded ? "ScheduleUrgentVisit" : "ScheduleStandardVisit",
      DefaultUrgency = assessment.SameDayVisitNeeded ? "High" : "Medium",
      EligibleOptions = ["ScheduleUrgentVisit", "ScheduleStandardVisit", "AuthorizeTemporaryFix"],
      MaxApprovalAmountUsd = Math.Min(config.AutonomousApprovalCapUsd, 150m),
      AutoExecutable = true,
      PolicySummary = "Water leaks should be scheduled quickly, with temporary fix approval available when needed."
  };
}

The workflow is useful because these rules are boring. You can inspect them, test them, and reason about their failure modes without asking what the model might do today.

The Reviewer Changes the Plan Only Inside the Envelope

After alignment, the second agent can still critique and change the plan, but only inside policy:

var reviewResult = await agentWorkflow.ReviewAsync(
  request,
  workflow.Assessment,
  policyDecision,
  workflow.Plan,
  cancellationToken);

workflow.Plan = policyEngine.ApplyAutonomousReview(
  workflow.Plan,
  reviewResult.Review,
  policyDecision);

The final application step still enforces defaults and clamps after review:

if (review.Confidence == "low"
  && plan.RecommendedAction != "GuidanceOnly"
  && plan.RecommendedAction != policyDecision.DefaultAction)
{
  plan.PolicyCorrections.Add(
      $"ReviewerAgent confidence was low. Action downgraded from '{plan.RecommendedAction}' to default '{policyDecision.DefaultAction}'.");
  plan.RecommendedAction = policyDecision.DefaultAction;
  plan.ApprovalAmountUsd = 0;
}

This is the right place for a second agent. It is not there to simulate a debate. It is there to add a second judgment pass before execution, while deterministic code still owns the final action boundary.

Live Runtime Logs Make the Workflow Inspectable

The sample prints runtime events as they happen so the workflow is visible instead of implied:

_runtimeLogger.Write("workflow", "Invoking CoordinatorAgent to classify the issue and draft the next action.");
_runtimeLogger.Write(
  "PolicyEngine",
  $"Policy changed the model output.\n" +
  $"Original model action: {authoringResult.Draft.Plan.RecommendedAction}\n" +
  $"Final aligned action: {workflow.Plan.RecommendedAction}\n" +
  $"Reason: {string.Join(" | ", workflow.Plan.PolicyCorrections)}");
_runtimeLogger.Write("workflow", $"Workflow is ready for durable execution of '{workflow.Plan.RecommendedAction}'.");

The logger is intentionally simple:

public sealed class ConsoleWorkflowRuntimeLogger : IWorkflowRuntimeLogger
{
  public void Write(string source, string message)
  {
      var prefix = $"[{DateTimeOffset.Now:HH:mm:ss}] [{source}] ";
      var normalized = message.Replace("\r\n", "\n", StringComparison.Ordinal);
      var lines = normalized.Split('\n', StringSplitOptions.None);

      foreach (var line in lines)
      {
          Console.WriteLine(prefix + line);
      }
  }
}

This sounds small, but it materially improves the sample. You can now watch the prompt go in, the raw model JSON come back, the policy correction happen, and the execution boundary fire in the same run.

Durable Workflow State Is Persisted as JSON

Every workflow is persisted as a JSON document under a workflow directory:

public async Task SaveAsync(HomeRepairWorkflow workflow, CancellationToken cancellationToken = default)
{
  var path = GetWorkflowPath(workflow.WorkflowId);
  var json = JsonSerializer.Serialize(workflow, JsonOptions);
  await File.WriteAllTextAsync(path, json, cancellationToken);
}

The persisted workflow captures request, assessment, plan, review, action result, checkpoints, revision, and raw agent trace metadata. A real saved run from the repo includes fields like:

{
  "workflowId": "HMR-20260425000533-602",
  "status": "ResolutionCompleted",
  "request": {
    "...": "..."
  },
  "assessment": {
    "...": "..."
  },
  "plan": {
    "...": "..."
  },
  "review": {
    "...": "..."
  },
  "actionResult": {
    "actionId": "ACT-B2FDEBEB",
    "actionType": "ScheduleUrgentVisit",
    "referenceCode": "URGENT-668D5A45"
  },
  "checkpoints": [
    "...",
    "...",
    "..."
  ],
  "agentTrace": [
    "...",
    "..."
  ]
}

That persistence layer is what makes the workflow durable instead of just conversational. The final state survives the process, and the full reasoning path is available for inspection later.

Execution Is Idempotent

The action gateway uses an idempotency key so the final effect is not duplicated across retries or restart recovery:

var result = await actionGateway.ExecuteAsync(
  workflow.Request,
  workflow.Plan,
  workflow.IdempotencyKey,
  cancellationToken);

The gateway first checks the action ledger before creating a new result:

var ledger = await LoadLedgerAsync(cancellationToken);
if (ledger.TryGetValue(idempotencyKey, out var existing))
{
  return Clone(existing, wasDuplicate: true);
}

ledger[idempotencyKey] = result;
await SaveLedgerAsync(ledger, cancellationToken);

This is a critical production detail. Once a workflow can dispatch a technician, issue approval, or open a case, retries stop being a harmless implementation detail. Idempotency becomes part of the correctness story.

Restart Recovery Is Explicit

On startup, the app attempts to recover workflows that were already ready for execution but have no action result yet:

public async Task<IReadOnlyList<HomeRepairWorkflow>> ResumePendingExecutionsAsync(
  CancellationToken cancellationToken = default)
{
  var workflows = await workflowStore.ListAsync(cancellationToken);
  var ready = workflows
      .Where(workflow => workflow.Status == WorkflowStatus.ReadyForExecution && workflow.ActionResult is null)
      .ToList();

  var resumed = new List<HomeRepairWorkflow>(ready.Count);
  foreach (var workflow in ready)
  {
      resumed.Add(await TryExecuteAsync(workflow, cancellationToken));
  }

  return resumed;
}

That keeps the durability story honest. A workflow does not stop being durable just because the process restarts between review and execution.

Walking a Real Live Run

A real local run for the heating-outage sample produced:

[01:02:15] [workflow] Starting HMR-20260425000215-970 for Jonas Reed.
[01:02:15] [workflow] Invoking CoordinatorAgent to classify the issue and draft the next action.
[01:02:32] [CoordinatorAgent] Raw model response:
{
"assessment": {
  "issueType": "HeatingOutage",
  "severity": "High",
  "sameDayVisitNeeded": true
},
"plan": {
  "recommendedAction": "Schedule emergency boiler service for immediate same-day visit",
  "confidence": "medium"
}
}
[01:02:32] [PolicyEngine] Policy envelope selected.
[01:02:32] [PolicyEngine] Default action: ScheduleUrgentVisit
[01:02:32] [PolicyEngine] Eligible options: ScheduleUrgentVisit, ScheduleStandardVisit, AuthorizeTemporaryFix
[01:02:32] [PolicyEngine] Policy changed the model output.
[01:02:32] [PolicyEngine] Original model action: Schedule emergency boiler service for immediate same-day visit
[01:02:32] [PolicyEngine] Final aligned action: ScheduleUrgentVisit
[01:02:40] [ReviewerAgent] Raw model response:
{
"decision": "Approve",
"reviewedAction": "ScheduleUrgentVisit",
"confidence": "high"
}
[01:02:40] [execution] Completed | attempt 1 | Executed ScheduleUrgentVisit as action ACT-B52BFAD3.

How to interpret this:

CoordinatorAgent produced a sensible free-form action proposal, but not an executable internal action name
PolicyEngine translated that proposal into the bounded internal action ScheduleUrgentVisit
ReviewerAgent approved the aligned plan instead of reviewing the raw free-form suggestion
The action gateway then executed only the policy-safe, reviewed action and persisted the result

This is exactly the intended shape. The model remains useful because it interprets the situation. The workflow remains trustworthy because deterministic code converts that interpretation into bounded execution semantics.

Reading the Persisted Trace

The console also lets you reopen a workflow later and inspect the stored trace, including checkpoints like:

- intake | Completed | attempt 1 | Home repair request captured.
- CoordinatorAgent | Completed | attempt 1 | HeatingOutage / Schedule emergency boiler service for immediate same-day visit / confidence=medium
- PolicyEngine | Completed | attempt 1 | Heating outages with urgency or vulnerable occupants should move to an urgent visit. Default action: ScheduleUrgentVisit.
- alignment | Completed | attempt 1 | Policy corrections applied: Recommended action 'Schedule emergency boiler service for immediate same-day visit' is not policy-eligible. Replaced with 'ScheduleUrgentVisit'.
- ReviewerAgent | Completed | attempt 1 | Approve / action=ScheduleUrgentVisit / confidence=high
- finalize | Completed | attempt 1 | ReviewerAgent verdict: Approve. Final action: ScheduleUrgentVisit.
- execution | Completed | attempt 1 | Executed ScheduleUrgentVisit as action ACT-B52BFAD3.

That is a better debugging surface than raw chat transcripts alone. You can see where the request entered the system, where policy intervened, whether the reviewer changed anything, and whether the final action executed or merely became ready.

Why This Architecture Works

The workflow works because the model and the code are doing different jobs on purpose:

The first agent interprets messy user language and drafts structured intent
The policy layer converts free-form suggestions into bounded internal actions
The reviewer agent adds a second reasoning pass before side effects happen
Durable JSON state keeps the workflow inspectable after the run ends
The action gateway makes execution idempotent and restart-safe
Live logs make the full handoff from agent output to deterministic execution visible

Potential Enhancements

To extend this project further, you can consider:

Preserve more request details such as access notes and preferred windows in regenerated customer and dispatch messages
Add explicit timeout classification so local-model cancellations are labeled as timeout failures instead of generic cancellation
Persist a separate operational event stream alongside workflow JSON for easier analytics
Replace the file-backed action ledger with a transactional store if you want concurrent execution guarantees
Add scenario tests for timeout handling, malformed JSON, and reviewer downgrades under low confidence

Final Notes

Agent workflows become more useful when the model is asked to do something distinct from the deterministic system around it.

If the model interprets and critiques, while code owns policy, persistence, idempotency, and restart behavior, you get a system that still feels like software even when it is genuinely agentic.

Explore the source code at the GitHub repository.

See you in the next issue.

Stay curious.

Share this article with your network.

LinkedIn X Facebook

Join the Newsletter

Subscribe for AI engineering insights, system design strategies, and workflow tips.

Your information is safe. Unsubscribe anytime.