
Most LLM reliability failures are context failures. The model is asked to decide with incomplete evidence, irrelevant evidence, or too much evidence for the available token window.
In this issue, we build a local-first deterministic context budgeting pipeline in C#. The running example is an e-commerce refund and resolution assistant for a late and damaged delivery case. The design is simple: reserve output tokens, reserve fixed prompt tokens, deterministically pack business context blocks, and fail fast if required evidence does not fit.
What You Are Building
A production-shaped prompt assembly and budgeting pipeline:
- Load app config from
appsettings.jsonplusDCB_environment overrides - Validate context window, reserved output, and fixed prompt budgets before execution
- Model business evidence as typed
ContextBlockobjects with priority and required flags - Compute available context tokens deterministically
- Pack blocks in stable order and track exclusions with explicit reasons
- Compose an inspectable prompt with system instructions, user task, and included blocks
- Optionally invoke a local OpenAI-compatible model endpoint
- Fail fast when required context overflows the budget
While the model proposes, the budgeter decides what context enters the prompt.
System Structure
This system uses a gate-based control flow: configuration and budget constraints are validated first, context blocks are packed deterministically, and execution stops immediately if required evidence does not fit. When the budget is valid, the app composes a bounded prompt preview and, if enabled, invokes the model and cleans response artifacts like <think> tags before printing the final output.
The diagram below shows this high-level control flow:
Context Budget Is a Runtime Contract
The available context budget is explicit and computed from three values:
public int AvailableContextTokens => ModelContextWindowTokens - ReservedOutputTokens - FixedPromptTokens;The request object enforces this contract up front:
if (AvailableContextTokens <= 0)
{
throw new InvalidOperationException(
"No context budget available. ReservedOutputTokens + FixedPromptTokens must be less than ModelContextWindowTokens.");
}No budget means no run. This prevents hidden truncation and silent prompt drift.
Runtime Config First
Configuration is loaded from file plus environment variables, then validated before any model call:
var configuration = new ConfigurationBuilder()
.SetBasePath(AppContext.BaseDirectory)
.AddJsonFile("appsettings.json", optional: true, reloadOnChange: false)
.AddEnvironmentVariables(prefix: "DCB_")
.Build();public int ModelContextWindowTokens { get; init; } = 850;
public int ReservedOutputTokens { get; init; } = 300;
public int FixedPromptTokens { get; init; } = 220;
public string Provider { get; init; } = "lmstudio";
public string BaseUrl { get; init; } = "http://localhost:1234/v1";
public string ApiKey { get; init; } = "not-needed";
public string ModelId { get; init; } = "deepseek/deepseek-r1-0528-qwen3-8b";
public float Temperature { get; init; } = 0.0f;
public bool EnableModelCall { get; init; } = true;This keeps runtime behavior explicit and reproducible across local runs, CI, and operations.
Model Context as Typed Blocks
Business evidence is represented as typed domain objects with validation at construction time:
public sealed class ContextBlock
{
public ContextBlock(
string blockId,
string source,
string content,
int priority,
DateTimeOffset observedAtUtc,
bool isRequired = false)
{
if (priority is < 0 or > 100)
{
throw new ArgumentOutOfRangeException(nameof(priority), "Priority must be in range 0..100.");
}
}
}The scenario data mixes customer message, order data, policy, history, and operations playbook as independent blocks with explicit priorities:
new(
blockId: "CUSTOMER_MESSAGE",
source: "support-ticket",
content: """
Customer says the blender jar arrived cracked and delivery was 4 days late.
Customer requests full refund and asks if return shipping is required.
Order ID: ORD-884120.
""",
priority: 100,
observedAtUtc: new DateTimeOffset(2026, 3, 13, 13, 18, 0, TimeSpan.Zero),
isRequired: true),
new(
blockId: "REFUND_POLICY",
source: "policy-service",
content: """
Damaged-on-arrival items are eligible for full refund or replacement.
Photo evidence is recommended but can be waived for first-time damage claims under $200.
Late-delivery compensation is 10% store credit when delay exceeds 2 days.
""",
priority: 95,
observedAtUtc: new DateTimeOffset(2026, 2, 21, 9, 0, 0, TimeSpan.Zero),
isRequired: true),This shape makes context auditable before it reaches the model.
Deterministic Packing Algorithm
Packing order is stable and explicit:
var orderedCandidates = request.Candidates
.OrderByDescending(static block => block.IsRequired)
.ThenByDescending(static block => block.Priority)
.ThenByDescending(static block => block.ObservedAtUtc)
.ThenBy(static block => block.BlockId, StringComparer.Ordinal);Each block is estimated, included if it fits, and excluded with a concrete reason if it does not:
if (tokenCount <= remainingContextTokens)
{
included.Add(new PackedContextBlock(block, tokenCount));
remainingContextTokens -= tokenCount;
continue;
}
var reason = block.IsRequired
? ExclusionReason.RequiredBlockTooLargeForBudget
: ExclusionReason.ExceedsRemainingBudget;
excluded.Add(new ExcludedContextBlock(block, tokenCount, reason));public enum ExclusionReason
{
ExceedsRemainingBudget = 1,
RequiredBlockTooLargeForBudget = 2
}Required evidence and optional evidence are treated differently, which is critical for safe behavior.
Token Estimation Strategy
The estimator is intentionally heuristic and provider-agnostic.
It estimates tokens from character length and word count, adds a small newline penalty, then uses the larger estimate.
This is simple but deterministic, which keeps budgeting behavior stable and testable.
public int EstimateTokens(string text)
{
if (string.IsNullOrWhiteSpace(text))
{
return 0;
}
var normalized = text.Trim();
var characterEstimate = (int)Math.Ceiling(normalized.Length / 4.0d);
var wordEstimate = (int)Math.Ceiling(CountWords(normalized) * 1.2d);
var newlinePenalty = CountNewlines(normalized);
return Math.Max(1, Math.Max(characterEstimate, wordEstimate) + newlinePenalty);
}The goal is consistency, not tokenizer-perfect precision.
Prompt Composition Is Inspectable
The composer builds one deterministic artifact with instructions, task, and each included block plus metadata:
builder.AppendLine("Context Blocks:");
foreach (var packed in includedBlocks)
{
var block = packed.Block;
builder.AppendLine($"### Block {block.BlockId}");
builder.AppendLine($"Source: {block.Source}");
builder.AppendLine($"Priority: {block.Priority}");
builder.AppendLine($"ObservedAtUtc: {block.ObservedAtUtc:O}");
builder.AppendLine($"Required: {block.IsRequired}");
builder.AppendLine($"EstimatedTokens: {packed.TokenCount}");
builder.AppendLine("Content:");
builder.AppendLine(block.Content);
builder.AppendLine();
}This makes prompt assembly inspectable and debuggable without guessing what the model actually received.
Execution Path and Failure Modes
The runtime explicitly stops when required context does not fit:
if (!result.CanProceed)
{
Console.Error.WriteLine();
Console.Error.WriteLine("Budgeting failed: one or more required business context blocks do not fit.");
Console.Error.WriteLine("Increase App:ModelContextWindowTokens, reduce App:ReservedOutputTokens, or shorten required blocks.");
return;
}Model invocation is also an explicit switch:
if (!config.EnableModelCall)
{
Console.WriteLine();
Console.WriteLine("Model invocation disabled (`App:EnableModelCall=false`).");
Console.WriteLine("Set `App:EnableModelCall` in appsettings.json or `DCB_App__EnableModelCall=true`.");
return;
}If a model returns hidden reasoning tags, the response is cleaned before printing:
const string openTag = "<think>";
const string closeTag = "</think>";The pipeline prefers explicit failure and explicit modes over silent behavior changes.
Test Coverage That Locks Behavior
The tests target deterministic guarantees, not only happy paths:
- Stable selection when candidate input order changes
- Required overflow detection with
CanProceedandHasRequiredOverflow - Exact accounting for available, used, and remaining context tokens
- Stable tie-breaking with
BlockIdwhen priority and timestamp are equal - Prompt composer preserving included block order
Assert.Equal(new[] { "A", "D", "C" }, result1.IncludedBlocks.Select(static b => b.Block.BlockId).ToArray());
Assert.Equal(result1.IncludedBlocks.Select(static b => b.Block.BlockId), result2.IncludedBlocks.Select(static b => b.Block.BlockId));Final Notes
Deterministic context budgeting is a practical reliability layer for LLM systems. It ensures the model sees the right evidence in a controlled order within a fixed budget.
When required context is guaranteed, optional context is bounded, and prompt assembly is inspectable, behavior becomes easier to trust and debug.
Explore the source code at the GitHub repository.
See you in the next issue.
Stay curious.
Join the Newsletter
Subscribe for AI engineering insights, system design strategies, and workflow tips.