Research

SLM Reasoning Layers

A new architecture for trustworthy AI — small language models as deterministic verification agents for large model systems.

Kapil Chandwani, ANRAK AI · March 2026

0ms
Verification latency
0x
Cheaper than LLM-as-judge
0
Cop types supported
0%+
Target accuracy per cop
Abstract

Large Language Models hallucinate. They fabricate facts, ignore constraints, and confidently present fiction as truth. Current approaches — prompt engineering, RLHF, constitutional AI, RAG — all ask the model to police itself. This is fundamentally flawed.

We propose Small Language Models (500M–7B parameters) deployed as independent, specialized verification layers that monitor larger models in real time. These SLMs are trained on narrow tasks with high reliability, incapable of the creative deception that makes large models untrustworthy, and fast enough to operate at inference time.

This is a new class of neurosymbolic AI where the “symbolic” layer is itself a neural network — but one small enough and deterministic enough to function as a reliable reasoning primitive.

The Problem

LLMs lie in five predictable ways

These are not edge cases. They are systematic, emergent properties of how large neural networks process and generate language.

01

Tool Use Fabrication

Claims to have used a tool or searched a database when it hasn't

02

Source Attribution

"Based on the document..." — then generates content not in the source

03

Retroactive Reasoning

Generates the answer first, then constructs reasoning to justify it

04

Confident Uncertainty

Presents uncertain info with the same conviction as known facts

05

Constraint Performance

Performs compliance rather than achieving it — finds creative workarounds

Architecture

The SLM Cop Framework

Multiple small models run in parallel between your primary model and the end user. Each checks one property. A deterministic verdict engine aggregates their outputs.

User Input
Query or prompt
Primary Model
8B–70B+ generates response
SLM Cop Squad — runs in parallel (~100ms)
Grounding
1B model
Domain
1B model
Consistency
1B model
Reasoning
3B model
Verdict Engine
Deterministic aggregation
All Pass
Serve response
Any Fail
Regenerate with feedback
Key Insight

Less capability, more reliability

Small models are better judges because they lack the capacity for creative deception. They can't construct elaborate lies — they can only check the one thing they were trained on.

🔒

Can't lie convincingly

A 500M model lacks the representational capacity to construct multi-layered, contextually appropriate deceptions. It checks a property and reports.

🎯

No sycophancy training

Never trained on conversations or human preferences. Has no concept of "pleasing the user." Trained only on (input, judgment) pairs.

🛡️

Resistant to prompt injection

Doesn't process input as "instructions" — processes it as features for classification. No instruction-following pathway to exploit.

Deterministic by specialization

All 500M parameters dedicated to one task. Converges to near-deterministic behavior — the same input produces the same output.

The Honesty Spectrum

500M–1B
95%
1B–3B
90%
3B–7B
85%
7B–13B
65%
30B–70B
40%
70B+
25%
Reliability on narrow verification tasks
Cop Taxonomy

Seven specialized verification agents

Each cop answers exactly one question. This specificity is what makes them reliable.

📋

Grounding Cop

1B–3B

Is every claim traceable to the provided context? Catches hallucinated facts, wrong attributions, subtle distortions.

🚧

Domain Constraint Cop

1B–3B

Does the response violate domain rules? Catches medical advice from receptionists, legal opinions from chatbots.

🔄

Consistency Cop

1B–3B

Does this contradict anything said before? Catches 'We close at 5' followed by 'Open until 8.'

🧠

Reasoning Cop

3B–7B

Does the chain-of-thought support the conclusion? Catches logical jumps and circular reasoning.

🔧

Tool Use Cop

500M–1B

Did the model actually call the tools it claims? Does the response match tool outputs?

📏

Instruction Cop

500M–1B

Does the response follow all system prompt constraints? Formatting, tone, length, behavior.

⚙️

Custom Cop

Any

Your own verification logic for domain-specific checks not covered by built-in types.

Dual Deployment

Guards training data and production responses

📊

At Generation Time

Cops verify each sample during dataset creation. Failed samples are regenerated with the cop's feedback — so your training data is clean before it enters the pipeline.

Teacher generates sample
Cops verify against rules + KB
Failed? Regenerate with feedback
Passed? Include in dataset
🚀

At Inference Time

In production, cops check every response before it reaches the user. If rejected, the model regenerates with the cop's critique. Critical failures return a safe fallback.

Model generates response
Cops verify in parallel (~100ms)
Failed? Up to 2 regeneration attempts
Verified response served to user
Comparison

How SLM Cops compare

Approach
Limitation
SLM Cops Advantage
Guardrails (NeMo, etc.)
Relies on expensive LLM calls for judgment
Fine-tuned small models — cheap, reliable, fast
Constitutional AI
Model evaluates itself — same biases
Independent external models — no correlation
Process Reward Models
Reasoning steps only, training time only
Any verifiable property, training + inference
Mixture of Agents
Large models collaborating — expensive
Small models verifying — 50x cheaper
Regex / Rule-based
Semantically blind — misses meaning
Understands 9 AM = 9:00 AM, catches subtlety
Implementation

How cops connect to any LLM

The cop system is a verification loop — not a modification to the LLM itself. It works with any model, any API, any framework.

1

Call LLM

Claude, GPT, or your fine-tuned model generates a response

2

Cops verify

Small models check the response in parallel (~100ms total)

All cops pass
Serve the response to the user
Any cop fails
Feed cop's reason back to the LLM → regenerate
Loop back to Step 1

The cop is just another LLM call

A cop model is a small language model (1B-3B parameters) running on any inference server — Ollama locally, a GPU server, or a cloud endpoint. You send it the primary model's response plus context, and it returns a JSON judgment. That's it.

The entire pattern
response = call_primary_llm(messages)     # Claude, GPT, your model

for cop in cop_squad:
    judgment = call_cop_model(cop, response, context)
    if not judgment["pass"]:
        messages += [
            {"role": "assistant", "content": response},
            {"role": "user", "content": f"Rejected: {judgment['reason']}. Regenerate."}
        ]
        response = call_primary_llm(messages)  # retry with feedback

return response

Where cops run

💻
Ollama (local)
Free, private, ~50ms. Run cops on your laptop alongside your app.
☁️
ANRAK API
Deploy cops on ANRAK, call via API. Same infrastructure as your primary model.
🖥️
Any GPU server
vLLM, TGI, or llama.cpp. Self-hosted on any cloud provider.
Serverless (Modal, Replicate)
Pay per call. Scales to zero when idle.

Build it with any tool

The orchestration is simple. The value is in the trained cop models — small models fine-tuned to reliably detect hallucinations, rule violations, and inconsistencies in your specific domain.

🔄

n8n / Make

Webhook → HTTP node (LLM) → HTTP nodes (cops in parallel) → IF node (verdict) → loop on failure. Visual, no code.

🔗

LangChain / CrewAI

Primary agent generates, cop agents verify. Orchestrator manages the feedback loop. Each cop is a tool or agent.

🐍

Raw Python / cURL

40 lines of code. Call LLM API, call cop API, check JSON, loop. Works in any language, any framework.

“A genius who sometimes lies needs a simple, honest cop. Not a smarter genius.”

The path to trustworthy AI runs through building smaller, less capable models and ensuring they behave — then using them to police the large ones.

Build trustworthy AI today

Train your own verification models on ANRAK AI. Your domain expertise becomes an executable guardrail.