Public records retention is the least glamorous problem in government IT and one of the few that can actually end up in court. Every record an agency holds has a statutory clock on it. Classify it wrong and you either destroy something you were legally required to keep, or you hoard sensitive material long past the point you were allowed to. Automated records retention classification in government is usually pitched as a Copilot problem. For the sensitive corpus, it is the wrong answer, and I want to walk through why, and what I built instead.

Retention Is a Legal Problem Wearing an IT Costume

In Washington, the destruction and preservation of public records runs through RCW 40.14, and the disposition of any given record type is governed by a retention schedule issued by the State Archives, most commonly the Local Government Common Records Retention Schedule. Each record category carries a Disposition Authority Number, a minimum retention period, and a cutoff trigger that starts the clock. Separately, RCW 42.56, the Public Records Act, governs what has to be disclosed, which means a record that is also responsive to a request or under a litigation hold cannot be destroyed even if its retention has lapsed.

So classification is not “what is this document about.” It is “which schedule rule governs this document, what is the minimum retention, and is anything currently freezing it in place.” That is a legal mapping with real consequences, and it is exactly the kind of judgment a generic chatbot is happy to fake.

Why the Cloud LLM Is the Wrong Tool for the Sensitive Corpus

Plenty of records can go to a cloud model inside the boundary, and that is fine. But the corpus that needs classifying most urgently is usually the one you least want leaving the building: investigative files, sealed material, personnel records, anything PII-heavy or with a confidentiality obligation attached. For that content, “send it to a hosted model and trust the data handling addendum” is a conversation an agency records officer should not have to win.

There is also a quieter problem. A general large language model asked to assign a retention code will produce one whether or not it actually knows the schedule. It will sound confident and cite a number that does not exist. In a domain where the classification is legally load-bearing, a plausible wrong answer is worse than no answer, because someone will act on it.

In retention, a confident wrong answer is worse than no answer. Someone destroys a record on the strength of it.

The Architecture: Local SLM Plus Vector Retrieval Over the Schedule

The pattern I built keeps inference on-prem and grounds every decision in the actual schedule. A small language model runs locally, on agency hardware, so no record content ever leaves the environment. Alongside it, an embedding model builds a vector index of two things: the retention schedule itself, broken into individual rules, and a set of already-classified exemplar documents.

For each incoming document, the system embeds the content, retrieves the handful of schedule rules and prior examples that are semantically closest, and hands that bundle to the local model along with the document. The model is not asked to recall the schedule from memory. It is asked to reason over the specific rules placed in front of it and pick the one that governs, returning the Disposition Authority Number, the minimum retention, the cutoff trigger, and a short rationale. When the retrieved rules conflict or none of them fit cleanly, the document is routed to a records specialist instead of being force-fit into a category. The model’s job is to be right or to admit it is not sure, not to always have an answer.

The leverage here is real. As a capability benchmark, this kind of pipeline saves on the order of 33,000 staff hours per million documents against manual classification. The point is not that it replaces the records officer. It is that it turns an impossible manual backlog into a review queue the officer can actually clear.

Citation-Bound by Design

Defensible destruction is the whole game. When an agency disposes of a record, it has to be able to show, later, that the record reached the end of its lawful retention and was not under any hold at the time. A classification with no reasoning attached cannot support that. A classification that names the exact schedule rule it applied, the retention period that followed from it, and the evidence it used, can.

That is why the system is retrieval-grounded and citation-bound rather than generative free-text. Every recommendation carries its receipts. The result is audit-ready by construction, which is a very different claim than “it never makes a mistake.” It will. The design assumption is that it will, which is why a human approves disposition and why low-confidence calls never reach the destruction queue on their own.

Where GCC and On-Prem Meet

Keeping inference local does not mean ignoring the rest of the compliance posture. The same control objectives that govern an agency’s GCC (Government Community Cloud) footprint apply to the on-prem pipeline: access control, audit logging, data handling aligned to CMMC and NIST 800-171 objectives. The local model is a deliberate architectural choice for the most sensitive corpus, not an escape hatch from governance. For everything that can safely live in the cloud boundary, it should. For everything that cannot, this is how you still get the automation without the exposure.

One disclaimer worth stating plainly: none of this is legal advice, and the agency’s records officer owns the schedule and the final disposition decision. The system makes that person faster and gives them a defensible trail. It does not replace their authority.

Who Built This

I am Jacob, a U.S. Navy veteran and the engineer behind Puget Sound AI, a veteran-owned small business (VOSB) building AI and automation for government environments. The records pipeline above is the kind of system I design, build, and document myself, so your staff owns it after handoff. If you are sitting on a retention backlog that nobody has the headcount to touch, there is a way through it that keeps the sensitive material on your own hardware. Let’s talk.

Automating Public-Records Retention Classification Without Touching the Cloud: Local SLM + Vector Retrieval

Retention Is a Legal Problem Wearing an IT Costume

Why the Cloud LLM Is the Wrong Tool for the Sensitive Corpus

The Architecture: Local SLM Plus Vector Retrieval Over the Schedule

Citation-Bound by Design

Where GCC and On-Prem Meet

Who Built This

Questions About Your GCC Environment?

Automating Public-Records Retention Classification Without Touching the Cloud: Local SLM + Vector Retrieval

Retention Is a Legal Problem Wearing an IT Costume

Why the Cloud LLM Is the Wrong Tool for the Sensitive Corpus

The Architecture: Local SLM Plus Vector Retrieval Over the Schedule

Citation-Bound by Design

Where GCC and On-Prem Meet

Who Built This

More from Insights

Shadow AI in Government: Why Locking Down Copilot Pushes CUI Off Your Network

Subcontracting GCC Talent: What Federal Primes Should Vet Before Adding an AI “Specialist” to a Bid

Multi-Agent Orchestration in Government: What Agent-to-Agent Communication Actually Means for Your GCC Tenant

Questions About Your GCC Environment?