Document Quality Evaluator

What it is

Assessment without assumptions

Most documents get evaluated informally. Someone reads them, forms a view, and either approves them or sends them back with comments. That process works, up to a point. What it rarely produces is a consistent, repeatable assessment that separates what is genuinely strong from what merely looks polished.

The Document Quality Evaluator changes that. It provides a structured framework for assessing a document against eight defined quality dimensions, producing a scored evaluation that is honest, specific, and grounded in what the document actually contains, not in assumptions about how it was made or who made it.

The framework is designed to work without access to the development process. Whether you are the document's author reviewing your own work before sharing it, a reviewer assessing a document you have been handed, or an independent assessor with no knowledge of how it was produced, the evaluation draws solely from what the document reveals about itself.

Three evaluator positions

Author self-review

You produced the document and want to assess it on its own terms, setting aside what you know about how it was developed. The most common use case before submission, sharing, or publication.

Assigned reviewer

You have been given the document to assess. You may or may not have development context. The framework allows you to choose whether to use that context or evaluate in isolation, and records which approach was taken.

Independent assessor

You have no development knowledge and may be assessing whether AI was involved in the document's production. In this position, the framework's AI Voice measure operates as a detection and characterisation tool rather than a quality score.

The framework

Eight dimensions of quality

Each dimension targets a distinct aspect of quality. Every dimension is scored on a five-band scale from Insufficient to Exemplary.

Dimension 01

Fit to Context

Does the document communicate its own purpose, audience, and constraints, and then consistently serve them?

Dimension 02

Evidence and Grounding

Are claims supported by evidence? Are specific assertions appropriately qualified where verification is not demonstrated?

Dimension 03

Analytical Depth

Given what this document appears to set out to do, does it contain the level of analytical work that purpose requires?

Dimension 04

Purposeful Structure

Does the structure serve the reader, guiding them efficiently toward understanding or decision, or does it impose form without function?

Dimension 05

Appropriate Register

Is the document's voice, formality, and language level consistent, coherent, and appropriate to its implied audience?

Dimension 06

Critical Integrity

Does the document say what its evidence actually supports? Are claims proportionate, uncertainty acknowledged, and conclusions honestly drawn?

Dimension 07

Internal Consistency

Does the document hold together as a coherent whole? Are claims consistent across sections, and does the conclusion follow from the argument made?

Dimension 08

Completeness against Evident Purpose

Does the document actually do what it sets out to do? Assessed against the purpose the document reveals about itself, does it arrive where it should?

How it works

Structure, modes, and scoring

Before you begin

Seven opening questions

Every evaluation begins with seven questions. They establish your relationship to the document, whether AI involvement is known, the document's type and developmental stage, what context is available beyond the document itself, and whether you want a full or light-touch review.

The answers calibrate how the framework is applied and are recorded in the evaluation report.

Operating modes

Inference and informed

The framework operates in one of two modes, determined by how much context is available. In inference mode, all scores are based solely on what the document reveals. In informed mode, contextual information you provide is used alongside the document.

Both modes are rigorous. The report always declares which was applied.

Scoring

The five-band scale

Scores reflect what is actually present, not what was intended.

Insufficient 0 — 2

Partial 3 — 4

Adequate 5 — 6

Capable 7 — 8

Exemplary 9 — 10

Adequate means the document passed a minimum threshold and nothing more. Capable is genuinely strong but not excellent. Exemplary means nothing more could reasonably be asked of the document at this standard.

See it in use

An evaluation in practice

Dimension 1 of 8

Fit to Context

Does the document serve its stated purpose for its intended audience and context?

7/10

Capable

Strengths

The document's purpose is clear and consistently served throughout: the argument that assessment needs redesign rather than restriction is signalled in the introduction and carried to the conclusion without deviation. Section headings are well-chosen and guide the reader efficiently. Author attribution and date are present. The implied audience — educated professionals in education, training, or policy — is coherently and consistently addressed. The public webspace context is well-served by the accessible but intellectually substantive register.

Weaknesses

The document carries no explicit framing statement of what it is: no genre label (think piece, position paper, commentary) and no stated audience. For a self-sufficient contextual artefact, this is a mild but real gap. The title signals a more radical departure than the conclusion ultimately delivers — the final paragraph explicitly dials back to "a direction of travel", creating a minor but noticeable tension between headline promise and substantive delivery.

Development area

Add a brief framing sentence at the outset — two lines would suffice — that locates the document as a position paper or think piece and identifies its intended readership. This resolves the self-sufficiency gap without altering the document's voice or argument.

Visibility note

Informed mode. Context used: intended for personal webspace as public-facing thought leadership. This informs the audience inference; without it, the implied audience would still be broadly inferable from the document's register and reference set but with slightly less precision.

Holistic Measure

AI Voice — Quality Assessment

Is AI contribution well-integrated, purposeful, and governed by consistent human direction?

7/10

Capable

Assessment

AI involvement is known. The document carries clear and consistent signals of human authorship governing the intellectual core. The "misdiagnosis" framing is a distinctive analytical position — not a repackaged AI-generated commentary. The decision to challenge assessment orthodoxy rather than simply assess AI's impact on it reflects genuine domain knowledge and intellectual positioning. The acknowledgement that the proposition is "not a finished model" reads as authentic authorial restraint rather than a formulaic hedge.

AI contribution is visible in places. The parallel three-part structure of the Implications section is a common AI organisational pattern. Several transitional sentences use AI-characteristic construction. The Risks section is relatively underdeveloped compared to the Proposition section — a pattern consistent with AI generating the positive case more fully than the critical countercase.

Overall characterisation: the document reads as genuine collaborative work in which AI has contributed to structure, some prose generation, and section scaffolding, but human direction is clearly governing throughout. The argument, the diagnostic framing, and the intellectual positioning are distinctively authored. AI contribution is purposeful and does not displace human thinking.

Development area

The most visible seam between AI and human contribution is in the Risks section, where the analytical depth falls below the level sustained in the Proposition section. Developing the equity paragraph and the qualification framework constraint with more specific analytical content would reduce this seam and produce a document in which the human voice governs even more consistently across all sections.

Overall Summary

Three findings + priority

The diagnostic framing and central proposition are the document's distinguishing qualities. The argument that assessment was already under strain before AI arrived, and that AI makes this more visible rather than introducing new weakness, is an analytically sharp and genuinely distinct position. These two elements place the document above most commentary in this space.

Evidence and grounding is the document's most significant vulnerability. Multiple consequential empirical assertions are stated without citation or qualification. For a final public-facing piece by a doctoral-qualified practitioner, this gap is disproportionate to the document's intellectual ambitions. The reference set of three is lean.

The Risks section is structurally underdeveloped relative to the Proposition section, creating an asymmetry that slightly undermines critical integrity. The equity paragraph in particular identifies a significant concern and then resolves it with a rhetorical framing rather than analytical engagement.

Priority

Strengthen evidentiary grounding for key empirical assertions and develop the equity paragraph in Risks to analytical rather than rhetorical engagement. These two changes would move the document from a strong Adequate overall to a consistent Capable.

This excerpt covers two of nine measures. The full report includes all eight dimensions, the holistic AI Voice assessment, a complete score summary, and the full evaluation record.

Download full evaluation report