Testing & Assessment Suite

A test score is only as good as the items behind it.

ReliCheck helps teachers, instructional designers, and certification authors check whether each test item is doing the work it should. Item difficulty, discrimination, distractor patterns, skill alignment, and reliability appear automatically, so the scores you report are defensible to learners, families, administrators, and certifying bodies.

A teacher reviewing item-level analysis after administering a classroom test
What a score alone can't tell you

Four questions a test score alone can't answer.

Every test score gets questioned eventually, by the learner who missed an item, the family asking for clarity, the administrator preparing a report, or the certifying body reviewing the program. The questions are predictable. The answer lives in the item analysis.

Asked by the teacher

Did the test actually measure what we taught?

ReliCheck answers

Skill tag rollups show per-standard performance, so you see which objectives the test actually measured and which students mastered each one. The score becomes diagnostic, not just summative.

Asked by the parent

Why did my child miss this item?

ReliCheck answers

Per-item analysis shows whether the missed item was hard for everyone (a sign it may be poorly worded) or specifically hard for that student (a sign of where to focus). Distractor analysis surfaces if a confusing wrong answer pulled students who actually knew the material.

Asked by the principal or department

Did learners actually grow this year?

ReliCheck answers

Pre/post item analysis with effect sizes and the Reliable Change Index labels the score lift with the evidence behind it: likely learning gain, possible measurement drift, or interpret with caution. The interpretation comes with the numbers that support it.

Asked by the certifying body or accreditor

How do you know each item is doing its job?

ReliCheck answers

Difficulty and discrimination for scored items, with distractor utilization for multiple-choice items. Flags identify items that are too easy, too hard, miskeyed, or have dead distractors. The report exports in a format external reviewers expect.

The best test next year is the one this year's items just told you to build.

From a score to a stronger test

How an item analysis turns this year's test into next year's.

Most platforms produce a score and stop. ReliCheck reads the score the way a measurement specialist would and surfaces the items that earned a place on the next administration, the items that need revision, and the items that should be retired.

Stage 1

Score collected

A test gets administered. Items are scored against the answer key. The class score arrives, but without item analysis the score is just a number, with no story about which items did the measuring.

Stage 2

Item analysis runs

Per-item difficulty, discrimination, and distractor usage are computed automatically. Skill tag rollups show per-standard performance. Reliability across the test is reported with KR-20 alongside alpha.

Stage 3

Revised test

Each item is flagged keep, revise, or drop. The next administration starts from a stronger item bank, and the score becomes evidence you can explain, not just a number someone might dispute.

In the platform

Item analysis built into every administration.

The checks a measurement specialist would run by hand, computed automatically the moment scored responses arrive. The reports come out in the formats teachers, departments, districts, and certifying bodies expect.

Score and item analysis
Difficulty and discrimination

Per-item p-values and point-biserial

Difficulty (proportion correct) labeled easy, medium, or hard. Discrimination (point-biserial correlation) flags items where stronger students did worse than weaker students, which usually points to a miskey or a confusing prompt.

Distractor analysis

Per-option utilization and dead-distractor flags

For each multiple-choice item, ReliCheck reports how many students chose each option. Distractors no one picked are flagged as candidates to rewrite or drop, since they are not contributing to measurement.

Reliability

KR-20, KR-21, alpha, and split-half

Reliability per test reported across the standard estimators, with item-if-deleted analyses that flag the items pulling reliability down. Useful when a department, district, or certifying body asks for the number.

Learning and reliability context
Skill rollups

Per-standard and per-objective performance

Tag each item to a skill, standard, or objective inside the builder. Reports group performance by skill, so the conversation shifts from "what was the score?" to "which standards did students master?"

Pre/post change

Effect sizes and the Reliable Change Index

Paired-sample analyses with Cohen's d and the Reliable Change Index. A score that moved from pre to post is labeled with the evidence behind it: likely learning gain, possible measurement drift, or interpret with caution.

Hosted delivery

Publish and share a hosted test

Publish the test to a shareable URL. Students take it in the browser. Responses score against the answer key automatically. Useful for online administrations, makeup sittings, and certification studies.

What you hand to parents, departments, and reviewers

The deliverables teachers and assessment authors export.

Every analysis produces an artifact ready for the parent conference, the department meeting, the principal's report, or the certifying-body packet. The math is exposed, the writing is editable, and the item-level evidence is right there.

Item analysis . Algebra Unit 4 Quiz

n = 28 students . KR-20 = 0.78 . Mean = 76%
Item Difficulty Discrimination Distractors Flag
Q1 0.78 0.41 All used Good
Q2 0.92 0.05 B unused Too easy
Q3 0.54 0.48 All used Good
Q4 0.18 -0.12 All used Likely miskeyed
Q5 0.64 0.36 All used Good
PDF

Learner score report

One per learner, broken down by skill tag. Item-level performance, missed items in context, and a plain-language paragraph instructors can share with learners, families, or advisors.

PDF

Group summary

Group-level distribution, per-skill mastery, and the items most learners missed. Useful for the next instructional block, department meeting, or certification review.

Revision plan

Item revision recommendations

Each flagged item comes with a recommended action: keep, revise the prompt, replace a distractor, double-check the key, or drop. The next test starts from those recommendations.

Change report

Pre/post change summary

Effect sizes, Reliable Change Index, and per-learner tier counts. The score lift is labeled with the evidence behind it: likely learning gain, possible measurement drift, or interpret with caution.

Show the items behind the score.

Start free, upload a test you already gave, or preview a sample item-health report and see how item evidence helps instructors, administrators, and certification teams explain the score.