Overview
Headline numbers + score distribution
Six KPI tiles (mean, median, range, SD, items, pass rate), score-distribution histogram, top strongest and flagged items, AI narrator pinned at the top in teacher voice.
The eight-tab Test analytics dashboard, built for classroom assessments where each item has a single correct answer. Reliability, item difficulty, distractor analysis, skill rollups, item health.
Headline numbers + score distribution
Six KPI tiles (mean, median, range, SD, items, pass rate), score-distribution histogram, top strongest and flagged items, AI narrator pinned at the top in teacher voice.
Reliability of the test itself
Cronbach's alpha as KR-20 on dichotomous items, McDonald's omega, Spearman-Brown split-half, SEM, alpha-if-deleted with delta per item.
How hard is each question?
Percent correct per item with labels Very Easy (90%+), Easy, Moderate, Hard, Very Hard. Inline horizontal bar chart sorted by difficulty.
Does each item discriminate between strong and weak students?
| Item | % correct | r_pb | Label |
|---|---|---|---|
| Q07 | 71% | 0.62 | Strong |
| Q03 | 67% | 0.48 | Good |
| Q15 | 43% | 0.22 | Review |
| Q12 | 38% | 0.14 | Problem |
| Q05 | 28% | -0.08 | Negative |
Point-biserial correlation between the item and the rest-of-test total, plus the classical top-27% vs bottom-27% discrimination index. Labels Strong / Good / Review / Problem / Negative.
Are wrong answers doing their job?
Per-item per-option count plus the share of top-quartile students who chose each option. Flags non-functioning distractors and possible miskeys.
Which skills are sticking?
| Skill | Items | Mean | Band |
|---|---|---|---|
| Algebra | 9 | 81% | Mastery |
| Geometry | 8 | 71% | Proficient |
| Fractions | 3 | 79% | Proficient |
| Word problems | 5 | 41% | Foundational |
Mean percent correct by skill tag with mastery bands. Skill x question-range matrix (rows = skill tags, columns = item buckets) with cells color-coded by mastery band.
Did students improve within the test?
Paired t-test between two user-selected item subsets (default first half vs second half). Cohen's d_z, plus per-respondent improved / unchanged / declined counts.
Composite per-item status
One label per item: Strong / Easy but Discriminates / Too Easy / Hard but Discriminates / Problem / Possible Miskey. Built from difficulty, point-biserial, and distractor signals.