← The Signal
Reliability Analysis

One Word, Three Reliabilities: Which One Does Your Study Actually Need?

Researchers say a measure is reliable as if that settles it. Reliability is not one thing, and reporting the wrong kind is a quiet way to answer a question nobody asked.

Reliability gets used like a single stamp of approval. In practice it is a family of related ideas, and the members answer different questions. Three come up most often. Knowing which one your study needs is the difference between evidence that speaks to your actual risk and a number that looks reassuring while missing the point.

Internal consistency: do the items agree with each other?

This is the one most people mean when they say reliability. Internal consistency, captured by Cronbach's alpha and McDonald's omega, asks whether the items on a single scale, measured at one sitting, move together as if they tap the same underlying thing. It is the right tool when your measure is a multi-item scale and your worry is that the items do not cohere. It says nothing about stability over time or agreement between people.

Test-retest: does the same person score the same way twice?

When your measure is meant to capture something stable, a trait, an attitude, an ability, you want it to give the same person a similar score on Monday and again three weeks later. Test-retest reliability checks that. A low value is a fork in the road: either the instrument is noisy, or the thing you are measuring genuinely changed between the two points. Sorting out which is part of the job, and it only matters if stability is something your construct is supposed to have.

Inter-rater: do two observers see the same thing?

Whenever a human judgment produces the score, coding open responses, rating a behavior, scoring an interview, the threat is the rater, not the items. Inter-rater reliability asks whether two people applying the same rules land in the same place. A measure can have beautiful internal consistency and still fall apart here, because consistent items do not guarantee consistent judges.

Match the reliability to the threat

The useful move is to name what could make your measure untrustworthy, then report the reliability that speaks to it. Multi-item scale? Internal consistency. Repeated measurement of something stable? Test-retest. Human raters? Inter-rater agreement. Reporting internal consistency for an observational study is not wrong so much as beside the point, because it answers a question your design never raised.

Where ReliCheck fits

ReliCheck computes internal consistency the honest way, alpha and omega together with item-level diagnostics and a dimensionality check, so your scale reliability is defensible rather than assumed, and it supports the agreement checks a rater-based design depends on. The point is not to produce a number. It is to produce the number that matches the threat to your measure, reported as evidence of consistency rather than proof of validity. Report reliability that way and it stops being a reflex and starts being an argument, which is exactly what a careful reader is looking for.

ReliCheck computes internal consistency with alpha, omega, item diagnostics, and a dimensionality read, and supports the agreement checks rater-based designs need, so you report the reliability your study actually calls for. See it at relichecksurvey.com.