Open almost any working survey and you will find entries that were never written as questions at all. A one-word label like Department. A code like Q1. A fragment like Start date. They function as spreadsheet column headers, and everyone reading the survey silently fills in the real question they imply. That silent filling-in is the problem. A respondent-facing item has to stand on its own as an answerable prompt, and a label does not. It only works because a human quietly translates it, and you cannot count on that translation being the same for everyone.
Two different questions, kept separate
It helps to split quality into two layers. The first is the stem: is this an answerable question, with a clear idea behind it and clear wording? The second is response fit: given a clear question, do the answer options match what it asks? The order matters. You cannot judge whether the response format fits until the question itself is interpretable. A label-only or coded or fragmentary stem fails at the first layer, which means everything downstream, the scale, the reliability, the comparison, rests on a foundation that was never really there.
What makes a stem answerable
An answerable stem does three things. It names a single, clear idea, so the respondent knows what is being asked about. It is phrased as something a person can actually respond to, not a category to be filed under. And its wording leaves little room for two people to read it two different ways. Held to that standard, Role level is not a question; How much authority do you have over your daily work? is. The first is a filing label. The second is something a respondent can answer honestly and consistently, which is the only kind of item that earns a reliability claim.
Why this decides your writing quality, not decorates it
This is not a style preference. If an item is not written as a respondent-facing, answerable prompt, it cannot be counted as strong writing, no matter how clean the rest of the survey looks. Label stems, coded stems, metadata fields, and bare fragments all fail the test, and a survey full of them is not a strong instrument with a few rough edges. It is an instrument whose questions have not been written yet. The honest move is to see that clearly before launch, not to discover it when the responses come back thin and hard to interpret.
Where the Survey Development System holds the line
ReliCheck's Survey Development System judges writing quality on exactly this basis. It separates whether each stem is an answerable question from whether the response format fits, and it will not let an item count as strong when the stem is only a label, a code, a metadata field, or a fragment. It tells you plainly how many of your items still need real question text and takes you to each one. That gate is deliberate, because a reliability number computed over unanswerable items is a number about nothing. ReliCheck is built to make sure the questions are real before the statistics pretend they are, which is what separates a survey you can defend from one that only looks finished.
ReliCheck's Survey Development System scores writing quality on whether each stem is an answerable question, flags items that are only labels or codes, and counts how many still need real question text. See it at relichecksurvey.com.