Survey design guide · ReliCheck

Start with purpose

Every item should answer a single, named question. Before any words go on the page, write down what the answer would tell you and how you would act on it. If neither answer is clear, the item belongs in a parking lot, not a survey.

Constructs come first. A survey about "engagement" is not the same as a survey about "satisfaction," even if respondents would happily answer either. Decide which construct you intend to measure, define it in one sentence, and check every item against that sentence. Items that drift toward a related but different construct quietly weaken your reliability and confuse your interpretation.

Worth doing first

Write a one-sentence purpose for the whole survey, and a one-sentence definition for each scale inside it. Tape both to the wall. Every editorial decision after this point should reference them.

Item types and when to use each

Likert items

Five- or seven-point agreement or frequency scales. Use them when you need a number you can summarize, compare across groups, or feed into reliability statistics. Likert items are the workhorse of evaluation, climate, engagement, and satisfaction surveys.

Single-choice items

Pick exactly one option from a list. Use them for categories where the choices really are mutually exclusive: department, role, primary language. Avoid single-choice for items where respondents could honestly fit several categories.

Multi-select items

Pick any number of options. Useful for "select all that apply" inventories: features used, channels visited, services received. Multi-select items are not summable into a scale; treat each option as its own variable when you analyze.

Open-ended items

Free-text responses. Use them when you cannot anticipate the response space, or when the value of the answer is in the wording itself. Open-ended items are where qualitative depth lives, but they cost respondents time and analysts effort, so include only the ones you will actually read.

Mixing types

A balanced survey uses several types deliberately. Likert items quantify a construct, single-choice items segment respondents, multi-select items inventory behavior, and open-ended items surface what the closed items missed.

Question wording

Write at the reading level of the respondent, not the analyst. Plain words, short sentences, one concept per item. The quality of the data depends more on whether respondents understand the question than on how clever the analysis is afterward.

Avoid double-barreled items

An item is double-barreled when it bundles two ideas that a respondent could honestly answer differently. "The course was challenging and rewarding" is double-barreled because a course can be one and not the other. Split it into two items: one for challenge, one for reward.

Avoid leading wording

Leading items signal the answer the writer wants. "How great was the workshop?" loads the dice toward agreement. Rewrite as a neutral assessment: "How would you rate the workshop?" Or convert to a Likert scale anchored from "Poor" to "Excellent."

Avoid jargon and abstraction

Words like "synergy," "alignment," "stakeholder engagement," or "value proposition" mean different things to different respondents. Replace them with concrete behaviors or outcomes. "I knew what I was supposed to do this week" is more useful than "My role had high alignment."

Avoid double negatives

Items that contain "not" combined with a negative modifier confuse respondents. "I do not think this course is unhelpful" forces the reader to parse two negatives at once. Rewrite in the positive: "I think this course is helpful."

Avoid absolutes unless the absolute is the point

Words like "always," "never," and "every" are extreme positions that few respondents will endorse honestly. Use them only when the construct truly requires the extreme; otherwise use frequency anchors that allow gradations.

Rule of thumb

If a respondent could agree with half of an item and disagree with the other half, the item is double-barreled. Split it.

Scale anchors

Anchors are the words attached to each numeric point on a scale. Good anchors give every respondent the same mental ruler. Bad anchors leave each respondent inventing their own.

Label every point

For agreement scales, label all five (or seven) points, not just the endpoints. "Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly agree" beats endpoints-only labeling because respondents do not have to guess what the middle means.

Use balanced wording

The number of positive and negative options should match. Three positive options paired with one negative option pulls the average upward through asymmetry alone, regardless of what respondents actually think.

Prefer concrete frequency anchors

For frequency scales, "About once a week" tells respondents something specific. "Sometimes" leaves it to interpretation. Anchor frequency in observable behaviors when possible.

Decide on a midpoint with intent

A neutral midpoint ("Neither agree nor disagree") gives respondents an easy out and can mask weak attitudes. Removing the midpoint forces a direction but can frustrate respondents who genuinely have no opinion. Pick based on whether you need decisiveness or honest neutrality.

Reverse-scored items

Reverse-scored items belong only when there is a methodological reason: detecting straight-lining, balancing positive and negative wording within a scale, or matching a published instrument. They should always be flagged at analysis so reliability statistics interpret them correctly.

Question order

The order of items shapes the answers. Respondents bring their answer to the previous item with them when they read the next one.

Group by topic

Keep items about the same construct together so respondents stay in one mental frame at a time. Switching back and forth between topics increases cognitive load and noise in the data.

Watch for priming effects

An early item can lift or depress a later one. A satisfaction battery placed before a product question can raise the product score because respondents are already thinking positively. Pre-test the order, or randomize within blocks where the design allows.

Sensitive demographics last

Place demographic items, especially sensitive ones, at the end of the survey. By that point respondents have invested time and are more likely to complete sensitive items rather than abandon. Asking demographics first signals "this is about you, not the topic," which can suppress engagement.

Mind the cognitive arc

Open with easy, concrete items that build commitment. Move to substantive items in the middle, when the respondent is warmed up. Close with reflective or summary items that benefit from the context the survey has built.

Open-ended items

Free-text responses surface what closed items missed: the framing, the example, the unanticipated reaction. They cost respondents more effort, so use them sparingly and ask each one specifically.

Ask one specific thing

"What was the most useful part of the workshop?" produces codable, comparable answers. "Any feedback?" produces a mix of compliments, complaints, and tangents that are hard to summarize. Each open-ended item should target a defined slice of the experience.

Anchor to constructs

For mixed-methods designs, anchor open-ended items to the same construct as your scale items. If your Likert battery measures "instructor clarity," your open-ended item might ask "What would have made the instructor's explanations clearer?" Theme extraction can then be cross-tabulated with the quantitative score.

Limit count and length

Two to four open-ended items in a typical survey is plenty. More than that and respondents start skipping or shortening their answers. Resist the urge to add an open-ended item to every section; the data quality drops fast.

Plan the analysis up front

Decide before you launch how you will read the responses. Will you code themes manually? Use AI theme extraction? Pull verbatim quotes for a report? The analysis plan tells you whether the question is worth asking and how to phrase it.

Piloting

Always pilot. Five to ten respondents from the target population catch wording problems faster than any AI review or expert read.

What to watch for

Read every open-ended response for signs of misinterpretation. Watch completion times, both by item and overall. Check that no closed item has near-zero variance, since an item that everyone answers the same way is providing no information. Look for items that take noticeably longer than their neighbors; long response times often signal confusion.

Pilot once with the real interface

The mobile experience is not the desktop experience. Pilot on the same devices respondents will use, in the same browsers, on the same screen sizes. A page that looks fine on a 27-inch monitor can collapse a Likert grid into something unreadable on a phone.

Treat pilots as data

Pilot responses are real responses to early items. If you fix wording mid-pilot, those early respondents are now answering different items than later ones. Either re-run the pilot from the start after the fix, or label the pilot as a separate phase and exclude it from the main analysis.

Common pitfalls

Writing for the analyst, not the respondent

Items that include statistical language ("on a balanced 5-point continuum") or jargon from the field confuse respondents. Use plain language even when the analysis will be sophisticated. The analyst is the only person who needs to understand the methodology; the respondent only needs to understand the question.

Cramming demographics into closed boxes

Race, ethnicity, gender, and disability often need an "other / self-describe" option to capture respondents who do not fit the closed categories. Closed-only demographic items leave gaps in your sample frame and can erode trust.

Mixing constructs inside one scale

A "satisfaction" scale that includes items about price, quality, and customer service is really three scales pretending to be one. Internal consistency drops, the composite score loses meaning, and item-total correlations look weak even though each subset would be reliable on its own. Separate them.

Letting AI write the items

AI-generated items are a starting point, not an ending point. They tend toward generic phrasing, miss the specific context of your respondents, and sometimes invent constructs that do not exist. Use AI to brainstorm or critique, then rewrite in your own voice.

Long surveys without breaks

Respondent fatigue is real. After 10 to 15 minutes, response quality drops noticeably: more straight-lining, more skipped items, shorter open-ended answers. If your survey runs longer, break it across pages with progress indicators, or split it into a multi-wave design.

No place for "I don't know"

Forcing an answer when the respondent honestly does not know the answer creates noise. For knowledge or fact items, include a "Don't know" or "Not applicable" option. Treat those responses as missing in analysis rather than as a midpoint score.

Quality checks before launch

Run a final pass before the survey goes live. The checklist below catches the issues that most often surface in post-launch debriefs.

AI question review. Run every item through AI review for double-barreled, leading, or vague wording. Address each flag, even if you ultimately decide to keep the item as-is.
Reverse-scored items flagged. Confirm every reverse-scored item is marked in the survey settings, so reliability statistics handle them correctly at analysis time.
Required-vs-optional flags reviewed. Required items lower completion rates. Make required only what is essential for analysis; everything else should be optional.
Mobile preview. Open the survey on a phone. Check Likert grids, long open-ended fields, and any conditional logic. Anything that requires zooming or horizontal scrolling needs a fix.
Construct tags applied. Tag the survey with its construct so the post-launch report knows what to compare reliability against. Untagged surveys produce technically correct numbers without context.
Respondent time estimate. Walk through the survey yourself and time it. Multiply by 1.5 for the average respondent. If the result exceeds 12 to 15 minutes, cut.
Privacy notice. Make sure respondents know who you are, what you will do with the data, and how long you will keep it. A clear privacy notice raises completion rates and reduces support load later.

Final question

If a respondent finished your survey and asked you what you were trying to learn, could you answer in one sentence? If yes, the survey is ready. If no, go back to Start with purpose.

Start with purpose

Item types and when to use each

Likert items

Single-choice items

Multi-select items

Open-ended items

Question wording

Avoid double-barreled items

Avoid leading wording

Avoid jargon and abstraction

Avoid double negatives

Avoid absolutes unless the absolute is the point

Scale anchors

Label every point

Use balanced wording

Prefer concrete frequency anchors

Decide on a midpoint with intent

Reverse-scored items

Question order

Group by topic

Watch for priming effects

Sensitive demographics last

Mind the cognitive arc

Open-ended items

Ask one specific thing

Anchor to constructs

Limit count and length

Plan the analysis up front

Piloting

What to watch for

Pilot once with the real interface

Treat pilots as data

Common pitfalls

Writing for the analyst, not the respondent

Cramming demographics into closed boxes

Mixing constructs inside one scale

Letting AI write the items

Long surveys without breaks

No place for "I don't know"

Quality checks before launch

Put the guide into practice