Start with purpose
Every item should answer a single, named question. Before any words go on the page, write down what the answer would tell you and how you would act on it. If neither answer is clear, the item belongs in a parking lot, not a survey.
Constructs come first. A survey about "engagement" is not the same as a survey about "satisfaction," even if respondents would happily answer either. Decide which construct you intend to measure, define it in one sentence, and check every item against that sentence. Items that drift toward a related but different construct quietly weaken your reliability and confuse your interpretation.
Write a one-sentence purpose for the whole survey, and a one-sentence definition for each scale inside it. Tape both to the wall. Every editorial decision after this point should reference them.
Item types and when to use each
Likert items
Five- or seven-point agreement or frequency scales. Use them when you need a number you can summarize, compare across groups, or feed into reliability statistics. Likert items are the workhorse of evaluation, climate, engagement, and satisfaction surveys.
Single-choice items
Pick exactly one option from a list. Use them for categories where the choices really are mutually exclusive: department, role, primary language. Avoid single-choice for items where respondents could honestly fit several categories.
Multi-select items
Pick any number of options. Useful for "select all that apply" inventories: features used, channels visited, services received. Multi-select items are not summable into a scale; treat each option as its own variable when you analyze.
Open-ended items
Free-text responses. Use them when you cannot anticipate the response space, or when the value of the answer is in the wording itself. Open-ended items are where qualitative depth lives, but they cost respondents time and analysts effort, so include only the ones you will actually read.
A balanced survey uses several types deliberately. Likert items quantify a construct, single-choice items segment respondents, multi-select items inventory behavior, and open-ended items surface what the closed items missed.
Question wording
Write at the reading level of the respondent, not the analyst. Plain words, short sentences, one concept per item. The quality of the data depends more on whether respondents understand the question than on how clever the analysis is afterward.
Avoid double-barreled items
An item is double-barreled when it bundles two ideas that a respondent could honestly answer differently. "The course was challenging and rewarding" is double-barreled because a course can be one and not the other. Split it into two items: one for challenge, one for reward.
Avoid leading wording
Leading items signal the answer the writer wants. "How great was the workshop?" loads the dice toward agreement. Rewrite as a neutral assessment: "How would you rate the workshop?" Or convert to a Likert scale anchored from "Poor" to "Excellent."
Avoid jargon and abstraction
Words like "synergy," "alignment," "stakeholder engagement," or "value proposition" mean different things to different respondents. Replace them with concrete behaviors or outcomes. "I knew what I was supposed to do this week" is more useful than "My role had high alignment."
Avoid double negatives
Items that contain "not" combined with a negative modifier confuse respondents. "I do not think this course is unhelpful" forces the reader to parse two negatives at once. Rewrite in the positive: "I think this course is helpful."
Avoid absolutes unless the absolute is the point
Words like "always," "never," and "every" are extreme positions that few respondents will endorse honestly. Use them only when the construct truly requires the extreme; otherwise use frequency anchors that allow gradations.
If a respondent could agree with half of an item and disagree with the other half, the item is double-barreled. Split it.
Scale anchors
Anchors are the words attached to each numeric point on a scale. Good anchors give every respondent the same mental ruler. Bad anchors leave each respondent inventing their own.
Label every point
For agreement scales, label all five (or seven) points, not just the endpoints. "Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly agree" beats endpoints-only labeling because respondents do not have to guess what the middle means.
Use balanced wording
The number of positive and negative options should match. Three positive options paired with one negative option pulls the average upward through asymmetry alone, regardless of what respondents actually think.
Prefer concrete frequency anchors
For frequency scales, "About once a week" tells respondents something specific. "Sometimes" leaves it to interpretation. Anchor frequency in observable behaviors when possible.
Decide on a midpoint with intent
A neutral midpoint ("Neither agree nor disagree") gives respondents an easy out and can mask weak attitudes. Removing the midpoint forces a direction but can frustrate respondents who genuinely have no opinion. Pick based on whether you need decisiveness or honest neutrality.
Reverse-scored items
Reverse-scored items belong only when there is a methodological reason: detecting straight-lining, balancing positive and negative wording within a scale, or matching a published instrument. They should always be flagged at analysis so reliability statistics interpret them correctly.
Question order
The order of items shapes the answers. Respondents bring their answer to the previous item with them when they read the next one.
Group by topic
Keep items about the same construct together so respondents stay in one mental frame at a time. Switching back and forth between topics increases cognitive load and noise in the data.
Watch for priming effects
An early item can lift or depress a later one. A satisfaction battery placed before a product question can raise the product score because respondents are already thinking positively. Pre-test the order, or randomize within blocks where the design allows.
Sensitive demographics last
Place demographic items, especially sensitive ones, at the end of the survey. By that point respondents have invested time and are more likely to complete sensitive items rather than abandon. Asking demographics first signals "this is about you, not the topic," which can suppress engagement.
Mind the cognitive arc
Open with easy, concrete items that build commitment. Move to substantive items in the middle, when the respondent is warmed up. Close with reflective or summary items that benefit from the context the survey has built.
Open-ended items
Free-text responses surface what closed items missed: the framing, the example, the unanticipated reaction. They cost respondents more effort, so use them sparingly and ask each one specifically.
Ask one specific thing
"What was the most useful part of the workshop?" produces codable, comparable answers. "Any feedback?" produces a mix of compliments, complaints, and tangents that are hard to summarize. Each open-ended item should target a defined slice of the experience.
Anchor to constructs
For mixed-methods designs, anchor open-ended items to the same construct as your scale items. If your Likert battery measures "instructor clarity," your open-ended item might ask "What would have made the instructor's explanations clearer?" Theme extraction can then be cross-tabulated with the quantitative score.
Limit count and length
Two to four open-ended items in a typical survey is plenty. More than that and respondents start skipping or shortening their answers. Resist the urge to add an open-ended item to every section; the data quality drops fast.
Plan the analysis up front
Decide before you launch how you will read the responses. Will you code themes manually? Use AI theme extraction? Pull verbatim quotes for a report? The analysis plan tells you whether the question is worth asking and how to phrase it.
Piloting
Always pilot. Five to ten respondents from the target population catch wording problems faster than any AI review or expert read.
What to watch for
Read every open-ended response for signs of misinterpretation. Watch completion times, both by item and overall. Check that no closed item has near-zero variance, since an item that everyone answers the same way is providing no information. Look for items that take noticeably longer than their neighbors; long response times often signal confusion.
Pilot once with the real interface
The mobile experience is not the desktop experience. Pilot on the same devices respondents will use, in the same browsers, on the same screen sizes. A page that looks fine on a 27-inch monitor can collapse a Likert grid into something unreadable on a phone.
Treat pilots as data
Pilot responses are real responses to early items. If you fix wording mid-pilot, those early respondents are now answering different items than later ones. Either re-run the pilot from the start after the fix, or label the pilot as a separate phase and exclude it from the main analysis.
Common pitfalls
Writing for the analyst, not the respondent
Items that include statistical language ("on a balanced 5-point continuum") or jargon from the field confuse respondents. Use plain language even when the analysis will be sophisticated. The analyst is the only person who needs to understand the methodology; the respondent only needs to understand the question.
Cramming demographics into closed boxes
Race, ethnicity, gender, and disability often need an "other / self-describe" option to capture respondents who do not fit the closed categories. Closed-only demographic items leave gaps in your sample frame and can erode trust.
Mixing constructs inside one scale
A "satisfaction" scale that includes items about price, quality, and customer service is really three scales pretending to be one. Internal consistency drops, the composite score loses meaning, and item-total correlations look weak even though each subset would be reliable on its own. Separate them.
Letting AI write the items
AI-generated items are a starting point, not an ending point. They tend toward generic phrasing, miss the specific context of your respondents, and sometimes invent constructs that do not exist. Use AI to brainstorm or critique, then rewrite in your own voice.
Long surveys without breaks
Respondent fatigue is real. After 10 to 15 minutes, response quality drops noticeably: more straight-lining, more skipped items, shorter open-ended answers. If your survey runs longer, break it across pages with progress indicators, or split it into a multi-wave design.
No place for "I don't know"
Forcing an answer when the respondent honestly does not know the answer creates noise. For knowledge or fact items, include a "Don't know" or "Not applicable" option. Treat those responses as missing in analysis rather than as a midpoint score.
Quality checks before launch
Run a final pass before the survey goes live. The checklist below catches the issues that most often surface in post-launch debriefs.
- AI question review. Run every item through AI review for double-barreled, leading, or vague wording. Address each flag, even if you ultimately decide to keep the item as-is.
- Reverse-scored items flagged. Confirm every reverse-scored item is marked in the survey settings, so reliability statistics handle them correctly at analysis time.
- Required-vs-optional flags reviewed. Required items lower completion rates. Make required only what is essential for analysis; everything else should be optional.
- Mobile preview. Open the survey on a phone. Check Likert grids, long open-ended fields, and any conditional logic. Anything that requires zooming or horizontal scrolling needs a fix.
- Construct tags applied. Tag the survey with its construct so the post-launch report knows what to compare reliability against. Untagged surveys produce technically correct numbers without context.
- Respondent time estimate. Walk through the survey yourself and time it. Multiply by 1.5 for the average respondent. If the result exceeds 12 to 15 minutes, cut.
- Privacy notice. Make sure respondents know who you are, what you will do with the data, and how long you will keep it. A clear privacy notice raises completion rates and reduces support load later.
If a respondent finished your survey and asked you what you were trying to learn, could you answer in one sentence? If yes, the survey is ready. If no, go back to Start with purpose.