Steps in Test Construction
1. Choose the purpose of the instrument
Is it a screener?
What will people use the test for?
What decisions are you trying to make?
2. Identify the construct(s)
Review the literature
3. Construct initial item pool
Try to develop 3 times as many items as you wish to have on the on the test.
Develop items based on the purpose of the test and based on who will take the test.
Take time now to develop "good" items -- try to cover the entire domain of the construct, but do not write items that are likely to "load" on other constructs.
Make items as simple to read as possible.
Reverse code some items.
4. Review, Revise and Tryout
Conduct an expert review
Conduct a bias review
Administer the items to a few individuals who are representative of potential test takers and obtain their feedback (if possible).
5. Alpha study
Administer the items to a sample of subjects who are representative of potential test users.
Do item analysis & choose final items (criteria include coefficient alpha, domain coverage, difficulty level, endorsement rates).
6. Beta study
Reliability studies (test-retest, alternate forms, internal consistency).
Validity studies (known groups, convergent & discriminant, predictive, factor analysis).
Collect normative data from a representative sample.
Steps in Constructing the Employee Assistance Program Inventory
1. Choose the purpose of the instrument.
Intake/screening tool for professionals in EAPs who provide counseling to working adults.
Purpose: to identify rapidly common psychological problems of employed adults to guide referrals or short-term interventions.
2. Identify the construct(s).
Based on the authors' experience and the literature, 50 possible assessment areas were identified
Survey listing the 50 areas was sent to 200 professionals randomly selected from EAPA database
Survey asked to choose 10 areas to best meed needs for initial screening
See Table 7
Chose 10 content areas
See Table 8
3. Construct initial item pool.
Literature review conducted to identify the behavioral expression of each content area
Initial item pool: 344
4. Review, revise, & tryout
5-member bias panel (gender, ethnic background, religious belief)
10 items were identified as having potential bias (4 were deleted and 6 were rewritten)
5-member expert panel of PhD psychologists working in EAPs for scoring
Criterion: 4/5 agreement; panel failed to agree on 34 of 344 items
33 items deleted; one was rewritten (6 of these items were also considered problematic by the bias committee)
5. Alpha study
Remaining 307 items administered to 215 employed adults
Items were eliminated based on
a. all scales would contain an equal number of items
b. each scale would be as internally consistent as possible while still providing comprehensive sampling of the content area
c. when items were similar in terms of item characteristics, items that provided the broadest domain coverage would be retained and items with significant relations with gender or ethnic group would be eliminated.
Final items: 120
Alpha ranged from .73 to .92 (M = .86)
6. Beta study: detailed results.
Tips on Attitude Scale Construction
Proper sampling is not easy to achieve. It is not easy to show that proper sampling has been achieved, either (although techniques developed for showing "content validity" might be brought to bear). (Look up Lawshe if you want to start content validity readings.)
Researchers investigating the "authoritarian personality" lifted key phrases from people's writings, discussions, and interviews. In job satisfaction, representative samples of people were asked to respond to open ended questions such as "What things do you like best about your job?" Bruce Rather (USF alum), in developing an alcohol expectancy scale, asked lots of people to write a response to the following: "Drinking alcohol makes me ..."
Where extensive prior work has been done, you can sample items from factors uncovered by factor analysis. There will be some decisions made as to how many items should be included for each factor; the important first step is to be sure that the waterfront has been covered. In marketing, they often use a technique called a focus group, in which several people are brought together to talk about the researcher's object of interest.
1. As my advisor used to say, "keep it simple, stupid."
Use simple words that everybody knows. (Utilize vocabulary sufficiently prevalent that the target population experiences little ambiguity). Eschew obfuscation.
give up abrogate
2. Keep it short.
for [the purpose of ]
[in order] to
[and things of such nature]
[at this point in time]
3. Avoid double-barreled items:
"The government should provide low-cost medical care because too many people are in poor health and doctors charge so much money."
4. Avoid absolutes: always, never, everybody, nobody:
"Sunday school is always fun for everyone."
5. Avoid questions that require little known facts:
"The government should provide for no more medical care than what is implied in the Constitution."
6. Avoid "not" and "un."
I do not like class. Rewrite to: I hate class.
7. Minor changes in item wording can have a big effect on item responses. (You can't always tell what these will be before collecting data). The following 2 items have very different response characteristics:
1. It is the man who starts off bravely on his own who excites our admiration.
2. We should admire a man who starts out bravely on his own.
Response sets, Acquiescence, yea & nay saying
Some people endorse most anything in an attitude scale. Others seem to disagree about most any attitude. "Oh, yeah?" Therefore, switch between positive & negative statements.
Some people respond in such a way as to look good, or to gain approval, even when there is a good chance the answers to questions will remain anonymous. Crowne-Marlow social desirability scale can serve as a control.
Sample question: "Before voting I thoroughly investigate the qualifications of all of the candidates."