Boston College Lynch School of Education

Psychometric Theory - ED669
(Spring 2000)

Psychometric Theory is that discipline which addresses the measurement and quantification of psychological phenomena (latent traits). Strictly speaking, psychological phenomena are not directly observable. Typically, they must be inferred from observations taken on some behavior that may be observed and is assumed to operationally define the unobservable characteristic that is of interest. An operational definition is most useful when it delineates boundaries of behavior and differential points between those boundaries. Ideally, a "scale" comprised of independent items is developed to measure a hypothesized unidimensional trait. Data are gathered and various statistical models are then employed to determine the extent to which the scale, or measurement instrument, functioned as intended.

Instructor:

Prof. Larry H. Ludlow
Campion Hall 336C
617-552-4221
Ludlow@bc.edu


Theme Quotes:

1. "The Reader may here observe the Force of Numbers, which can be successfully applied even to those things, which one would imagine are subject to no Rules. There are very few things which we know, which are not capable of being reduc'd to a Mathematical Reasoning; and when they cannot, it's a sign our Knowledge of them is very small and confus'd; and where a mathematical reasoning can be had, it's a great folly to make use of any other, as to grope for a thing in the dark, when you have a Candle standing by you." John Arbuthnot, 1692.
In I. Todhunter, A History of Mathematical Theory of Probability. (Macmillan,p48-51, 1865).

2. "Psychometry, it is hardly necessary to say, means the art of imposing measurement and number upon operations of the mind...".
F. Galton, Psychometric Experiments. Brain, II, 149-162, 1879.

3. "...that until the phenomena of any branch of knowledge have been subjected to measurement and number, it cannot assume the status and dignity of a science."
Galton.

4. "I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be."
Sir William Thomson, Lord Kelvin. Electrical Units of Measurement. Popular Lectures and Addresses, Vol 1 of 3. (London: Macmillan, 1889, p. 73-74)

5. "The grand, and indeed only, character of truth is its capability of enduring the test of universal experience, and coming unchanged out of every possible form of fair discussion".
Sir John Herschel.

6. "Whatever exists, exists in some amount."
E. L. Thorndike.

Ludlow's Challenge:

If it exists, it can be measured; If it can't be measured, it doesn't exist.

Ludlow, L.H. Psychometrics Lectures, Boston College, February 1996


Course Objectives

A) Introduce you to Classical Test Theory (True Score Theory), Item Response Theory, and the Rasch model (in particular); and

B) Provide an opportunity for you to develop competent, practical data analysis/consulting skills.

You will spend considerable time in the library and on the computer. [It is assumed that you will exert individual initiative in solving computing/analysis problems as they arise.]

COURSE ASSESSMENTS

You will be evaluated on the following components:

a) data analyses (Classical, Rasch)
b) literature reactions
c) measurement essay
d) Rasch presentation (Final)
e) class participation

Literature Reactions

The literature reactions (theory memos, reviews, reaction papers) will take the form of at least 1-2 pages, (greater length is acceptable but is not encouraged) typed and double-spaced. They will be handed in the first seven class meetings after the initial lecture. Their purpose is to introduce the literature to you and, in turn, your interests to me.

1) Begin the main body of your discussion with a direct quote from the article and its page number. Following the quote, write an analysis of its meaning to you. Your analysis should not be a paraphrased rendition of the quote but illustrative of your independent thinking on an interesting idea. For example, identify what may be wrong with the author's thinking on a question and suggest how the approach could be improved. Or, when your quote captures the brilliance of someone's thinking suggest ways its application may be broadened. Or, how can what we typically accept as "standard procedure" be improved by an obviously better way? Or, when you have encountered a particularly interesting topic, discuss its research potential for you or its potential for incorporation into your current employment. Or, you may wish to challenge "Ludlow's Challenge."

2) Organize the reaction papers and reviews according to the format shown below. In this form, your name and date are in the upper right hand corner and the full literature citation is in the upper left hand corner of the document.

 

Pearson, K. The Grammer of Science. Your Name
London : Adam and Charles Black, 1900. Date


" The classification of facts and the formation of absolute judgments upon the basis of this classification-judgments independent of the idiosyncrasies of the individual mind-essentially sum up the aim and method of modern science."

Page6


Now would follow your reaction to the quote.

3) Your first Reaction Paper is to answer the question "What is Measurement?" You may consult any of the materials in this syllabus. BUT, I want to know what you in your own words think constitutes measurement. Your remaining Reaction Papers will be of the form presented in steps (1) and (2) above.

4) No papers are due for the evening in which analyses are submitted.

Data Analyses

The data analyses will consist of your output from the measurement programs and a complete report stating the results. You may supply your own data or you may solicit School of Education faculty for data. A reasonable way to satisfy this course component is to analyze the same data set for each psychometric model. The report should describe the sample, the variable being measured, items of the instrument (including their number and scoring format), the psychometric model and its psychometric properties, the interpretation of whether or not the data fit the model, and what modifications (if any) would improve the instrument.

Measurement Essay

The measurement essay will integrate your literature reactions and your understanding of class discussions. This is an opportunity for you to formally summarize your understanding of the essentials of measurement. One reasonable way in which to satisfy this component is to take a single topic and focus each reaction paper on some aspect of that topic. The measurement essay would then trace the development of your research. This essay should be 5-10 pages in length (potentially longer), typed, double-spaced, and fully referenced. In your essay you may include a discussion of topics that remain confusing, or appear as potentially researchable. Potential topics might include: authentic assessment, item banking, tailored testing, computer adaptive testing, Rasch applications, latent trait model fit, standard setting, one-parameter versus three-parameter models, differential item functioning (DIF), comparisons of estimation algorithms, goodness of fit tests, etc. You might even address how, if any, your interpretation of the first reaction paper "What is measurement?" has shifted/clarified/been re-defined over the course of the semester.

Rasch Presentation

Your last data analysis will close with the Rasch model. You will provide a brief (15-20 minute) class presentation of your results.

Required Texts

Andrich, D. (1988). Rasch Models for Measurement. Newbury Park: Sage.

Crocker, L. & Algina, J. (1986). Introduction to Classical & Modern Test Theory. NY: Holt, Rinehart & Winston.

Hambleton, R.K., Swaminathan, H. & Roger, H. J. (1991). Fundamentals of Item Response Theory. Newbury Park: Sage.

Wright, B.D. & Masters, G.N. (1982). Rating Scale Analysis. Chicago: MESA Press.


PROPOSED TOPICS

1. History: Psychophysics to Psychometrics.
Principals and theoretical development.

Required Readings:

1. Chapters1 and 3 of Crocker & Algina.
2. "Forward" of Wright & Masters.
3. Ludlow, L.H. (1998). Galton: The first psychometrician?. Popular Measurement, 1, 13-14.
4. Thurstone, L.L., Psychology as a quantitative rational science. In Thurstone, L.L. The Measurement of Values. University of Chicago Press, 1959.
5. Boring, E.G. The beginning and growth of measurement in psychology. In Woolf, H. (Ed.) Quantification. Bobbs-Merrill, 196l.

Suggested Readings:

1. Thurstone, L.L., Attitudes can be measured. In Thurstone, L.L. The Measurement of Values. University of Chicago Press, 1959.
2. Boring, E.G. Gustav Theodor Fechner. In Boring, E.G. A History of Experimental Psychology (2nd ed.). Prentice-Hall, 1950.
4. Kuhn, T.S. The function of measurement in modern physical science. In Woolf, H. (Ed.) Quantification. Bobbs-Merrill, 1961.
5. Thurstone, L.L. Psychophysical analysis. In Thurstone, L.L. The Measurement of Values. University of Chicago Press, 1959.
6. Jones, L.V. The nature of measurement. In Educational Measurement (2nd ed). Thorndike, R.L. (Ed) (2nd Ed). American Council on Education, 1971.
7. Stevens, S.S. Mathematics, measurement, and psychophysics. In Stevens, S.S. (Ed). Handbook of Experimental Psychology. Wiley, 1951.
8. Galton, F. (1879). Psychometric experiments. Brain, II, 149-162.

2. Classical True Score Theory:
Theory, assumptions, applications.
SPSS reliability and factor analysis computer output intepretation of TASC data.

Required Readings:

1. Chapters 5-7, 13-14 of Crocker & Algina.
2. Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.
3. Allen, M.J. & Yen, W.M. Classical True-Score Theory (Ch. 3) in Introduction to Measurement Theory. Monterey, CA: Brooks/Cole, 1979.

Suggested Readings:

1. Traub, R.E. & Rowley, G.L. (1991). Understanding reliability. Educational Measurement: Issues and Practice, 10, 37-45.
2. Loevinger, J. (1965). Person and population as psychometric concepts. Psychological Review, 72, 143-155.
3. Loevenger, J. (1954). The attenuation paradox in test theory. Psychological Bulletin, 5, 493-504.
4. Thurstone, L.L. Psychological Implications of Factor Analysis. Psychometric Laboratory Paper #44. The University of Chicago, Sept., 1947.
5. Thurstone, L.L. Psychological Assumptions of Factor Analysis. Psychometric Laboratory Paper #51. The University of Chicago, Feb.,1949.
6. Gould, J. (1981). Chapter 6 in The Mismeasure of Man. NY: Norton.
7. Spearman, C. (1904). "General Intelligence," Objectively Determined and Measured. American Journal of Psychology, 15, 201-293.
8. Hattie, J., Jaeger, R.M. & Bond, L. (1999). Persistent methodological questions in educational testing. Review of Research in Education, 24, Chapter 11. Washington, DC: AERA.
9. Traub. R.E. (1997). Classical test theory in historical perspective. Educational Measurement: Issues and Practice, 8-14.

3. Guttman's Scale Theory:
Theory, assumptions, applications.
Interpretation of Hillock's Taxonomy of Reading Skills Hierarchy.

Required Readings:

1. Stouffer, S.A. An Overview of the Contributions to Scaling and Scale Theory. In Measurement and Prediction, Stouffer, S.A. et al., Princeton University Press, 1950.
2. Guttman, L.L. The Basis for Scalogram Analysis. In op cit.
3. Ludlow, L.H. & Hillocks, Jr., G. (1985). Psychometric Considerations in the Analysis of Reading Skill Hierarchies. Journal of Experimental Education, 54, 15-21.

4. Item Response Theory:
Basics - item and test characteristic curves, the information function, one-parameter dichotomous /rating scale/partial credit models.

Required Readings:

1. Chapter 15 of Crocker & Algina.
2. Jaeger, R.M. (1987). Two decades of revolution in educational measurement!? Educational Measurement: Issues and Practice, 6-14.
3. Ludlow, L.H. & Haley, K.C. (1999). Newton: The pinball wizard?. Popular Measurement, 2, 5-7.

Suggested Overview Readings-Past/Present/Future:

1. Bock, R.D. (1997). A brief history of item response theory. Educational Measurement: Issues and Practice, 21-32.
2. Fischer, G.H. & Molenaar, I.W. (Eds.) (1995). Rasch Models: Foundations, recent developments, and applications. NY: Springer. (see Ch 1).
3. Hambleton, R., Swaminathan, H. (1985). Item reponse theory: Principles and applications. Boston: Kluwer. (see Ch 1 and 2)
4. Hambleton, R. (1989). Principles and selected applications of item response theory. In Linn, R.L. (Ed.). Educational Measurement. (3rd ed). NCME, AERA: McMillan.
5. Mislevy, R.J. (1987). Recent developments in item response theory with implications for teacher certification. In Review of Research in Education, Rothkopf, E.F. (Ed.) Vol. 14, Washington: AERA.
6. Mislevy, R.L. (1996). Test theory reconceived. Journal of Educational Measurement, 379-416.
7. Reckase, M.D. The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25-36.
8. Wainer, H. (1989). The future of item analysis. Journal of Educational Measurement, 26, 191-208.
9. Van der Linden, W. & Hambleton, R. (Eds.). (1997). Handbook of modern item response theory. NY: Springer. (see Ch 1).

General Measurement Articles:

1. Andrich, D. (1989). Distinctions between assumptions and requirements in measurement in the social sciences. In J. A. Keats, R. Taft, R. A. Heath & S. H. Lovibond (Eds.), Mathematical and Theoretical Systems: Proceedings of the 24th International Congress of Psychology of the International Union of Psychological Science, Vol. 4 (pp. 7-16). North-Holland: Elsevier Science Publishers.
2. Andrich, D. (1996). Measurement criteria for choosing among models with graded responses. In Categorical variables in developmental research: Methods of analysis (pp. 3-35). Academic Press, Inc.
3. Fisher, W. P., Jr. (1994). The Rasch debate: Validity and revolution in educational measurement. In M. Wilson (Ed.), Objective measurement: Theory into practice. Vol. II (pp. 36-72). Norwood, New Jersey: Ablex Publishing Corporation.
4. Michell, J. (1986). Measurement scales and statistics: A clash of paradigms. Psychological Bulletin, 100, 398-407.
5. Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355-383.
6. Wright, B. D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288.
7. Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know. Hillsdale, NJ: LEA.

5. The Rasch Model:
Purpose, assumptions, estimation procedures, item and person fit, residual analysis, applications.
Computer output interpretation of TASC and TAMP data sets.

Required Readings:

1. Chapters 1-5 of Wright & Masters.
2. Wright, D.D. (1980). "Foreward", and "Afterward". In Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests. University of Chicago Press.
3. Ludlow, L.H. & Haley, S.M. (1995). Rasch model logits: Interpretation, use, and transformation. Educational and Psychological Measurement, 55, 967-975.
4. Ludlow, L.H. & O'Leary, M. (1999). Omitted and not reached items: Practical data analysis implications. Educational and Psychological Measurement, 59, 615-630.

Suggested Readings:

1. Wright, B.D. (1967). Sample-free test calibration and person measurement. In Proceedings of the 1967 Invitational Conference on Testing Problems. Princeton: Educational Testing Service, 85-101.
2. Whitely, S.E. & Davis, R.V. (1974). The nature of objectivity with the Rasch model. Journal of Educational Measurement, 163-178.
3. Andrich, D. (1978). Relationships between the Thurstone and Rasch Approaches to Item Scaling. Applied Psychological Measurement, 449-460.
4. Englehard, G. (1984). Thorndike, Thurstone, and Rasch: A Comparison of their methods of scaling psychological and educational tests. Applied Psychological Measurement, 21-38.
5. Hambleton, R. Principles and selected applications of item response theory. Chapter 4. In Educational Measurement (3rd ed). Linn, R. (Ed). NY: Macmillan.
6. Brink. N. (1972). Rasch's logistic model vs. The Guttman model. Educational and Psychological Measurement, 32, 921-927.
7. Hambleton, R. & Jones, R. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their application to test development. Educational Measurement: Issues and Practice, 38-47.
8. Gable, R., Ludlow, L., Wolf, M. (1990). The use classical and Rasch latent trait models to enhance the validity of affective measures. Educational and Psychological Measurement, 50, 869-878.
9. McNamara, T. (1996). Raters and ratings: Introduction to multi-faceted measurement. Concepts and procedures in Rasch measurement. Ch 5 & 6 in Measuring Second Language Performance. London: Longman.

Two related articles:

1. Leonard, M. (1980). Rasch promises: A layman's guide to the Rasch method of item analysis. Educational Researcher, 22, 188-192.
2. Willmont, A. (1980). What does Rasch promise? A reply to Rasch promises by Martin Leonard. Educational Researcher, 22, 193-197.

Five related articles:

1. Divgi, D.R. (1986). Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23, 283-298.
2. Henning, G.(1989). Does the Rasch model work for multiple-choice items? Take another look: A response to Divgi. Journal of Educational Measurement, 26, 91-97.
3. Andrich, D. (1989). Statistical reasoning in psychometric models and educational measurement. Journal of Educational Measurement, 26, 81-90.
4. Goldstein, H. (1979). Consequences of using the Rasch model for educational assessment. British Educational Research Journal, 5, 211-220.
5. Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematical and Statistical Psychology, 33, 234-246.

Two general other-discipline articles:

1. Alphen A., Halfens, R., Hasman, A., & Imbos, T. (1994). Likert or Rasch? Nothing is more applicable than good theory. Journal of Advanced Nursing, 20, 196-201.
2. Spray, J. (1987). Recent developments in measurement and possible applications to the measurement of psychomotor behavior. Research Quarterly for Exercise and Sport, 58, 203-209.

Related Books:

1. Fischer, G.H. & Molenaar, I.W. Rasch Models: Foundations, Recent Developments, and Applications. NY: Springer, 1995.
2.Wilson, M. (ed). Objective Measurement: Theory Into Practice. Volume 1-4. Norwood, NJ: Ablex, 1992-1997.
3. Wright, B.D. & Stone, M.H. Best Test Design. Chicago: MESA Press, 1979.

Other:

Any issue of Rasch Measurement: Transactions of the Rasch Measurement Special Interest Group. (see me for their location)

Variable Development and Application Examples:

1. Hillocks, Jr. G. & Ludlow, L.H. (1984). A taxonomy of skills in reading and interpreting fiction. American Educational Research Journal, 7-24.
2. Ludlow, L.H. (1985). A strategy for the graphical representation of Rasch model residuals. Educational and Psychological Measurement, 45, 851-860.
3. Ludlow, L.H. (1986). Graphical analysis of item response theory residuals. Applied Psychological Measurement, 10, 217-229.
4. Ludlow, L.H. & Hwang, R. (1990). Evaluating district-level performance relative to the system. Educational Research Quarterly, 14, 29-37.
5. Ludlow, L.H. & Guida, F.V. (1992). The Test Anxiety Scale for Children as a Measure of academic anxiety. Educational and Psychological Measurement, 51, 1013-1021.
6. Ludlow, L.H. & Lunz, M. (1998). The Job Responsibilities Scale: Invariance in a longitudinal prospective study. Journal of Outcome Measurement, 2, 326-337.
7. Ludlow, L.H. (1998). Scale invariance from a three-dimensional graphical perspective: Visualizing an eigenvector.Educational and Psychological Measurement, 58, 166-178.
8. Ludlow, L.H. (1999). The structure of the Job Responsibilities Scale: A multi-method analysis. Educational and Psychological Measurement, 59, 962-975.
9. Coster, W.J., Mancini, M.C. & Ludlow, L.H. (1999). Factor structure of the School Function Assessment. Educational and Psychological Measurement, 59, 665-677.
10. Coster, W., Ludlow, L.H. & Mancini,M. (1999). Using IRT variable maps to enrich understanding of rehabilitation data. Journal of Outcome Measurement, 3, 123-133.

TAMP/PEDI Projects:

1. Gans & Haley, et al. (1988). Description and interobserver reliability of the Tufts Assessment of Motor Performance. American Journal of Physical Medicine and Rehabilitation, 2, 202-210.
2. Haley & Ludlow, et al. (1991). Tufts Assessment of Motor Performance: An empirical approach to identifying motor performance categories, Archives of Physical Medicine and Rehabilitation, 72, 359-366.
3. Ludlow & Haley. (1991). Polytomous Rasch models for behavioral assessment: The Tufts Assessment of Motor Performance. In Objective Measurement, Vol. 1, Wilson, M. (Ed.) Ablex.
4. Ludlow, Haley & Gans. (1992). A hierarchical model of functional performance in rehabilitation medicine: The Tufts Assessment of Motor Performance. Evaluation and the Health Professions, 15, 59-74.
5. Haley & Ludlow. (1992). Applicability of the hierarchical scales of the Tufts Assessment of Motor Performance for school-aged children and adults with disabilities. Physical Therapy, 72, 191-206.
6. Fisher, A.G., Bryze, K.A., Granger, C.V., Haley, S.M., Hamilton, B.B., Heineman, A.W., Puderbaugh, J.K., Linacre, J.M., Ludlow, L.H., McCabe, M.A. & Wright, B.D. (1994). Applications of conjoint measurement to the development of functional assessment. International Journal of Educational Research, 21, 579-593.
7. Haley, S.M., Ludlow, L.H. & Coster, W.J. (1993). Pediatric Evaluation of Disability Inventory: Clinical Interpretation of summary scores using Rasch rating scale methodology. Physical Medicine and Rehabilitation Clinics of North America: New Developments in Functional Assessment, 4, 529-540.
8. Ludlow, L.H. & Haley, S.H. (1996). Effect of context in rating of mobility activities in children with disabilities. Educational and Psychological Measurement, 56, 122-129.

6. Operation of Psychometric computer programs:
SCALE, WINSTEPS, RUMM, PARSCALE, BILOG-MG.

7.The Two-and Three-parameter IRT Models:
Purpose, assumptions, estimation, model fit, applications.

· Baker, F.B. (1992). Item Response Theory: Parameter Estimation Techniques. NY: Marcel Dekker.
· Hambleton, R.K. (Ed) (1983). Applications of Item Response Theory. Vancouver, BC: Educational Research Institute of British Columbia.
· Hambleton, R.K. & Swaminathan, H. (1985). Item Response Theory: Principles and Applications, Boston: Nijhoff.
· Hambleton, R.K., Swaminathan, H. & Rogers, J. (1991). Fundamentals of Item Response Theory. Sage.
· Harris, D. (1989). Comparison of 1-,2-, and 3-parameter IRT models. Educational Measurement: Issues and Practice. NCME Instructional Module, Spring, 35-41.
· Hulin, C.L., Drasgow, F. & Parsons, C.K. (1983). Item Response Theory: Application to Psychological Measurement. Homewood, IL: Dow Jones-Irwin.
· Lord, F.M. (1983). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Erlbaum.
· Van der Linden, W. & Hambleton, R.K. (1996). Handbook of Modern Item Response Theory. NY: Springer.
· Wainer, H. & Messick, S. (1983). Principals of Modern Psychological Measurement. Hillsdale, NJ: Erlbaum.

8. Technical Applications of IRT:
Item banking, adaptive testing, item and test bias, equating, test construction, differential item functioning (DIF), scale anchoring, cut-scores, plausible values.

Differential Item Functioning:

· Berk, R.A. (Ed.) (1982). Handbook of Methods for Detecting Test Bias. Baltimore, MD: Johns Hopkins University Press.
· Holland, P.W. & Wainer. H. (1993). Differential Item Functioning. Hillsdale, NJ: Erlbaum.
· Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Computerized Adapative Testing:

· Sands, W.A., Waters, B.K. & McBride, J.R. (Eds). (1997). Computerized Adaptive Testing: From Inquiry to Operation. Washington, DC: APA.
· Wainer, H., Dorans, N.J., Flauger, R., Green, B.F., Mislevy, R., Steinberg, L. & Thissen, D. (1990). Computerized Adaptive Testing: A Primer. Hillsdale, NJ: Erlbaum.

Equating:

· Angoff, W.H. (1984). Scales, Norms, and Equivalent Scores. Princeton: ETS.
· Holland, P.W. & Rubin, D.B. (1982). Test Equating. NY: Academic Press.
· Kolen, M.J. & Brennan, R.L. (1995). Test Equating: Methods and Practices. NY: Springer.
· Linn, R.L. & Kiplinger, V.L. (1995). Linking statewide tests to the NAEP: Stability of results. Applied Measurement in Education, 8, 135-155.
· Mislevy, R.J., Sheehan, K.M. & Wingersky, M. (1993). How to equate tests with Little or no data. Journal of Educational Measurement, 30, 55-78.

Cut-scores:

· Berk, R.A. (1986). A consumer's guide to setting performance standards on criterion-referenced tests. Review of Educational Research, 56, 137-172.
· Glass, G.V. (1978). Standards and criteria. Journal of Educational Measurement, 15, 237-261.
· Jaeger, R.M. (1989). Certification of Student Competence. In R.L.Linn (Ed.), Educational measurement (3rd ed., pp 485-514). New York: American Council on Education and Macmillan.
· Kane, M. (1994). Validating the performance standards associated with cutscores. Review of Educational Research, 64, 425-461.


Thurstone Bibliography

General References on Science and Measurement


Rasch Bibliography
(in my files)


Wright Bibliography

Guttman Bibliography
(in my files)


Relevant Dissertations
(Chapters with Rasch model descriptions)


Special Edition Journals

Proceedings:



Relevant Journals


Web Sites

Finally, check these interesting IRT-related web sites:

http://www.rasch.org/
http://quarles.unbc.ca/psyc/itc/index.html

I have not checked this next site yet but it sounds interesting:

e-PSYCHOMETRICS, a user-friendly on-screen book, which provides useful internet resources for measurement theory. The main topics include introduction to measurement theory; introduction to reliability and validity; introduction to classical test theory and the corresponding procedures for estimating reliability and validity; introduction to item response theory including Rasch models, computerized adaptive testing (CAT), differential item functioning (DIF) and test equating. Other topics include online documentation and abstracts, books and journals, mail-servers, and professional organizations.

The address is: http://go.to/EricWong

Please note that it will take several minutes to download the webbook. Kindly send me comments and suggestions. Eric WONG


Classical True-Score Theory Assignment
(Spring 2000-100 points)

Upon your data set, use SPSS procedures to perform a classical true-score theory (CTT) item analysis. Provide an answer to all of the following questions. An outline format is preferable. There is no need to try to write the assignment as a mini-publication at this point.

1. Instrument and sample:

Explain the purpose of your measurement instrument. What does the instrument purport to measure? Who developed it (wrote the items)? How many items are included? What is the scoring format? How many response options are provided? Is it a speeded test? How long does it take to answer? Is it a standardized or non-standardized instrument? Is it primarily for norm-referenced or criterion-referenced purposes?
Where did your sample come from? Who collected the data? How many subjects are there? Are they a subset of a larger study and, if so, briefly explain why they were specifically chosen. Are there any special characteristics about them? What is the population to whom they are generaizable?

2. Measurement model:

Explain the statistical form of the true score model (present the relevant equations and explain them). What are its primary assumptions (present the equations and explain them)? Do they appear reasonably well met for your data?
For your data, show how the following were computed (what equations led to the statistics): item difficulty (for dichotomous data), discrimination (corrected item-total correlation), reliability (for internal consistency), and standard error of measurement (based on the internal consistency estimate). Explain the various components of the equations. Why is the item-total correlation corrected?
Explain the general purpose of a common factor analysis when it is applied to items of a test. Briefly explain what you think an eigenvalue is and explain what the factor loadings are. What is the purpose of the scree plot and a varimax rotation? What general procedures are normally conducted in order to determine the appropriateness of factoring a correlation matrix?

3. Analysis:

Discuss the distributional characteristics of your item difficulties and person total scores, e.g, are they as intended, are they surprising? Discuss whether your discrimination estimates are reasonable or not. Are there any particular items with statistical problems (what are the statistical problems)? What might have caused the problems, if there are any? Should any items be removed or revised? Interpret the Cronbach alpha you obtained.
Discuss the results of your initial factor analysis and how you subsequently decided on the number of final factors to retain. What percent of variance was extracted by those factors and what is your opinion of the magnitude of the percent that was accounted for? Was the rotated and plotted final solution interpretable (just plot the first two factors)? What verbal labels did you apply to "name" the factors (and explain why you applied those names)? Was your solution expected or surprising (did you have any idea about what might result from the factor analysis)? What is the reliability of each of the final factors in your solution? How many scores for each tested person would you recommend should be reported?

4. Submit your write-up and output. A useful way to write your analysis is to cut and paste into it the appropriate tables/graphs/figures that are output by SPSS rather than referring the reader to the pages of your output. (NOTE: pay attention to typo's and notation errors.)


Item Response Theory Assignment
(Spring 2000: 100 points)

Upon your data set, use SCALE/ WINSTEPS/ RUMM/ PARSCALE to perform an item response theory analysis.

1. Instrument and sample:

Explain the purpose of your measurement instrument. Who developed it? How many items are included? What is the scoring format? How many response options are provided? Is it a speeded test? How long does it take to answer? Where did your sample come from? How many subjects are there? Are there any special characteristics about them? Basically, I want you to remind me of the characteristics of the data used for the classical analysis.
For your data, what is the "variable" that is being measured? That is, what is the hypothesized structure that is to be tested by the Rasch model?

2. Measurement model details:

Explain the statistical components of the Rasch model. Why is it called a one-parameter model when clearly there is a parameter for both persons and items? What are the primary assumptions of the model? Do they appear reasonably well met for your data?
Explain how the initial PROX person ability estimates and item difficulty estimates are computed. Why are persons and items with perfect correct or zero scores removed from analysis? What does the term "sufficient statistic" refer to? Explain (in your own words) what person and item "logits" are. How is the "expected" value for a person on any item computed?
Explain the difference the and "UCON" estimation procedure. What does the term "likelihood" refer to? Explain, in general basic terms, how the Newton-Raphson algorithm operates. What is its function?
How are person and item weighted fit statistics computed? Explain how person and item positive and negative fit statistics may be interpreted. What might be done if an item or person is considered to misfit the model?

3. Analysis:

Discuss the initial distributional characteristics of your item difficulties (does it appear to be a relatively easy or hard instrument) and person abilities (do they appear relatively capable or not). Are these findings as intended? For your data, what do "difficulty" and "ability" translate into?
Are there any particular persons with statistical problems? What might have caused them if there are? Are there any particular items with statistical problems? What might have caused them is there are? (How are you defining a "problem" and what have you done to try to locate their source?)
Explain what the "variable map" is and what it reveals about your data. Was your solution expected or surprising? What modifications, if any, would you suggest if the instrument were to be revised and re-administered?
Finally, compare and contrast the Rasch results to your previous classical analysis results. For example, is there any additional insight you have gained about your data? In addition, how does the standard error of measurement associated with a person's performance differ between the two models?

4. Submit your write-up and output. A useful way to write your analysis is to cut and paste into it the appropriate tables/graphs/figures that are output by the software rather than referring the reader to the pages of your output. (NOTE: pay attention to typo's and notation errors.)