Psychological Testing

Development and Background

In addition to going through the process of standardization and normative referencing, well-constructed psychological tests are measured for 1) reliability and 2) validity.  Reliability ensures that the results obtained at time one will be obtained at time two.  Reliability is an intuitive concept and is expected by examiners and examinees alike.  In medicine for instance, blood pressure should relatively stable over time.  If the patient is calm and rested, blood pressure measurements taken a week apart should correspond.  Similarly, many personal characteristics such as intelligence and personality are relatively stable over time.  Therefore, the results of an intelligence or personality test should be relatively consistent from year to year.  There are several methods by which tests are measured for reliability:

  1. Test-Retest Reliability: This is the most straightforward measurement of reliability.  It consists of repeated measures compared across time.
  2. Split Half Reliability: This involves splitting a test into equal parts and thereafter comparing the results of each.
  3. Alternative Form Reliability: This time-consuming method of measuring reliability entails the parallel development of two measures.  Both forms are then compared for similarity. 

Once reliability has been established, validity must follow.  A valid test measures what it is supposed to measure.  Using the aforementioned example, a blood pressure cuff, if valid, will measure blood pressure.  Similarly, a depression inventory will measure depressive symptoms, an intelligence test will measure intelligence and a personality test will measure personality.  However, because many psychological variables are difficult to define, tests are difficult to validate.  For example, who is to say exactly what intelligence is.  Intelligence is a word we use often, but think about seldom.  In Plato’s Republic, Socrates reminds his contemporaries that abstract concepts like justice, though we all use them confidently, are difficult to define.  As Socrates gave thought to defining abstract terms, so do modern test developers.  Mature measures ensure adequate validity by reviewing literature, research and theory on the construct of interest.  In addition, they use multiple forms of measurement to ensure adequate validity:

  1. Content Validity: Does the test cover the pertinent content?  If it does, it has content validity. If a measure of depression is developed it should check for all symptoms contained within the American Psychiatric Association’s Diagnostic Manual.  Similarly, it should assess the depression-specific thoughts described by cognitive-behavioral theorists.  If the measure is representative of such content, it has content validity.  If the measure is missing content or emphasizes thoughts to the exclusion of symptoms, it lacks content validity.
  2. Criterion Related Validity: Criterion Related Validity is established when a test relates to some specific criteria.  The criterion in question is generally something difficult to obtain information about.  For example, average college grade point average.  In order to obtain this information you would have to send someone to college for four years and then make the necessary calculations. To avoid the time and costs, a test might be used, such as the SAT.  While the college grade point average is something that accumulates over time, SAT results, by way of comparison, can be obtained cheaply and easily.  Yet, the SAT is only as useful as its ability to predict academic success in college as measured by grade point average.  If you have doubts about the SAT’s ability to make this prediction, you have doubts about its criterion related validity. 
  3. Convergent and Discriminant or Divergent Validity: In addition to establishing other forms of validity, the well-constructed psychological test should also be able to measure the trait of interest, and not other traits. So you want an intelligence test to measure intelligence, but not education.  Similarly, a company developing a test of anxiety must demonstrate that this new test measures anxiety specifically and not depression or some other category of mental illness.
  4. Construct Validity:  Finally, there is construct validity; a form of validity needed to measure all abstract constructs.  For example, personality is one of those enduring, meaningful and integral facets of personhood that psychologists want to measure, but which is abstract.  Because it is abstract and intangible, there is ultimately no way to compare personality with the results of a personality test.  In light of this impossibility, construct validity employs multiple methods to ensure valid results.  For example, the construct validity of a new test can be established by comparing it to already established measures; by comparing the test to published theories; and by comparing the test to expert judgment. 

Dr. Steven C. Hertler
10 Sycamore Avenue
Ho Ho Kus, New Jersey 07423

Second Location
218 Lorraine Avenue
Upper Montclair, New Jersey 07043