Norms Referenced Evaluation Essay

Why are norm-referenced scores important?

Different groups of test takers have very different performance levels and therefore their scores differ quite a bit on scientifically developed measures of cognitive ability.  It is important to understand how your group of test takers compares to selected external comparison groups, for example, the population of comparable regional or national peer groups.

Norm-referenced scores bring value to your data.  Test scores without comparison percentiles are essentially one dimensional; while you can compare individual test scores to other members of your group, you are not able to determine the relative strength of the group in comparison to others in the population

Norm-referenced scoring brings dimensionality to your statistics. The true value of your data is revealed when you are able to relate the scores of your individual test-takers or entire group to appropriate comparison groups. Mean scores from different institutions vary widely, depending on the institution or agency. The comparison of your individual and group scores to a similar national or regional sample of test-takers will give you a way to judge the relative strength of your test takers and your programs. For example, if your group of test-takers’ mean Overall score is at the 50th percentile in relationship to a group of similar test-takers, then you would know that your group is performing as well, on average, as the national population of students in their group.

Reviewers, such as accrediting bodies, recognize the benefits of external benchmarking and often require this evidence as an indicator of organizational effectiveness or as documentation for self-studies.

Researchers use norm-based references to analyze long term performance progress against learning outcome goals or to assess effectiveness of educational programs.

When you implement a testing program, it is important that you gain the full value of your data by selecting the best test instrument with the strongest norm sets. Insight Assessment offers validated, objective assessments of critical thinking skills and mindset for students at all levels from K-12 through graduate professional programs.

Insight Assessment’s comprehensive comparison percentile norms give our clients a powerful tool to evaluate the relative strength of the performance of their test-takers and to guide the creation of learning outcomes goals. Our critical thinking skills test reports provide individual and group analytics on key components of critical thinking strength that can be benchmarked against a variety of external comparison groups.  

Contact us to learn how you can maximize your testing program data.

To learn more:

This article focuses on norm-referenced testing. Norm-referenced tests are assessments administered to students to determine how well they perform in comparison to other students taking the same assessment. The article describes how norm-referenced tests are created and developed, and gives examples of current widely used tests. The differences between norm-referenced tests and criterion-referenced tests are compared, as well as some advantages and disadvantages of norm-referenced testing. Scoring systems and techniques are also explained.

Keywords Content Validity; Criterion-Referenced Test; High-Stakes Tests; Educational Assessment; Norm-Referenced Test; Normative Sample; Norming; Percentile; Percentile Rank; Sampling Error; SAT Test; Standardized Tests; Test Bias


Norm-referenced tests are assessments administered to students to determine how well they perform in comparison to other students taking the same assessment. Each student's current performance can be compared to that of a representative sample of students, which is known as a norm group or normative sample, who have previously taken the test. Norm-referenced tests differ from criterion-referenced tests in that criterion-referenced tests help show how a student stands in relation to a particular educational curriculum, with an emphasis not on comparing students with others taking the assessment but on whether a student has mastered specific skills that have been taught (Monetti & Hinkle, 2003). Very few, if any, students are expected to attain a perfect score on a norm-referenced test, and students are generally not encouraged to study for norm-referenced tests because they are intended to measure a broad range of general knowledge already attained in reading, language arts, mathematics, science and social studies (Miller-Whitehead, 2001).

Using the Test

Norm-referenced tests are used to try to predict how well students will do in certain situations, such as college. They can also be used to place students in gifted and talented or remedial programs (Bracey, 2000). The SAT and Preliminary SAT/National Merit Scholarship Qualifying Test (PSAT/NMSQT) are examples of norm-referenced tests. The SAT is used for college entrance, and the PSAT/NMSQT gives students practice for the SAT Reasoning Test and a chance to enter the National Merit Scholarship Corporation scholarship programs ("About PSAT/NMSQT," n.d.). The California Achievement Test and the Iowa Test of Basic Skills are also examples of norm-referenced tests. There is no passing or failing of norm-referenced tests, since each student receives scores compared to others who have taken the test. Test scores are generally given as a percentile.

Designing the Test

Step 1: Examining the Curriculum Materials

To develop a norm-referenced test, test publishers examine the curriculum materials produced by textbook and workbook publishers and then develop questions that measure the skills most commonly used in the materials they have reviewed. Then experts review the items produced to determine their content validity, or whether the test measures what is it is supposed to measure. For example, a norm-referenced test that is purported to be a measure of students' reading skills but only assesses vocabulary would not have high content validity and should be redesigned. After proper content validity has been established, the test is then tried out by a sampling of students to see how the questions are answered. A norm-referenced test should not have questions that too many students cannot correctly answer, or have test items that too many students answer correctly. In general, norm-referenced tests only include items that between 30 percent and 70 percent of the students who have taken the test answer correctly. In addition, questions are removed that people with overall high scores do not correctly answer as well as the questions that students with overall low scores correctly answer (Bracey, 2000).

Step 2: Interpreting Student Performance

Each student's performance on norm-referenced tests is assessed according to the performance of a “normed” group, a larger set of students who have previously taken the exam in order to determine the norm. Because "norming"-constructing the norms of – an exam is an expensive and involved process, test publishers usually use norms for about seven years. The results are reported as a percentile rank, meaning that a student receiving a score of 65 has performed to the same degree or higher than 65 percent of the norming students, which, if properly normed, is indicative of all students who have taken the norm-referenced test since it was first given (Bond, 1995). However, test scores on norm-referenced tests typically rise the longer the test is in use, which could be attributed to changes in instruction or test preparation that instructors implement due to their familiarity with the test questions (Linn, Graue & Sanders, 1990, as cited in Monetti & Hinkle, 2003). Knowing student rank can be useful in deciding whether students may need some remedial assistance in a subject area or should be included in a gifted and talented program. Norm-referenced test results cannot provide information about what exactly students know, only that they know more of the test content than a percentage of the students who comprise the norm group (Bond, 1995).

Step 3: Constructing Norms

The process of constructing norms is called "norming." The norms are entered into a chart that lets the test interpreter convert the raw scores to a derived score, which will then make it easier to compare one student's score to the norm group. There are four types of derived scores: percentiles, standard scores, developmental scales, and ratios and quotients. In order to construct norms, the population of interest needs to be identified. This might be anything from a student body in a certain school district, all those who have applied to be part of a program, all residents of one state, and all students in a particular region such as the Midwest, Northeast, Pacific Northwest, or South. The most important statistics to be analyzed for the sample data should be determined, as should the tolerable amount of sampling error for those statistics. A procedure also needs to be devised for attaining the norm group, and the sample size needs to be determined (Rodriguez, 1997).

What Are Norm- Referenced Tests?

Norm-referenced tests reveal student scores in regard to those that have been pre-established by a norm, or average, group of similar students who have taken the same test. Norms are statistics that provide information about how a defined group of students performed on a given test. Many norm groups can be assigned for different tests, and a student's relative ranking is not often predictable, as it depends on which norm group was used as an analogy. To help states and districts select the norm-referenced test that best suits their needs, the normative sample should be described in enough detail to assist in the selection process. This means that the demographic characteristics of the norm group should be described in detail, including gender distribution, racial and ethnic background, the geographic location, socioeconomic position, and the education level of the group (Rodriguez, 1997). This information will allow states and districts to assess whether it would be a meaningful comparison for their students. If the demographic characteristics do not match up well with the students to be assessed, then that particular test should not be used as the results would not be relevant.

Further Insights

Norm-referenced tests are assessments given to students to determine how they perform in comparison to their peers who have taken the same test. All students taking the test are compared to a norm group who were given the test before it was distributed for mass use. The norm groups can be national norms or local norms and used depending on what type of comparison is being sought. Norm-referenced tests and criterion-referenced tests are vastly different but can be used in conjunction to provide an overall view of student performance with norm-referenced tests providing a comparison with other students and criterion-referenced tests showing student mastery of subject matter (Monetti & Hinkle, 2003).



Among derived scores lay percentiles, which are the most commonly used due to their ease of interpretation. The difference between percentile and percentile rank is that: “a percentile is a point in the distribution below which a certain percentage of the scores fall, and a percentile rank gives a student's relative position, or percentage, of student scores that fell below the obtained score. For example, the 90th percentile is the point below which 90 percent of the scores in the distribution fall; it does not mean that a student who has scored at the 90th percentile answered 90 percent of...


Leave a Reply

Your email address will not be published. Required fields are marked *