Qualities of a good research instrument SlideShare

Whether a test is standardized or teacher-made, it should apply the qualities of a good measuring instrument. This module discusses the qualities of a good test which are: validity, reliability, and usability.

Nội dung chính Show

  • What are the characteristics of good measuring instrument?
  • Which of the qualities of good measuring instrument is most practicable?
  • What are the characteristics of educational measurement?

After reading this module, students are expected to:

1. define and explain the characteristics of a good measuring instruments;
2. identify the types of validity;
3. describe what conditions can affect the validity of test items;
4. discuss the factors that affect the reliability of test;
5. estimate test reliability using different methods;
6. enumerate and discuss the factors that determine the usability of test; and
7. point out which is the most important characteristics of a good test.

Validity

Validity – is the most important characteristics of a good test. Validity – refers to the extent to which the test serves its purpose or the efficiency with which it measures what it intends to measure.

The validity of test concerns what the test measures and how well it does for. For example, in order to judge the validity of a test, it is necessary to consider what behavior the test is supposed to measure.

A test may reveal consistent scores but if it is not useful for the purpose, then it is not valid. For example, a test for grade V students given to grade IV is not valid.

Validity is classified into four types: content validity, concurrent validity, predictive validity, and construct validity.

Content validity – means that extent to which the content of the test is truly a representative of the content of the course. A well constructed achievement test should cover the objectives of instruction, not just its subject matter. Three domains of behavior are included: cognitive, affective and psychomotor.

Concurrent validity – is the degree to which the test agrees with or correlates with a criterion which is set up an acceptable measure. The criterion is always available at the time of testing.

Concurrent validity or criterion-related validity- establishes statistical tool to interpret and correlate test results.

For example, a teacher wants to validate an achievement test in Science [X] he constructed. He administers this test to his students. The result of this test can be compared to another Science students [Y], which has been proven valid. If the relationship between X and Y is high, this means that the achievement test is Science is valid. According to Garrett, a highly reliable test is always valid measure of some functions.

Predictive validity – is evaluated by relating the test to some actual achievement of the students of which the test is supposed to predict his success. The criterion measure against this type is important because the future outcome of the testee is predicted. The criterion measure against which the test scores are validated and obtained are available after a long period.

Construct validity – is the extent to which the test measures a theoretical trait. Test item must include factors that make up psychological construct like intelligence, critical thinking, reading comprehension or mathematical aptitude.

Factors that influence validity are:

1. Inappropriateness of test items – items that measure knowledge can not measure skill.
2. Direction – unclear direction reduce validity. Direction that do not clearly indicate how the pupils should answer and record their answers affect validity of test items.
3. Reading vocabulary and sentence structures – too difficult and complicated vocabulary and sentence structure will not measure what it intend to measure.
4. Level of difficulty of Items – too difficult or too easy test items can not discriminate between bright and slow pupils will lower its validity.

5. Poorly constructed test item – test items that provide clues and items that are ambiguous confuse the students and will not reveal a true measure.

6. Length of the test- a test should of sufficient length to measure what it is supposed to measure. A test that is too short can not adequately sample the performance we want to measure.

7. Arrangement of items – test item should be arrange according to difficulty, with the easiest items to the difficult ones. Difficult items when encountered ahead may cause mental block and may also cause student to take much time in that number.

8. Patterns of answers – when students can detect the pattern of correct answer, they are liable to guess and this lowers validity.

Reliability

Reliability means consistency and accuracy. It refers then to the extent to which a test is dependable, self consistent and stable. In other words, the test agrees with itself. It is concerned with the consistency of responses from moment to moments even if the person takes the same test twice, the test yields the same result.

For example, if a student got a score of 90 in an English achievement test this Monday and gets 30 on the same test given on Friday, then both score can not be relied upon.

Inconsistency of individual scores however may be affected by person’s scoring the test, by limited samples on certain areas of the subject matter and particularly the examinees himself. If the examinees mood is unstable this may affect his score.

Factors that affect reliability are:

1. Length of the test. As a general rule, the longer the test, the higher the reliability. A longer test provides a more adequate sample of the behavior being measured and is less distorted by chance factors like guessing.

2. Difficulty of the test. When a test is too easy or too difficult, it cannot show the differences among individuals; thus it is unreliable. Ideally, achievement tests should be constructed such that the average score is 50 percent correct and the scores range from near zero to near perfect.

3. Objectivity. Objectivity eliminates the bias, opinions or judgments of the person who checks the test. Reliability is greater when test can be scored objectively.

4. Heterogeneity of the student group. Reliability is higher when test scores are spread over a range of abilities. Measurement errors are smaller than that of a group that is more heterogeneous.

5. Limited time. a test in which speed is a factor is more reliable than a test that is conducted at a longer time.

A reliable test however, is not always valid.

Methods of Estimating Reliability of Test:

1. Test-retest method. The same instrument is administered twice to the same group of subjects. The scores of the first and second administrations of the test are determined by Spearman rank correlation coefficient or Spearman rho and Pearson Product-Moment Correlation Coefficient.

The formula using Spearman rho is:

rs  1 – 6D2 Where ; D2 = sum of squared difference
N3 – N between ranks
N = total number of cases

For example, 10 students where used as samples to test the reliability of the achievement test in Biology. After two administration of test the data and computation of Spearman rho is presented in the table below:

Differences squared
Scores Ranks between ranks difference
Students S1 S2 R1 R2 D D2

1 89 90 2 1.5 0.5 0.25
2 85 85 4.5 4 0.5 0.25
3 77 76 9 9 0 0
4 80 81 7.5 8 0.5 0.25
5 83 83 6 6.5 0.5 0.25
6 87 85 3 4 1.0 1.0
7 90 90 1 1.5 0.5 0.25
8 73 72 10 10 0 0
9 85 85 4.5 4 0.5 0.25
10 80 83 7.5 6.5 1.0 1.0

Total D = 3.5

rs = 1 – 6D2
N3 – N

= 1 –

= 1 –

= 1 – 0.0212

= 0.98 [very high relationship

The rs value obtained is 0.98 which means very high relationship; hence achievement test in Biology is reliable.

Pearson Product-Moment Correlation Coefficient can also be used for test-retest method of estimating the reliability of test. The formula is:

Using the same data for Spearman rho, the scores for 1st and 2nd administration may be presented in this way:

X [S1] Y[S2] X2 Y2 XY

89 90 7921 8100 8010
85 85 7225 7225 7225
77 76 5929 5776 5852
80 81 6400 6561 6480
83 83 6869 6869 6869
87 85 7569 7225 7395
90 90 8100 8100 8100
73 72 5329 5184 5256
85 85 7225 7225 7225
80 83 6400 6889 6640
X = 829  = 830 X2 = 68967 2 =69154 X = 69052

Could you now compute by using the formula above? Illustrate below:

Alternate-forms method. The second method of establishing the reliability of test results. In this method, we give two forms of a test similar in content, type of items, difficulty, and others in close succession to the same group of students. To test the reliability the correlation technique is used [refer to the formula used in Pearson Product-Moment Correlation Coefficient].

2. Split-half method. The test may be administered once, but the test items are divided into two halves. The most common procedure is to divide a test into odd or even items. The results are correlated and the r obtained is the reliability coefficient for a half test. The Spearman-Brown formula is used which is:

where; r = reliability of whole test
rht = reliability of half of the test

For example, rht is 0.69. what is r?

rt = 2 rht
1 + rht

= 2[0.69]
1+ 0.69

= 0.82 very high relationship, so the test is reliable.

Split-half method is applicable for not highly speeded measuring instrument. If the measuring instrument includes easy items and the subject is able to answer correctly all or nearly all items within the time limit of the test, the scores on the two halves would be about similar and the correlation would be closed to +1.00.

3. Kuder-Richardson Formula 21 is the last method of establishing the reliability of a test. Like the split half method, a test is conducted only once. This method assumes that all items are of equal difficulty. The formula is:

Where:

X = the mean of the obtained scores
S = the standard deviation
k = the total number of items

Example: Mr. Marvin administered a 50-item test to 10 of his grade 5 pupils. The scores of his pupils are presented in the table below:

Pupils
Score [X]
X-X
[X-X]
A 32 3.2 10.24
B 36 7.2 51.84
C 36 7.2 51.84
D 22 -6.8 46.24
E 38 9.2 84.64
F 15 -13.8 190.44
G 43 14.2 201.64
H 25 -3.8 14.44
I 18 -10.8 116.64
J 23 -5.8 33.64

288 801.60

X = 28.8 S = 89.07 k = 50

Show how mean and standard deviation was obtained in the box below:

Could you now compute the reliability of the test applying the formula on Kuder Richardson formula 21? Please try!

Usability

Usability means the degree to which the tests are used without much expenditure of time, money and effort. It also means practicability. Factors that determine usability are: administrability, scorability, interpretability, economy and proper mechanical makeup of the test.

Administrability means that the test can be administered with ease, clarity and uniformity. Directions must be made simple, clear and concise. Time limits, oral instructions and sample questions are specified. Provisions for preparation, distribution, and collection of test materials must be definite.

Scorability is concerned on scoring of test. A good test is easy to score thus: scoring direction is clear, scoring key is simple, answer is available, and machine scoring as much as possible be made possible.

Test results can be useful if after evaluation it is interpreted. Correct interpretation and application of test results is very useful for sound educational decisions.

An economical test is of low cost. One way to economize cost is to use answer sheet and reusable test. However, test validity and reliability should not be sacrificed for economy.

Proper mechanical make-up of the test concerns on how tests are printed, what font size are used, and are illustrations fit the level of pupils/students.

Summary

A good measuring instruments posses’ three qualities, which include: validity, reliability and usability.

Validity – the extent to measure what it intends to measure. It has four types; the content, construct, concurrent and predictive. Test validity can be affected by: inappropriateness of the test, direction, vocabulary and construction of test, level of difficulty, constructions, length of test, arrangement of items and patterns of answers.

Reliability is the consistency of scores obtained by an individual given the same test at different times. This can be estimated using test-retest method, alternate forms, split-half and kuder-Richardson Formula 21. The reliability of test may however be affected by the length of test, the difficulty of test item, the objectivity of scoring heterogeneity of the student group, and limited time.
Usability of test means its practicability. This quality is determined by: ease in administration [administrability], ease in scoring [scorability], ease in interpretation and application [interpretability], economy of materials and the proper mechanical make-up of the test.
A test to be effective must be valid. For a valid test is always valid, but not all reliable test is valid.

Learning Exercises

I. Multiple Choice: Encircle the correct answer.

1. Which statement concerning validity and reliability is most accurate?

a. A test can not be reliable unless it is valid.
b. A test can not be valid unless it is reliable.
c. A test can not be valid and reliable unless it is objective.
d. A test can not be valid and reliable unless it is standardized.

2. Which type of validity is appropriate for criterion-reference measure?

a. content validity c. construct validity
b. concurrent validity d. predictive validity

3. Which is directly affected by objectivity in scoring?

a. The validity of test c. The reliability of test
b. The usability of test d. The administrability of test

2. A teacher-made test constructed had overemphasized facts and underemphasized other objectives of the course for which it is designed, what can be said about the test?

a. It lacks content validity.
b. It lacks construct validity.
c. It lacks predictive validity.
d. It lacks criterion-related validity.

3. When an achievement test for grade V pupils was administered to grade VI, what is most affected?

a. reliability of the test c. usability of the test
b. validity of the test D. reliability and validity of the test

4. Which factor of usability is described by the wise use of testing materials?
a. scorability c. economy
b. adminstrability d. proper mechanical make-up

5. Clarity and uniformity in giving directions affects:
a. scorability of the test c. interpretability of the test
b. administrability of the test d. proper mechanical make-up
6. Which best describe validity?
a. consistency in test result
b. practicability of the test
c. homogeneity in the content of the test
d. objectivity in administration and scoring of test

II. Setting and Option Multiple choice: Table 1 presents the scores of 10 students who were tested twice [test-retest] to test the reliability of such test. Complete the table and answer the question below. Choose the correct answer and show the computation in the question where it is needed.

Score Rank Differences Squared
Students from ranks difference
S1 S2 R1 R2 D D2

1 68 71
2 65 65
3 70 69
4 65 68
5 70 72
6 65 63
7 62 62
8 64 66
9 58 60
10 60 60

Total =

1. What is the squared difference between ranks of students?
a. 13 b. 14 c. 15 d. 16

2. Who got the highest score in the second administration of test?
a. student 1 c. student 5
b. student 3 d. student 7

3. What is the calculated rs?
a. 0.86 b. 0.88 c. 0.90 d. 0.92

4. What is the Garrets’ interpretation of the obtained rs [refer to question no.3]?
a. negligible correlation c. marked relationship
b. low correlation d. high or very high relationship

5. Based on Garrets’ interpretation of the calculated rs, what can you say about the test constructed?

III. Simple Recall:

1. Mr. Gwen conducted a 40-item Mathematics test to his 10 udents. Their scores in the first half and in the second half are shown below. Find the reliability of the whole test using split-half method. Is the test reliable? Justify.

1st half – 17 18 20 11 10 13 20 19 19 15
2nd half – 15 13 18 10 8 10 18 16 17 14

2. Ms. Pearl administered a 30 item science test to her Grade 6 pupils. the scores are shown below. What is the reliability of the whole test using Kuder-Richardson Formula 21? Is the test reliable? Justify.

Pupils – A B C D E F G H I J K L
Scores – 25 22 30 22 17 15 18 24 27 18 23 26

IV. Essay:

1. In your own opinion, which is better a valid test or reliable test? Why?
2. Why do you think students’ score in a particular test sometimes vary?
3. Discuss what makes test items/test results invalid?

References

Asaad, Abubakar S. and Wilham M. Hailaya. Measurement and Evaluation,
Concepts and Principles, 1st Ed. Manila, Philippines: Rex Book Store,
Inc. 2004.

Calmorin, Laurentina. Educational Research, Measurement and Evaluation, 2nd Ed. Metro Manila, Philippines: National Book Store, Inc. 1994.

Oriondo, Leonora L. and Eleonor M. Antonio. Evaluating Educational Outcomes. Manila, Philippines: Rex Book store, 1984.

What are the characteristics of good measuring instrument?

Characteristics of good measuring instrument:.

RELIBILITY. RELIBILITY - is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects. ... .

VALIDITY. ... .

PRACTICIBILITY. ... .

USABILITY. ... .

MEASUREABILITY..

Which of the qualities of good measuring instrument is most practicable?

Validity. The most important characteristics of any good test are validity. Moreover, it refers to the amount to which the test serves its purpose.

What are the characteristics of educational measurement?

Five characteristics of quality educational assessments: Part one.

Content validity..

Reliability..

Fairness..

Student engagement and motivation..

Consequential relevance..

What are the good qualities of good research instrument?

Characteristics of good measuring instrument:.
RELIBILITY. RELIBILITY - is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects. ... .
VALIDITY. ... .
PRACTICIBILITY. ... .
USABILITY. ... .
MEASUREABILITY..

Which of the qualities of good measuring instrument is most practicable Why?

Validity – is the most important characteristics of a good test. Validity – refers to the extent to which the test serves its purpose or the efficiency with which it measures what it intends to measure. The validity of test concerns what the test measures and how well it does for.

What makes a research instrument is valid?

Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world. High reliability is one indicator that a measurement is valid.

Chủ Đề