Psychological tests

Psychological tests

Psychological tests

Intelligence, Aptitude and special aptitude test

Intelligence

The purposes of intelligence testing can be broadly divided into educational, research, vocational and medical (Whitla 1968).

Educational

  • For measuring the general learning readiness .We know that intelligence quotient scores are correlated with school achievement
  • For identifying gifted children .The essence of educational guidance is in providing for all children instructions that is interesting in content and suitable to their level of intellectual development
  • For identifying mentally retarded children so as to make adequate provisions for them
  • For homogenous grouping of children for educational effectiveness

Research

  • For indicating the extent of differences of intelligence quotient among children of the same calendar age. This indicates the need for providing teaching materials at the different levels of difficulty
  • To study mental growth. Mental abilities develop in a sequential order from birth onwards. We can use intelligence test to see the direction of individuals and group curves

Vocational

  • For vocational guidance. Different vocations call for different aptitudes

Medical

  • To define accurately the degree of mental retardation or defects so as to evolve adequate management strategies

Aptitude test

Aptitude test is defined as the test of suitability to determine whether an individual is likely to develop the skills required for a particular kind of work (Encarta dictionary, 2008). Aptitude tests are used to calculate abilities over a long period of time, as well as to envisage future learning performance. Example of aptitude tests are the Scholastic Assessment Test (SAT) and the American College Testing (ACT)).These tests are both college admission tests explored in the prediction of college success.

Special aptitude tests

The purpose of special aptitude test raises the concept of fidelity and bandwith.Bandwith determines the breath of the traits that is being evaluated while fidelity refers to the extent to which a particular measure focuses on a particular attribute or quality. These tests are explored to prognosticate on the future performance in a subject that the person in question is not currently trained.Goverment parastatals,institutions and business organization often will apply specific aptitude tests  when  handing over specific privileges to certain individuals. .Furthermore, vocational guidance counseling may involve aptitude testing to help clarify individual career goals (Microsoft Encarta, 2008). If a person has a similar score in comparison to that of individuals already functioning in a particular profession, the probability of success in that occupation can be predicted by the use of aptitude tests. Certain aptitude tests have a wide coverage that includes skills germane to many different professions. The General aptitude test, for instance aside measuring the general reasoning ability also covers the areas of form perception, motor coordination, clerical perception as well as manual and finger dexterity. Other tests may concentrate on a single area such as the Art, Engineering and modern languages (Microsoft Encarta.2008). One of the examples of special aptitude tests is the sensory or perceptual test and this concentrates on the discrimination of color and visual acuity. It also involves the auditory senses.

Another example of special aptitude test is the mechanical test which includes the test of spatial relations and this demands manual dexterity as well as space visualization. There is also the paper and pencil test which includes the Bennet Mechanical comprehension Test and the Minnesota Paper Formboard.

Other special aptitude tests are listed below.

The clerical test

This includes the Minnesota clerical test that consists of 200 pairs of numbers and 200 pairs of

names. It also includes the clerical abilities which is an embodiment of 7 other tests like test of

proofreading and copying etc.

The art and musical test.

The Art ability test includes the Art judgment test in which the participant judges between two pictures and chooses the one that is better. In aesthetic perception test, the participant gives an orderly ranking of 4 versions of the same project. The grave design judgment test also allows the participant to adjudicate the best among a group of abstract pictures.

Musical ability test

The musical ability test gives an analytical assessment of musical ability and it makes use of tones as well as notes to evaluate 6 components of auditory discrimination. The wing standardized test of musical intelligence explores recorded pianoic songs to assess about 8 areas which include the memory, chord analysis and rhythm.

Validity, Reliability and standardization of test

Analyzing validity and reliability is the foundation to identifying whether an experiment makes use of proper instrumentation, attain sensible results and appropriate procedure.

Why validity?

Validity is a useful research tool which is necessary to carry out any worthwhile project and it is a must for both the quantitative and qualitative research as the objective is to estimate the truth to the maximum degree as possible.

Validity of a test

A well-designed test is that which is both valid and reliable. Validity is in some ways the most fundamental consideration.  “A test is said to be valid if it measures what it claims to measure” (Kline, 1986).  The Standards for Educational and Psychological Testing  put forward by American psychological Association, the American Educational Research Association, National Council of Measurement in Education (Standards for educational and psychological testing, 1999)  opine  that validity “ is a measure of the  meaningfulness, usefulness and appropriateness of the specific inferences  drawn  from test scores.  Test validation is therefore the process of gathering evidence to corroborate such inferences.”

Validity is often measured in the context of the purposes for which a test score will be used. There are a number of measures of validity. For a psychological test to be valid, the test must first be reliable, but not all reliable tests demonstrate validity.  In Greenland & Linn, 1990 “Reliability is a necessary but not sufficient condition for validity.”  There exist different measures of validity depending on the purpose of the test.  Most common types of validity include the construct validity, content validity, predictive validity, face validity. As opposed to reliability, there is no particular singular statistical approach that is used to demonstrate validity.

Construct validity

This implies “the degree to which one can infer certain construction in a psychological theory from the test scores.”(Haladyna, 2002) The concept was at first  used in relation to psychological testing that pertains to individual differences such as assessment of the level of  hostility and anxiety(Mehrens&Lehman,1987;Mehrens, 2002).It has now become part and parcel of the discussion about the  validity of the achievement test (Cronbach  &Mehrens,1955;Cronbach,1989;Mehrens,2002)

Predictive validity

This is realized by finding a link between test and subsequent criterion ,for instance by looking into the correlation between a test administered at age 10 with outcomes and performances on subsequent test of academic success such as college classes, one can establish the validity of an intelligence test(Kline 1986).Nevertheless, problem arises in validating variables in this manner.    One, for the problem of getting suitable subsequent criterion and two, the challenges posed by collection of statistics (Kline 1986)

Face validity.

Face validity is the least effective type of validity.  It refers to how well the test “on the face of it” appears like it measures what it is supposed to measure.  A psychological test made up of psychological problems is regarded to have face validity.  This type of validity is pivotal to test takers.

Content validity.

This is the most valuable type of validity for assessment of student learning.  Content validity indicates the extent to which the test items correspond to what is learned in a particular course or perhaps in a similar knowledge framework.  In order words, to what range are the test items indicative of the types of content or skills that were taught? Content validity can be facilitated by watchfully manipulating with the view to reflecting what was taught.   In order to realize this, many test makers explore a test design or medium where the rows or columns are the pivotal elements of the content and the boxes represent the related test items. The test item is a representative of the universe of accomplishments within the framework, hence the items must be the appropriate samples (Mehrens&Lehman, 1987)

Reliability

Reliability is a term that refers to the consistency of the scores.  In order words, the possibility to obtain the same or similar scores supposing the test was administered at a different time of day, or perhaps if different raters scored the test?  Reliability also indicates how internally consistent the test is.

Reliability can be simply defined as “the degree of consistency between two measures of the same thing” (Mehrens & Lehmann, 1987).Reliability is commonly influenced by strength of the test,the speed of the test and group homogeneity. A large test test is generally more reliable as test speed increases estimating reliability with equivalency or test-retest approach becomes more significant. The more heterogeneous a group is, the better the reliability and this is because of the increasing variability of group scores despite stability of standard error. Amidst difference score, the test with a little variability will be less reliable all other things being equal(Haladya,2002;Mehren&Lahman 1987) Reliability coefficient should be at least 85 and 65 respectively for tests with individual consequences and that used for making decisions about groups(Mehrens&Lehman 1987).Several methods are used to estimate test reliability and they include the interrate,test-retest,alternate forms, split- halves,Kinder-Richarson(K-R) method ,Cronbach’salpha method and Improving reliability method.

Standardization

Standardized tests are administered to assist in appropriate academic placement, to assess academic achievement, to identify individual aptitudes, to explore vocational interests, and to examine personal characteristics. Standardized tests are used also to identify gifted students and those with special learning problems. (Microsoft Encrata, 2008)

Test standardization involves the use of established rules in the administration as well as the interpretation of test. Here, standard measures are used for the assessment of the tests as well as the interpretation of the results. Test administrators and proctors being used in all classrooms when the test is given. Standardization of test gives description of a test prepared by learned individuals and administered to large group of students under certain prescribed conditions .It has a low correlation with short term classroom learning as seen in grading period.Standadized test are expansive survey of accrued learning as might occur over numerous years of instructions (Pophan, 2000; Halagyna2002).Standardized tests include aptitude test, intelligence test, entry/exit tests, achievement test. Test standardization helps to measure knowledge with ease and  to do this in a better way than assessment of skills (Haladyna 2002)

Misuse and misinterpretation of Psychological tests

Psychological tests are often used inappropriately and are misinterpreted and over interpreted in the forensic setting.  This harms the person being evaluated and interferes with the course of justice.  It also does a disservice to the reputation of psychologists and the science of psychology (Harris quoted in Ralph and Hallida).Commonly misinterpreted psychological tests are:

Drawing and projective tests

In the case of children drawing test, interpretation are often not backed up by experimental and empirical evidence. No standard data showcases validity as well as reliability .In a situation where drawing is used, to avoid misinterpretation, the interpretation should be conservative in order to generate hypothesis to be explored. Projective test also lack any appreciable validity and reliability. In a review of the Draw-A-Person test in the Seventh Mental Measurements Yearbook, it was said that there appears to be very little evidence for the use of “signs” as valid indicators of personality characteristics.   With children’s drawings there is so much variability from drawing to drawing that particular features of any one drawing are too unreliable to say anything about them (Harris quoted in Ralph and Hallida).

Rorschach test

Here, certain recommendations and conclusions about people lives are made based on misinterpretation of this test having failed to recognize its limitation as there is no empirical support for the validity of this test and therefore it becomes limited in its clinical use. Here, people should rather be evaluated on the basis of what they do as opposed to what they are feeling, their thoughts or inclinations  as seen in Rorscharch.The responses  given in Rorschach  is not a true reflection  of an evidence of a real psychopathology. The interpretation is therefore subjective and quirck.It is unscientific as it makes inferences based on supposed reality of unconscious process in the mind.

The MMPI

This is also often misinterpreted test. It gives a quirck misinterpretation without support from empirical books .Its interpretation does not just stem from differences in opinions. More often than not, it makes use of computerized interpretation without any particular characteristics.

K scale

This is also usually misinterpreted. For instance, its elevation in individuals taking MMPI in prison and courts though frequent does not necessarily mean defensiveness as personality attribute on this is usually an adaptive reaction and need not be overinterpreted

Multiphasic sex inventory

This is a self report questionnaire comprising statements about the experience, difficulties, and sexual escapades. It is a scalar assessment of openness as regards sexual attitudes. Its use sometimes to find out individuals who denies   being abused sexually to elicit whether such persons actually abuse is a misplacement of purpose and often lead to its misinterpretation.

Penile plethysmograph

This is a method designed to fashion out individual’s treatment programmes for sexual offenders. This is subject to error when it is applied to find out the truthfulness when an individual denies ever committed a sexual offence as it generates a false plosive results and lead to its misinterpretation (Ferral qtd in Ralph and Hallida)

What is item analysis?

A list of statistics that is explored in the evaluation of whether a particular test is adequately performing the job of measuring the same variable that is being measured and assessed by other test items. The individual who understands the items pick the right or correct answer and the one who does not have a response that will be evenly distributed across the answers that are wrong

What then is the objective of item analysis?

It enhances the test by first recognizing the good items. Furthermore, it identifies the item that needs to be revised or discarded. It also determines what people do and do not understand. Item analysis in the hand of instructors is a veritable tool in assisting on the ways to improve as well as give guidance to instructors. The criteria for achieving this is such that the items analyzed must be valid assessment of the instructions as contained in the objectives. In the same vein, such items must be indicative. In order words, the information of incorrect options that student pick must be a guide to the natural history of the misunderstanding and therefore regulatory of necessary remediation.

Item analysis provides the item writer with a record of student reaction to items. It gives us little information about the appropriateness of an item for a course of instruction. The appropriateness or content validity of an item must be determined by comparing the content of the item with the instructional objectives (Academic technology service).

Item analysis reports contain students score and the response to each test items and this is further processed to generate what is known as the item analysis report file. It provides score distribution which can be in the order of percentile ranking, student number, or alphabetical order. It can also be arranged in order of total percentage points. Item analysis statistics gives the fraction of the total group that gets an item wrong with high index interpreted as difficult item and vice versa. In item analysis, the group is divided into upper, middle and lower based on the test scores. Item analysis also provides information about the index of discrimination which is obtained by subtracting the fraction that got answer right in the upper group from those that got it right in the lower group (Academic technology services).

In conclusion, item analysis provides information about maximum discriminating value, discriminating efficiency and the biserial correlation (Academic technology services)

References

Academic technology services. “Introduction to item analysis.”

Retrieved from www.ats.msu.edu on June 16, 2009

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.

Cronbach, L. J. (1989). Construct validation after 30 years. In R. L. (Ed.) ,

Intelligence:  Measurement theory and public policy (pp. 147-171). Urbana: University of

Illinois Press.

Guilford J.P. (1968). “The Structure of intelligence.” In D.K. Whitla(Ed) Handbook of Measurement and assessment in the Behavioral Sciences, Reading, Mass: Addison-Wesley.

Haladyna, T. (2002). Essentials of Standardized Achievement Testing:  Validity and Accountability. Boston, MA: Allyn and Bacon.

Kline, P. (1986). “A Handbook of Test Construction:  Introduction to Psychometric Design.” NY Methuen, Inc.

Mehrens, W. A., & Lehmann, I. J. (1987). Using Standardized Tests in Education. White Plains, NY: Longman, Inc.

Ralph, U& Hallida, W. “Misuse of psychological test in forensic settings, some horrible examples.

Retrieved from www.parentingplan.net on 16th June, 2009.

Schnitzer, Phoebe Kazdin. “Psychological Testing.” Microsoft® Student 2008 [DVD]. Redmond, WA: Microsoft Corporation, 2007

Psychological tests

Psychological tests

Intelligence, Aptitude and special aptitude test

Intelligence

The purposes of intelligence testing can be broadly divided into educational, research, vocational and medical (Whitla 1968).

Educational

  • For measuring the general learning readiness .We know that intelligence quotient scores are correlated with school achievement
  • For identifying gifted children .The essence of educational guidance is in providing for all children instructions that is interesting in content and suitable to their level of intellectual development
  • For identifying mentally retarded children so as to make adequate provisions for them
  • For homogenous grouping of children for educational effectiveness

Research

  • For indicating the extent of differences of intelligence quotient among children of the same calendar age. This indicates the need for providing teaching materials at the different levels of difficulty
  • To study mental growth. Mental abilities develop in a sequential order from birth onwards. We can use intelligence test to see the direction of individuals and group curves

Vocational

  • For vocational guidance. Different vocations call for different aptitudes

Medical

  • To define accurately the degree of mental retardation or defects so as to evolve adequate management strategies

Aptitude test

Aptitude test is defined as the test of suitability to determine whether an individual is likely to develop the skills required for a particular kind of work (Encarta dictionary, 2008). Aptitude tests are used to calculate abilities over a long period of time, as well as to envisage future learning performance. Example of aptitude tests are the Scholastic Assessment Test (SAT) and the American College Testing (ACT)).These tests are both college admission tests explored in the prediction of college success.

Special aptitude tests

The purpose of special aptitude test raises the concept of fidelity and bandwith.Bandwith determines the breath of the traits that is being evaluated while fidelity refers to the extent to which a particular measure focuses on a particular attribute or quality. These tests are explored to prognosticate on the future performance in a subject that the person in question is not currently trained.Goverment parastatals,institutions and business organization often will apply specific aptitude tests  when  handing over specific privileges to certain individuals. .Furthermore, vocational guidance counseling may involve aptitude testing to help clarify individual career goals (Microsoft Encarta, 2008). If a person has a similar score in comparison to that of individuals already functioning in a particular profession, the probability of success in that occupation can be predicted by the use of aptitude tests. Certain aptitude tests have a wide coverage that includes skills germane to many different professions. The General aptitude test, for instance aside measuring the general reasoning ability also covers the areas of form perception, motor coordination, clerical perception as well as manual and finger dexterity. Other tests may concentrate on a single area such as the Art, Engineering and modern languages (Microsoft Encarta.2008). One of the examples of special aptitude tests is the sensory or perceptual test and this concentrates on the discrimination of color and visual acuity. It also involves the auditory senses.

Another example of special aptitude test is the mechanical test which includes the test of spatial relations and this demands manual dexterity as well as space visualization. There is also the paper and pencil test which includes the Bennet Mechanical comprehension Test and the Minnesota Paper Formboard.

Other special aptitude tests are listed below.

The clerical test

This includes the Minnesota clerical test that consists of 200 pairs of numbers and 200 pairs of

names. It also includes the clerical abilities which is an embodiment of 7 other tests like test of

proofreading and copying etc.

The art and musical test.

The Art ability test includes the Art judgment test in which the participant judges between two pictures and chooses the one that is better. In aesthetic perception test, the participant gives an orderly ranking of 4 versions of the same project. The grave design judgment test also allows the participant to adjudicate the best among a group of abstract pictures.

Musical ability test

The musical ability test gives an analytical assessment of musical ability and it makes use of tones as well as notes to evaluate 6 components of auditory discrimination. The wing standardized test of musical intelligence explores recorded pianoic songs to assess about 8 areas which include the memory, chord analysis and rhythm.

Validity, Reliability and standardization of test

Analyzing validity and reliability is the foundation to identifying whether an experiment makes use of proper instrumentation, attain sensible results and appropriate procedure.

Why validity?

Validity is a useful research tool which is necessary to carry out any worthwhile project and it is a must for both the quantitative and qualitative research as the objective is to estimate the truth to the maximum degree as possible.

Validity of a test

A well-designed test is that which is both valid and reliable. Validity is in some ways the most fundamental consideration.  “A test is said to be valid if it measures what it claims to measure” (Kline, 1986).  The Standards for Educational and Psychological Testing  put forward by American psychological Association, the American Educational Research Association, National Council of Measurement in Education (Standards for educational and psychological testing, 1999)  opine  that validity “ is a measure of the  meaningfulness, usefulness and appropriateness of the specific inferences  drawn  from test scores.  Test validation is therefore the process of gathering evidence to corroborate such inferences.”

Validity is often measured in the context of the purposes for which a test score will be used. There are a number of measures of validity. For a psychological test to be valid, the test must first be reliable, but not all reliable tests demonstrate validity.  In Greenland & Linn, 1990 “Reliability is a necessary but not sufficient condition for validity.”  There exist different measures of validity depending on the purpose of the test.  Most common types of validity include the construct validity, content validity, predictive validity, face validity. As opposed to reliability, there is no particular singular statistical approach that is used to demonstrate validity.

Construct validity

This implies “the degree to which one can infer certain construction in a psychological theory from the test scores.”(Haladyna, 2002) The concept was at first  used in relation to psychological testing that pertains to individual differences such as assessment of the level of  hostility and anxiety(Mehrens&Lehman,1987;Mehrens, 2002).It has now become part and parcel of the discussion about the  validity of the achievement test (Cronbach  &Mehrens,1955;Cronbach,1989;Mehrens,2002)

Predictive validity

This is realized by finding a link between test and subsequent criterion ,for instance by looking into the correlation between a test administered at age 10 with outcomes and performances on subsequent test of academic success such as college classes, one can establish the validity of an intelligence test(Kline 1986).Nevertheless, problem arises in validating variables in this manner.    One, for the problem of getting suitable subsequent criterion and two, the challenges posed by collection of statistics (Kline 1986)

Face validity.

Face validity is the least effective type of validity.  It refers to how well the test “on the face of it” appears like it measures what it is supposed to measure.  A psychological test made up of psychological problems is regarded to have face validity.  This type of validity is pivotal to test takers.

Content validity.

This is the most valuable type of validity for assessment of student learning.  Content validity indicates the extent to which the test items correspond to what is learned in a particular course or perhaps in a similar knowledge framework.  In order words, to what range are the test items indicative of the types of content or skills that were taught? Content validity can be facilitated by watchfully manipulating with the view to reflecting what was taught.   In order to realize this, many test makers explore a test design or medium where the rows or columns are the pivotal elements of the content and the boxes represent the related test items. The test item is a representative of the universe of accomplishments within the framework, hence the items must be the appropriate samples (Mehrens&Lehman, 1987)

Reliability

Reliability is a term that refers to the consistency of the scores.  In order words, the possibility to obtain the same or similar scores supposing the test was administered at a different time of day, or perhaps if different raters scored the test?  Reliability also indicates how internally consistent the test is.

Reliability can be simply defined as “the degree of consistency between two measures of the same thing” (Mehrens & Lehmann, 1987).Reliability is commonly influenced by strength of the test,the speed of the test and group homogeneity. A large test test is generally more reliable as test speed increases estimating reliability with equivalency or test-retest approach becomes more significant. The more heterogeneous a group is, the better the reliability and this is because of the increasing variability of group scores despite stability of standard error. Amidst difference score, the test with a little variability will be less reliable all other things being equal(Haladya,2002;Mehren&Lahman 1987) Reliability coefficient should be at least 85 and 65 respectively for tests with individual consequences and that used for making decisions about groups(Mehrens&Lehman 1987).Several methods are used to estimate test reliability and they include the interrate,test-retest,alternate forms, split- halves,Kinder-Richarson(K-R) method ,Cronbach’salpha method and Improving reliability method.

Standardization

Standardized tests are administered to assist in appropriate academic placement, to assess academic achievement, to identify individual aptitudes, to explore vocational interests, and to examine personal characteristics. Standardized tests are used also to identify gifted students and those with special learning problems. (Microsoft Encrata, 2008)

Test standardization involves the use of established rules in the administration as well as the interpretation of test. Here, standard measures are used for the assessment of the tests as well as the interpretation of the results. Test administrators and proctors being used in all classrooms when the test is given. Standardization of test gives description of a test prepared by learned individuals and administered to large group of students under certain prescribed conditions .It has a low correlation with short term classroom learning as seen in grading period.Standadized test are expansive survey of accrued learning as might occur over numerous years of instructions (Pophan, 2000; Halagyna2002).Standardized tests include aptitude test, intelligence test, entry/exit tests, achievement test. Test standardization helps to measure knowledge with ease and  to do this in a better way than assessment of skills (Haladyna 2002)

Misuse and misinterpretation of Psychological tests

Psychological tests are often used inappropriately and are misinterpreted and over interpreted in the forensic setting.  This harms the person being evaluated and interferes with the course of justice.  It also does a disservice to the reputation of psychologists and the science of psychology (Harris quoted in Ralph and Hallida).Commonly misinterpreted psychological tests are:

Drawing and projective tests

In the case of children drawing test, interpretation are often not backed up by experimental and empirical evidence. No standard data showcases validity as well as reliability .In a situation where drawing is used, to avoid misinterpretation, the interpretation should be conservative in order to generate hypothesis to be explored. Projective test also lack any appreciable validity and reliability. In a review of the Draw-A-Person test in the Seventh Mental Measurements Yearbook, it was said that there appears to be very little evidence for the use of “signs” as valid indicators of personality characteristics.   With children’s drawings there is so much variability from drawing to drawing that particular features of any one drawing are too unreliable to say anything about them (Harris quoted in Ralph and Hallida).

Rorschach test

Here, certain recommendations and conclusions about people lives are made based on misinterpretation of this test having failed to recognize its limitation as there is no empirical support for the validity of this test and therefore it becomes limited in its clinical use. Here, people should rather be evaluated on the basis of what they do as opposed to what they are feeling, their thoughts or inclinations  as seen in Rorscharch.The responses  given in Rorschach  is not a true reflection  of an evidence of a real psychopathology. The interpretation is therefore subjective and quirck.It is unscientific as it makes inferences based on supposed reality of unconscious process in the mind.

The MMPI

This is also often misinterpreted test. It gives a quirck misinterpretation without support from empirical books .Its interpretation does not just stem from differences in opinions. More often than not, it makes use of computerized interpretation without any particular characteristics.

K scale

This is also usually misinterpreted. For instance, its elevation in individuals taking MMPI in prison and courts though frequent does not necessarily mean defensiveness as personality attribute on this is usually an adaptive reaction and need not be overinterpreted

Multiphasic sex inventory

This is a self report questionnaire comprising statements about the experience, difficulties, and sexual escapades. It is a scalar assessment of openness as regards sexual attitudes. Its use sometimes to find out individuals who denies   being abused sexually to elicit whether such persons actually abuse is a misplacement of purpose and often lead to its misinterpretation.

Penile plethysmograph

This is a method designed to fashion out individual’s treatment programmes for sexual offenders. This is subject to error when it is applied to find out the truthfulness when an individual denies ever committed a sexual offence as it generates a false plosive results and lead to its misinterpretation (Ferral qtd in Ralph and Hallida)

What is item analysis?

A list of statistics that is explored in the evaluation of whether a particular test is adequately performing the job of measuring the same variable that is being measured and assessed by other test items. The individual who understands the items pick the right or correct answer and the one who does not have a response that will be evenly distributed across the answers that are wrong

What then is the objective of item analysis?

It enhances the test by first recognizing the good items. Furthermore, it identifies the item that needs to be revised or discarded. It also determines what people do and do not understand. Item analysis in the hand of instructors is a veritable tool in assisting on the ways to improve as well as give guidance to instructors. The criteria for achieving this is such that the items analyzed must be valid assessment of the instructions as contained in the objectives. In the same vein, such items must be indicative. In order words, the information of incorrect options that student pick must be a guide to the natural history of the misunderstanding and therefore regulatory of necessary remediation.

Item analysis provides the item writer with a record of student reaction to items. It gives us little information about the appropriateness of an item for a course of instruction. The appropriateness or content validity of an item must be determined by comparing the content of the item with the instructional objectives (Academic technology service).

Item analysis reports contain students score and the response to each test items and this is further processed to generate what is known as the item analysis report file. It provides score distribution which can be in the order of percentile ranking, student number, or alphabetical order. It can also be arranged in order of total percentage points. Item analysis statistics gives the fraction of the total group that gets an item wrong with high index interpreted as difficult item and vice versa. In item analysis, the group is divided into upper, middle and lower based on the test scores. Item analysis also provides information about the index of discrimination which is obtained by subtracting the fraction that got answer right in the upper group from those that got it right in the lower group (Academic technology services).

In conclusion, item analysis provides information about maximum discriminating value, discriminating efficiency and the biserial correlation (Academic technology services)

References

Academic technology services. “Introduction to item analysis.”

Retrieved from www.ats.msu.edu on June 16, 2009

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.

Cronbach, L. J. (1989). Construct validation after 30 years. In R. L. (Ed.) ,

Intelligence:  Measurement theory and public policy (pp. 147-171). Urbana: University of

Illinois Press.

Guilford J.P. (1968). “The Structure of intelligence.” In D.K. Whitla(Ed) Handbook of Measurement and assessment in the Behavioral Sciences, Reading, Mass: Addison-Wesley.

Haladyna, T. (2002). Essentials of Standardized Achievement Testing:  Validity and Accountability. Boston, MA: Allyn and Bacon.

Kline, P. (1986). “A Handbook of Test Construction:  Introduction to Psychometric Design.” NY Methuen, Inc.

Mehrens, W. A., & Lehmann, I. J. (1987). Using Standardized Tests in Education. White Plains, NY: Longman, Inc.

Ralph, U& Hallida, W. “Misuse of psychological test in forensic settings, some horrible examples.

Retrieved from www.parentingplan.net on 16th June, 2009.

Schnitzer, Phoebe Kazdin. “Psychological Testing.” Microsoft® Student 2008 [DVD]. Redmond, WA: Microsoft Corporation, 2007

About these ads

Follow

Get every new post delivered to your Inbox.

Join 25 other followers

%d bloggers like this: