20. International Conference
ETHICS IN BUSINESS & ECONOMICS:
CHALLENGES FOR HIGHER EDUCATION
Dedevelopment
of
Educational Tests
(computer translation)
Dr. Vadim Avanesov
18 -19 марта 2002 г., Алматы, Казахстан
18 -19 March 2002, Almaty, Kazakhstan
The report is aimed at the following: to present to attendees the methodological aspects of tests development and operation in educational systems. Educational Measurement science deals with test development mainly in Western countries. The subject matter of this Science is the development of high-quality tests aimed at the measurement of students' knowledge level. Currently such tests are also used for students' ranking, monitoring of educational process, adaptive learning and test control, distance learning: generally speaking, tests are used in all modern educational technologies.
The test method is considered actual among others due to certain competitive advantages. The five key advantages:
High scientific validity of tests as such, that allows objective assessment of knowledge level of examinees;
Technological properties of test methods;
Measurement accuracy;
Common rules for educational testing control and adequate interpretation of testing results;
Compatibility of testing technology with other modern educational technologies.
System of definitions and terminologies together with test form and content make up a theoretical and methodological base of test methods. Methodology-related test methods, quality check criteria and mathematical models of educational measurement are not covered by the report.
There exists a hierarchy of 5 basic subordinated concepts: the author of the report has investigated the first three concepts - "item in the test form", "test item" and "Educational Test" along with associated terminology (1), and then strongly defined those in the work (2). Two other system-forming concepts of the theory, i.e., "content" and "form", are both related to the items and tests on the whole. Investigation results are presented in detail in the author's papers - (2) and (3).
Test Definition
Educational test is defined by the author as system of parallel test items of specific form, certain content, and approximately even ascending difficulty. The system is aimed at giving an objective assessment of structure and measuring the students' knowledge level.
A brief interpretation of key concepts is helpful for better understanding of the definition. System assumes the test contains system-forming items. That means belonging of all items to the same educational system, e.g. the same subject, correlation between items and the test scores. Items are arranged in ascending order: from the easiest to the most difficult ones. In other words, arrangement of items by difficulty is one of the important system-forming characteristics of an educational test.
Specific form of testing items is proved by the fact they are neither questions nor problems, they're items designed in the form of true or false statements, depending on answers. The answers, themselves, are not true and false (See exampes below). Traditional questions can't be true or false: as the answers are often uncertain and wordy, the teachers have to use an outstanding intellectual potential to ascertain if the answers are true. In this sense, traditional questions and answers prove non-technological and shouldn't be included into test.
Certain content means using only appropriate assessment knowledge, consistent with the educational subject content; other materials cannot be included into educational testing. For example, assessment of intellectual potential is a subject for psychological testing.
Testing content exists, is kept and transferred in four forms of items. Either testing or it's content cannot exist in any other form than test form.
An items difficulty criterion is the sole theoretically justified criterion in the educational measurement to arrange homogeneous testing content. The educational testing should not include any content non-related to education (for instance, IQ assessment). This is a subject of psychological testing. The increasing difficulty can be compared with the barriers on a stadium racetrack where the next barrier is higher than previous one. Only a better-trained runner would be succeed to run the distance and overcome all barriers.
As the items are arranged by increasing difficulty, one may notice that one examinee fails with the first and easiest items, others - with the next ones. Student of medium knowledge level would be succeeding only with the half of a test. And, finally, only the most skilled students would solve the most difficult problems placed at the end of a test.
Difficulty of the test may be defined in two ways: a) imaginarily, on the basis of the assumed volume and character of mental work promoting items success realization; and b) by empirical approbation of items accompanied by estimation of share of false answers. Empirical indicators of difficulty have been studied only in the classical theory for many years. New appearing types of testing emphasize the nature of the students' mental work.
Answer to educational item is given as a brief judgment related to content and form to the items content. Answers to each item may be true or false. Designers of the test determine the accuracy criteria in advance. Evaluation of the designed answers by accuracy isn't used in the practical testology very often, but if necessary, an items can be designed with all true answers distinguished only by degree of accuracy. The instruction for examinees would be "Circle the number of the most true answer!"
The chance to give a true answer on any items depends upon the correlation between students' knowledge level and items difficulty. This chance is indicated by values from 0 to 1, upon comparable scales availability. Analysis of each student's answers to testing lightens his knowledge level and structure. The more true answers have been given, the higher an individual testing score of examinee is. Usually the testing score is associated with "knowledge level" and it is to be adjusted to an educational measurement model. The same knowledge level can be achieved due to answers given to different items. For example, a student has a score of 10 points in the test consisted of 30 items. Most likely his score has been obtained due to his true answers to the first 10 - comparatively easy - items. The consistency of unities followed by zeroes in this case /"1" followed by "0"/ is considered the right profile of student's knowledge.
An opposite situation, where a student gives true answers on difficult questions and false on easy ones, is contradictory to the logic of teaching, hence, such knowledge profile could be defined as inverted. Inverted profile is rarely found, mostly due to the fallacy in design of the test, where items are arranged not by ascending difficulty. Provided that the test is designed appropriately, knowledge structure is proved by each profile. This structure can be defined as elementary (because of factor structures determined through factor analysis methods).
Each educational institution should be aimed above all at forming appropriate individual knowledge structures without gaps in the knowledge, and at improving educational level. Japan and the rapidly developing Asian-Pacific countries evidently maintain this principle. Mostly, knowledge level depends on the student's individual work and capacities, when knowledge structure is much depending upon the appropriate organization of educational process, individual approach in education, teacher's competence and skill, objective control - as a matter of fact, all that we're lacking.
Form of test items
First of all, the content and form of test notice attention of instructors. Content is defined as reflection of a fragment, or a component, of a school subject in the form of test; form is defined as a method of correlation and order of items components. Testing content may exist, be stored and transferred in either of the four items key forms. Either testing or it's content cannot exist in any other form than testing.
All test items known in theory and practice can be divided into four major groups. Items with one or more true answers form the first group. Items offering a choice of answers (usually one true and several false answers) should be rather defined as the items with the choice of one true answer. For example:
1. NUMBER "ONE" IS CONSIDERED A
1) prime number
2) composite number
3) both prime and composite number
4) neither prime nor composite number
Such items have a true and several false, but verisimilar answers. The latter are called distractors . Number of distractors may vary from 1 to 6. At present, the items with choice of several true answers are widely spread alongside with the items of one-answer choice. They are more difficult by content than items of one answer choice. "Circle the numbers of all true answers" instruction is given at the beginning. In this case the number of distractors are increased and may achieve 12-14/
1. RELATED TO PHILOSOPHY ARE
1) atom 2) knowledge 3) being 4) liberty 5) development 6) quality 7) culture 8) revolution 9) dialectics 10) quantity
5. EXCISABLE GOODS ARE
1) tobacco 2) jewelry 3) grain 4) cars 5) petrol 6) sausages 7) bread 8) alcoholic beverages
Examinee should define the answers true or false, and decide upon completeness of the answer. Second group is represented by items requiring additional answer: it's usually one word or sign. The standard instruction is: "Add".
6. THE FIRST GREEK PHILOSOPHER WAS ____________.
7. EACH PLANET ORBITS BY AN ELLIPSE IN A FOCUS OF WHICH IS ________.
Third group is formed by the items composed of the elements arranged in two columns. Such items are preceded by the instruction -
Match:
8. NAME
1. Fauna
2. Flora
3. Megaera
4. Aesculapius
5. Penelope
6. Narcissus
7. Prometheus
|
Meaning of name
A) Doctor
B) Luck
C) Fighter
D) Faithful wife
E) Wicked woman
F) Self-enamored
G) Vegetative world
H) Mysterious man
I) Man of striking beauty
|
Answers: 1_, 2_, 3_, 4_, 5_, 6_, 7_.
In the missing cells of answer line examinees enter a letter corresponding to the right answer from the second column.
The fourth group includes the items of procedural or algorithmic nature. Let us consider the items for testing the historic knowledge concerning the events of February-October 1917, that considerably influenced not only the history of Russia , but the course of the political events in the entire world. Naturally, studying the course students memorize the facts. But knowing history is not only knowing certain facts, first of all it is knowing the historic process where the studied facts are regulated by time. Each items is preceded by the instruction:
"Identify the right consecution":
1. EVENTS OF FERUARY-OCTOBER 1917
?- VI congress of RSDRP
?- disavowal of tsar Nicholas II
?- arrival of Lenin
?- founding the Petrograd council
?- Kornilov rebellion
?- abolition of diarchy
?- II congress of the Soviets
Examinee enters the ranking numbers in the boxes on the left of each element of the items. At computer testing, the examinee works with the help of special instrumental program made taking such form of items into account; after entering the ranking number the shunt automatically switches to the next box. Second example:
23. ARRANGING THE CHAPTERS OF THE NOVEL
"A HERO OF OUR TIME".
?- Bela
? Taman '
?- The Fatalist
?- Princess Mary
?- Maxim Maximovich
?- Author's introduction
?- Pechorin's journal
THE CONTENT OF TESTING
The content of testing is an optimal reflection of the content of education in the system of testing items. The words "optimal reflection" presume the necessity to select such control material, the answers to which would provide a high probability (over 95%) evidence of each student's preparedness.
The requirement to provide optimal reflection involves compulsory periodic revisions of the goals and meaning of pedagogic activity. Till recently the practice of general secondary education was reduced to mastering the known list of Knowledge, Abilities and Skills. In the educational ministry it was supplemented by a controversial (if not harmful) idea of so-called "educational minimum" that absolutely contradicted the goals of genuine education and character education, with the goals involving full intellectual, cultural, moral, esthetic, and physical development. Orientation to the minimum and checking minimum only is a consequence of bureaucratic approach to the education management and the falseness of a total and minimalist educational policy.
Optimization of the content has been a leading idea of traditional and adaptive testing: to optimize testing means to measure the knowledge of maximal number of students, rapidly, with high quality, at minimal expenses, with the minimum number of items and for the short space of time.
This idea is close to the task of improving the effectiveness of pedagogic activity by the usage of mass knowledge control. It seems appropriate to make some generalization of ideological sense: testing culture, first of all, is interesting to the leaders aiming at increasing such effectiveness.
Testing content selection criteria:
Correspondence of the content of testing to the goals of testing;
The importance of the knowledge tested within the general system of knowledge;
Correlation of content and form;
Correctness of the content of test items;
Representativeness of the content of educational discipline in the content of test;
Correspondence of the content of test with the modern state of science.
The complement and equilibrium of the content of test.
System of the content.
Variability of the content.
Correspondence between the degree of difficulty and the goal of the test.
The lack of scientific research for the testing causes substitution for genuine testing by unscientific forms and methods. For instance in Russia instead of developing tests, the funds borrowed from the budget and international loans are spent on developing pseudo-scientific controlling material used for the Single State Examination.
Negative aspects of the Russian Single State Examination are unrealistic statement of goals. For example: It is impossible to provide an equal access to education by impoverishment of the general population; Fighting corruption is ineffective without Anticorruption Law; Objectification of knowledge assessment is impossible using low-quality tests; The essential issues of the Unified State Examination that have not been worked over at all, are: juridical, social (social consequences in particular); methodic and metric (issues of exact measurement). Main courses of work to provide scientific substantiation of test process are:
training the specialists by the program of "Pedagogic measurements";
post-graduate study and defense of the thesis on test problems;
training listners of secondary special educational institutions and school-teachers concerning issues of the methodology of test control knowledge;
publications on the issue.
A brief list of the author's publications
1. "Methodological and theoretic grounds of testing control". Thesis of the doctor of pedagogic science. State university, 1994 - 339p.
2. "Composition of test items ". Testing center, 2002 - 240p.
3. Content of testing. Principles of developing the content of test. Logical requirements to the content of test. Knowledge as a subject of test control. Kinds of knowledge. // Managing schools. NN 36, 38, 42, 46, in 1999 and N 2 in 2000.
4. "Basic concepts of Educational Measurements" // Thesis report of the participants of workshop-school "Scientific problems of test control of knowledge" 14-18th of March 1994. Center of Research of the problems of specialists' training quality, 1994, p. 105-108.
5. "Where will education go"// People's education, N 5, 2001, p. 26-31.
6. "How to overcome the precipice between secondary and high school?"// Managing schools, N43, November 2000.
7. "Do we really want it?" // Russian Federation today. N20, September 2001, p. 8-9.
8. Principles of scientific organization of pedagogic control in high school. M. MISiS , 1989. p . 167.
9. Certification of tests in ministry fashion \\ Official documents in education. N32 (167) November 2001, p . 99-102.
|