Assessment is needed to help teachers make decisions about students' linguistic abilities, their placement in appropriate levels, their progress and achievement. The success of any assessment depends on the effective selection and use of appropriate tools and procedures as well as on the proper interpretation of students' performance. Assessment tools and procedures, being essential for evaluating students' progress and achievement, also help in evaluating the suitability of the curriculum, the effectiveness of the teaching methodology and the instructional materials used.
In this paper I would like to concentrate on the advantages and the disadvantages of the following types of assessment: summative assessment (cloze tests and MCQs as examples) and production testing (role plays and interviews). Cloze tests can also be diagnostic and used for literacy placement if used before a student is placed in a particular class.
As a student I mostly encountered assessment in the form of the traditional paper-and-pencil tests. It was summative evaluation of student achievement which focused on linguistic accuracy and mastery of discrete language points. The teaching was grammar-centered and text-centered. So the test items typically consisted of gap-filling (to test grammar, collocation, fixed phrases and reading comprehension) and MCQs (to test vocabulary and grammar).
Cloze tests have a number of advantages. They are easy to prepare and can be prepared quickly; because of the randomness of the deleted words anything can be tested (grammar, collocation, fixed phrases and reading comprehension).
However, the actual score the student gets depends on the particular words deleted. Some are easier to supply than others, in some cases there are several possible answers (the problem of ‘reliability’).
In my teaching practice I sometimes use cloze tests and MSQs which I prepare for my students. Here’s an example of such a cloze, reading comprehension being tested.
Here’s an extract from the text to read:
…A roof is the top covering of a building that sheds rain or snow, keeping the building interior dry. Roofs may be 'pitched', 'domed', 'low slope' or 'flat' in form; however, roofs are rarely truly flat. Flat roofs are commonly found on industrial type structures while low slope is found on pre-fabricated or steel structures such as arenas. Pitched roofs are the primary design found on residential homes…
The cloze looks like this:
Roofs……… the interior building from rain and snow.
There are ……… types of roofs.
Low slope roofs are typically not found on…
(Keys:protect, four, industrial structures and residential homes)
I think such a test is valid because it tests what it is supposed to test, i.e. reading comprehension. The test’s reliable because it doesn’t seem to have a variety of answers and its results don’t depend on the scorers. Some research on Cloze tests indicates they are best for reading comprehension at the sentence level with reading for facts. When the order of sentences was scrambled up, there was no difference in Cloze scores. When Cloze scores were compared with a multiple choice test of facts, vocabulary and literal details, the Cloze and MCT scores correlated very highly. When the multiple choice tests asked inference questions or questions that required students to draw conclusions, the correlation with Cloze tests became lower. It went from a high correlation to a moderately high correlation.
The positive backwash effect is created if one of the objectives of the course (as in my case) is to teach students reading comprehension and translation of engineering texts.
MCQ (multiple choice question) is another type of indirect test which is widely used to test vocabulary and grammar. It looks like this:
- An ___ describes a person or a thing.
- A noun
B verb C adjective
- Can you ___ a photo of us?
- A take
B make C develop
- a: Where do you ___? b: I'm from Italy.
- A come to
B come from C from
- She came for dinner and ___ some nice flowers for me.
- A bring
B taken C brought
- He ___ at 9.30. He was late for college.
- A got down
B got C got up
- We arrived ___ the station early.
- A at
B to C in
The advantage of MCQs is that they are easy to mark. However, the incorrect choices may distract students from the correct answer and actually mix them up. Also, guessing can play a role. In the example above, a student should score at least 33% by blind guessing. With just two choices (i.e. in True/False tests) guessing is good for 50%. It’s also important to note that a student can be simply trained in the technique and have better results than a student who is not.
With the introduction of the communicative teaching methodology the ability to speak has become central. In real life people speak when they have some purpose. So I try to use direct test items types which reflect real life. For example, role play activities where students perform tasks such as ringing a hotel to book a room, or buying something in a shop. The situations imitating real life communication help to achieve the assessment validity, unless the language is all memorized in which case the authenticity and validity disappear.
Role plays are much fun and thus motivating. They also let shy students to be more direct in their opinion and behavior than they might be in real life situations. Finally, they allow students to use a much wider variety of language than other more task-centered activities. This task is also good for cooperative learning.
As for the disadvantages of this test task type, I don’t think I can see any. However, the tester can be too subjective while assessing. The backwash effect is positive if the objective of the course is to form the learner’s communicative competence. I teach such a course, so the usage of the described task is justified. If students know how they will be tested, they tend to pay attention to activities that look like the test.
Another speaking production testing task I use includes an interview questioning students after they’ve learned some topical material. The example is below ( Topic Student Life):
- What faculty and department are you in?
- What year are you in?
- What is your major?
- What subjects do you study?
- What lectures do you attend)?
- How do you prepare for your classes?
- Have you got any seminars? What seminars do you have?
- What do you like the most about your studies? Etc
Though the students’ answers are not impromptu and the task does not seem to be much authentic, such an interview is quite advantageous. The students still have a choice to give a variety of answers. The questions are framed so that they require 1-2 word answers. This makes them more a test of listening comprehension and just a little of speaking ability. It also makes sense if the students don’t have much speaking mastery, yet. The last question is an exception to this. Adding more open-ended questions to the mix seems to be a good idea. An example might be, “Tell me about what you did in your XXX class last week.”
The task mentioned above has content validity as it is constructed to assess the student’s speaking ability on a certain topic.
GLOSSARY/ГЛОССАРИЙ
(толкование определений заимствовано из “Глоссарию терминов по проведению оценки, экзаменов и тестов в системе образования”)
Assessment Оценка/оценивание |
Общий термин, используемый при
оценивании или “измерении” поведения или
характеристики. Латинский корень assidere означает “сидеть рядом”. Применительно к образованию оценка означает процесс наблюдения за обучением; описания, сбора, регистрации и интерпретации информации о своем собственном обучении или обучении учащегося. Промежуточная оценка представляет собой промежуточный этап в учебном процессе, часть рефлексии и автобиографического понимания движения вперед. Традиционно, оценка учащихся проводится при определении того, в какой класс их определить, при переводе в следующий класс, выпуске из школы или оставлении на повторный курс. В контексте организационной подотчетности оценка используется для определения эффективности школ, программ обучения и результативности работы учителей. В контексте школьной реформы оценка является важнейшим инструментом анализа действенности изменений в процессе преподавания и обучения. |
Summative Assessment Окончательная оценка |
Оценивание усвоенного (в конце учебного цикла для отражения уровня достижений учащихся). Оценивание по завершении учебного блока или блоков, упражнения или плана, для того, чтобы определить или вынести суждение о навыках и знаниях учащихся, или о действенности плана или деятельности. Результаты являются кульминацией процесса преподавания/обучения при работе над блоком, предметом или годовой программой. |
Backwash Effect Эффект обратного воздействия |
Воздействие (положительное или отрицательное) схемы проведения оценки на процесс преподавания/обучения, которое предшествует этому процессу. Например, экзамены, на которых проверяется исключительно знание “изолированных” фактов, могут способствовать заучиванию материала в классе наизусть (отрицательное обратное воздействие). Экзамен по естественным наукам, который включает в себя тест на практические навыки проведения лабораторной работы, может побудить учителей использовать экспериментирование в качестве инструмента обучения (положительное обратное воздействие). |
Validity Валидность |
Мерило того, в какой степени экзамен измеряет то, что он призван измерить. |
Content validity Валидность содержания |
Подтверждение валидности, полученное путем демонстрации того, что содержание теста отображает указанную область поведения. При проведении публичных экзаменов высокая степень валидности содержания достигается путем контроля за тем, чтобы все задания были рассчитаны на проверку знаний, представлений и навыков, включенных в “учебный план”, и чтобы баланс заданий, призванных проверить различные навыки и темы, соответствовал спецификациям теста. |
Reliability Надежность |
Степень, в которой результаты
оценивания заслуживают доверия и
последовательно измеряют определенные знания
и/или навыки учащихся. Надежность является
показателем последовательности оценок при
проведении оценки различными оценщиками, в
разное время или при выполнении различных
заданий, которые измеряют один и тот параметр.
Таким образом, надежностью можно называть: Взаимосвязь между тестовыми заданиями, призванными оценить одни и те же знания или навыки (надежность задания); Взаимосвязь между двумя проведениями одного и того же теста для одного и того же учащегося или учащихся (надежность теста/повторного теста); Степень совпадения между двумя или более оценщиками (надежность оценщика). Следует отметить, что ненадежное оценивание не может быть валидным. |
Required Readings:
1. Huges, A. (1989). Kinds of test and testing. In A. Hugh, Testing for language teachers. Cambridge : Cambridge University Press, pp.9-21.
2. Brown, J.D. & Hudson, T. (Winter, 1998). Alternatives in assessment. TESOL Quarterly, 32, (4), 653-675.
3. Sook, K.H. (2003,December). L2 Language assessment in the Korean classroom. Asian EFL Journal. Retrieved March 7, 2004 from http://www.asian-efl-journal.com/dec_03_gl.pdf
Конец формы