Reflecting on Marking Approaches


According to Lyle Beckman, for the language teachers, testing the students is nothing less than a dilemma, as the language here is both the tool for checking and the object being assessed. The teachers have to use language to asses the language ability of the students (Oxford University Press, 1996). And a dilemma it is indeed as it is not simple but a complex procedure that comprises important concepts which in one way establish accuracy of the tests, and on another make the assessment all the more in depth. The tests measure the progress of the students and in turn also show how effective the overall test was in assessing the students. The two reasons said make the need to develop an accurate checking system all the more important. The concepts or rather the key areas when developing a program include, test validity, test authenticity and backwash.

For the writing test that was conducted among 90 students on the topic based on reflecting on and discussing a past journey, two marking approaches were used. Where one of the approaches applied a simple marking procedure, the other was a rather complex one, coupled with a mark scheme. The paper will determine the effectiveness of each of these approaches in terms of their validity, authenticity and backwash and conclude which is far reaching than the other in establishing the purpose of the test.

The Purpose of the Test

Before we begin to assess the marking approaches, it is essential to establish the purpose of the test.

Generally speaking, on a broader level, the reasons for language tests are proficiency, achievement, placement, diagnostic, progress, and aptitude.


‘Proficiency’ is the global language ability. The proficiency tests measure the student’s abilities relating to a specific task which they will later encounter, such as in the university they are studying. The examples of tests that measure the proficiency include, TOFLE, Cambridge (IELTS) and Michigan (MELAB).


The achievement reflects the instructions given during a period of time, and how well the student corresponds and comprehends them in the test.


Placement or diagnostic tests measure the global language ability and place and sort the students according to their abilities.


Progress tests are the common class room tests taken to motivate the students and assess their strong and weak areas in the instructions or lessons taught.


Finally, aptitude tests measure the proficiency in language and its use based on the grammatical structure or sounds (Bachman, 1996).

The reason for the test conducted in the case given presents a situation of either a ‘progress test’ or an ‘aptitude test’.

More specifically, the purpose of the test was to test grammatical structure, and grasp over the language writing skills. The two approaches, take into account some similar and some additional aspects to asses the test in comparison to one another.

Test Validity and Reliability

Test validity and reliability are the two important elements of a test that overall determine its success in measuring the original purpose. Test validity is the property of the test by which it measures what it is supposed to measure (Hughes, 2003). Validity reflects adequacy and appropriateness of the tests whilst contrasting on its basic intentions.

Reliability of the test is the property by which the test measures consistently what they are trying to measure. Reliability identifies the sources of mistakes or errors in the language skills in an attempt to reduce their effects in the development of language tests. The less the errors in the measuring, the more is the reliability. A test can not be valid unless it is reliable too.

Marking Approach 1

In terms of test validity and reliability how does the marking approach 1 score? The marking approach 1 was simple. It laid down four areas as the basis of assessment of the students work: spelling, punctuation, grammar, control and argument. The teachers were given equal sets of papers to score the work in these areas out of the total score of 25 and then assign percentages to each to see how well the student has done in each of these areas.

At a glance, it is a common practice, easy and straight. But an approach like this is useful for a single teacher as she knows her own criteria for the marking of each area. She knows what is the standard and the levels of grammar that are to be assessed and how to score that varying levels presented by the student. Thus, if a classroom teacher is to check a work, on a regular basis, this marking scheme would be appropriate as the marking scheme would be fit in her mind. However, since the test conducted here was distributed among several teachers, it is essential that they have a guide or a common scheme laid down in front of them that justifies their ratings as the same standard would be followed in the assessment of each of the areas that the test was to measure. Without a common and a more detailed standard of measurement, the approach simply makes the test lack in reliability.

Marking Approach 2

Marking Approach 2 had just what the marking approach 1 was missing: the presence of a marking scheme. The approach measured spelling, grammar, punctuation, vocabulary, cohesion and overall impression. And for each of these areas, descriptive criteria for scoring were given. Such as for spelling, three or four levels were given to place the student’s work in between them and mark them accordingly. Thus, the results of the tests assessed through this approach would be both valid and reliable, as they would represent consistency. Not one of the tests would show an unreasonably high score, due to the lenient criteria of one teacher. Since all of the teachers would be following the same standard and a same pattern, the test measuring system becomes a subjective one making it all the more reliable and valid.

Test Authenticity

Authenticity is implied in connection with the tests that involve realistic simulations or criterion samples. Authenticity of the tests facilitates positive consequences for the teaching and learning. For this they have tacit validity standards. The major element of a highly authentic test measurement is that no aspect should be left out of the assessment of the focal construct (Bachman, 1996).

Marking Approach 1

The tests assignment invoked the students to produce an authentic account set on real life experiences or simulations. The question is was the marking approach just as authentic? The answer is no. Similar to the argument presented above that for reliability of the test a marking scheme is necessary, to establish authenticity of the test assessment, a realistic marking scheme is just as important. Thus, Marking Approach 1 lacks in authenticity.

Marking Approach 2

Marking Approach 2 on the other hand reflects a fair involvement of high authenticity in terms of test measurement. The evidence of the authenticity is given by the realistic and descriptive scoring criteria, laid down for each aspect to be assessed of the test. Each of the scoring criterion, presents situations regarding the performance of the students, where they have made certain mistakes, shown lack or improvement of skills, etc and based on that their scoring is given. These criteria are developed on real grounds, taken from real test results which are similar to the one being conducted presently. Thus, the test marking approach 2 is highly authentic.

Back Wash

Back wash, also known as ‘wash back’ is the extent to which the test influences the teachers and the learners to do things that they would not normally and necessarily do. The notion of the ‘Back wash validity’ holds that a test’s validity should be gauged by the degree of positive influence it has on the teaching.  Back wash is like a feedback or a response that reflects the teaching experience and allows the teacher to assess her own work and head for improvement (McNamara, 2000).

The test that is present in the case is a good example of a test that measures strong language and writing skills. And its back wash can be measured owing to the marking approaches. Tests having beneficial back wash are those coupled with criterion samples for assessment. For a high back wash there should be minimum difference between the lessons taught in class and the lessons being assessed in the test (McNamara, 2000).

So how well does each of the marking approaches do in reflecting the back wash?

Marking Approach 1

Marking Approach 1 takes into account the important areas of language testing but provides a lack of criterion samples, thus reflects a potential lack of back wash. Although, there would be some useful influence upon the teachers over the results, the influence would ne nonetheless minimum as compared to the marking approach 2 which is accompanied with realistic criteria for assessment that is an example of a valid, reliable and authentic assessment, all of which are necessary requirements of a test producing a beneficial back wash.

Marking Approach 2

Marking Approach 2 provides a detailed assessment of the test that involves multiple scoring based on authentic and descriptive criteria, which all the more reflects a potentially beneficial back wash.


Language testing is a critical procedure, which has to take into account the important elements of validity, reliability, authenticity and the notion of back wash. The two approaches that were used to assess the language test conducted upon 90 students were analyzed based on these elements. And what was the result? Where marking approach 1 presented a simple and easy way to assess the students’ works, it lacked in reliability and authenticity and provided a potentially weak back wash in return. On the other hand, the seemingly difficult and complex marking approach 2 is the all the more valid, reliable, and authentic and has the potential to produce a high back wash.