top of page

Language assessment literacy and thought experiments - a collection

  • Foto do escritor: SABELETRAS
    SABELETRAS
  • 4 de mar.
  • 10 min de leitura

This is a collection of thought experiments to be used in language assessment literacy programs. Thought experments (TEs) are fictitious narratives that foster problem solution and theoretical and conceptual understanding in many fields, such as philosophy, education and physics. I recommend that TEs be used in language teacher education and to support the development of language assessment literacy for other language education stakeholders. The texts below were produced by the author of this blog and can be used in language assessment literacy projects with due credit.


Citation: Monteiro, N. P. (2025) Language assessment literacy and thought experiments: a collection. https://www.avaliacaolinguistica.com.br/post/language-assessment-literacy-and-thought-experiments-a-collection



The Forest Language Course: A Thought Experiment on Fairness


In the heart of a lush, green forest, a group of animals eagerly gathered for their language class. The class was taught by Giraffe, who was known for her towering height and excellent view of the entire forest. She had been teaching the animals a new language called Animish, which all the animals were excited to learn.


The students in Giraffe's class were a diverse group: Bee, Zebra, Turtle, Panther, and Butterfly. Each brought unique abilities and perspectives to the class, making the learning experience rich and varied.

Next week, the class was scheduled to take a listening skills test in Animish. Giraffe prepared diligently, creating a test she believed would be fair for all her students. She wanted to ensure that each animal could demonstrate their understanding of Animish.


When the test day arrived, the animals were both nervous and excited. They took their seats, and Giraffe began playing recordings in Animish, asking questions that the animals had to answer.

As the test progressed, it became clear that some animals were struggling more than others. Bee buzzed through the questions with ease, using her quick thinking and agile mind. Zebra, with his keen hearing and sharp focus, managed to keep up. Panther, too, showed great prowess, his speed and alertness helping him respond promptly.


However, Turtle and Butterfly faced difficulties. Turtle, known for his slow and steady pace, found it challenging to process the recordings and formulate answers quickly enough. He needed more time to comprehend each question and respond. Butterfly, delicate and thoughtful, also struggled to keep up with the swift pace of the test. Her gentle nature and slower response time were not suited to the fast rhythm of the assessment.


By the end of the test, Turtle and Butterfly were disheartened. They knew the material well but couldn't demonstrate their knowledge effectively under the time constraints. Giraffe, noticing their distress, realized that the test may not have been as fair as she intended.

 


Thyssa - Reflections of a Silverfish


Thyssa had been living there for quite some time, nestled among the classic volumes on electricity and magnetism—James Clerk Maxwell’s forgotten works from the distant 1870s. The untouched collection had found its place on one of the last shelves in the old university library. Right next to it were the group study tables where students gathered to decipher challenges across various fields of knowledge. Thyssa listened to it all.


On Mondays, she caught up on the young students’ weekend lives: their outings, romances, family quarrels, and parties. Occasionally, older university students would complain about their married lives. “Such hypocrisy and triviality,” thought Thyssa, more accustomed to the historiography of complex calculations. But what truly irked the little silverfish were the students’ grievances about their academic lives. They incessantly complained about professors, assignment grades, and the difficulty of exams! And Thyssa wondered, “But weren’t they the ones who chose to be here?”


One day, Thyssa observed as the students debated which professor played the most tricks in their exams. “Of course, it’s Professor Alessandra,” asserted Maria. “She always includes questions that demand attention to the tiniest details in the wording.” “For me, it’s Professor Roberto who sets the trickiest questions,” Estevão chimed in. “Betão loves options with ‘not,’ ‘never,’ ‘always.’ If you’re not attentive, you’re done… I get it wrong every time,” he added. “Oh, for me, the most challenging ones are Professor Isadora’s essay questions,” indignant Teresa contributed. “The texts are extensive. She insists we use them without copying, but you just can’t figure out how to fit them into your answer. The texts don’t align.”


Thyssa watched all this unfold, growing increasingly astounded by the stories. The little silverfish wondered whether those students thought the exams had “tricks” when they answered correctly. “Or perhaps these so-called tricks were just more of the students’ complaints,” she pondered, quickly correcting herself: “I don’t believe in tricks,” Thyssa thought. “To me, these exams were poorly designed.”


She recalled her time residing in a 2002 volume of Applied Measurement in Education, where she had the pleasure of reading the article “A Review of Multiple-Choice Guidelines for Classroom Assessment” by Haladyna, Downing, and Rodriguez. As she listened to the students’ accounts, the guideline number 07 echoed in Thyssa’s mind: “Avoid trick items.” Restless in her cellulose cocoon, Thyssa remembered that, according to the authors, tricks involve the intention to deceive students. And Thyssa wondered whether the professors were truly malevolent or simply lacked the skill to craft good questions. “Is it intention, lack of knowledge, or both?”


Suddenly, Thyssa heard the sound of a cart. It was the librarian gathering some books. Thyssa swallowed hard. She hadn’t finished reading the chapter on electrostatic instruments.

 


The Trap

Célia, the English language teacher, prepared her semester exams to determine her students’ grades. Célia works at a school where the administrators believe in the power of traditional exams, requiring that 80% of the total score come from these exams. The remaining 20% is left to the teachers’ discretion. Célia decided to incorporate written text revisions and peer assessment into her evaluation process. At the end of the semester, each student selects four written productions, justifies their choices, and submits them for the teacher’s evaluation.


In a very different situation, her friend Marta works at a school where the administrators are averse to using traditional exams. They emphasize the importance of formative assessment, which tracks students’ progress throughout the academic term and values participation. Consequently, they have prohibited the use of exams but granted teachers the freedom to choose their assessment methods. The only requirement is that a record of the assessment process be kept in the school’s documents at the end of the term.


In their respective contexts, each teacher makes her own choices. Célia meticulously analyzes official documents, the pedagogical project, the textbook, extra activities, and her lesson plans. She ensures that her exams align with the content and skills taught in class and that the exam tasks resemble those encountered in the daily classroom routine. Célia drafts an assessment plan with clear definitions of what each question should cover, its format, and how the content, skills, and formats were addressed in routine activities throughout the semester. Based on this plan, she designs her exams and then combines the results with the points from the written text assessments.


Marta, on the other hand, doesn’t need to create exams but must submit a document with students’ grades to the school administration. Throughout the semester, she has already designed activities for her students and recorded their scores. Marta organized four activities for each of the five units in the textbook. Each activity is worth half a point, so the total points for all activities sum up to 10. Each activity covers new content. Students who didn’t submit or answered incorrectly received a zero, while those who completed the activities correctly earned the agreed-upon half point. Their final grades are the sum of the correct activities submitted.


Now, over coffee, the friends discuss their assessment approaches. Marta believes the era of exam tyranny has passed, while Célia asserts that well-designed exams can be one of several assessment tools. The dialogue continues until the friends realize the time and need to head home to prepare their lesson plans for the next morning.


 

A Teacher’s Dilemma

Every year, Ângela, the English teacher, creates new questions for student assessments. As she develops these questions, she stores them in a folder on her computer, effectively creating a small question bank. Whenever needed, Ângela selects questions from this bank to construct her exams and assignments. Some of these questions have never been used, and now Ângela decides to give them a chance. She designs various activities for both formative and summative assessment using these questions.


However, Ângela begins to notice that some of her top-performing students are getting these questions wrong. She also observes that students are increasingly complaining about the exams. They claim they can’t quite grasp what is expected of them with these particular questions. Astonished, Ângela then decides to examine the questions and students’ answers, only to discover significant variation in the number of correct responses. It appears that students lack a consistent pattern when answering these questions, making it challenging to understand why certain students choose specific answers.


As colleagues at the same school, Ângela turns to you for your opinion on what might be causing this situation. Now, how would you respond to Ângela? Let’s imagine the potential problem with Ângela’s questions and explore some possible solutions.

 


The Rater

Seated in the teachers’ lounge, engaged in lively conversation with colleagues, Demétrio is suddenly approached by Pedro, one of his students, in a state of desperation: “Mr. Demétrio, I failed your subject. Can you give me half a point?” “You had the entire semester to prepare and catch up. Now you’re asking for points? Not happening,” the resolute teacher replied, returning to his seat. “This is how it works with me,” Demétrio told Antônio, a fellow teacher. “No room for students to come begging at the end of the semester.” Antônio swallowed hard and remained silent, knowing he used to be much more lenient. The previous week, Marina had asked him for a chance to redo an assignment and improve her grade. Antônio agreed. He reasoned that it might help the student finally grasp the material. Besides, he knew she had faced some challenges. “Just one point. Of course, I’ll evaluate it. I won’t give it away. I’m just giving her an opportunity. Losing a grandmother isn’t easy,” thought the insecure Antônio, as if he needed to justify his actions to himself.


Now, standing before him, was Demétrio—a staunch believer in the concept of merit taken to its extreme. Demétrio had no doubts or pangs of conscience about his conduct. On the contrary: “I’m promoting learning and responsibility among these young people. I don’t assign grades; the students do,” he declared to his colleagues in the lounge, who were more interested in drinking coffee and trivial discussions at this point in the semester. But Demétrio was unaware that Pedro had also faced challenges. His father had nearly died in a car accident during the exam week. Pedro’s mind simply wasn’t focused on studying. Yet there was something else escaping the awareness of Demétrio, Pedro, and all the students and teachers: in question 06, a half-point essay question, Demétrio’s grading lacked consistency. While he had been fair in correcting answers on Saturday morning, the exhausted Demétrio failed to maintain the same criteria late Sunday night when grading the last batch of exams.

 


A Nice Start

In a school, a team of teachers decides to create a series of mock exams to assess the students. The teachers gather and establish guidelines for question development and grading. They also create a calendar specifying when all students in the school will take these mock exams throughout the year. They agree that each mock exam should consist of 40 questions covering all subjects. Other teachers join forces to create and grade the questions, and some colleagues are chosen as assessment coordinators. Their task is to compile the questions and finalize the exam format. The mock exams will be administered every two months during a class immediately after the break.


Excitedly, the teachers present their project to the school administration and receive authorization to proceed. However, the administration sets some conditions: the team can use only four sheets of A4 paper per student for printing the exams, and there will be no additional expenses for any reason. The administration expects the team to provide data on student performance two weeks after each mock exam.


As if that weren’t enough, the team begins to notice that other colleagues are not as enthusiastic. Some teachers refuse to participate in question development or grading. A few even complain that the project disrupts their lesson planning, especially since the mock exam dates coincide with their regular classes. Other teachers fear that assessment results could be used against them. Almost no one wants to be an assessment coordinator. The students are also dissatisfied—they argue that they already take too many tests and see no point in yet another evaluation. Some parents question why their children are coming home earlier from school after assessments.


After eight months, the team decides to terminate the project, citing unfavorable conditions for its continuation.

 


The New Vision

In Nihiland, the Department of Educational Norms and Curricula initiated a local reform with the goal of updating education to align with the “vision of the new era.” All educational legislation and resulting regulations were revised to incorporate this brand-new perspective. Teachers received additional training, and textbooks were rewritten to achieve this alignment. However, the deeply entrenched pedagogical culture within institutions and the mindset of teachers proved resistant to the efforts of educational policymakers. In the classroom environment, teachers continued to employ the same practices and cover the same content.


After extensive discussions and consultations, the department made an unprecedented decision: to develop a comprehensive external assessment program. All schools would be required to conduct these assessments to remain operational, and each educational level would have its corresponding evaluation. The assessments would be fully aligned with the official curriculum and objectives. Students could progress to the next levels only if they passed these evaluations. Teachers would be hired based on their mastery of the curriculum. All textbooks would be scrutinized for alignment with the curriculum. University admissions and technical courses would depend on students’ performance in the new curriculum, and even certain job placements.


Gradually, the anticipated changes took root. Universities began training teachers with the new mindset. Educational administrators closely monitored student performance in assessments, identifying teachers who deviated from the new guidelines—these teachers would undergo further training. Students and parents demanded increasingly focused and intensive teaching from schools, ensuring the future of the youth. The media championed the advantages of this new education. Educational studies and research embraced the new trends. Researchers highlighted various aspects of the new curriculum. Official statistics began demonstrating the progress of this novel approach to education. Within a few years, the entire community had embraced the vision of the new era, with few remembering the old times.

 


A lens to the mind

Marcos designed a reading test to assess his students’ abilities in comprehension, text interpretation, and recognition of the features and the communicative functions of discourse genres. Focused on their tasks, approximately 25 students delve into the twenty questions and five texts on a Tuesday afternoon. Meanwhile, across the street, Hans Baar, a physicist and neurocognitive scientist, tests his latest invention from his laboratory—a mental telescope capable of detecting people’s mental processes using its powerful upsilon rays, even from a distance of several hundred meters.

 

As he aims his equipment at Joana precisely when she answers question number 08, he follows her reasoning until he realizes that she selects an answer option that he cannot discover. Shifting the telescope two meters to the right, the scientist now observes Júlio’s thoughts as well when the student responds to question number 08. Without knowing which option Júlio marked, Baar assumes that the students chose different options since their reasoning was different. However, it turns out that the students selected the same alternatives for the question. In turn, Marcos, the teacher, upon grading the test, notes that the students answered the same questions in the same way and believes that they therefore used the same mental processes.

 

 

 

 

 

 

 

 

 

 

 
 
 

Posts recentes

Ver tudo
Engenharia reversa e exames: do que se trata?

Nos últimos dias, um caso polêmico envolvendo o que seria o vazamento de questões do ENEM 2025 invadiu os veículos de notícia e causou perplexidade entre estudantes, pais e jornalistas. Supostamente,

 
 
 

Comentários


bottom of page