England's Exam Fiasco Shows How Not To Apply Algorithms To Complex Problems With Massive Social Impact

from the let-that-be-a-lesson-to-you-all dept

The disruption caused by COVID-19 has touched most aspects of daily life. Education is obviously no exception, as the heated debates about whether students should return to school demonstrate. But another tricky issue is how school exams should be conducted. Back in May, Techdirt wrote about one approach: online testing, which brings with it its own challenges. Where online testing is not an option, other ways of evaluating students at key points in their educational career need to be found. In the UK, the key test is the GCE Advanced level, or A-level for short, taken in the year when students turn 18. Its grades are crucially important because they form the basis on which most university places are awarded in the UK.

Since it was not possible to hold the exams as usual, and online testing was not an option either, the body responsible for running exams in the UK, Ofqual, turned to technology. It came up with an algorithm that could be used to predict a student’s grades. The results of this high-tech approach have just been announced in England (other parts of the UK run their exams independently). It has not gone well. Large numbers of students have had their expected grades, as predicted by their teachers, downgraded, sometimes substantially. An analysis from one of the main UK educational associations has found that the downgrading is systematic: “the grades awarded to students this year were lower in all 41 subjects than they were for the average of the previous three years.”

Even worse, the downgrading turns out to have affected students in poorly performing schools, typically in socially deprived areas, the most, while schools that have historically done well, often in affluent areas, or privately funded, saw their students’ grades improve over teachers’ predictions. In other words, the algorithm perpetuates inequality, making it harder for brilliant students in poor schools or from deprived backgrounds to go to top universities. A detailed mathematical analysis by Tom SF Haines explains how this fiasco came about:

Let’s start with the model used by Ofqual to predict grades (p85 onwards of their 319 page report). Each school submits a list of their students from worst student to best student (it included teacher suggested grades, but they threw those away for larger cohorts). Ofqual then takes the distribution of grades from the previous year, applies a little magic to update them for 2020, and just assigns the students to the grades in rank order. If Ofqual predicts that 40% of the school is getting an A [the top grade] then that’s exactly what happens, irrespective of what the teachers thought they were going to get. If Ofqual predicts that 3 students are going to get a U [the bottom grade] then you better hope you’re not one of the three lowest rated students.

As this makes clear, the inflexibility of the approach guarantees that there will be many cases of injustice, where bright and hard-working students will be given poor grades simply because they were lower down in the class ranking, or because the school did badly the previous year. Twitter and UK newspapers are currently full of stories of young people whose hopes have been dashed by this effect, as they have now lost the places they had been offered at university, because of these poorer-than-expected grades. The problem is so serious, and the anger expressed by parents of all political affiliations so palpable, that the UK government has been forced to scrap Ofqual’s algorithmic approach completely, and will now use the teachers’ predicted grades in England. Exactly the same happened in Scotland, which also applied a flawed algorithm, and caused similarly huge anguish to thousands of students, before dropping the idea.

The idea of writing algorithms to solve this complex problem is not necessarily wrong. Other solutions — like using grades predicted by teachers — have their own issues, including bias and grade inflation. The problems in England arose because people did not think through the real-life consequences for individual students of the algorithm’s abstract rules — even though they were warned of the model’s flaws. Haines offers some useful, practical advice on how it should have been done:

The problem is with management: they should have asked for help. Faced with a problem this complex and this important they needed to bring in external checkers. They needed to publish the approach months ago, so it could be widely read and mistakes found. While the fact they published the algorithm at all is to be commended (if possibly a legal requirement due to the GDPR right to an explanation), they didn’t go anywhere near far enough. Publishing their implementations of the models used would have allowed even greater scrutiny, including bug hunting.

As Haines points out, last year the UK’s Alan Turing Institute published an excellent guide to implementing and using AI ethically and safely (pdf). At its heart lie the FAST Track Principles: fairness, accountability, sustainability and transparency. The fact that Ofqual evidently didn’t think to apply them to its exam algorithm means its only gets a U grade for its work on this problem. Must try harder.

Follow me @glynmoody on Twitter, Diaspora, or Mastodon.

Filed Under: , , , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “England's Exam Fiasco Shows How Not To Apply Algorithms To Complex Problems With Massive Social Impact”

Subscribe: RSS Leave a comment
31 Comments
Annonymousesays:

Re: Re: The bell curve claims another victim

I had taught college level continuing education night classes for over a decade and had worked with and been mentored by my former instructors and proffessors when I started as an instructor. The thing is, that a homogeneous class usually does have a classic bell distribution but that’s the crux of the matter. Classes are not always homogeneous. This is most obvious for night school classes where you have 3 distinct student types. Adult education, daytime students needing a flexible schedule and those who failed the previous semester.
I almost always had a dromidary distribution at the start of a semester and only through diligence on my part and encouraging peer group study was I able to pull the curve together and push it up to my satisfaction.

Martin Thomassays:

Exam regulator rejected expert help

The Royal Statistical Society offered assistance and nominated two professors of statistics to help design a decent algorithm. But the professors withdrew when asked to sign a non disclosure agreement which would have constrained them for five years.

"We get the point of non-disclosure agreements: you don’t want someone offering a running commentary while decisions are being made," said Sharon Witherspoon of the Royal Statistical Society , "But constraining independent academic experts from saying, ‘Well, looking at the data, I saw it was clear this would have this effect,’ didn’t fit our principles of transparency."
https://news.sky.com/story/a-levels-exam-regulator-ignored-expert-help-after-statisticians-wouldnt-sign-non-disclosure-agreements-12049289

PaulTsays:

Re: Exam regulator rejected expert help

"We get the point of non-disclosure agreements: you don’t want someone offering a running commentary while decisions are being made,"

Well, sometimes you do. It seems this project failed at the design phase, so you probably should be consulting with people who are able to spot fundamental design issues for you go live. Especially with something this important. A-Levels are stressful at the best of times, I can’t imagine what it must be like for students who’ve worked their asses off to improve grades enough to go to their preferred university during a pandemic, only for an algorithm to tell they worked for nothing.

""But constraining independent academic experts from saying, ‘Well, looking at the data, I saw it was clear this would have this effect,’ didn’t fit our principles of transparency.""

In other words, an open source methodology was required but they decided to create a proprietary solution.

Scary Devil Monasterysays:

Re: Re: Exam regulator rejected expert help

"It seems this project failed at the design phase, so you probably should be consulting with people who are able to spot fundamental design issues for you go live."

There’s an extra helping of irony in that the people too unable to seek expert knowledge from outside are also the people who in this case are tasked to teach those who will be tomorrow’s scientists.

"In other words, an open source methodology was required but they decided to create a proprietary solution."

I can somehow envision a table of stodgy deans straight out of Oxford going "Let the rabble – probably not even upper seconds – scrutinize and comment on our teaching methods? The Nerve!"

Pseudonymous Cowardsays:

Re: Re: Re: Exam regulator rejected expert help

There’s an extra helping of irony in that the people too unable to seek expert knowledge from outside are also the people who in this case are tasked to teach those who will be tomorrow’s scientists.

It might reassure (or perhaps disappoint) you to know that no-one at Ofqual has even the slightest responsibility for teaching anyone!

The UK qualifications system has plenty of quirks, one of which is that universities have no real involvement in day-to-day running of the public exams they largely rely on for admissions.

Universities do have input to the curriculum students are taught (how much depends largely on central Government policy).

But the actual job of delivering qualifications is normally performed by "awarding organisations" – a mix of for-profit and non-profit bodies.

Ofqual is the regulatory body charged with making sure those awarding organisations do a good job of running their qualifications.

This year, the pandemic thoroughly disrupted normal assessments. That meant more central coordination than normal was needed, and we were asked/instructed to provide it.

But definitely no tables of stodgy deans (Oxford or otherwise), I’m afraid!

Anonymoussays:

Re: Re: Re: Exam regulator rejected expert help

More accurately, one shouldn’t be writing such algorithms at all. The whole point of the tests should be to determine the student’s demonstrative knowledge of a given subject. Not to make beurocrats happy by assigning random percentages to random numbers on a list so they can continue passing their annual judgement on people. Such algorithms are completely baseless as they are devoid of reliable and recent data. Their only purpose is political grandstanding.

Pseudonymous Cowardsays:

Re: Re: Re: Re: Exam regulator rejected expert help

People who don’t know their subject shouldn’t be writing code and algorthms to measure it.

The point of the algorithm wasn’t to measure subject knowledge. It was an honest (albeit flawed) attempt to bring some measure of consistency/plausibility to teacher’s judgements about students’ subject knowledge.

The decision to use teacher judgements was not ours – although I’m not sure there was a reasonable alternative.

And the system to support teacher judgements was assembled at speed; it would have been designed very differently if we’d had enough time to do so.

Most importantly, a lot of work to bring consistency to teacher’s judgements could and would have happened in advance. We could have taken steps to ensure evidence used to inform judgements was the same across schools, and to train teachers to make good, consistent, judgements.

The whole point of the tests should be to determine the student’s demonstrative knowledge of a given subject.

I don’t disagree. And that’s exactly what happens in a normal year. Students take the tests, and how they perform in them determines their outcomes.

But this year, nobody took the tests.

And because the UK higher education system relies so heavily on the results of these qualifications, we couldn’t say "you didn’t take the tests, so you get nothing". That would have been spectacularly unfair.

Whatever we did was bound to be worse than the normal approach of having students take the tests. That’s why taking the tests is the normal approach.

Did we make the best of a bad situation? No. But I think it’s only fair to acknowledge that the best approach (everyone taking the tests) just wasn’t available.

Anonymoussays:

Re: Exam regulator rejected expert help

"We get the point of non-disclosure agreements: you don’t want someone offering a running commentary while decisions are being made,"

Which is why so many management led disasters occur, as that greatly reduces the chance of problems being spotted, or mistaken assumptions being corrected by someone who understands what the managers are trying to do.

Scary Devil Monasterysays:

Well, nothing like a pandemic to bring old news to life.

…because apparently, after half a century’s worth of universal complaints and tons of data assembled and calculated about the state of education it appears we have come to the same conclusion again; "Get more teachers!"

It bothers me to no end that no matter the country the actual education supposed to provide the foundation of the future always gets the first taste of the budget axe.

Or, as can be observed in the OP, handing the evaluation of the state of a youths education to a complex template deployed by a computer algorithm. And to really guarantee that fail, using a model known to be flawed and keeping the project in a closed workshop with no real expert audit.

What could possibly go wrong?

Annonymousesays:

Re: Well, nothing like a pandemic to bring old news to life.

Sadly education getting the axe is not always true.

Ontario being a prime example of ballooning budgets, shrinking class sizes and lowered expectations.
Throwing more money and more teachers at the problems never fixed anything but to the contrary.
It only enriches the teachers and especially their unions.

Teaching is a profession so treat the teacher as such.
Pay them well BUT keep their feet to the fire when it comes to skills and knowledge with regular recertification licensing every few years.

Personally I think the union bosses should be all tossed in a hole in Sudbury breaking big rocks into small rocks. None of them are teachers so completely useless to actual education or the kids.

Here we have gradeschool teachers that can’t pass a fifth grade math test but have jobs for life.

Pseudonymous Cowardsays:

A view from the inside

Full disclosure: I work for Ofqual. I had no involvement in the development of the algorithm, but obviously know those who did. These are my personal views, not any sort of official response.

A few thoughts:

  1. Something that brought an extra dimension of difficulty here was our statutory duty to maintain standards over time. That means as well as worrying about fairness for this year’s students we also had to consider fairness over time. We were quite literally duty-bound to try not to treat this year’s students any more or less favourably than previous or future students.

I can say with absolute certainty that every decision we took was a genuine effort to maximise those twin aspects of fairness. And that there simply was no solution that was perfectly fair to all past, current and future students.

That’s not an excuse. I agree that some of the outcomes here weren’t right. But I think it’s important to understand what we were trying to achieve.

  1. I don’t think it’s accurate to suggest that students in affluent areas saw their grades systematically improve over what their teachers predicted. Very few students were "upgraded" by the algorithm.
  2. As the OP alludes to, one of the reasons for this is that teacher-predictions were demonstrably over-optimistic. They also varied widely in their degree of optimism. Both are understandable and largely unavoidable, particularly since there wasn’t time to develop a system to help bring consistency to teacher judgements. But there were some predictions that really strained credibility – we saw instances of schools giving every student in some subjects top grades.

And that lack of consistency in judgement means relying solely on teacher predictions is inevitably unfair on those students whose teachers happened to be at the less-optimistic end of the scale.

It also means it’s not entirely meaningful/fair to measure the success/failure of the algorithm in terms of how students’ outcomes differed from their teacher’s predictions. In at least some cases, those differences will reflect error in teacher predictions more than any error on the part of the algorithm.

  1. Should we have released the model earlier to allow for more external scrutiny? With the benefit of hindsight, certainly. But one of the things we had to worry about was the possibility that releasing it could affect how some (less scrupulous) schools went about making their predictions.
  2. Another difficulty with a fully open-source model here is the nature of the data: highly sensitive personal information about every young person in the country. Not something we legally make available to the public.
  3. Educational inequality is very real. In any normal year, a student’s exam results reflect a range of factors beyond their raw ability and potential. That’s a real problem, and not one that can and should be solved by the qualifications students take.

So, yes, it’s predictable that an algorithm designed to replicate the results of those qualifications in an extreme situation ended up reflecting those inequalities back. But could it ever have done anything else?

MathFoxsays:

Re: A view from the inside

Hi, thanks for your insightful post.

A view from the Netherlands: We’ve had our problems with the final exams too. Luckily we start the examination process early and the "school exams" that usually determine 50% of the exam results were (almost) done when schools closed. After some deliberation it was decided to cancel the "central written final exam" and grant students their diploma based on the results of the "school exams".

On the other hand, I understand the demand for consistent exam results over time, that’s also one of the Dutch concerns. However, the quality of education has improved over the decades and especially students from less privileged backgrounds do better than they used to do, which caused the average level in exam candidates to rise. AFAIK grading in Dutch exams allows for this gradual rise in level. Does Ofqual acknowledge such a rise and how does it integrate that in its standards?

Pseudonymous Cowardsays:

Re: Re: A view from the inside

However, the quality of education has improved over the decades and especially students from less privileged backgrounds do better than they used to do, which caused the average level in exam candidates to rise. AFAIK grading in Dutch exams allows for this gradual rise in level. Does Ofqual acknowledge such a rise and how does it integrate that in its standards?

We do.

There are two main ways we take account of this in a normal year.

  1. We look at students’ performance in previous public exams or national standardised tests. This is mainly designed to pick up the natural year-on-year variation in the ability profile of students as a whole, but also allows us to get some handle on improvement over time.
  2. More recently we’ve introduced a National Reference Test which is taken by a large, representative, sample of students each year. This deliberately uses very similar questions each year so provides more objective evidence of improvements over time.
Anonymous Anonymous Cowardsays:

Re: A view from the inside

Isn’t there a bit of a problem in ‘predicting’ how a student would do? Isn’t education about acquiring knowledge and to some degree skill? Saying that some student has worked hard and has potential isn’t the same thing as they have the knowledge/skill.

Down the road, in future courses (or jobs/careers/professions) that presume a certain level of knowledge/skill (that isn’t actually there) there could be problems and remedial action then becomes necessary. I am not suggesting that students get held back, but moving forward some consideration of what is actually happening is necessary. Along with a plan to assist the students for things missed that are out of their control.

Pseudonymous Cowardsays:

Re: Re: A view from the inside

Isn’t there a bit of a problem in ‘predicting’ how a student would do? Isn’t education about acquiring knowledge and to some degree skill? Saying that some student has worked hard and has potential isn’t the same thing as they have the knowledge/skill.

Absolutely. That’s why in more normal times we have written exams, and other forms of structured assessment for more practical skills.

But thanks to the pandemic, almost none of that was possible. All assessments were cancelled as a public health measure.

But because University admissions in the UK are based almost entirely on the results of these qualifications (a problem in and of itself, but I’ll save that rant for another day), we couldn’t just postpone everything until it was possible again.

This particular policy choice – leaving an entire year of students with nothing, or finding a way to estimate/predict how they would have performed had assessments taken place – was the easy one.

Where it got much harder – and went wrong – was when it came to figuring out how to do that estimating.

Bloofsays:

It was a fiasco by design. For twenty years now kids getting better exam results each year has been a major bugbear of the right wing press in the UK, more kids getting into better universities on merit (Oxbridge excluded) meaning those from wealthier backgrounds actually have to work and compete for places. Making university more expensive hasn’t worked as well as they’d hoped, so when the pandemic rolled around, they saw a chance to kneecap poorer students and have a convenient tech scapegoat for doing so… But didn’t expect the pushback to be nearly as universal.

Anonymoussays:

Re: Re:

In a nation of millions, all answers may be correct to one degree or another. The conspiracy may not be as wide as thought, but can you say that nobody subscribes to it? Likewise incompetence, conflicting goals, and all the rest.

Somewhere out there, someone is blaming the test failure on homeopathic lager brewed on the other side of the flat earth.

Anonymoussays:

Well teachers predictions from every school should be made public along with last year’s exam results
It is impossible for all students in a school to get top grades, every school has lazy students, good students, great students, average students eg along the curve.
Also poor students may not have laptops or broadband or a quiet room to study in. Not many poor students get to Harvard.
It’s like open source security any exam grading system must be open to public scutiny.
Asking for nondisclosure agreements about a public
exam grading system is pointless and unfair.
Like the law it must be fair and seen to be fair.
But the UK government is prone to secrecy even if its
Not needed.

Pseudonymous Cowardsays:

Re:

Well teachers predictions from every school should be made public along with last year’s exam results

In an ideal world, I absolutely agree. Historic exam results are already public, but the challenges here are largely with the teacher predictions:

  • Teacher predictions didn’t exist until comparatively late in the process – because they needed time to come up with them. A proper open-source approach would have required them much earlier than was possible.
  • Teachers themselves were concerned about the possibility of predictions being made public. Many of them were uncomfortable enough making the predictions at all, so we felt the need to offer them a degree of protection from external pressure.

Did we get it right? No. But I’m genuinely unsure whether a more open-source model was deliverable in the time we had.

Maxsays:

Bullshit!

This is nothing short of "precrime" and therefore an utterly unacceptable abomination. NOBODY should EVER suffer the consequences of what they would be "likely" to do. ONLY that of what they actual DID DO. After they did it.

If I ever needed any proof that we are headed straight to hell in a handbasket at relativistic speeds, the very concept that anyone could entertain this idea with a straight face for any length of time has to be it.

The decision of coasting along towards whatever I am likely to get or making an redoubled effort and come in well higher on any exam should solely and exclusively be MINE, not up to a piece of retarded code whose creators think they have it all figured out.

Pseudonymous Cowardsays:

Re: Bullshit!

This is nothing short of "precrime" and therefore an utterly unacceptable abomination. NOBODY should EVER suffer the consequences of what they would be "likely" to do. ONLY that of what they actual DID DO. After they did it.

We’d have loved to do that, but unfortunately the pandemic had other ideas.

Thanks to that, none of this year’s students had an opportunity to actually DO anything, and waiting until they’d had one wasn’t an option.

alishabhtsays:

Health Care services in Gurgaon

Health Care services in Gurgaon business utilizes people that are devoted towards their individual jobs and put in a great deal of exertion to accomplish the basic vision and bigger objectives of the organization. Sooner rather than later, health care services in Delhi business means to grow its line of items and administrations and oblige a bigger customer base. In Delhi, home health care services foundation possesses a conspicuous area in Mansarovar.
More Information Please Visit: http://goodwillhealthcare.in/index.html Call Now : +91-8766285609

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop ┬╗

Follow Techdirt

Techdirt Daily Newsletter

Tech & COVID is a new project by Techdirt, with sponsorship from

Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...
Older Stuff
12:00 How Smart Software And AI Helped Networks Thrive For Consumers During The Pandemic (30)
12:10 Chinese Government Sentences Journalist To Four Years In Jail For Reporting On The Spread Of The Coronavirus (15)
15:38 Instructors And School Administrators Are Somehow Managing To Make Intrusive Testing Spyware Even Worse (49)
09:33 WHO Is Blocking Commenters From Even Mentioning Taiwan On Its Facebook Page (25)
20:03 Not Just America: CEO Of Esports Org In India Says COVID-19 Resulted In Explosive Growth (7)
12:00 How To Fix Online Education In The Covid-19 Era (6)
15:48 COVID-19 Is Driving The Uptake Of Chess -- And Of Surveillance Tools To Stop Online Players Cheating (20)
20:44 How Linus Torvalds Invented Today's Work From Home Paradigm In 1991 (38)
19:38 Esports March On: Nike Jumps In With Glitzy Ad While Forbes Ponders If Esports Will Be Our New Pastime (6)
06:37 As COVID Highlights U.S. Broadband Failures, State Bans On Community Broadband Look Dumber Than Ever (24)
03:21 US Patent Boss Says No Evidence Of Patents Holding Back COVID Treatments, Days Before Pharma Firms Prove He's Wrong (40)
03:21 Congressional Republicans With No Strategy On Pandemic, Healthcare, Societal Problems... Have Decided That The Internet Is The Real Problem (35)
05:57 Cord Cutting Has Utterly Exploded During the Covid Crisis (79)
09:42 Collaboration Houses: How Technology & A Pandemic Have Created Entirely New Ways To Go To College (16)
13:30 Techdirt Podcast Episode 253: Post-Pandemic Tech (5)
13:34 Has The Pandemic Shown That The Techlash Was Nonsense? (12)
19:54 England's Exam Fiasco Shows How Not To Apply Algorithms To Complex Problems With Massive Social Impact (31)
10:51 AMC Theaters: Risk Death And Disability To Watch Movie Reruns For 15 Cents! (46)
06:33 Congress To Consider National Right To Repair Law For First Time (36)
13:26 Georgia School District Inadvertently Begins Teaching Lessons In First Amendment Protections After Viral Photo (74)
13:30 Techdirt Podcast Episode 250: Modeling The Pandemic (5)
06:20 It Only Took A Massive Pandemic For Hollywood To Ease Off Stupid, Dated Movie Release Windows (14)
19:40 Tech And COVID-19: Stop Using Video Game Graphics For Fake Crowds, Fox (24)
13:40 How Technology And The Pandemic Are Bringing People Closer Together, Even As We're Physically Apart (11)
19:42 Tech And COVID-19: MLB Rolls Out Remote Cheering Function In Its MLB App (6)
19:08 R&A's The Open Golf Tournament This Year Will Be Virtual In Multiple Ways And It's Going To Be Amazing (10)
10:43 When Piracy Literally Saves Lives (16)
19:49 'The Sims' Becomes An Outlet For Would-Be Protesters Who Cannot Attend Protests (17)
15:26 Internet Archive Closing National Emergency Library Two Weeks Early, Due To Lawsuit, Despite How Useful It's Been (104)
11:07 Two Cheers For Unfiltered Information (6)
More arrow
This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it