Ofsted Part 1 – Page 2
Her Majesty’s Chief Inspector, Amanda Spielman, is reputed to be the first Chief Inspector (since Ofsted’s creation in 1992) to have a strong background in research. In her speech at the Festival of Education in 2017 the Chief Inspector announced an expansion of Ofsted’s research function, while tacitly acknowledging that Ofsted had not previously had any mechanism in place for evaluating whether its judgments were either valid or reliable (let alone accurate, or fair):
“This is why I am expanding Ofsted’s research function. We’re going to be looking at the validity and reliability of our inspections, making sure we look at what really matters in education and that our judgements are consistent and reliable.” 
I can attest that being observed managing the extraordinarily dynamic and interactive process of teaching a class of children, in the knowledge that the judgment that will be passed by the observer may have very significant consequences for one’s own career, the careers of colleagues and the status of the school, can be remarkably stressful. That stress isn’t in any way alleviated by the understanding that inspectors are apparently endowed with powers of observation that the teacher whose lesson is being observed simply does not possess. So the process of lesson observation – and the judgments that flow from it – has long been a highly contentious practice.
In the years before Amanda Spielman took up her post, the Ofsted inspection framework had already come under much scrutiny and criticism in respect of the putative validity and reliability of its judgments. The process of lesson observation, and the weighting afforded to it in judging the overall quality of teaching and of learning in a school, was the core issue.
The 2014 Policy Exchange Report ‘Watching the Watchmen’  was explicit:
‘The report also concludes that lesson observations – which take the majority of an inspection in terms of time and money – are neither valid nor reliable in their present form. Although the purpose seems sensible – to validate the quality of teaching in a school and check how young people are learning – the report is unequivocal in concluding that observations in their current format cannot make such a judgement, and that the consequences that flow from the practice of observations – whether it is schools preparing checklists of ‘Outstanding lessons’, conducting mock inspections, or teachers preparing ‘Ofsted lessons’ – are all both nugatory and avoidable.’ (pp. 6-7)
Robert Peal’s study of the same year for Civitas, ‘Playing the Game. The enduring influence of the preferred Ofsted teaching style’.  reached a similar conclusion:
‘The judgement of individual inspectors on the quality of teaching within a school is too subjective, too imprecise and too contentious to continue in its current form. The practice of lesson observations has been shown to be an inexact science, with judgements that are both invalid and inconsistent. This is compounded by the negative effect of observations, distracting teachers from honest questions about what will help pupils learn, and towards the more cynical question of ‘what does Ofsted want to see?’ (p.49)
In 2015, Ofsted conducted a single piece of research that related to the quality of its judgments, a small scale study of 24 schools, all with the same starting point at the previous inspection. (‘Do two inspectors inspecting the same school make consistent decisions?’)
The title is somewhat misleading in that the study related to consistency between different inspectors’ judgments of the same observed classroom events, rather than their overall judgment about the school, so it is a study focusing on reliability. This is not the same as seeking to establish whether the judgments themselves were valid. Judgments that are reliable but invalid are clearly of no practical use.
Ofsted’s research study recognised just this point:
‘For instance, the positive findings from this current study will be largely irrelevant if the components of current inspection processes are found to have little association in determining school quality. Furthermore, the absence of strong evidence in the research literature on the validity of inspection suggests this will remain a priority for the sector going forwards.’ (p.41)