Using software to detect anomalies

BY PETER WANYONYI March to April is a nervous time in Kenya, and this has nothing to do with our geographical vulnerability to weather events – floods and droughts at opposite ends of the country – at about this time. This period is when high school students who sat their examinations the previous year receive their results. It is a bitter-sweet time: Kenya is, like many developing countries, a brutally competitive crucible, one in which it is all too easy for people to fall through the cracks of the fabric of society – particularly if those people are not well-connected. The Kenya Certificate of Secondary Education, KCSE, is therefore an absolute separator for the vast majority of children in the country. One’s results in KCSE determine what university they will join, what courses they will pursue, whether they will receive a government scholarship or not – it is a do or die affair. Being a poor, badly-run country, this means that KCSE invariably invites cheating. The 2015 KCSE examination had perhaps the most extensive cheating ever. The situation ended up costing the-then Education minister his job, with a new minister brought in to try to get a handle on things. As we can see from the recently-released 2015 KCSE results, however, the new minister has not made much of a difference. The statistics emerging from the results are staggering and unbelievable – it will be surprising if any university takes those results seriously. There are certain basics in statistics that one cannot run away from, and which have been demonstrated and proved over across thousands of different disciplines. One of these is the “normal distribution curve”, a simple observation in all natural and social studies. Sometimes called the “Bell Curve”, it holds that the averages of random variables independently drawn from independent distributions converge to the normal. What this means is that, if you randomly select a large enough sample and subject it to a standardized test, you will end up with results that show a small number of scores being above the average, another small number of scores being below average, and the vast majority being average. When this is plotted on a graph, it takes a simple bell-like shape, with the extremes at either end being in relatively low numbers, while the majority of samples occur towards the average of the population. Astonishingly, even though we do not live in a perfect world, this pattern is universal. Human traits are distributed along bell curves – look around you, try plotting a bell curve of apparent body weight as you walk along the street. There will be a few very thin people, a few very fat people, but the vast majority will be towards the middle. Such statistical tools also form the basis of government policy in health and education, for example – in a population, there will be a few people that fall really sick with very bad ailments, a few that never fall sick at all, and a vast majority that falls ill with fairly routine illnesses that can then be planned for. Screen Shot 2016-04-01 at 2.56.55 PM   Screen Shot 2016-04-01 at 2.57.13 PM The recently released KCSE results, therefore, make no sense. Mapping out one school’s performance results in the above graph. This was not an unusual case, as the pattern was repeated across the country. A distribution of this sort is so unlikely as to be a statistical impossibility – and it points to either a badly skewed sample, or a test that was not standardized. We know that the KCSE examinations were not really standardized, because many candidates obtained examination papers before the examinations themselves – and this would explain why such strange patters are repeated across the entire examination result set. The eternal tragedy of Kenyan public policy, however, is that we still subscribe to the “bwana mkubwa amesema” paradigm, in which the word of some unnamed mandarin high up is always final. These results will therefore stand, even though everyone can see that they are nonsense. In a different setting, statistical analysis software would be used to detect anomalies quite easily. In the case of KCSE results, for example, basic analysis using Office software such as Microsoft Excel would quickly show that the examination as a whole was badly leaked. However, the problem would still be identifying who bought the examination papers and who did not. In this case, the analysis would be applied to individual schools. A school whose distribution curve is heavily skewed towards the right – that is, more candidates obtained the highest score than the next-highest score – would immediately draw attention and investigation: after all, it is much easier to fail an examination than to pass it, so any skew for a standardized examination should normally be that there are more people failing than passing. The reverse implies that the examination was too easy, which is what would happen when the examination papers leak before the examination itself. Software tools play a big part in analysis and policy development around the world. With such tools commonly available today, government departments – such as the ministry of education – should look to employ them to help analyse the results of government policy and make the necessary mitigating interventions. Not doing so is unforgiveable and lazy. The author is an ICT consultant based in New Zealand.

Sign Up