Friday, 3 May 2013

Guido Fawked – Data Journalism Fail

Data Journalism at its finest ... evidence based blogging” proclaimed the perpetually thirsty Paul Staines yesterday evening, trying his best to promote the work of his newly anointed teaboy Alex Wickham in supposedly showing that the paper almost singlehandedly responsible for the recent upsurge in the use of the word “scrounger” was none other than the deeply subversive Guardian.

Really? Er ...

There was even a bar chart with the Y-Axis starting at zero – a rarity for the rabble at the Guido Fawkes blog – in support of the post “Guardian Uses Word ‘Scrounger’ More Than Any Other Paper”. At first glance, this looks almost credible. But seasoned watchers of the Fawkes folks will already have read ahead to the assertion that “Guido has been crunching the numbers” and duly smelt a rat.

Especially the methodology: “Because neither Google or [sic] LexisNexis include all paywalled sites in their analysis, Guido used each newspaper website’s own internal search engine to determine in how many articles the word ‘scrounger’ appeared between 2010 and today. The respective Sunday editions of the titles were included with the daily for the purposes of this research”.

... maybe not

So this wasn’t a like with like comparison – unless each of those internal search engines worked the same way. And almost three hours before Staines made his lofty pronouncement, Declan Gaffney had well and truly skewered the Fawkes rabble with a more thorough analysis which showed the Guardian to be well behind the Telegraph, Mail, Sun and Express (which topped the chart).

As Gaffney pointed out, the Fawkes method was never going to give a true picture because “(a) there is no reason to believe that all websites are equally representative of the content of titles (b) articles get deleted from websites but not print editions and (c) some titles have much more developed online content than others, notably the Guardian”. Quite.

Fawkes spin ...

So how did he get his numbers? “For our analysis we combined word-counting (on a set of 6,000 articles) with manual coding (of a 20% sample of the articles). We didn't just count words: using a custom-built database we were able to look at co-occurrences of different vocabularies in the same article. This was pretty time consuming, but it beats passing off the output of a couple of hours of timewasting on media search engines as serious analysis”. Ouch!


... versus reality

That’s why, when the Fawkes blog tells “That is why Owen [Jones] and the unpopular progressive sections of the media use the emotive term more than anyone on the welfare-reforming right”, you know they are once again talking weapons grade bullshit, yet combining it effortlessly with the brass neck of their spin cycle.

Well done Declan Gaffney, and nul points to The Great Guido. Another fine mess.

1 comment:

  1. Excellent article, thanks. One point of criticism, your final graphic, the line graph, is unreadable (for slightly colour-blind me, at least). Rather than just using a key, with coloured lines, would it be possible to use tabs or arrows identifying each publication too?

    ReplyDelete