“Data Journalism at
its finest ... evidence based blogging” proclaimed the perpetually thirsty
Paul Staines yesterday evening, trying his best to promote the work of his
newly anointed teaboy Alex Wickham in supposedly showing that the paper almost
singlehandedly responsible for the recent upsurge in the use of the word “scrounger” was none other than the
deeply subversive Guardian.
Really? Er ...
There was even a bar chart with the Y-Axis starting at zero –
a rarity for the rabble at the Guido Fawkes blog – in support of the post “Guardian
Uses Word ‘Scrounger’ More Than Any Other Paper”. At first glance, this
looks almost credible. But seasoned watchers of the Fawkes folks will already
have read ahead to the assertion that “Guido
has been crunching the numbers” and duly smelt a rat.
Especially the methodology: “Because neither Google or [sic] LexisNexis
include all paywalled sites in their analysis, Guido used each newspaper
website’s own internal search engine to determine in how many articles the word
‘scrounger’ appeared between 2010 and today. The respective Sunday editions of
the titles were included with the daily for the purposes of this research”.
... maybe not
So this wasn’t a like with like comparison – unless each of
those internal search engines worked the same way. And almost three hours
before Staines made his lofty pronouncement, Declan Gaffney had well and truly skewered the Fawkes
rabble with a more thorough analysis which showed the Guardian to be well behind the Telegraph,
Mail, Sun and Express (which
topped the chart).
As Gaffney pointed out, the Fawkes method was never going to
give a true picture because “(a) there is
no reason to believe that all websites are equally representative of the
content of titles (b) articles get deleted from websites but not print editions
and (c) some titles have much more developed online content than others,
notably the Guardian”. Quite.
Fawkes spin ...
So how did he get his
numbers? “For our analysis we combined
word-counting (on a set of 6,000 articles) with manual coding (of a 20% sample
of the articles). We didn't just count words: using a custom-built database we
were able to look at co-occurrences of different vocabularies in the same
article. This was pretty time consuming, but it beats passing off the output of
a couple of hours of timewasting on media search engines as serious analysis”.
Ouch!
... versus reality
That’s why, when the Fawkes blog tells “That is why Owen [Jones] and
the unpopular progressive sections of the media use the emotive term more than
anyone on the welfare-reforming right”, you know they are once again
talking weapons grade bullshit, yet combining it effortlessly with the brass
neck of their spin cycle.
Well done Declan Gaffney, and nul points to The Great Guido. Another
fine mess.
Excellent article, thanks. One point of criticism, your final graphic, the line graph, is unreadable (for slightly colour-blind me, at least). Rather than just using a key, with coloured lines, would it be possible to use tabs or arrows identifying each publication too?
ReplyDelete