Shams, Scams and Wikipedia Scrapers

Shams, scams and Wikipedia scrapersThe many shams, scams and Wikipedia scrapers employed by certain online ‘publishers’ are now in their tenth year. Consider the book blurb containing the following:

“Please note that the content of this book primarily consists of articles available from Wikipedia or other free sources online.”

A company we won’t name, operating under a number of imprints we also won’t name has been taking articles from Wikipedia and publishing them as books. On an industrial scale: tens of thousands of them. And charging anything from $10-$100. With the dead giveaway tag line “High Quality Content by Wikipedia articles.” For ten years.

The packaging of Wikipedia articles into print-on-demand books is a massive, automated operation, powered by readily available program scripts. This has been going on since at least 2010. Amazon has been at best impotent and at worst complicit in this practice. Wikipedia even has a page about it. As it mentions the publishers names, you can bet they’ve scraped it and turned it into a book.

The titles and contents are composed of a base Wikipedia page and then a gamut of other linked Wikipedia articles. These collections of Wikipedia articles aren’t even particularly coherent. It’s what you get web-scraping using dumb scripts with no form of context-checking. But that doesn’t matter. It’s all about building a big catalogue of niche titles and waiting for the careless clicks of gullible buyers. It’s a numbers game. From the reports of the last ten years, it seems they’re selling a fair stack of books for a relatively small outlay on software automation. Money for old rope – or in this case, old articles.

Republishing articles from Wikipedia for profit is not strictly illegal under the copyright statements of Wikipedia itself. This is re-packaged content originally published under an open license. It contains the original attribution and has a blatant notification of origin on the front. If readers wish to purchase said content in an offline format and are prepared to pay silly money for it, that is not illegal either.

However, it does fall foul of consumer law in many territories because the content is so poorly assembled, unedited and in many cases, incomplete to the point of incoherence. But only if you can track them down and prosecute. Which seems to be more of a problem.

It’s become such a massive enterprise, it seems beyond any agency or platform to effectively stop it.

It’s the result of a huge flaw in the POD industry. Print on Demand services will accept submissions from anyone, and produce copies paid for in advance. A listing on the respectable online platforms encourages some people to buy without thoroughly checking the contents or description. It seems a lot of the sales are from people with education or corporate accounts paying for speculative orders on a work or business account, only to find out the book is web-scraped garbage when it arrives.

It’s another example of scam-culture fostered by the Internet. For the platforms drowning in new content and trying to remove the harmfully  illegal material, it’s all too easy to let this stuff slip under the radar.