Nobody loves spam. This page aims to aggregate the ways to combat spam on semanticweb.org wiki and manage the community efforts in that.
|Proposal and ideas (this page)|
|Common spam patterns|
|How to clean up spam manually|
- 1 Kinds of spam on the wiki
- 2 Current vulnerabilities of the wiki
- 3 Tasks
- 3.1 Cleaning up the existing spam
- 3.2 Prevent spam in the future
- 4 Tools
- 5 Volunteers
- 6 Links to read
Kinds of spam on the wiki
- Anonymous spam
- Spam from registered user
By page action
- Spamming on a user page
- Spamming by creating new page
- Spamming on existing pages
By sort of spam itself
- Posting links to websites
- Posting text with non-spam links for example liks to a URL-shortener services
- Posting text without a links
There are several dozens of pages that are not spam but it's not a useful content. Typically these are pages for testing Semantic MediaWiki features: they have SMW properties and it's possible to export RDF out of them. These pages should be removed but the data dumps should be uploaded to http://sandbox.semantic.mediawiki.org.
Current vulnerabilities of the wiki
- Weak captcha. Currently questy captcha is used: there are about 5 different questions so it could be easily broken.
- Registered users can post immediately
- Anonymous users can post links
Cleaning up the existing spam
Block spam users and delete spam pages
There are plenty of spam users that are not blocked yet and several pages that are entirely spam. Both blocking and removing created pages of the users can be done at once, using SecretaryBot, AutoWikiEditor and Nuke (see #Tools)
Remove spam users' contribution to the wiki
This is trickier. A spammer could write something on an existing page, but after that someone could have edited it once again. Here several strategies are applicable.
- The simplest case is when wiki experienced mass spam attack. In this case we can do mass rollback.
- First of we can try to remove ONLY spammer's contribution to the page and merge the result to a most recent version of the page. This is done with simple Undo function. Is there a bot available that tries to undo all user's edits?
- However sometimes it's not possible to merge two versions without conflicts. In this case all we can do is to form a regular expression that will undo all the similar changes.
Prevent spam in the future
Add new expressions to spam filter
There is a regex based spam filter extension installed on this wiki: ConfirmEdit. It uses two blacklists: MediaWiki:Spam-blacklist and meta:MediaWiki:Spam-blacklist to check if the edit is good. Every time user tries to save the page the extension scans the text of the edit and deny saving in case text matches the regular expression.
Now the only captcha that is used on a wiki is QuestyCaptcha: asking the question from predifined set.
Another efficient thing is honeypot CAPTCHA which is for spambot that fill all the values on the registration. Honeypot CAPTCHA adds a hidden field that a human user will never change.
Change the policy of anonymous and newly registered users
Now the policy is the following:
- anonymous users can add links after solving the captcha
- they can add any other text normally
- registered users can add external links normally
The security of the system may be increased after better captcha, but it's always better to protect yourself twice. Many spammers write something right after registration.
Analyze spam-like behavior
There several different ways to analyze Bad Behavior. First is to analyse the information of the headers that client send to server. Second one is to analyse the actions of a client on a wiki.
Create a command of volunteers that can periodically clean up spam
Some spam will nevertheless occur even in the most protected wiki. We need several people that will have Administrator rights and will be able to read RecentChanges every week blocking and undoing the spam revisions.
Tools for batch editions
- AutoWikiBrowser - allows to quickly and interactively form lists, edit pages by regex and delete pages. Batch block of users and remove contribution is not supported.
- secretaribot includes script that shows username+userpage and allow you to instantly delete the page and block the user.
- Spam blacklist cleanup script allows to quickly clean up all spam URLs added to MediaWiki:Spam-blacklist of the wiki
- Nuke allows to delete all pages created by a given user
- DeleteBatch allows to create a list of pages and delete them in one click.