Blocking robots from Semantic MediaWiki special pages

From semanticweb.org

Jump to: navigation, search

Pages like Special:ExportRDF, Special:SearchByAttribute, or Special:Browse can tie up the bots of search engines for long periods of time and prevent them from indexing the actual pages. This can lead to search engines offering up RDF pages to users searching for terms instead of an actual wiki page, which is likely to confuse and discourage them.

In order for this to work, you will need to enable short URLs on your wiki where the script path is different from the URL path. Then, you can block any robots you need to discourage from using these (and other problematic MediaWiki pages) with the following rules in your robots.txt file:

Disallow: /w
Disallow: /wiki/Special:SearchByAttribute/
Disallow: /wiki/Special:ExportRDF/
Disallow: /wiki/Special:Browse/
Disallow: /wiki/Special:Whatlinkshere/
Disallow: /wiki/Special:SearchByProperty/
Disallow: /wiki/Special:Recentchangeslinked/
Disallow: /wiki/Special:Whatlinkshere/
Disallow: /wiki/Special:SearchByRelation/
Disallow: /wiki/Special:PageProperty/

In this case, /w is the script directory and /wiki is the virtual directory that internally redirects to /w/index.php.

This will also prevent the disallowed bots from indexing things like edit pages, which is also useful.

To disallow all bots from these pages, your robots.txt would look like:

User-agent: *
Disallow: /w
Disallow: /wiki/Special:SearchByAttribute/
Disallow: /wiki/Special:ExportRDF/
Disallow: /wiki/Special:Browse/
Disallow: /wiki/Special:Whatlinkshere/
Disallow: /wiki/Special:SearchByProperty/
Disallow: /wiki/Special:Recentchangeslinked/
Disallow: /wiki/Special:Whatlinkshere/
Disallow: /wiki/Special:SearchByRelation/
Disallow: /wiki/Special:PageProperty/

Alternative, you could try the generateSitemap.php script in the maintenance directory to generate sitemaps for your site that encourage them to index the wiki pages over Special pages.

Personal tools