
link spammers
-------------

Link spammers target Blogs/Boards/Forums and Wikis in particular to
publicate hyperlinks to their sites. Those links aren't intended for
humans, but placed there purely to push search engine rankings.

This is not a new trend, but an ongoing annoyance. Luckily ewiki
provides you with a small aresenal of useful counterfight plugins
and methods. There is a new plugin category ".../spam/" set up
exactly for this; the most useful plugins are however sprinkled
around - that's why this README tries to guide you.


what to do first
----------------

If you get spammed, you should first lock that particular page
against writing for a while - if only one is target of a massive
attack.


denying them success
--------------------

Chances are, you got targeted by a link spammer because your pages
are too open for indexing. Go on and prove check, that your site
layout script loads following include() fragments:

  <html>
   ...
   <?php
      include(".../ewiki/fragments/head/core.php");
      include(".../ewiki/fragments/head/meta.php");
   ?>
   ...
  </head>

The first prevents (old) ?version= pages from getting indexed by
search engines, and new revisions cannot be indexed in the first twelve
hours. If you ensure that, then spamming your site gets uninteresting
for the pundits. It is unlikely that any search engine bot would index
the spamlinks before you remove them.

The only problem here is, that most of the link spammers are only
semi-professionals and some won't notice this measure immediately or
at all. Nevertheless this is the most important step! Don't forget
that bringing crap onto your Wiki is only a time-consuming requirement
for spammers, what they are really targetting is Google & Co. and so
denying them success there has top priority.
Getting rid and preventing further spam is always only step two.

                                      (This attitude makes link spammers
                                      as angry, as their spam makes us.)


mass link spamming (so called "WikiSpam")
------------------

Spammers need to spam a lot, else it gets financially uninteresting.
We simply call this "brute force" spamming from here.

So most attacks against Wikis involve throwing a few dozens or even
hundreds (!!) of links onto a single page. This activity is easy to
track, and so if you see this kind of attack, all you have to do is
loading the according plugin:

  include(".../ewiki/plugins/edit/limitlinks.php");

This will reject any page edits with more than 5 new URLs, which
makes your Wiki a useless target for most spammers (those that operate
manually would get quickly annoyed here). Easy, Easy, Easy. Solves the
problem for most of us.

If you run a link directory with your Wiki, this plugin will soon show
its drawbacks - but you could then specifically disable it for some
pages.


single comment link spammers
----------------------------

Many attacks against Wikis are side-effects and were originally meant
for spamming blogs and boards. You can tell apart this sort of attack
because of
 - HTML or "BBcode" around the posted URL
 - it's just a single link
 - all previous content of the page is gone
 - many many page revisions are created

If this happens, then you are definetely attacked with a script. First
lock that particular page, after finding out the attackers entry point
(the edit/ page was typically bookmarked somewhere).

You won't be able to stop this form of attack with simple IP-blacklist
methods, because the spammer operates with trojan-governed zombie
machines from many different locations. Our advantage is, that we can
easily tell it apart because of the edit/POST frequency.

And also as easily we can get rid of bots, which do not specifically
target our Wiki software. Even simple captches (like the ProtectedEMail
checkbox) will normally do.

 - plugins/spam/antibot_checkbox.php
 - plugins/spam/antibot_captcha.php

Typically attacks start from bookmarked pages with some sort of VBscript
(this topic is not fully researched) running from zombie machines. So
there is little interaction, which makes the simple captchas successful.

Moreover those bots are pretty dumb and don't behave like humans. You
can detect and blqock them as well by simply looking at how much time
is between retrieving the edit page and hitting submit/save. There is
a plugin which does exactly that, and blocks if the timespan is below
five seconds:

 - plugins/spam/antibot_captcha.php

(Time is money for link spammers, they cannot even afford slowing their
scripts by even those 5 seconds.)

If only few pages are targeted, you can also disable most of the protection
mechanisms selectively. There are a few "trigger" and "anti-trigger"
plugins, which specifically enable or disable the rest of our anti-spam
extensions.


blocking URLs
-------------

There are also two plugins for blocking URL patterns. Look out for
"edit/spam_block" and "edit/spam_deface". For both of which you can
mantain patterns (on Wiki pages).

With "lib/spamblock_whois" you can even go so far and lock against
domain registrants or hosters, which will block most future spamming
attempts of a person, once you find out.

The "zero_pagerank" scripts and code is used to implement an
intermediate page, which again cripples page rank bonus of linked
pages. (Though this is less senseful now with the rise of hyperlink
flags as rel="NOFOLLOW"; read on...)

There are many shared URL black- and blocklists, but we haven't yet
prepared a cron script to syndicate or even exchange it. But this is
definitely an option and worth considering. You can find a few big
blacklist with search engines. Projects like WardsWiki, WikiPedia,
MeatBall, CommunityWiki, but also GeekLog, WordPress, PhpNuke and
other portal/CMS-ware maintain some.


NOFOLLOW
--------

Google and their page rank idea caused us all that trouble. They later
however decided to do something against link spamming, because they
itself suffered from a watered search database. It is possible to add
an  rel="NOFOLLOW"  to every  <a>  hyperlink tag. (IMHO it would be
more honest to call it "NOPAGERANK", but hey, better than nothing.)

Use ".../ewiki/plugins/linking/a_nofollow.php" as the quick-but-dumb
solution. It adds this to all links, and prevents additional page rank
bonus for any pages.

There is a variant of this (".../new_noffollow.php") which works
considerably slower, because it checks which links have been added the
last two weeks, and only adds the rel="NOFOLLOW" for fresh (=not yet
reviewed) links.


chinese spam
------------

If you don't have any chinese users or discussions, but are targeted
with links to there; you could lock against such content getting seen
by Google (remember: this is what we need to prevent, then spammers
will go away automatically).

 - plugins/meta/block_chinese.php

It works by setting the NOFOLLOW flag automatically, if it detects
foreign (chinese) html entities.


manual
------

Load ".../plugins/meta/meta.php" to add a smaller input box on the
edit pages. On the pages that get targetted by spammers, simply add
"robots: NOINDEX,NOFOLLOW"
so search engines will avoid them.

You can remove this again, if attacks go down. It turned out to be
too troublesome for most spammers to take care of this field.


last resort
-----------

If everything else fails, then try to get rif of link spammers by
opting out from getting indexed by search engines. Either add an
<meta name="robots" content="NOINDEX,NOFOLLOW"> to your sites layout
or create an all-blocking "/robots.txt" file.

Alternatively <meta name="robots" content="NOFOLLOW"> may be enough,
because search engines then wouldn't follow any spam links. You then
could operate your sites as normal, if you post a page with links to
all YOUR (real) pages on a non-blocked site. (Make your frontpage
static, and place a link to a less frequently renewed page index file;
everything else flagged with NOFOLLOW. That would work.)


edit locking
------------

Of course, if your Wiki is not publically-editable, then you won't
even have any of the above mentioned problems. But as you see, there
are enough countermeasures to live without any _PROTECTED_MODE setup.


edit captchas
-------------

Another useful workaround against automated link spamming are captchas
on the edit/ screen. In the simpler cases, something like the
ProtectedEMail checkbox will do. They work because most attacks origin
from script-operated zombie machines, and those bots are too dumb to
target all possible WWW forms (they usually were meant for blog comment
fields).

...


cleaning up the mess
--------------------

Many of the tools/ can help you to remove spammy pages. Especially
the "holes" and the "revert" tool are useful here. Though most people
prefer the WikiCommander or ewikictl to rip bad page revisions.


more notes
----------

See [http://erfurtwiki.sf.net/WikiSpam] for more informations and a
more up-to-date guide.

