Blog
Hiding from spammer robots with Javascript e-mail address writer
They say that within a few seconds of connecting a new computer to the internet, it will be pinged, probed, and hacked if it is unprotected. Kind of like the classic leg-in-piranha-infested-waters effect. Well, I don’t know if that’s true, but I do know some badbots have been snooping around my site, and I’ve had to take steps to keep them from impinging on my enjoyment of life.
How do I know badbots are visiting? A couple of ways: First, I am using a free service called IP2Location
, that shows the general location of a computer based on it’s IP address. According to this, I routinely get visits from the Netherlands, Germany, France, Taiwan, and other “interesting” locales that – one would assume – do not have very many people interested in an Austin specific web site. Hmm, I just got a “visitor” from China, apparently from the FUJIAN region – wherever that is. I have to assume these are robots and not humans. One giveaway is that the server logs show only certain files are requested, and not others that are necessary for human enjoyment of the site. For example, the home page is index.php, but when that page loads in a browser, as it would when a human requests it, a number of other files are required: a javascript file, a stylesheet, and several images. The Chinese visitor, as an example, only requested the root page (index.php) and the javascript file. That’s it - no images or styles. Without them the page would be meaningless.
The other way I know some of these are badbots, and not helpful search engine bots (which I also get a fair share of, btw – Google almost nightly, plus Yahoo, Inktomi, AOL, and others) is that I’ve started to get comment and e-mail spam at my brand-new site. The comment span appeared first – I received 4 comments to various blog postings at the same time, each with what seemed to be just a random collection of words strung together into something that looked kinda like sentences, but really weren’t. Intermixed with the random words were links, also with random words, but the weird thing is at least some of them seemed to point to a legitimate site like news.yahoo.com. I have to admit, I do not understand the purpose of this. Why would a spammer link to a yahoo site? Maybe if I had examined the source code I would have figured out the trick, but I just deleted the comments instead.
Getting protection from comment spam was pretty easy – all I had to do was enable the Akismet plugin, which is very effective (so far) at keeping comment spam away from me. I did have to register at WordPress to get the digital certificate to enable this plugin, but that was pretty painless and only took a few minutes. Hmm, I wonder if there were technical reasons for this, or if WP has stolen a page from the Microsoft playbook. Remember when MS tried to force everyone to get a Passport account? Supposedly you could not get anything out of MS web sites (and even some software) without a Passport. Thankfully that didn’t last very long, and although I managed to get by without a Passport, I now have an unused blog over at WordPress, just to keep spam out of my comment queue. Oh well.
Combating the other scourge, spam e-mail, took a little more doing. The primary line of defense for this is to keep the badbots from finding your addresses in the first place. I was negligent in this regard when I created the Contact Me page, and posted my 6 addresses in plain sight, both to humans and to robots. I knew I would have to come back and fix this, and now that the spam is here, it is time. Actually, it is a little late, because at least one spammer already knows the addresses. So far, I’ve received only one form of spam, so if I cross my fingers tight enough, maybe he’ll not bother to share them with anyone else.
There are a number of ways to hide e-mail addresses from robots, while making them visible to humans, each with varying levels of visitor convenience. There is a nice summary of these methods here
.
Since one of my design objectives is to make this site fun and entertaining, visitor convenience ranks pretty high in priority. This means e-mail addresses need to be actual mailto: links, with text that can be easily copied and pasted if the user prefers to do that. Thus, address hiding methods that require any user effort were not considered, such as an images that looks like text (requires users to manually type in the entire address), or text descriptions (in the form “erwin AT austinmash DOT com”), or deliberate address munging (like erwin@REMOVE.austinmash.com).
I also discounted methods that rely on replacing regular text, like “@”, with the equivalent HTML or HEX character encodings, like @. The theory here is that the badbots go hunting for @ symbols, and if they don’t find one, they go away. However, given the declining cost of bandwidth and CPU cycles, and the fact this is easy to decode, I am not about to assume spammers are not smart enough to also be on the look out for @.
There are more elaborate schemes that rely on encryption, CSS tricks, Forms, or user action before revealing the addresses (as if it was a state secret or something). Most of these have maintenance drawbacks, and are just not worth the effort. Simple yet Effective, is my motto. So, I settled on a Javascript approach.
The basic concept is to have Javascript write the text to the page in chunks, such that any one chunk is not enough for a badbot to get anything useful. The end result on the page, however, is a complete e-mail address with mailto: functionality and easy-to-copy-and-paste text. Also, it can be done in a way that is easy to maintain, and if necessary, can even be generated with server-side code.
There are a lot of examples of this technique on the web, but most take an approach that seems convoluted and inelegant to me, such as this: My e-mail address is <script>document.write(“me”);document.write(“@”); I suppose this is fine if this is the only address on the page, but I have 6 addresses on my contact page. So I came up with my own technique, which I think is more flexible and re-usable (although it’s pretty likely this has been done before, I did not run across any examples like this in my research).
document.write(“here.com”)</script>
First, I declared a number of functions to write certain portions of each address, with the intention of reusing common functions (chunks of text). Then, where I wanted the address to render to the page, I just called the functions in sequence.
This is the first part, the function definitions. I placed this in the <head> portion of the page, but it just as easily could have been in a separate .js file.
<script>
function hrf() {document.write("<a href=mailto:")}
function d() {document.write("@austin")}
function d1() {document.write("mash.com")}
function d2() {document.write(">")}
function erwin() {document.write("erwin")}
function press() {document.write("press")}
function sales() {document.write("sales")}
function h() {document.write("help")}
function s() {document.write("submit")}
function payments() {document.write("payments")}
</script>
Then, each time I need an address on the page, I do something like:
<p>Feel free to send me a note at:
<script>hrf();erwin();d();d1();d2();erwin();d();d1()</script>
Here is another part of the page. Note, I am reusing most of the same functions
<p>If you have any difficulty buying pixels, please use this address:
<script>hrf();sales();d();d1();d2();sales();d();d1()</script>
If you want to see the end result, just see my Contact page.
If at some point I create a new page that also has e-mail addresses, I will probably go ahead and put the function definitions in an external file that both pages can access, and will probably re-name the functions to make them less ambiguous and likely to collide with other functions of the same name. Also, I may go ahead and combine d() and d1() – the badbot already knows my domain name, so there’s probably little gained by trying to hide it.
The biggest drawback to this approach is that it won’t work for visitors that don’t have Javascript, or have it turned off. This was not a big issue for me, because without Javascript, the whole site is pretty useless and probably impossible to read anyway. This is a conscious decision on my part, and I’m ok with it. I mean, Disney can spend millions of dollars on Disney World and have some of the greatest rides in the world, but they can’t do anything about someone who refuses to get on because it may mess up their hair, or whatever. Hey – it’s their loss, right?
Oh OK - for the non-scripters, I added a <noscript> tag that says my address is erwin (at) austinmash (dot) com. There you go, you sit here and watch everyone else have fun.
I don’t know how long it takes for an exposed computer to be eaten up by the piranhas of the internet, but I do know a new web site will start to get nibbles in a few weeks. I hope my new boots are thick enough for the coming onslaught.
No Comments yet »
RSS feed for comments on this post.
Trackback URI for this post:
http://austinmash.com/blog/hiding-from-spammer-robots-with-javascript-e-mail-address-writer/trackback/
Leave a comment
You must be logged in to post a comment.





