Hi all... A few weeks ago someone wrote that to stop the bots from getting your email address off your web site you should use symbols instead of letters ie "ü" means the German ü. Anyway can someone tell me where to find a complete list of these symbols or at least the ABCs? Thanks, JIM -- Jim Hatridge Linux User #88484 ------------------------------------------------------ WartHog Bulletin Info about new German Stamps http://www.fuzzybunnymilitia.org/~hatridge/bulletin Viel Feind -- Viel Ehr' Anti-US Propaganda stamp collection http://www.fuzzybunnymilitia.org/~hatridge/collection
Jim, On Friday 12 November 2004 09:00, James Hatridge wrote:
Hi all...
A few weeks ago someone wrote that to stop the bots from getting your email address off your web site you should use symbols instead of letters ie "ü" means the German ü.
I'm not sure how helpful that technique really is, since HTML character entities (the proper term for these things) are designed for machine processing.
Anyway can someone tell me where to find a complete list of these symbols or at least the ABCs?
Anyway, this is the horse's mouth: <URL: http://www.w3.org/TR/REC-html40/sgml/entities.html>, but as a standard specification isn't the most accessible. This one is a little more straigthforward to use as a ordinary reference: <URL: http://www.w3schools.com/html/html_entitiesref.asp>. Most comprehensive books on HTML, e.g., O'Reilly's "HTML -- The Definitive Guide" or their "HTML Pocket Reference" include lists of character entities.
Thanks,
JIM
Randall Schulz
Randall R Schulz wrote:
Jim,
On Friday 12 November 2004 09:00, James Hatridge wrote:
Hi all...
A few weeks ago someone wrote that to stop the bots from getting your email address off your web site you should use symbols instead of letters ie "ü" means the German ü.
I'm not sure how helpful that technique really is, since HTML character entities (the proper term for these things) are designed for machine processing.
It seems to work reasonably well. I had a couple of email addresses for testing and the one in clear text got a hell of a lot more spam than the encoded one. However... the smarter bots will look for encoded Mailto:'s. So it helps to mix encoded and none encoded text together, just to make it difficult. e.g.: mailto:someone@somewhere.com jalal -- GPG fingerprint = 3D45 5509 D380 26A4 523E A9D8 A66A 5F38 CA43 BB0E
Jalal, On Friday 12 November 2004 13:41, jalal wrote:
Randall R Schulz wrote:
Jim,
On Friday 12 November 2004 09:00, James Hatridge wrote:
Hi all...
A few weeks ago someone wrote that to stop the bots from getting your email address off your web site you should use symbols instead of letters ie "ü" means the German ü.
I'm not sure how helpful that technique really is, since HTML character entities (the proper term for these things) are designed for machine processing.
It seems to work reasonably well. I had a couple of email addresses for testing and the one in clear text got a hell of a lot more spam than the encoded one. However... the smarter bots will look for encoded Mailto:'s. So it helps to mix encoded and none encoded text together, just to make it difficult. e.g.: mailto:someone@somewhere.com
But if you really want to shut out the bots, create an image file (GIF, PNG, JPEG, etc.) that displays the email address. If you want to be even more sure, include some graphical obfuscation that a person can easily disregard but which will confuse OCR software. I just signed up for on-line payment of my telephone bill, and one step of the process requires the user to read a series of alphanumeric characters in an image and enter them in text box. The original image from which the human user must transcribe those letters and numbers has the appearance of yellow graph paper. I take it that is something that OCR software has great difficulty dealing with.
jalal
Randall Schulz
On Friday 12 November 2004 05:10 pm, Randall R Schulz wrote:
But if you really want to shut out the bots, create an image file (GIF, PNG, JPEG, etc.) that displays the email address. If you want to be even more sure, include some graphical obfuscation that a person can easily disregard but which will confuse OCR software. I just signed up for on-line payment of my telephone bill, and one step of the process requires the user to read a series of alphanumeric characters in an image and enter them in text box. The original image from which the human user must transcribe those letters and numbers has the appearance of yellow graph paper. I take it that is something that OCR software has great difficulty dealing with.
An easy way to make your email address unreadable on a web page is to use: http://automaticlabs.com/products/enkoderform/
Bruce, On Friday 12 November 2004 14:51, Bruce Marshall wrote:
...
An easy way to make your email address unreadable on a web page is to use:
Interesting. I hope you trust them to handle your email address, 'cause that's what I used to test it. Nonetheless, all the bot needs is an embedded JavaScript interpreter to thwart that scheme. It's still not as robust as something optical. RRS
On Saturday 13 November 2004 00:58, Randall R Schulz wrote:
Interesting. I hope you trust them to handle your email address, 'cause that's what I used to test it.
Anyone who posts to this list with their real address will have it in dozens of spam databases already, so 'trusting' one more is a little moot, no?
Nonetheless, all the bot needs is an embedded JavaScript interpreter to thwart that scheme. It's still not as robust as something optical.
Somehing as simple as inserting a nonsense word in the address and giving instructions to remove it to get the real address will more than likely be enough to beat any bot. I find it difficult to believe that spammers would be sophisticated enough to write web crawlers with advanced parsers
Anders wrote regarding 'Re: [SLE] html symbols question ..' on Fri, Nov 12 at 18:48:
On Saturday 13 November 2004 00:58, Randall R Schulz wrote: [...]
Nonetheless, all the bot needs is an embedded JavaScript interpreter to thwart that scheme. It's still not as robust as something optical.
Somehing as simple as inserting a nonsense word in the address and giving instructions to remove it to get the real address will more than likely be enough to beat any bot. I find it difficult to believe that spammers would be sophisticated enough to write web crawlers with advanced parsers
Yup, there's that whole return-on-investment thing to worry about that. If you're scouring the web, looking for email addresses, are you gonna want to download linked javascript files and waste processor time parsing javascript to see if anything's document.write()'d that looks like an email address, etc? Probably not. I'm partial to adding a removethis subdomain to addresses, myself. It's pretty easy to write an output filter (with Apache) to take any email addresses and put "removethis." after the @, when the part after the @ is one of our email domains. That gets the added benefit of allowing the rejection of any messages sent to removethis.mydomain, which rejects the message for all recipients when there are multiple recipients specified (using my MTA's setup, anyway). --Danny
James Hatridge wrote:
A few weeks ago someone wrote that to stop the bots from getting your email address off your web site you should use symbols instead of letters ie "ü" means the German ü.
Anyway can someone tell me where to find a complete list of these symbols or at least the ABCs?
On the DVD/CDs there's selfhtml, that great and comprehensive documentation about HTML, CSS, XML, Javascript, Dynamic HTML, CGI/Perl, PHP and more. It is in German though. Once you have it installed, from the index page (which is then located at /usr/share/doc/selfhtml/index.htm) go to HTML/XHTML --> HTML-Referenz --> HTML-Zeichenreferenz. BTW, selfhtml is also obtainable from www.selfhtml.org. The CSS part has already been translated into English, with more to come. S.H.
participants (7)
-
Anders Johansson
-
Bruce Marshall
-
Danny Sauer
-
jalal
-
James Hatridge
-
Randall R Schulz
-
Sjoerd Hiemstra