TrackBack spam solved

Following up on the post I just made about the BotBlock plugin, I thought I should mention that the reason I'm getting comment spam is that I turned off the DisableComments plugin on LinLog. The reason I turned it off was to test out the TrackbackValidator plugin that comes standard with LnBlog 0.8.0.

The DisableComments plugin allows you to automatically turn off replies (TrackBacks and comments) on entries older than a given number of days. Since my big problem was TrackBack spam, and it was mostly on entries that were more than a month old, I "fixed" the problem by simply setting DisableComments to disable replies after 30 days. On the up side, this stopped the flood of TrackBack spam. On the down side, it stopped all legitimate replies too.

Well, it turns out that I don't really need the DisableComments plugin as much anymore, Happily, the new TrackbackValidator plugin, which only allows TrackBacks from URLs that actually link to you, has completely solved my comment spam problem. My server access logs will show lots of TrackBack pings, but not a single spam ping has gotten through.

The only down side is that now I need to worry about comment spam on old entries.

BotBlock plugin

I added a new plugin to the plugins page today. It's called BotBlock, and it's just simple attempt to keep robots from posting comments.

I wrote this because, for the past several days, I've been getting a lot of comment spam. The messages were comming in groups of two to six messages at a time, had varying content, and came from varying IP addresses. However, the general format of all the posts was the same (short fake greeting, followed by lines of URLs and two or three word descriptions) and they all targeted the same blog entry. So obviously these were either being posted by a robot or a very stupid human.

Thus I implemented this stop-gap solution. Basically, it just adds a hidden field to the comment form that contains a hash value based on your LnBlog configuration and the client IP address. When the client submits a comment, it checks this hash. If it's either missing or doesn't match the calculated value, the comment is rejected.

Of course, this depends on the bot being relatively stupid and the spammer not being motivated enough to figure out your specific configuration. A determined spammer could bypass this protection without too much effort, which is why I call this a stop-gap. However for small-time blogs like mine, which aren't worth the effort to crack, this solves the immediate problem.

LnBlog 0.8.0, "No Need for TrackBack"

Well, it's finally here: LnBlog 0.8.0 is now available. You can grab it here or go to the download page.

Several files have been removed in this release, so to upgrade, you should upload the new directory to your server and then copy/move your old userdata folder into it. Before that, though, you might want to take a look at the new system.ini file, just to see what the new options are. That part is optional, though.

Lots of changes and lots of bug fixes in this release. I won't bore you with all (or even most) of the specifics. That's what the changelog is for. However, for this version, I did do some extra testing and even documented the outstanding problems in this release. Fortunately, most of them are relatively small.

There are bunches of new features. First, LnBlog now has support for Pingbacks. You can turn on and off both the sending and receiving of Pingbacks on a per-entry basis. Note that there is an AllowLocalPingback setting in the entryconfig section of the system.ini file. If you set this to 0, then LnBlog will not send Pingback pings to enrties on your

Second, there's a new standard plugin: the TrackbackValidator. Basically, this checks the URLs of incoming TrackBack pings to see if they link to your blog. This works on the principle that legitimate TrackBacks almost always link to you, but TrackBack spam almost never does. So far, it seems to have completely eliminated my TrackBack spam problem.

Third, there's now a standard profile.ini file. This adds a custom "contact me" link field for your profile. This field takes an HTML link as its input. If this is given, then your e-mail address will not be displayed in your profile. You can use this with the ContactForm plugin.

Fourth, LnBlog now has simple Podcast support. Basically, this means you can add an enclosure URL to your entries and it will be included as an RSS enclosure in the RSS 2.0 feed (if you have one). You can either enter the RSS attributes directly, specifying url="http://somehost/file" length="12345" type="audio/mpeg", or, if the file in on the same server as LnBlog, you can give the URL and let LnBlog compute the file size and MIME type. Note that you can also use LBCode-style relative URIs, giving only the name for files in the entry directory, or a path relative to the blog root.

Last, I've reworked the post editor, including a lot more JavaScript. I've condensed the LBCode editor buttons, added a drop-down menu to add topics, and hidden the extra settings in an expandable box. I think it's much easier to use now. Also note that there's an EditorOnBottom setting to the entryconfig section of the system.ini file. Although it's not acutally in the default file, the default value is 0, which puts the editor buttons above the text area. Add this setting with the value of 1 to put them below.

I think that pretty much sums up the big things for this release. As usual, all forms of feedback are welcome. If you have any comments, find any bugs, or whatever, feel free to leave a comment, e-mail me, or whatever.

TrackbackValidator: sneak preview

I read an interesting article on TrackBack spam today, called Taking TrackBack Back (from Spam). The ideas seemed sensible to me, so I banged out a TrackbackValidator plugin to implement it.

The idea behind the paper and the plugin is simple: the sites linked to by TrackBack spam never contain a link back to your blog entry, but nearly every legitimate blog entry that pings you does. Therefore, rather than fancy content filter, we can eliminate TrackBack spam by simply fetching the URL in the ping and only accepting it if the page actually contains a link to your site.

This plugin implements that idea. It also includes options to white-list all pings coming from your domain and to allow ping that link to files under your entry (e.g. an uploaded file or the comments page) rather than just the permalink. My spam rates have been way up lately, so I'll be uploading this tonight and we'll see how it goes.

Interesting spam influx

I've had an interesting influx of comment spam lately. You obviously won't find it on this site anymore, because I always delete spam immediately, but here are some excerpts from the comment notification e-mails.

Ive pretty much been doing nothing to speak of. My lifes been bland these days. I havent been up to much. I feel like a void.

My mind is like a fog. Such is life. So it goes. Ive basically been doing nothing to speak of.

I havent been up to anything recently. Such is life. What can I say? Not that it matters. I feel like a void, but pfft. I cant be bothered with anything these days, but its not important.

Sense a pattern there? This particular round of comment spam seems to be targeted at angsty, overly dramatic teenagers with LiveJournal accounts. The comment and homepage link are clearly commercial (I've had several for sites selling fake Rolex watches), but the body text seems specifically crafted to blend in with steroetypical bland, pointless weblog comments. I don't know why, but for some reason this strikes me as really funny. I guess I need to get out more.

Figuring out my comment spam

So I come home after a rotten day, feeling really down, and what do I find in my inbox? Twenty-four (yes, that's 24) e-mail notifications for comments on my blogs: all of them spam. Damned degenerate scumbags. I guess it's time to get serious about implementing a content filter, because these sub-human wastes of perfectly good carbon atoms just won't leave me alone. And I'm getting tired of deleting comments and trackbacks by these walking piles of monkey excrement, so my only choice is to get pro-active.

The thing that really pisses me off about today's hit and run is that it isn't even commercial spam. Oh, I hate the assholes who leave that too, but at least I can understand it. Deleting links to online gambling and loan refinancing sites is unpleasant, but at least the act of posting such links on blogs makes sense: more links = better Google ranking = more money. They're still slightly below flesh-eating bacteria on the scale of human worth, but at least their actions aren't completely incomprehensible.

Today's round of comment spam, however, is different. This isn't the first time I've suffered this type of attack, but it is the first time I ever stopped to analyze it. You see, there were two distinct types of comment. The first makes absolutely no sense to me. It is simple something@skepticats.com posted as the name, subject, and body of the comment. That's it. No links or anything. Just an invalid e-mail address at my domain. Does anybody have any clue what the purpose of such a comment could be? Does it have something to do with gaming e-mail harvesters? That's pretty much all I could think of.

The second type of message is significantly more complicated. Like the previous message type, it contains a random e-mail address at my domain in the subject and body of the comment. However, for the name field, it contains some variation on the following text:
to
Content-Type: multipart/alternative; boundary=912124b723a23f3d33ad518075fc69e8
MIME-Version: 1.0
Subject: carelessly. s no one in the hut, no
bcc: real_address_removed@aol.com

This is a multi-part message in MIME format.

--912124b723a23f3d33ad518075fc69e8
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

strove to compete with the steam packet, the dark smoke from which, like some demon, partly rested upon the vessel, partly
--912124b723a23f3d33ad518075fc69e8--

.
I actually had to look at the raw data files on my server to figure out that this was going in the name field. On the comments page, most of it actually showed up in the body. This seems to be because the comment class expects every field except the body to be one line, because that's the only way to enter it on the form.

I could be wrong, but this appears to be an attempt to piggyback on the comment notification system. Apparently the idea is that by injecting mail headers directly into the name field, they can fool the mailer into thinking they're real headers and sending a copy of the message to the address in the BCC line. Fortunately, it doesn't appear to work. However, I'm still concerned that there's no actual commercial content in the messages. They appear to be just text snippets taken at random from a story of some type. Why would anyone want to send that? Is somebody just using this as a test? What on earth is going on with these messages?

One problem down

Well, it looks like the new IP blacklisting plugin has solved my trackback spam problem. All the spam was coming from two or three subnets, so I just banned the entire subnet and I haven't had anything since.

In other news, I implemented a couple of fixes and features today. First, I fixed a problem with links to uploaded files being broken by the new pretty link feature. Second, I fixed that annoying problem where tags weren't preserved in the edit box when previewing an entry.

On the feature front, I added the ability to turn off trackbacks. For now, there's no seperate setting for it - it just follows the setting for comments. I also added a plugin that allows you to turn off trackbacks or comments for an entire blog. That should be nice for people like me who never actually get any legitimate trackbacks.

Anyway, I want to do some testing, but I think I'm pretty much done with this version in terms of features. Look for a new release before the end of the week.

Gack! More trackback spam!

Well, it looks like testing out the IP banning plugin for a few days was a good idea, because it didn't work quite as well as I'd have hoped. In other words, I'm still getting trackback spam.

I took an hour or so to rework the plugin. This time I decided to keep it nice and simple. I made sure to remove all extraneous whitespace, separate the per-blog and global IP lists, and did the check with a simple preg_match() call. This has the added benefit that I can now use the same code to ban an entire subnet just by including a star in the IP.

I think I should also include a setting to turn off trackbacks and/or comments altogether. I'll bet there are a lot of people who just don't feel like trying to keep up with the spammers. And at this point, I really can't say I blame them. My site isn't even that popular, and I've gotten nearly 50 trackback spams just since I implemented e-mail notification the other day.

Yay! Spam handling!

Well, now my spam problem is (partially) taken care of. Today I added trackbacks to the IP blacklisting plugin, added a plugin to send e-mail notifications for new trackback pings, and added an interface to delete trackback pings. After I burn them in for a few days, I'll put up a new release.

I would have liked to just release the plugins, but it turns out that isn't possible. You see, I haven't actually touched the trackback code since I first added it. As a result, the trackback class wasn't raising any events for plugins to hook into. In fact, the class didn't even have a delete method. So I had to add all of that. The good news is that now it's taken care of, so hopefully in the future I can just release plugins.

And here I was worried about comment spam

Sigh.... And here I had just finished an IP banning plugin for comment spam and I discover that I'm now getting trackback spam. Pornographic trackback spam, no less. And I didn't even bother to implement an interface for deleting trackbacks, much less blocking them. I guess I'll be doing that tomorrow. My apologies for the oversight.

So it looks like I've got another couple of plugins to implement before the next release. For one, I'll want e-mail notifications of trackbacks, so that I'll actually know when I get trackback spam. Second, I'll have to extend the comment banning to trackbacks. Third, I guess I'll have to start working on that keyword banning idea I was thinking of for comment spam.