I have a script that downloads all the posts from HN and I trained models that can predict based on the title: (1) will an article get > 10 votes, (2) will an article get a ratio of comments/votes > 0.5, and (3) will an article be [dead].
The (1) model sucks (AUC-ROC maybe 0.6), the (2) model is better (AUC maybe 0.7) but the (3) model got an AUC pushing 0.98 which seemed unreasonably high.
My mental model of "[dead]" was that it happens to articles that get popular but are about politics or some other bad subject. What I found though is that HN gets bursts of spam like the one you're experiencing and with the system I had (i) the same headline would show up [dead] a large number of times and (ii) the same headline would show up in the train, eval and test data sets so of course the system got an unreasonably high score for [dead]. That's how I learned that HN gets these spam waves.
Well, to be fair, the spam blocking is working great, you just chose to turn it off. I saw a prior run a few days ago, where they kept changing tactics, and HN kept reacting and blocking - no idea if that was automatic or if dang was reacting to it, but either way it was a fun day of watching the battle go on.
But as to why HN is a target, you don't need a high percentage of hits to make it worth spamming. Scams are lucrative. If one in a million viewers actually follows the links and falls for the scam, that will more than cover costs of spamming links. So they will attack any site where it looks like there is any chance of getting through.
1) SPAMmers can be quite dim - if they were smarter many of them would be doing more useful and more profitable things.
2) I think the dynamic auto-kill seems to be working the way that it is intended to.
3) The current rather prolific idiot may be trying to probe for weaknesses, but is burning rather a lot of sockpuppet accounts and not being smart about the probes... See (1).
I have a script that downloads all the posts from HN and I trained models that can predict based on the title: (1) will an article get > 10 votes, (2) will an article get a ratio of comments/votes > 0.5, and (3) will an article be [dead].
The (1) model sucks (AUC-ROC maybe 0.6), the (2) model is better (AUC maybe 0.7) but the (3) model got an AUC pushing 0.98 which seemed unreasonably high.
My mental model of "[dead]" was that it happens to articles that get popular but are about politics or some other bad subject. What I found though is that HN gets bursts of spam like the one you're experiencing and with the system I had (i) the same headline would show up [dead] a large number of times and (ii) the same headline would show up in the train, eval and test data sets so of course the system got an unreasonably high score for [dead]. That's how I learned that HN gets these spam waves.
Thank you for the rundown. Very interesting to get some ideas of spammer tactics.
Well, to be fair, the spam blocking is working great, you just chose to turn it off. I saw a prior run a few days ago, where they kept changing tactics, and HN kept reacting and blocking - no idea if that was automatic or if dang was reacting to it, but either way it was a fun day of watching the battle go on.
But as to why HN is a target, you don't need a high percentage of hits to make it worth spamming. Scams are lucrative. If one in a million viewers actually follows the links and falls for the scam, that will more than cover costs of spamming links. So they will attack any site where it looks like there is any chance of getting through.
1) SPAMmers can be quite dim - if they were smarter many of them would be doing more useful and more profitable things.
2) I think the dynamic auto-kill seems to be working the way that it is intended to.
3) The current rather prolific idiot may be trying to probe for weaknesses, but is burning rather a lot of sockpuppet accounts and not being smart about the probes... See (1).