Talk:Spam blacklist
- Proposed additions
- Please provide evidence of spamming on several wikis. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Reports can also be submitted through SBRequest.js. Please check back after submitting your report, there could be questions regarding your request.
- Proposed removals
- Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
- Other discussion
- Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
- Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
- #wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
- Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respectiveMediawiki talk:Spam-whitelist
page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.
Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.
Completed requests are marked as {{added}}
/{{removed}}
or {{declined}}
, and are generally archived quickly. Additions and removals are logged · current log 2025/10.
- Information
- List of all projects
- Overviews
- Reports
- Wikimedia Embassy
- Project portals
- Country portals
- Tools
- Spam blacklist
- Title blacklist
- Email blacklist
- Rename blacklist
- Closure of wikis
- Interwiki map
- Requests
- Permissions
- Bot flags
- New languages
- New projects
- Username changes
- Translations
- Speedy deletions
- snippet for logging
- {{sbl-log|29409180#{{subst:anchorencode:SectionNameHere}}}}
![]() |
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 3 days and sections whose most recent comment is older than 7 days.
|
Proposed additions
[edit]vinhomesgreenparadisecangio.com
[edit]vinhomesgreenparadisecangio.com
vinanet.vn
trieuxuan.vn
dauthau.net
sunworldfansipan.net
uommam.vn
anhsieuviet.com
sportstatistik.nu
thhp3.vn
bemo-cloud.com
phatgiaodoisong.vn
phincafeviet.com
brainupcoffee.vn
hieuluat.vn
narenca.com.vn
chuyenthanglongdalat.edu.vn
Spam. See w:vi:Wikipedia:Citron/Spam. --Peterxy(talk) 01:02, 3 October 2025 (UTC)
- @Peterxy12:
Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 07:17, 3 October 2025 (UTC)
parivahansewa.app
[edit]parivahansewa.app
External link spam. --Peterxy(talk) 09:33, 6 October 2025 (UTC)
- @Peterxy12: I only see link additions in enwiki where the domain is already blocked. Why does this domain need to go on the global SBL? Count Count (talk) 11:31, 6 October 2025 (UTC)
- Sorry, I'll
Request withdrawn. Peterxy(talk) 15:13, 6 October 2025 (UTC)
- Sorry, I'll
Proposed removals
[edit]encyclopediadramatica.com
[edit]encyclopediadramatica.com
is dead for >10 years now (according to archive.org).
So the redirects
encyclopediadramatica.net
encyclopediadramatica.org
are also useless and nobody would try to spam those links.
So i'll remove them from the sbl (but keep the .se version on the list) in the next days, unless anybody objects.
-- seth (talk) 19:41, 1 October 2025 (UTC)
- @Lustiger seth they were not really added for spamming, but if I recall correctly because the sites were abused (including doxing?). It was quite some time ago, it may have changed. (note, the site now seems to be edramatica.com ??). Dirk Beetstra T C (en: U, T) 10:15, 2 October 2025 (UTC)
- Ok, so edramatica.com seems to be a forum for the website that is down for more than 10 years now.
- If nobody tries to link to the forum at wikipedia, then we don't need to add the forum to the blacklist.
- And the dead websites can be removed from the blacklist anyway, right?
- -- seth (talk) 15:37, 2 October 2025 (UTC)
- Its not down for 10 years, maybe their original one.
I think this needs a wider discussion, I’m not convinced this is a problem worth solving, also see the discussion below. Dirk Beetstra T C (en: U, T) 22:17, 2 October 2025 (UTC)
- Its not down for 10 years, maybe their original one.
Declined. As discussed below, let's do a more general clean-up.
- -- seth (talk) 13:22, 5 October 2025 (UTC)
hometown.aol.co.uk
[edit]hometown.aol.co.uk
is dead for >10 years now (according to archive.org).
So i'll remove the entry from the sbl in the next days, unless anybody objects.
-- seth (talk) 19:44, 1 October 2025 (UTC)
- More a general answer to these type of removals, sometimes we leave the site on because the sites are usurped or redirected to crap, or then it allows to link to archives from the time it was still alive, so you link to crap that we did not want to link to 10 years ago. Dirk Beetstra T C (en: U, T) 10:18, 2 October 2025 (UTC)
- When a website is taken over by someone else, it is unlikely that the same people will link to it as before. Since we do not block all websites that are potentially spam or otherwise inappropriate, but only those websites that have been attempted to be placed in Wikipedia, I do not consider it reasonable to block domains on this basis. Otherwise, we could also include any domains in the SBL that have never been linked to us but are rubbish, which would cause the SBL to explode.
- The web archive is a valid point. However, we can assume that 99% of all dead websites that are dead for a long time, will not be used in the Wikipedia again.
- -- seth (talk) 15:48, 2 October 2025 (UTC)
- One additional thing to this, usurping old websites is a thing (see the Judy case), if becoming free, many dead websites are free to usurp and re-use (or abuse/misuse). Archive sites still have very old data from some of these sites (you say that archive.org has data >10 years ago), but, especially if usurped, people should not be linking to the 'real' link anymore.
The gain in removing a few rules from the blacklist is minimal, and should not be our problem (the servers should be fast enough to assign enough time to it).
So, unless a website is repurposed to a new legit owner, there is no use in removing rules covering once existing websites that are now dead / usurped. Removing those rules from blacklists is, at best, going to give you more work cleaning up when unknowing editors start adding them again as 'originals' for archived websites, so I don't think they are useless. Dirk Beetstra T C (en: U, T) 08:41, 3 October 2025 (UTC)- Thanks for your assessment. I would like to add that the technical dimension does play a role: The SBL is not only an editorial tool for avoiding unwanted links, but also has a direct impact on server performance. Each additional entry increases the effort required to save each individual page, as the verification mechanisms consume more resources. Especially in the case of large wikis with high activity, this can add up and lead to noticeable delays.
- For this background, the domain blocklist was also developed with the aim of reducing the load (because it does not require regular expressions). Entries for permanently dead domains that are neither actively abused nor have any realistic chance of legitimate reuse no longer contribute to quality assurance in such cases, but instead place an unnecessary burden on the system.
- Yes, we have to be careful, when it comes to potentially usurped domains, but only if there are attempts to set links. For websites that have been dead for over a decade and show no signs of reactivation or abuse, removal from the SBL seems reasonable. If abuse does occur later, they can be re-added at any time.
- In short: editorial considerations must be supplemented by technical common sense. The SBL is not an archive, but a tool -- and like any tool, it should be maintained regularly.
- -- seth (talk) 09:01, 3 October 2025 (UTC)
- I do not fully disagree with that, Seth, and I have for a long time been pushing for an overhaul for the SBL mechanism (but that apparently never had proper priority). At the moment, the BED mechanism is very insufficient, it does not allow for whitelisting (which is for many domains necessary!), bringing still an unavoidable burden to the servers, ánd a burden to the maintaining administrators (whitelist requests for material on BED needs a move of the rule from BED to SBL, then whitelisting, all with its own paperwork, confusion of where links are blocked (2 SBLs and 2 BEDs)).
Over the last month we have added 40 domains (and we do not blacklist everything that could be blacklisted, COIBot marks reports stale if editors wiped all links while there is crap that should just be blacklisted regardless). The main problem is that the system is not scalable enough. Even if you remove now 200 domains, that really hardly affects the burden on the system (most of those rules will be the simple regexes, not the time consuming ones; maybe 1-2% time gain, where the list grows multiples of that in a year). And the losses are that some of those domains do return in some form, even if only as good faith edits, for which we do not have the manpower to follow up. And where do you put the cut-off, AOL is several years gone, encyclopediadramatica is at best a year, maybe still there under a different name (but hey, there was a reason why they were moving TLD the whole time, and likely there is a reason to now move away from the expected domainname, I would not be surprised that they buy them back, or that it would be usurped since many webpages still link to that). My experience with spammers has been that they even return (or never really go away) for decades, and I know what is needed when they do return: one of (the few of) us doing the cleanup.
What could be a good effort is to look at the domains on the SBL that are 'true spam' (no whitelisted rules anywhere), take them off and move them to BED. That is a much larger gain than removing a small percentage of domains from the SBL because we perceive them as 'no risk'. Dirk Beetstra T C (en: U, T) 10:48, 3 October 2025 (UTC) - Thinking about this, it is even way less. This list is 16000 lines. En.wikipedia has about 6500 lines (de.wiki less than a 1000), and 1000 whitelist rules (de.wiki less than 100). So overall, for en, 22500 lines (making 200 removals less than 1%, and those are likely 'simple' regexes). Dirk Beetstra T C (en: U, T) 12:23, 3 October 2025 (UTC)
- Actually I would not just delete entries of old dead pages. It would be even better to delete all entries that were not used (according to logs) during the last x years (for a reasonable value of x, e.g. 5). By that we would implicitly delete most of the dead website entries.
- So I'm not talking about just 200 entries or less than 1%.
- 10 years back I removed already all entries from the SBL that did not trigger for ~1,5a at that time.[1] I deleted ~124kB at a time when the SBL was 282kb big. That means, the SBL was reduced by 44%.
- Just now I checked, how many of the entries I deleted 2015 were re-added later, i.e. during the last 10 years (via
grep -Fxf current.txt old.txt | grep -Fvxf old_cleaned.txt
). It was even less than I expected: 5 entries, i.e. less than 0,1% of the deleted entries. - Nevertheless, I agree that an additional transfer to the BED makes sense.
- -- seth (talk) 20:56, 4 October 2025 (UTC)
- I'm not sure whether what was re-added as entries to the SBL is a good measure for that, it assumes that we efficiently detect that these sites are still being (ab/mis/re)used. Seen that I encounter cases where sites were abused for months without anyone suggesting them to be blacklisted, or even just be removed, does show that our detection is rather bad. And using redirects or archives of 'bad' sites is something that we also encounter.
Nonetheless, removing rules that don't hit for a long time is a much better way to clean up. That won't remove (and that is what I disagree with) sites that went dead recently (I have encyclopediadramatica on my watchlist, it is a near ongoing discussion about which site is the official website of the wiki, and whether the current domain is actually online or offline, if I am correct there has even been back-and-forth on the .es and .se). Some of the more recent ones then could go to BED (though that then also doesn't work for encyclopediadramatica, because it has whitelist rules on several wikis).
(apart from the question of cleanup, I would be interested in a tool that shows exactly when a site was last triggering a blacklist entry - it is sometimes a lot of work to show that spammers are actually not 'long gone' after we blacklist their site x years ago; a couple of months ago I declined a delisting request on en.wikipedia 'this site is not spam' because it was used - possibly independently, but nonetheless - in campaigns cross-wiki, and have in the past had delistings result in spammers happily return shortly after delisting). Dirk Beetstra T C (en: U, T) 23:48, 4 October 2025 (UTC)
- I'm not sure whether what was re-added as entries to the SBL is a good measure for that, it assumes that we efficiently detect that these sites are still being (ab/mis/re)used. Seen that I encounter cases where sites were abused for months without anyone suggesting them to be blacklisted, or even just be removed, does show that our detection is rather bad. And using redirects or archives of 'bad' sites is something that we also encounter.
- I do not fully disagree with that, Seth, and I have for a long time been pushing for an overhaul for the SBL mechanism (but that apparently never had proper priority). At the moment, the BED mechanism is very insufficient, it does not allow for whitelisting (which is for many domains necessary!), bringing still an unavoidable burden to the servers, ánd a burden to the maintaining administrators (whitelist requests for material on BED needs a move of the rule from BED to SBL, then whitelisting, all with its own paperwork, confusion of where links are blocked (2 SBLs and 2 BEDs)).
- One additional thing to this, usurping old websites is a thing (see the Judy case), if becoming free, many dead websites are free to usurp and re-use (or abuse/misuse). Archive sites still have very old data from some of these sites (you say that archive.org has data >10 years ago), but, especially if usurped, people should not be linking to the 'real' link anymore.
- @Lustiger seth There are probably hundreds of dead sites on the SBL. Why do you want to remove this one? Count Count (talk) 11:10, 2 October 2025 (UTC)
- Yes, there are many of those. When I deleted a lot of double entries in the SBL yesterday, I just saw "aol" and was wondering whether that website still exists.
- Furthermore I wanted to see with these two examples (see also request above) whether there are objections in deleting such old (and now useless) entries.
- In dewiki (and iirc enwiki too), we delete such entries from time to time. The SBL has an impact on the time needed for saving pages. So we should delete every entry that is unneeded.
- -- seth (talk) 15:55, 2 October 2025 (UTC)
- I think we have mentioned several general things which should be talked about in the general #Discussion section. I'll start a thread there soon.
- Concerning this specific aol domain: The general proposal to remove unused domains from the SBL should also implicitly cope with this domain. So there is no need to delete it now.
Declined
- -- seth (talk) 13:27, 5 October 2025 (UTC)
Troubleshooting and problems
[edit]Discussion
[edit]Tooling / cleaning
[edit]In #hometown.aol.co.uk we mentioned several ideas.
- It would be nice to have a tool that shows when an entry (or at least a specific domain or page) was last triggering a blacklist entry.
- As in 2015 [2] (see also Archives/2015-01) we should delete old entries that have not triggered the SBL for x years. (x = 5?)
- It might be reasonable to move simple SBL entries, i.e. domains, (that are not locally whitelisted) to the global BED (list of blocked external domains). However, Special:BlockedExternalDomains is disabled. So it this an option now anyway?
Concerning 1.: I'm using a script for this. But for every domain it needs ~1000 db requests (one for each wiki). So I'm not sure, whether I should put that in a public web interface. -- seth (talk) 14:58, 5 October 2025 (UTC)
- Re 1. The URL for SBL hits is encoded in logging.log_params in a non-indexable way (see e.g. quarry:query/97741). To make that feasible we would need to collect hits in a user db. I have been thinking about doing this for spamcheck for quite a while.
- Re 2. We could, but I don't see why that should be a priority IMHO.
- Re 3. There is no global BlockedExternalDomains, see phab:T401524. Once this is implemented with a way to allow local whitelisting we can move stuff over. Count Count (talk) 15:28, 5 October 2025 (UTC)
- 1. Yes, using the `logging` table in the db is what I do in my script and what I also did in 2015. I'm using the replica db at toolforge. Using the replica directly, i.e. without having a separate db that contains the needed information only, searching all ~1000 wmf wikis takes about 1 or 2 minutes for a given regexp.
- 2. I mentioned reasons in the thread above. In short: performance. However, you don't need to do anything. I'd do that.
- 3. "Once it is implemented [...]": I see. So let's skip that for now.
- -- seth (talk) 17:56, 5 October 2025 (UTC)
- 1. I've backparsed the db once (and some of that data is in the offline linkwatcher database), however that is taking a lot of time, and since it are one-off runs it does not go up-to-date. Search engine would be nice per-wiki (just looking back for the last n additions, with n defaulting to 2 or 3, looking backward for a choice timeframe), and one for cross wiki (with additional limitation for 'the big 5 wikis', 'the big 18 wikis + commons and wikidata', . For the application I suggested it does not have to find all additions, just the last couple.
2. I agree with the sentiment that it does not have priority, that performance loss is minimal, and I don't feel particularly worried if I blacklist 100 domains in one go that I bring the wiki down. Cleanup is good, though, it has a couple of advantages in administration as well (the occasional 'this website was spammed 15 years ago, it has now been usurped by another company', editing speed on the lists, easier to find complex rules).
3. BED really needs the whitelist to work on it, otherwise especially a global BED is going to be a pain for local wikis. Dirk Beetstra T C (en: U, T) 06:02, 6 October 2025 (UTC)- Unfortunately it seems that BlockedExternalDomain hit log entries are not being replicated to the Toolforge replicas. The log entries are just missing there. @Ladsgroup Is that on purpose? Count Count (talk) 07:56, 6 October 2025 (UTC)
- Compare e.g. de:Special:Redirect/logid/140172685 and quarry:/query/97766 Count Count (talk) 07:59, 6 October 2025 (UTC)
- @Count Count Hi. I don't think that's on purpose and fixing it is rather easy. Would you mind creating a phabricator ticket assigning it to me with link to this comment? Thanks Amir (talk) 15:32, 6 October 2025 (UTC)
- @Ladsgroup: Done, see phab:T406562. Thanks for having a look! Count Count (talk) 10:19, 7 October 2025 (UTC)
- Thanks! Amir (talk) 10:27, 7 October 2025 (UTC)
- @Ladsgroup: Done, see phab:T406562. Thanks for having a look! Count Count (talk) 10:19, 7 October 2025 (UTC)
- @Count Count Hi. I don't think that's on purpose and fixing it is rather easy. Would you mind creating a phabricator ticket assigning it to me with link to this comment? Thanks Amir (talk) 15:32, 6 October 2025 (UTC)
- 1. I wrote a script to fetch all SBL data from all wmf wikis since 2020 and write all the data into a sqlite-db (the script needs ~7 minutes). This is not soo big (3,4M datasets in a 0,7GB db-file) and could be a) updated automatically (e.g. every day or every hour) and b) used in a little web interface to search the data. If I automatically delete all data that is older than 5 years, this might even scale. After the bug Count Count mentioned will be fixed, I could add the BED logs.
- -- seth (talk) 22:00, 7 October 2025 (UTC)