Coder Social home page Coder Social logo

mkb2091 / blockconvert Goto Github PK

View Code? Open in Web Editor NEW
113.0 5.0 14.0 534.09 MB

Malware, advert and tracking blacklist

License: MIT License

Shell 0.07% Rust 97.54% CSS 0.04% JavaScript 0.19% TypeScript 2.16%
adblock-list adblocker-lists phishing-sites pihole-blocklists pihole-ads-list pihole-adblocker-list hosts hosts-file ads filterlist

blockconvert's Introduction

BlockConvert

Malware, advert and tracking blocklist which consolidates and improves upon many other blocklists.

What this blocks:

  • Malware/Phishing: Many malware lists are used in building this list, including multiple malware IP lists, which are used to find many more malware domains.

  • Adverts: Adblock syntax is partially supported, so this list is able to extract some advert domains. This list is pretty good at blocking adverts, but an in-browser adblocker such as uBlock Origin is recommended as well as relying on hosts/DNS blocking.

  • Trackers: Many tracking domains are extracted from the lists used, including Privacy Badger data files which automatically identify trackers.

  • Coin mining: A few coin mining blocklists are used to block browser-based coin mining from using cpu.

Advantages of using this list:

  • Conversion of list types. As well as supporting many common filter list formats, it also supports Privacy Badger data file, which uses algorithms to detect trackers allowing newly created trackers to be quickly detected and added to this blocklist without a human needing to spot the tracker.

  • Reverse DNS and passive DNS on malware IP addresses. This allows finding all the domains which a malware IP blacklist suggests could be dangerous to be found and blocked. This allows blocking of malware domains that haven't yet been added to other malware domain lists.

  • Use of a whitelist. Using a hosts file doesn't allow whitelisting, and many DNS-based blockers don't have great whitelist support. This list has it's own whitelist, as well as using a few others to try to reduce false positives. This list supports "*" in subdomain and TLD to aid in easily fixing many false positives at once. (If you do find a false positive(a domain that shouldn't be blocked), then please make an issue and I will remove it)

  • Use of DNS to check if domains still exist. Many lists contain domains that have expired and no longer exist. This makes those lists larger than needed which wastes bandwidth, space and can slow blocking.

How to use:

  • Pi-hole: Go to the web interface. Log in. Settings -> Blocklists. Copy domain list URL(Pi-hole currently only supports domain lists) from below in the links section, and paste it in the textbox. Click Save.

  • Blokada: Open Blokada. Click shield with black middle which says "{number} in blacklist". Click the plus in the circle at the bottom of the screen. Copy and paste hosts file from link sections. Click save. WARNING: This list is large and might slow down your phone

  • uBlock Origin: Click the uBlock Origin logo/uBlock Origin extension. Click open dashboard(3 horizontal lines under the disable uBlock Origin button, on the right). Click Filter lists. Scroll to the bottom, and click Import(in custom section). Copy and paste the Adblock style blocklist from the link section below.

Links

Adblock Plus format

Hosts file format

WARNING: Too large for Windows: #87

Domain list

Blocked IP address list

DNS Response Policy Zone(RPZ) format

As well as generating blocklists, this project also generates whitelists which are used in the process. If you maintain your own blocklist, you may find one of the following whitelists useful:

Whitelisted domains

Whitelisted ABP format

The Process

  1. Download all expired filterlists

  2. Combine and split all the filterlists based on their type. This splits the lines into seperate groups: Adblock rules, blocked domains, regexes of blocked domains, allowed domains, regex of allowed domains, ips which are blocked, ips which are allowed, subnets which are blocked, subnets which are allowed.

  3. Apply a regex to all the filterlists to extract domains and combine with other domains found via other means.

  4. For each of those domains, use DNS to check if the domain is still active. If the domain isn't in the allowed domains list, doesn't match any of the allowed regexes, isn't in allowed by an adblock exception rule and it is blocked, or one of its cnames/ips is blocked then add it to the output.

Sources: Sources

blockconvert's People

Contributors

mkb2091 avatar t145 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

blockconvert's Issues

static.tacdn.com

Hi there!

Another false-positive found, static.tacdn.com. This is the static CDN provider for tripadvisor.ca, a travel and restaurant guide based in Canada.

Thanks!

mailchi.mp

Hi,
This is a well-known email subscription service. I believe this may be a false positive.

funk.eu

This site contains software downloads
Not malicious (maybe some piracy)

Whitelist Request - substrate.office.com

Microsoft To Do sync is broken if the domain is blocked. Initializing a sync command request results in only this domain being queried. No other domain is utilized after substrate.office.com is whitelisted (such as a tracker being blocked causing a subsequent failure).

I do not have sufficient knowledge if the domain is also utilized for data tracking from other MS based applications.

gcs.sc-cdn.net

gcs.sc-cdn.net is a false-positive. This domain belongs to Snapchat's face filter feature. By having this domain blocked, it prohibits a major feature of the app.

us.archive.org

ia600903.us.archive.org
ia600905.us.archive.org
ia601400.us.archive.org
ia601403.us.archive.org
ia601506.us.archive.org
ia800905.us.archive.org
ia801400.us.archive.org
ia801402.us.archive.org
ia801405.us.archive.org
ia801407.us.archive.org
ia801502.us.archive.org
ia801503.us.archive.org
ia801505.us.archive.org
ia801506.us.archive.org
ia801507.us.archive.org
ia801509.us.archive.org

The following are on the blocklist. These specific domains provide CDN delivery for downloads on the Internet Wayback Machine (archive.org) files. I believe they are also a false-positive.

IPv6 in IPv4

Hi,

https://raw.githubusercontent.com/mkb2091/blockconvert/master/output/ip_blocklist.txt
ends with fe80::ff:fe9c:89c5 and no EOL

Thanks

False Positives found

I have been using the list for a little bit, I like the idea and it seems like a pretty solid block list but I have run into a few false positives in the last couple days.

ci4.googleusercontent.com
ci5.googleusercontent.com
ogs.google.com
online.jimmyjohns.com
vortex.accuweather.com

The first 3 are related to showing images in emails on gmail.
The 4th is a website for ordering food.
The 5th is for displaying rader/maps in the accuweather app

disqus.com

This is a software platform that allows comments/discussions/replies, and is often built right into news articles.

sfdataservice.microsoft.com

Hi again,

Found a new false-positive. The domain is sfdataservice.microsoft.com. Blocking this domain forbids payments made through the Microsoft store on Desktop browser usage. You can see this occur when attempting a purchase on a webpage like the Xbox Game Pass.

Thank you!

Wildcards in blacklist

Hi,
Correct me if I am wrong, but I believe blacklist.txt is supposed to be domains only
Out of 217244 domains, 5 domains are invalid due to the wildcard *
adservice.google.*
adskeeper.co.*
id.google.*
zzxosget.com*.crisp.chat
zzzmen99.had.su*.m.gxwztv.com

False positive found

blog.gab.com
btcpay.gab.com
business.gab.com
code.gab.com
develop.gab.com
develop.pro.gab.com
docker.develop.gab.com
eng-001.develop.gab.com
eng-002.develop.gab.com
gab.com
help.gab.com
invest.gab.com
mailer.gab.com
news.gab.com
not-develop.gab.com
ns1.gab.com
pro.gab.com
share.gab.com
shop.gab.com
trends.gab.com
unsubscribe.gab.com
www.blog.gab.com
www.gab.com
www.news.gab.com

These are safe domains, not sure why would they be blocked.

Query adlists

Hi,
Great tool. I love the idea of using DNS to remove dead domains and reduce my massive 2 billion domains down.
However I am hoping you can add functionality similar to in pihole, which allows you to search through all or your lists and see which list a false positive domain is originating from.
I use this with lists that have too many false positives, I remove.

soundcloud.com

soundcloud.com was recently added to the list.

soundcloud.com appears to be a false-positive, however, the other telemetry links in the blocklist do not. By blocking soundcloud.com it prohibits connecting to soundcloud.com on Desktop, surprisingly not on the mobile apps in the same network.

xvideos.com

Hi, this domain is on your list... you don't seem to be censoring XHamster or PornHub, so I think this one mistakenly snuck onto your list.
Please remove, as it's not malicious, and AFAIK you aren't meaning to censor!
Thanks

statics.streamable.com

statics.streamable.com is a false-positive. It will not display streamable.com webpage, a low-bloat video hosting platform.

simply.com false positive

Hi

We aquired simply.com and are working on launching a webhosting project on it. However the domain is blocked by you, due to past advertising activity (before we bought it).

Can you delist the domain from your blocklists?

m.me

Hi,
This domain is simply Facebook Messenger URL shortening service

3p.ampproject.net

Hi,
This breaks all amp-powered articles on Google News on Android
Thanks

logincdn.msauth.net

logincdn.msauth.net is a false-positive. It is the content distribution network for Microsoft and Microsoft365 related services. Log-in pages will be affected and will not display content correctly if this domain is blacklisted.

accounts.nvgs.nvidia.com

Needed to log into Geforce Experience or Nvidia services on Nvidia Shield
This is the accounts subdomain for managing account

bitcoin.comnews.bitcoin.com bitcoin.cowww.bitcoin.com

I was trying to see if you were blocking some of zerodot's other mining domains, like news.bitcoin.com and found:

bitcoin.comnews.bitcoin.com
bitcoin.cowww.bitcoin.com
Pretty sure these are accidentally concatenated
But, once you fix it, I'd really recommend not blacklistting these domains

I've opened up issues at zerodot's gitlab, but he doesn't agree with me.
I agree with blocking javascript coinminers (as they are malicious), but when he starts blocking EVERY CRYPTO NEWS WEBSITE EVER, I think he started taking it too far.

Maybe, you want to include ONLY the browser mining, and not the main list. (Main list blocks everything like coingeek and news.bitcoin.com and coinmarketcap and lots of news sites)

discord.com

discord.com recently was added to the list.

Discord is a popular VoIP and chat communication service. I do not believe their webpage contains any analytical tracking for it to be placed on the list.

bit.ly

bit.ly found in blacklist.txt
This is a non-malicious URL shortening service

However, was not found in domains.txt
What is the difference between the 2 files?

tradingview.com

Hello!

Found more recent false-positives in the list. This domain has quite a few subdomains listed, not all of them should be whitelisted, as most of the tradingview.com entries belong on the list.

These domains should be whitelisted as they provide market graphs for stocks:
widgetdata.tradingview.com
s.tradingview.com
s3.tradingview.com

You can see this for example on Investopedia.com

Cheers

lnkd.in

Hi,
This is LinkedIn URL shortener service

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.