Coder Social home page Coder Social logo

jjj333-p / spam-police Goto Github PK

View Code? Open in Web Editor NEW
21.0 2.0 8.0 361 KB

A matrix bot to monitor and respond to investment scam spamming across the matrix platform, for example in rooms with a permanently offline admin.

License: GNU Affero General Public License v3.0

JavaScript 99.92% Shell 0.08%

spam-police's Introduction

IMPORTANT !!

This instance of the bot is currently not being maintained. The project is not abandoned, but the bot is currently being rewritten, and this stable version kinda sucks right now. Proceed with caution and keep an eye on the 2024 rewrite branch. I will release an announcement when its ready.

Spam Police

A Matrix bot to monitor and respond to investment scam spamming across the Matrix platform, for example in rooms with a permanently offline admin.

Warning

This bot does not support encrypted rooms yet. This has been implemented in the sdk and likely could be added to the bot easy enough, however I don't currently have the proper access to the bot account required to set this up due to not being able to currently set up my desktop workstation.

Discussion

Inviting the bot

You can use my instance: @anti-spam:matrix.org, or self-host your own!

To invite it, you can run the command below in #spam-police-bot-cmds:matrix.org.

+join [Alias/ID to room]

By using my instance you agree to everything applicable within https://pain.agency/legal/

Note

If you have problems inviting the bot, make sure the bot can join it. if the room is invite only, invite the bot account to the room first. If you still have problems, join our support room.

Note

My instance logs the scams it finds to #jjj-tg-scams:matrix.org

Commands/Usage

Command prefix:

All commands below require the prefix before the command unless specified. The bot prefix can be customized on a per-room basis, in which the bot displayname will be set to [prefix] | Spam Police, with a default prefix of +.

Example usage: +uptime to run the uptime command with the default prefix of +

Basic commands

  • @anti-scam:matrix.org (no prefix) - pinging the bot will bring up a short introduction as well as a link back here, no need to bookmark this page!
  • uptime - displays the current bot uptime, doubles as a ping command

Essential Usage Commands

  • join [room ID/alias/matrix.to] - works only in #spam-police-bot-cmds:matrix.org. This command makes the bot join the referenced room. If the room is invite only (unsure why you'd use this bot in an invite only room) you'll need to invite the bot's account to the room and then run this command
  • mute - toggles mute mode on and off. In mute mode the bot will only react to detected scam messages, and will not display the warning message.

Moderation Related Commands

  • rules - generates a json file of all rules that apply to the room, sorted by banlist. Adding a way to filter for a specific user is planned but not currently a thing

Note the following commands require ban permission unless otherwise noted because that is the powerlevel required to perform these same actions manually (or with a selfbot)

  • followbanlist [add/remove] [room ID/alias/matrix.to] - subscribes or unsubscribes the room to a mjolner banlist, and performs mass moderation actions as mjolner would.
  • banlist [add/remove] [here / room ID/alias/matrix.to] [target user mxid] <reason> - writes or removes a mjolner ban recomendation policy to a "banlist" room with a given reason
    • requires ban permission or permission to write m.rule.policy.user state events as that is the permission required to perform the same actions by hand or selfbot.

Administration Commands for Selfhosters

  • restart - runs Process.exit(), only restarts if you have it set to run upon exit such as through systemd (this is how I do it on the production system)
  • leave <room ID/alias/matrix.to> - makes the bot leave the specified room, if no room is specified it leaves the current room. Also adds the room to a blacklist to prevent it from being re-added to the room (also gets added if a room mod kicks the bot).
  • unblacklist [room ID/alias/matrix.to] - remove rooms from the administrator blacklist.

Self-hosting

Requirements

Note

The bot is very light during run time and barely uses any resources, however it sees a large resource spike at startup due to initial sync. This can cause it to crash on lower powered systems (like the hetzner cpx11 vps) if you have an old sync token. What I do to fix this is I run the bot on a higher powered system like my dev machine (tbh anything with 4+ gb of real ram should work, 8+gb ram and quad core should be enough for the spike to be unnoticable), and then copy that sync token in bot.json into the bot.json on my vps. This issue is not faced on the Hetzner cpx21, which I now run. Please let me know or contribute if you know how to make the initial sync less resource intense.

Instructions

  1. Download the latest stable version located in the branches
    • Stable branches are formatted as stable-vX.X.X-(version-X,-update-x,-patch-X)
    • Downloading as a ZIP and extracting it is recommended
      • Using git: git clone -b <branch> --single-branch https://github.com/archeite/spam-police.git

Note

For a development version, you download from the master branch instead of the stable branch. The git command is shown below

$ git clone -b master --single-branch https://github.com/archeite/spam-police.git
  1. Go into the folder you cloned (cd spam-police), create a directory named db (mkdir -p db), and enter it (cd db)
cd spam-police && mkdir -p db && cd db
  1. Copy the example configuration file from examples/login.yaml to db (cp ../examples/login.yaml ./)

  2. Edit the configuration file to your liking

  3. Go back to the root directory (cd ..) and create bot.json (touch bot.json)

Note

You don't need to put anything in bot.json, leave it empty

This appears to be how the bot SDK saves the sync token and stuff.

  1. To install dependencies, run npm install

  2. Start the bot with node index.js or node .

spam-police's People

Contributors

jjj333-p avatar archeite avatar dependabot[bot] avatar fd1f avatar gravax avatar jokergermany avatar

Stargazers

Cirnos avatar  avatar Walter C avatar Andrea PIERRÉ avatar Tom Foster avatar Christoph avatar  avatar  avatar Ari Archer (migrated to https://git.ari.lt/ari) avatar Michael avatar  avatar Black Pjotr avatar Linux in a Bit avatar koutsie avatar Jussi Kuokkanen avatar  avatar RayManD avatar  avatar  avatar Michael Sasser avatar akc3n avatar

Watchers

 avatar  avatar

spam-police's Issues

string distance / fuzzy matching instead of hard substring keyword searching

It could be worthwhile to also implement some simple edit-distance based fuzzy typo allowance & fuzzy keyword matching might be set as well. And also, if a message contains too (many) characters not participating in valid words of the sentence, that would be a red flag.

Each room is limited to a single language in 99% of the cases, thus posting foreign spam is already a red flag. This is important in the dozens of local language rooms where the indiscriminate English spammer sometimes joins as well. But also, dictionaries exist (see your package manager, or Wiktionary, Wikipedia, etc). Or you could just go through the chat log to collect words and sentences used by non-troll members in the past (=ham) to help discriminate it from unusual content (spam).

element : after mentions makes bot think its a command

image

{
  "content": {
    "body": "+ | Spam Police: ",
    "format": "org.matrix.custom.html",
    "formatted_body": "<a href=\"https://matrix.to/#/@anti-scam:matrix.org\">+ | Spam Police</a>: ",
    "msgtype": "m.text"
  },
  "origin_server_ts": 1694056007492,
  "sender": "@ghost:waffle.tech",
  "type": "m.room.message",
  "unsigned": {},
  "event_id": "$_-dwV2AdmEcP2mXFzXRr5yb1YmsJvWHWgNxC57ha3IQ",
  "room_id": "!JglTjYmZcLVE6tFH:pain.agency"
}

i think i need to adjust the parcing so that it looks if the mention is the only word, not if the message includes only the mention. just the : at the end of the mention throws it off

creates files that are invalid in windows

i didnt realize this until i tried to run the bot on a windows "server" i have, but my code creates files with @ in the name which is invalid in windows and causes it to crash. need to fix asap

Option to disable mod mentions

We sometimes invite it to rooms that are abandoned. These are very good inputs to the scam detector, but it makes no sense to mention "mods" if there is nobody there. The current workaround is if a mod mutes the whole room completely, which is not ideal.

It might be a good idea to have options to set this via #11

watch device lists

most spammers are on matrix.org, and they always reuse the session names, and they usually only have a single device too, with #57 it's also easier to find more device names

add help command

bot responds to ping but it should also respond to help. most people try to run help instead of just mentioning the bot

Improve introduction text

The text (#16 ) should explain that if joining was unwanted, the operator should be contacted (within the support room?) and/or that it can be kicked.

But effective abuse control through kicking requires that it also mentions kicking in the control room, and that would be additional complexity.

Only accept commands in the commands room

  • At present, anyone could get your bot banned by manually invite it or commanding it towards a high number of rooms (or just a few with a trigger-happy moderator)
  • As an easy workaround, only accept bot commands within its designated command room
  • You may leave the room public while it is not abused, but as a benefit, you would see where it is commanded to and by whom (and ban those who abuse this right and/or make the room invite-only later on)

supply size info when uploading files

currently no size information is supplied which is within spec but some clients dont like. apparently buffer.length is a convenient way to just get this information
image

things to improve efficency

  • central state fetching on event?
  • pass state through functions
  • ^ to config fetching from state
  • caching state for multi account?

handle when a room has no main alias

The bot writes "in null" if the room has no primary alias published (but may have aliases otherwise) Could we perhaps print (all of) its aliases in such cases?

sharding / server straddling

  • - main hs m.org ?
  • - send on pain.agency to avoid ratelimit
  • - appservice api for deeper hs probing?
  • - redaction of soft-failed events? (redaction on not main hs for ratelimit)
  • - - select * from events where event_id = '$myredactionevent'

Message purge functionality

add feature to be able to fetch and purge x amount of messages or filter them for stuff, similar to discord bots

New bot command to identify the bot

  • At present, it only introduces itself in the initial post
  • Provide an option for the bot to identify itself when queried (ideally by a moderator later on, but at present just by anyone).
  • As a simple implementation, it could always reply when @mentioned.

use curl to check t.me urls

when a telegram link is posted, the bot should use curl to check the name of the group which will help identify the scammers that post "crypto investment groups" links but with no other message to set off the filters
image

community management

  • - mod room
  • - state keys in mod room to identify child rooms
  • - child rooms have shared secret in state to confirm, not room id for protection
  • - bans from mod room dont identify the mod (maybe send event id, which will be hard to find outside and protect identity but be traceable)
  • - protection against state diverge?
  • - automated protection
  • - banlist short coding

Bot commands

The bot could be controlled by slash commands in a designated room and/or by whoever invited it and/or from whichever room it was sent from there.

!police modmention false
or !police modmention #room:example.com false

race condition upon reacting with "banned" after banlisting a user

if the bot manages to receive multiple users confirming at once, it will banlist twice and try to react twice causing the bot to crash and restart
image

Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]: MatrixHttpClient (REQ-64490) {
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:   errcode: 'M_DUPLICATE_ANNOTATION',
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:   error: "Can't send same reaction twice"
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]: }
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]: /home/joseph/spam-police/node_modules/matrix-bot-sdk/lib/http.js:95
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:         throw new MatrixError_1.MatrixError(errBody, response.statusCode);
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:               ^
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]: MatrixError: M_DUPLICATE_ANNOTATION: Can't send same reaction twice
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:     at doHttpRequest (/home/joseph/spam-police/node_modules/matrix-bot-sdk/lib/http.js:95:15)
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:     at async descriptor.value (/home/joseph/spam-police/node_modules/matrix-bot-sdk/lib/metrics/decorators.js:19:32)
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:     at async descriptor.value (/home/joseph/spam-police/node_modules/matrix-bot-sdk/lib/metrics/decorators.js:19:32)
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:     at async descriptor.value (/home/joseph/spam-police/node_modules/matrix-bot-sdk/lib/metrics/decorators.js:19:32) {
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:   body: {
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:     errcode: 'M_DUPLICATE_ANNOTATION',
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:     error: "Can't send same reaction twice"
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:   },
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:   statusCode: 400,
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:   errcode: 'M_DUPLICATE_ANNOTATION',
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:   error: "Can't send same reaction twice",
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]:   retryAfterMs: undefined
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]: }
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 snap[721641]: Node.js v18.18.0
Oct 01 14:30:24 snapshot-115842185-debian-2gb-hil-1 systemd[1]: spam-police.service: Main process exited, code=exited, status=1/FAILURE

enact moderation upon banlist writing

currently the bot does not do anything when a new banlist rule is written

  • it should go through all rooms, find ones following the banlist and perform that action
  • needn't do anything special for when the bot writes a banlist recomendation, as it will come back as an occured event in the client.on loop - need to make sure it doesnt filter out events originating from itself
  • only perform action if the member is in the room

Aggregate the rooms of occurrence per abuser

At present, the bot outputs a separate entry for each abuser.

In our ban reason, I mention all major rooms where the abuse happened. If I just blindly copy & paste these warnings, I will keep rewriting the rule for the user, thus losing the past list of abuse for him.

Hence it would be useful if the bot either replied to its previous warning for the same user, or if it edited the past warning to add new rooms there.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.