Coder Social home page Coder Social logo

banbuilder's Introduction

Flattr this git repo

Banbuilder

Banbuilder is a PHP function and bad word database for profanity filtering. The PHP script uses regex to intelligently look for "leetspeak"-style numeric or symbol replacements.

The databases of profanity are located in the lang/*.wordlist-regex.php files, and are in a simple PHP array. So far we have translations for:

  • US English
  • British English
  • Spain Spanish
  • France French
  • Korean (South)
  • Netherlands Dutch
  • Norwegian (Bokmål & various dialects)

We are actively looking for translation files!

Usage

Simply require the database file and the function file, and invoke the function:

 $badwords = array();
 include('lang/en-us.wordlist-regex.php');
 include('censor.function.php');
 $censored = censorString($input, $badwords);

To use multiple language dictionaries, just include them.

 $badwords = array();
 include('lang/en-us.wordlist-regex.php');
 include('lang/fr.wordlist-regex.php');
 include('lang/es.wordlist-regex.php');
 include('censor.function.php');
 $censored = censorString($input, $badwords);

Make sure you set the $badwords = array(); before including the dictionaries.

You end up with an array called $censored. You can access the original, uncensored string as $censored['orig'] and the newly censored string as $censored['clean'].

There is an optional string parameter that you can pass to use a different replacement character. For example:

 $censored = censorString($input, $badwords,'X');

Will replace "bitch" to "XXXXX" instead of the default "*****".

If the optional parameter is more than 1 character long, a bad words will be replaced by random strings which are composed by characters taken from the string. For example:

 $censored = censorString($input, $badwords,'X!°#%$@');

Could replace "bitch" to something like "%#@#@" or "%@!X#".

Summary

In a nutshell, this code takes your array of bad words and compares it to an array of common filter-evasion tactics. It then does a string replacement to insert regex parameters into your badwords array, and then evaluates your input string to that expanded banned word list.

So in your bad words array, you might have:

 [0] => 'ass'

The preg_replace functions replace all of the possible shenaningan letters with regex patterns (in lieu of adding the variants onto the end of the array), so the 'ass' in your array gets turned into this, right before the preg_replace checks for matches:

 [0] => /(a|a\.|a\-|4|@|Á|á|À|Â|à|Â|â|Ä|ä|Ã|ã|Å|å|α)(s|s\.|s\-|5|\$|§)(s|s\.|s\-|5|\$|§)/i

This means that a word can have none, one or any variety of leet replacements and it will still trip the trigger. Part of the leet filter includes stripping out letter-dash and letter-dots.

This means that the following all evaluate to the "bitch":

  • B1tch
  • bi7tch
  • b.i.t.c.h.
  • b-i-t-c-h
  • b.1.t.c.h.
  • ßitch
  • and so on....

Legacy Database

When this project was first started, it was used to compile a database of swear words in every permutation for scenarios where regex wasn't possible for whatever reason). While the regex method is much better, the legacy full (non-regexy) databases contain over 800 words ready to use in banned words lists for projects. You can find those badword lists in the word-dbs directory in the repo.

These files are extra and are not required for the PHP function to run.

Current file types are:

  • SQL
  • CSV
  • LaTex
  • YML
  • XML
  • PHP Array

banbuilder's People

Contributors

ad7six avatar jegtnes avatar michelebertoli avatar rtxanson avatar ryoshu avatar snipe avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.