leo-profanity
Profanity filter, based on "Shutterstock" dictionary
Installation
// npm
npm install leo-profanity
// Bower
bower install leo-profanity
// dictionary/default.json
Example usage for npm
var filter = ;
filter.list()
// return all profanity words (Array.string)filter;
filter.check(string)
Check out mor example on filter.clean
// output: truefilter;
filter.clean(string, [replaceKey=*])
// no bad word// output: I have 2 eyesfilter; // normal case// output: I have ****, etc.filter; // case sensitive// output: I have ****filter; // separated by comma and dot// output: I have ****.filter; // multi occurrence// output: I have ****,****, ***, and etc.filter; // should not detect unspaced-word// output: Buy classic watches onlinefilter; // clean with custom replacement-character// output: I have ++++filter;
filter.add(string|Array.string)
// add wordfilter; // add word's array// check duplication automaticallyfilter;
filter.remove(string|Array.string)
// remove wordfilter; // remove word's arrayfilter;
filter.reset()
Reset word list by using default dictionary (also remove word that manually add)
filter.clearList()
Clear all profanity words
Algorithm
This project decide to split it into 2 parts, Sanitize
and Filter
and these below is a interesting algorithms.
Sanitize
Attempt 1 (1.1): convert all into lower string
Advantage:
- simple
Disadvantage:
- none
Attempt 2 (1.2): turn "similar-like" symbol to alphabet
e.g. convert `@` to `a`, `5` and `$` to `s`
Advantage:
- simple + detect some trick word (e.g. @ss, b00b)
Disadvantage:
- "false positive"
- limit user imagination (user cannot play with word)
e.g. joe@ssociallife.com
e.g. user want to try something funny like "a$$a$$in"
Attempt 3 (1.3): replace `.` and `,` with space to separate words
in some sentence, people usually using `.` and `,` to connect / end the sentence
Advantage:
- increase founding possibility
e.g. I like a55,b00b
Disadvantage:
- none
Filter
Attempt 1 (2.1): split into array (or using regex, somehow)
using space to split it into array then check by profanity word list
Advantage:
- simple
Disadvantage:
- need proper list
- some "false positive"
e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)
Attempt 2 (2.2): filter word inside (with or without space)
detect all alphabet that contain "profanity word" (e.g. `thistextisfunnyboobsanda55`)
Advantage:
- simple
- can detect "un-spaced" profanity word
Disadvantage:
- many "false positive"
e.g. http://www.morewords.com/contains/ass/
e.g. Clbuttic mistake (filter mistake)
Summary
- We don't know all methods that can produce profanity word (e.g. how many different ways can you enter a55 ?)
- There have a non-algorithm-based approach to achieve it (yet)
- People will always find a way to connect with each other (e.g. Leet)
So, this project decide to go with 1.1, 1.3 and 2.1. (*note - you can found other attempts in "Reference" section)
TODO
-
add
method - Filter html syntax
- Support multi-language
- Complete
clean
API - Increase code coverage percentage
- Fix ESLint
- Demo page
- More word dictionary
-
setDictionary
function - Encapsulate private function
- Order by alphabetical
- Order by length
Other language / framework
- PHP on packagist.org
- Python on PyPI
- Java on Maven
- Wordpress on wordpress.org
Contribute
- Fork the repo
- Install Node.js and dependencies
- Make a branch for your change and make your changes
- Run
git add -A
to add your changes - Run
npm run commit
(don't usegit commit
) - Push your changes with
git push
then create Pull Request
Contribute for owner
$ npm install -g semantic-release-cli
$ semantic-release-cli setup
Using above command to setup "semantic-release"
Stats
Reference
- Inspired by jwils0n/profanity-filter
- Algorithm / Discussion
- "similar-like" symbol to alphabet
- Replace Bad words using Regex
- Clbuttic
- The Clbuttic Mistake
- The Clbuttic Mistake: When obscenity filters go wrong
- Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?
- How do you implement a good profanity filter?
- The Untold History of Toontown’s SpeedChat (or BlockChattm from Disney finally arrives)
- Profanity Filter Performance in Java
- Resource bad-word list
- Tool