Coder Social home page Coder Social logo

django-html_sanitizer's Introduction

Django HTML Sanitizer

Django HTML Sanitizer provides a set of utilities to easily sanitize/escape/clean HTML inputs in django. This app is built on top of bleach, the excellent Python HTML sanitizer.

Dependencies

Installation

You'll first need to install the package (or download manually from pypi):

pip install django-html_sanitizer

And then add sanitizer to your INSTALLED_APPS in django's settings.py:

INSTALLED_APPS = (
    # other apps
    "sanitizer",
)

Model Usage

Similar to bleach, django sanitizer is a whitelist (only allows specified tags and attributes) based HTML sanitizer. Django sanitizer provides two model fields that automatically sanitizes text values; SanitizedCharField and SanitizedTextField.

These fields accept extra arguments:

  • allowed_tags: a list of allowed HTML tags
  • allowed_attributes: a list of allowed HTML attributes, or a dictionary of tag keys with atttribute list for each key
  • allowed_styles: a list of allowed styles if "style" is one of the allowed attributes
  • strip: a boolean indicating whether offending tags/attributes should be escaped or stripped

Here's how to use it in django models:

from django.db import models
from sanitizer.models import SanitizedCharField, SanitizedTextField

class MyModel(models.Model):
    # Allow only <a>, <p>, <img> tags and "href" and "src" attributes
    foo = SanitizedCharField(max_length=255, allowed_tags=['a', 'p', 'img'],
                             allowed_attributes=['href', 'src'], strip=False)
    bar = SanitizedTextField(max_length=255, allowed_tags=['a', 'p', 'img'],
                             allowed_attributes=['href', 'src'], strip=False)
    foo2 = SanitizedCharField(max_length=255, allowed_tags=['a', 'p', 'img'],
                             allowed_attributes={'img':['src', 'style']},
                             allowed_styles=['width', 'height'], strip=False)

Form Usage

Using django HTML sanitizer in django forms is very similar to model usage:

from django import forms
from sanitizer.forms import SanitizedCharField

class MyForm(forms.Form):
    # Allow only <a>, <p>, <img> tags and "href" and "src" attributes
    foo = SanitizedCharField(max_length=255, allowed_tags=['a', 'p', 'img'],
                             allowed_attributes=['href', 'src'], strip=False)
    bar = SanitizedCharField(max_length=255, allowed_tags=['a', 'p', 'img'],
                             allowed_attributes=['href', 'src'], strip=False, widget=forms.Textarea)
    foo2 = SanitizedCharField(max_length=255, allowed_tags=['a', 'p', 'img'],
                             allowed_attributes={'img':['src', 'style']},
                             allowed_styles=['width', 'height'], strip=False)

Template Usage

Django sanitizer provides a few differents ways of cleaning HTML in templates.

escape_html Template Tag

Example usage:

{% load sanitizer %}
{% escape_html post.content "a, p, img" "href, src, style" "width"%}

Assuming post.content contains the string '<a href ="#" style="width:200px; height="400px">Example</a><script>alert("x")</script>', the above tag will output:

'<a href ="#" style="width:200px;">Example</a>&lt;script&gt;alert("x")&lt;/script&gt;'

On django 1.4 you could also use keyword arguments:

{% escape_html '<a href="">bar</a>' allowed_tags="a,img" allowed_attributes="href,src" allowed_styles="width" %}

strip_html Template Tag

Example usage:

{% load sanitizer %}
{% strip_html post.content "a, p, img" "href, src" %}

If post.content contains the string '<a href ="#">Example</a><script>alert("x")</script>', this will give you:

'<a href ="#">Example</a>alert("x")'

escape_html Filter

Escapes HTML tags from string based on settings. To use this filter you need to put these variables on settings.py:

  • SANITIZER_ALLOWED_TAGS - a list of allowed tags (defaults to an empty list)
  • SANITIZER_ALLOWED_ATTRIBUTES - a list of allowed attributes (defaults to an empty list)
  • SANITIZER_ALLOWED_STYLES - a list of allowed styles if the style attribute is set (defaults to an empty list)

For example if we have SANITIZER_ALLOWED_TAGS = ['a'], SANITIZER_ALLOWED_ATTRIBUTES = ['href'], SANITIZER_ALLOWED_STYLES = ['width'] in settings.py, doing:

{% load sanitizer %}
{{ post.content|escape_html }}

If post.content contains the string '<a href ="#" style="width:200px; height:400px">Example</a><script>alert("x")</script>', it will give you:

'<a href ="#" style="width=200px;">Example</a>&lt;script&gt;alert("x")&lt;/script&gt;'

strip_html Filter

Similar to escape_html filter, except it strips out offending HTML tags.

For example if we have SANITIZER_ALLOWED_TAGS = ['a'], SANITIZER_ALLOWED_ATTRIBUTES = ['href'] in settings.py, doing:

{% load sanitizer %}
{{ post.content|strip_html }}

If post.content contains the string '<a href ="#">Example</a><script>alert("x")</script>', we will get:

'<a href ="#">Example</a>alert("x")'

Changelog

Version 0.1.5

  • Fixes for smart_unicode and basestring (python 3.x support)

Version 0.1.4

  • CharField, TextField, strip_html and escape_html now support allowed_styles (thanks cltrudeau,
  • Added an example of template tag usage using kwargs now that Django 1.4 is out

Version 0.1.2

  • allowed_tags and allowed_attributes in CharField and TextField now default to []

django-html_sanitizer's People

Contributors

cltrudeau avatar gbezyuk avatar kckrinke avatar selwin avatar the-glu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-html_sanitizer's Issues

import problems using django_sanitizer as python egg

If I install django_sanitizer as python egg via buildout, line bellow fails:
from sanitizer.models import SanitizedTextField
But if I simply clone http-readonly repository, import statement works fine.
Looks like something is wrong with the package configuration.

Incompatible with new bleach

In order to adapt to the API changes in the current html5lib release, bleach is making substantial changes of its own:

https://github.com/mozilla/bleach

It does not appear to me that django-html_sanitizer is compatible with the bleach changes. If I run test.py in python2.7 with the new bleach I get:

python tests.py
Traceback (most recent call last):
File "tests.py", line 6, in
from sanitizer.templatetags.sanitizer import (sanitize, sanitize_allow,
ImportError: No module named templatetags.sanitizer

The new bleach is expected to be released next week.

clean() got an unexpected keyword argument 'styles'

Unsurprisingly this looks like its not compatible with Bleach 5.0.0 due to that styles kwarg change. We've had the the older version of bleach pinned for a while but would it be possible update this library to support Bleach 5.0.0? I can create a PR with the change if needed (seems fairly straightforward?)

Python3 import smart_unicode

Hi, I get import error:
ile "/lib/python3.4/site-packages/sanitizer/models.py", line 3, in
from django.utils.encoding import smart_unicode
ImportError: cannot import name 'smart_unicode'

After replace in your models.py 'smart_unicode' to 'smart_text as smart_unicode' the error is gone.

Settings.py issue

I've installed bleach and sanitizer, but I get ImportError: No module named sanitizer when trying to add "sanitizer" to INSTALLED_APPS.

Update python package

The python package isnt updated to include the smart_unicode fix which is causing errors

Adding values in get_prep_value sometimes causes IntegrityError

If strip=False then escaping introduces extra characters than the given string which might lead to integrity constraint errors while saving to database. This is caused since when we have a 100 character column and we give 80 characters of HTML with strip as False then validation passes through but the escaping adds more characters that might cause the string length to be more than 100 characters while saving to database. The safe way will be to use strip=True. I don't know how this can be handled in this layer but just wanted to give a heads up on this.

Thanks for the library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.