Comments (3)
Honestly I don't think this is Bleach's responsibility. Do you have a use case where the app using Bleach doesn't know the incoming text charset?
I guess I'd call this a documentation issue and say if you know you're going to be passing in a bytestring with non-ASCII characters, convert it to a unicode
object first.
from bleach.
Calling this a documentation issue is a reasonable resolution. This line should be changed:
"NB: Bleach accepts bytestrings or unicode, but it always returns unicode."
To me, it implies that bleach is going to attempt to do the right thing when converting to unicode. Maybe something similar to this:
NB: Given a bytestring or unicode, Bleach will always return unicode; however, the conversion from bytestring to unicode does not respect character encoding. If you are using a specific encoding, please convert the bytestring to unicode before using Bleach to ensure document integrity.
from bleach.
Updated the README, and I also added an explicit cast to UTF8 for bytestrings, so your original example should just work now (as opposed to the implicit bytestring->unicode conversion that was happening). This makes clean()
more consistent with linkify()
.
It would be relatively easy to add an encoding='utf-8'
kwarg to both clean()
and linkify()
, that would be passed along to force_unicode()
. Bleach still wouldn't do charset detection but you wouldn't need to do the decode()
yourself.
from bleach.
Related Issues (20)
- support python 3.11 (october 3rd, 2022)
- bug: bleach truncates Katex style attributes HOT 7
- Solo quiero decir que Bleach vuelve en octubre ❤️💪😎🍷💕
- bug: hardcoded dev dependency versions breaks mypy usage HOT 5
- fork html5lib-python or find alternative HOT 1
- bug: bleach.clean is not handling & symbol very well HOT 1
- Possible to only allow target="_blank" but no other values? HOT 3
- tox utility environments are constrainted to only run on Linux HOT 1
- bleach is deprecated; statement on project going forward (2023-01-23) HOT 11
- RFE: please provide update for latest `tinycss2` 1.2.1 HOT 2
- RFE: lease drop use `six` module HOT 1
- bug: linkify with entities inside anchor strings are incorrectly escaped HOT 1
- Open angle bracket '<' with few words after cleaned up if there's no closing bracket HOT 1
- bug: using OpenSUSE and Fedora packages which change the Bleach code, parse_shim tests fail with Python 3.10.12 HOT 6
- bug: Cleaner removes href valid tag "tg://user?id=124124124" HOT 1
- bug: drop support for Python 3.7 which is EOL
- feature: add support for Python 3.12
- Style attributes are getting stripped off HOT 13
- Open bracket '<' still cleaned up without closing bracket
- RFE: move away from deprecated `html5lib` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bleach.