Coder Social home page Coder Social logo

Comments (12)

jsocol avatar jsocol commented on June 9, 2024

I don't consider linkifying something that looks like a domain a bug: the bug here is that path/to/file.sh shouldn't be considered to "look like a domain". Back into the URL regex to fix this.

from bleach.

kumar303 avatar kumar303 commented on June 9, 2024

One thing to consider is that top level xpi files might exist without any prefixed path, something like "somescript.sh" -- this would be linked by linkify, which is not desired for amo-validator output.

from bleach.

jsocol avatar jsocol commented on June 9, 2024

.sh is a valid TLD, this behavior is correct. Turning it off would also stop linkifying example.com in the text. If you don't want this behavior, why are you running linkify at all?

from bleach.

kumar303 avatar kumar303 commented on June 9, 2024

Right, losing example.com would be fine. We just use it for URLs that are already prefixed with http(s). We want those links to go through the outgoing server so using bleach for this seemed like a good idea.

from bleach.

jsocol avatar jsocol commented on June 9, 2024

The default behavior can't change, because I don't think this is expected behavior. I'd nominate aggressive or eager for the name of the kwarg to linkify().

from bleach.

SmileyChris avatar SmileyChris commented on June 9, 2024

An aside, these false positives are why django's urlize filter only linkifies urls prefixed with a schema or 'www.', or domain-only urls ending in .com, .org and .net.

from bleach.

jsocol avatar jsocol commented on June 9, 2024

@SmileyChris: That difference is actually one of the reasons this method exists in the first place, which is why I'm not keen on changing it.

from bleach.

SmileyChris avatar SmileyChris commented on June 9, 2024

Fair enough then! :)

from bleach.

kmike avatar kmike commented on June 9, 2024

Could linkify be more configurable? Sometimes example.com should be linked, sometimes not. Now the only possiblity to configure the linkify is to copy-paste the whole linkify method. I think it'll be good to do one of the following:

  • acccept callbacks for common tasks (e.g. post-process a node, find links, etc);
  • make utility methods module-level functions so one can reuse them and build custom linkify;
  • make linkify class with overridable methods + call for backward compatibility

My use case: linkify IDN domains (e.g. сайт.рф) + add target='_blank' to links. I can also imagine somebody wanting to add css classes to external links, to stop linking localhost url or to use custom domain list (maybe even different domain lists for different parts of site).

Would you accept patch with such changes or this is all unreasonable or you want to implement something yourselves?

from bleach.

jsocol avatar jsocol commented on June 9, 2024

The cleanest of those options feels like the class, but I really struggle to see how any of them could be completely backwards compatible (except maybe callbacks for "common tasks" but I don't want a 40-kwarg method). I'm open to any of them as long as import bleach; bleach.linkify(text) still works, none of the current defaults change, it's not a massive performance hit (it's slow enough as-is) and it's clean.

Any of these changes, I think, necessitates moving the linkify code into a new module. Probably clean, too, for consistency.

@kmike: I'm certainly open to a big rewrite but it's not my highest priority right now, especially not given how much bigger that scope is than this issue.

from bleach.

msabramo avatar msabramo commented on June 9, 2024

Another consequence of this eager linking is that linkify links the word "settings.py" on my crate.io page. Is there any way to tell it to suppress the link?

from bleach.

jsocol avatar jsocol commented on June 9, 2024

.py is the ccTLD for Paraguay. It's legit.

That said, #56 would make linkify customizable enough that anyone could refuse to linkify whatever TLDs they want. The solution will be to write callbacks that test it, e.g.:

def no_python(text, attrs):
    if attrs['href'].endswith('.py'):
        return None

Or more broadly:

def only_with_protocol(text, attrs):
    if not attrs['href'].startswith('http://'):
        return None

The latter seems like a logical thing to include, maybe not as a default, but as an option.

Closing this in favor of #56.

from bleach.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.