This has been a design flaw since the inception of the library, so, mea culpa on that.
Fundamentally, preserving, escaping, and encoding "reserved" characters is entirely the URL object's job, and it's failing at that. Possibly the most succinct demonstration of the problem is this:
>>> u = URL()
>>> u = u.child(u'/')
>>> u = u.asIRI()
Traceback (most recent call last):
File "<input>", line 1, in <module>
u = u.asIRI()
File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 1116, in to_iri
fragment=_percent_decode(self.fragment))
File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 861, in replace
userinfo=_optional(userinfo, self.userinfo),
File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 606, in __init__
for segment in path))
File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 606, in <genexpr>
for segment in path))
File
"/Users/glyph/Library/Python/2.7/lib/python/site-packages/hyperlink/_url.py",
line 410, in _textcheck
% (''.join(delims), name, value))
ValueError: one or more reserved delimiters /?# present in path segment: u'/'
>>>
This is - obviously I hope - the wrong place to be failing with an error like this.
There was previously some attempt to preserve these characters in the data model and escape them only upon stringification, but d26814c wrecked these semantics. (In fairness: the attempt to do this was broken, and there are some places, like the scheme, where certain characters indeed cannot be represented, so this direction isn't entirely wrong.)
Fundamentally if a user wants to encode slashes, question marks, hash signs or whatever else that a human might, for example, type into a text field, then it should be possible to do that.
We could fix this obvious manifestation of the problem by just putting back the escape-only-on-asText
logic, but that still leaves an even more pernicious problem:
>>> u = URL(path=tuple([u'%2525']))
>>> u.asText()
u'%2525'
>>> u.asIRI().asText()
u'%25'
>>> u.asIRI().asIRI().asText()
u'%'
>>>
Clearly, multiple trips through asIRI
should not be un-escaping the escape character - the idea is that .asIRI()
is a normalization step, that should be idempotent upon subsequent calls.
For the moment, I'm not sure exactly what the correct fix is here, but the property I'd really like to preserve is that for any x
,
URL.fromText(URL().child(x).<as many asIRI()s or asURI()s as you want>.asText()).<as many .asIRI()s as you want, although possibly not .asURI()s>.segments[0] == x