marrow / uri Goto Github PK
View Code? Open in Web Editor NEWA type to represent, query, and manipulate a Uniform Resource Identifier.
Home Page: https://pretty-rfc.herokuapp.com/RFC3986
License: MIT License
A type to represent, query, and manipulate a Uniform Resource Identifier.
Home Page: https://pretty-rfc.herokuapp.com/RFC3986
License: MIT License
>>> u=URI("ws://localhost:8080/blabla")
>>> str(u)
'ws:localhost:8080/blabla'
>>> u.uri
'ws:localhost:8080/blabla'
Notice, the "//" are gone.
This is a completely different URI that will not work
BTW pip list:
uri 2.0.1
I've recently had reason to curse out stdlib urllib.parse.url* and went looking for alternatives. I found uri
, furl
and yarl
, the latter is missing from the "Migrating"-section in the README.
I really enjoy the snark in the migrating-section so here's hoping ya'll have the spoons to take a look :)
Service is essentially no longer functional; switch over to rfc-editor.org.
>>> hash(URI('https://localhost'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'URI'
Could this be easily fixed? It would be nice to be able to use uri's as keys in a dict for instance, or to use efficiently in eg sets.
The current minimal pair covers HTTP and HTTPS, the likely candidates utilized by the from_wsgi
factory. This should be loaded from the system-standard /etc/services
data store, and/or a bundled (though possibly outdated version) should be vendored, if possible.
>>> uri = URI('http://foo.com')
>>> uri.path = 'blah'
>>> str(uri)
http://foo.comblah
Expected: Either reject paths with an initial slash, or reject paths without one.
Generally populate contents (for Path objects with only a protocol defined) from /etc/protocols
if available, otherwise provide a default set of basics. Allow entry_point
registration of handlers for those protocols. Desired syntax for use:
from uri.scheme import http
uri = http // "example.com" / "index.html"
Scheme
handler classes (or factories) via entry_points
.Scheme
) and URL (URLScheme
— utilizes ://
separator instead of :
) covering http
, https
, ftp
, file
, ldap
, telnet
, and irc
out of the box. (The default is URI-style.)SchemePart
.I get an error 'ImportError' when try run a simple code with uri.
code to reproduce:
from uri import URI
uri = URI('http://www.google.com/base?one=1&two=2')
print(uri)
Traceback:
Traceback (most recent call last):
File "E:\Work\Projects\python_projects\uri_test.py", line 2, in <module>
from uri import URI
File "E:\Work\Venvs\python_projects\lib\site-packages\uri\__init__.py", line 8, in <module>
from .bucket import Bucket
File "E:\Work\Venvs\python_projects\lib\site-packages\uri\bucket.py", line 5, in <module>
from collections import ItemsView, KeysView, MutableMapping, MutableSequence, ValuesView, deque, namedtuple
ImportError: cannot import name 'ItemsView' from 'collections' (C:\pythons\Python310\lib\collections\__init__.py)
Process finished with exit code 1
Used versions:
Python 3.10
uri 2.0.1
tcp://
pseudo-protocol; port identification required.udp://
pseudo-protocol; port identification required.SSL or TLS cryptography through addition of +ssl
or +tls
protocol suffixes.
https://docs.python.org/3/library/urllib.request.html
https://docs.python.org/3/library/http.client.html#module-http.client
class HTTPScheme(Scheme):
def open(self, uri:URI, mode:str='r', buffering=-1, encoding=None, errors=None, newline=None) -> HTTPResponse:
...
from uri import URI
from PIL import Image
with URI('https://httpbin.org/image/png').open('rb', True) as fh:
image = Image.open(fh)
...
When deserializing a URL provided at instantiation time, or on any assignment of a hostname, punycode encoding should be detected and decoded. When serializing (str()
, &c.) the punycode should encoded version of any non-ASCII-safe hostname should be used.
Reference in Marrow Mailer:
Current version in Pypi (2.0.1) is not compatible with recent Python releases, but the development branch is.
Can we publish it to the public repositories to make it easier to have uri
as a dependency?
Thank you!
Examples:
from uri import URI
base = URI("http://example.com/foo/bar.html")
base / "baz.html" # http://example.com/foo/baz.html
base // "cdn.example.com" / "baz.html" # http://cdn.example.com/baz.html
base / "/diz" # http://example.com/diz
base / "#diz" # http://example.com/foo/bar.html#diz
base / "https://example.com" # https://example.com
A concrete example:
base = URI("http://ats.example.com/job/listing")
# scrape the listing, identify a job URL from that listing
target = URI("detail/sample-job") # oh no, it's relative!
job = base / target # And it's resolved.
Got the following warnings by pytest:
/uri/bucket.py:5: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
from collections import ItemsView, KeysView, MutableMapping, MutableSequence, ValuesView, deque, namedtuple
uri/qso.py:5: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
from collections import Mapping, MutableMapping, deque, namedtuple
-- Docs: https://docs.pytest.org/en/stable/warnings.html
The library still seems to work in Python 3.9 (Tried it) but changing the import location would be reasonable.
This has dire consequences for checking for their presence in sets, or use as dictionary keys.
See https://tools.ietf.org/html/rfc3986#section-3.3 for reference
>>> # a relative 'baz' uri -- expected behavior.
>>> buri = URI('baz:relativepath')
>>> buri.authority
''
>>> str(buri)
'baz:relativepath'
>>>
>>> # a relative 'file' uri -- unexpected behavior
>>> furi = URI('file:relativepath')
>>> furi.authority # looking good..
''
>>> # If a URI does not contain an authority component,
>>> # then the path cannot begin with two slash characters ("//")
>>> str(furi)
'file://relativepath'
>>> URI(str(furi)).authority
'relativepath'
From the wiki https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
If an authority component is present, then the path component must either be empty or begin with a slash (/).
So this should be correct uri
https://localhost
(notice no slash at the end)
However
>>> URI("http://local")
URI('http://local/')
URI does not recognize URI authority generally, that is scheme://authority/path
and somehow mangles it to a non-hierarchical URNs.
Examples of mishandling:
>>> a = uri.URI("ldap://[2001:db8::7]/c=GB?objectClass?one")
>>> a
URI('ldap://[2001:db8::7]/c=GB?objectClass?one')
>>> a.resolve("/c=NO")
URI('/c=NO') # expected: ldap://[2001:db8::7]/c=NO
>>> a = uri.URI("sftp://example.com/etc/passwd")
>>> a
URI('sftp:example.com/etc/passwd') # expected // to be preserved
>>> a.resolve("/root")
URI('sftp:/root') # expected sftp://example.com/root
>>> a = uri.URI("app://01e36b38-39a4-48b2-88d0-6e82717ee87f/nested/example")
>>> a
URI('app:01e36b38-39a4-48b2-88d0-6e82717ee87f/nested/example') # expected // to be preserved
>>> a.resolve("/folder/")
URI('app:/folder') # Expected app://01e36b38-39a4-48b2-88d0-6e82717ee87f/folder/
Instead there seems to be special handling of a few, selected schemes. This should be treated according to RFC3986 and respect //
in the incoming URI, no matter which scheme.
All String parse to lower case,makes hard to use base64 decode
Can't register another scheme without pull request
example: amqp:// and amqps://
really painful
While on the topic of operator overloading on the SO Python chat, I got the idea after chatting with @amcgregor that adding a slice notation to a URI would be a nice extension of this embedded DSL syntax:
Example:
url = URI["username":"password"]
This would be easy and intuitive in interactive sessions, without extra API namespace clutter.
RFC-3986 talks about rootless path URIs.
I interpret this to mean that a URI of file:relative/path.ext
is a perfectly reasonably URI indicating a relative path. Unfortunately this does not round trip:
>>> u = uri.URI("file:relative/path.ext")
>>> u
URI('file://relative/path.ext')
>>> u.path
PurePosixPath('relative/path.ext')
>>> u.uri
'file://relative/path.ext'
>>> u2 = uri.URI(u.uri)
>>> u2.path
PurePosixPath('/path.ext')
>>> u.hostname
>>> u2.hostname
'relative'
So it turns a relative URI into an absolute URI with a hostname. Am I misinterpreting RFC-3986? (I noticed this because I have some relative paths that I need to represent as URIs).
>>> a = URI("http://example.com/foo//bar/"); a
URI('http://example.com/foo/bar/')
>>> a / "" / "baz"
URI('http://example.com/foo/bar/baz')
>>> a / "//baz"
URI('http://example.com//baz')
Empty path segments must be allowed, and should be easier to work with / induce than including path element separators within a path segment.
As utilized frequently in HTML for external asset references, URI such as the following should be an identity transform:
//example.com/protocol/relative
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.