Parsing a URI¶
There are two ways to parse a URI with
This is best when you’re not replacing existing usage of
urllib.parse. This also provides convenience methods around safely normalizing URIs passed into it.
This is best suited to completely replace
urllib.parse.urlparse(). It returns a class that should be indistinguishable from
Let’s look at some code samples.
First we’ll parse the URL that points to the repository for this project.
url = rfc3986.urlparse('https://github.com/sigmavirus24/rfc3986')
Then we’ll replace parts of that URL with new values:
print(url.copy_with( userinfo='username:password', port='443', ).unsplit())
This, however, does not change the current
url instance of
ParseResult. As the method name might suggest,
we’re copying that instance and then overriding certain attributes.
In fact, we can make as many copies as we like and nothing will change.
print(url.copy_with( scheme='ssh', userinfo='git', ).unsplit())
We can do similar things with URI References as well.
uri = rfc3986.uri_reference('https://github.com/sigmavirus24/rfc3986')
print(uri.copy_with( authority='username:firstname.lastname@example.org:443', path='/sigmavirus24/github3.py', ).unsplit())
However, URI References may have some unexpected behaviour based strictly on the RFC.
Finally, if you want to remove a component from a URI, you may pass
to remove it, for example:
This will work on both URI References and Parse Results.
And Now For Something Slightly Unusual¶
If you are familiar with GitHub, GitLab, or a similar service, you may have interacted with the “SSH URL” for some projects. For this project, the SSH URL is:
Let’s see what happens when we parse this.
>>> rfc3986.uri_reference('email@example.com:sigmavirus24/rfc3986') URIReference(scheme=None, authority=None, firstname.lastname@example.org:sigmavirus24/rfc3986', query=None, fragment=None)
There’s no scheme present, but it is apparent to our (human) eyes that
email@example.com should not be part of the path. This is one of the areas
rfc3986 suffers slightly due to its strict conformance to
RFC 3986. In the RFC, an authority must be preceded by
//. Let’s see
what happens when we add that to our URI
>>> rfc3986.uri_reference('//firstname.lastname@example.org:sigmavirus24/rfc3986') URIReference(scheme=None, email@example.com:sigmavirus24', path=u'/rfc3986', query=None, fragment=None)
Somewhat better, but not much.
The maintainers of
rfc3986 are working to discern better ways to
parse these less common URIs in a reasonable and sensible way without
losing conformance to the RFC.