Coder Social home page Coder Social logo

melomancool / url-match Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 30 KB

Easily extract interesting bits of information from URLs

License: GNU Lesser General Public License v3.0

Python 11.02% GAP 20.86% Hy 68.12%
python python3 hy library parsing url url-parser url-parsing

url-match's Introduction

[Alpha] Url Match

Easily extract interesting bits of information from URLs.

Inspired by Clojure's Compojure — an awesome routing library.

Example usage

from url_match import make_schema, match

yt_be_schema = make_schema('https? youtu.be /:id {t=:ts}')
match(yt_be_schema, 'https://youtu.be/57Ykv1D0qEE?t=1m43s')
# => {'id': '57Ykv1D0qEE', 'ts': '1m43s'}

More examples

Project Maturity

The project is in early stage of development, so anything could change.

Installation

pip3 install git+https://github.com/MelomanCool/url-match.git#egg=url_match

Schema syntax

Schema consists of four parts:

protocol hostname path query

They are separated by spaces.

Note: right now, the query part is mandatory. If you don't want to match any query parameters, just use {}.

Protocol

Can be one of:
http
https
https? — matches both http and https.

Hostname

Consists of one or more subdomains, separated by the "." character. You can make some subdomains as optional, using ".?" after it.

Examples:
www.youtube.com — matches only www.youtube.com
www.?youtube.com — matches both www.youtube.com and youtube.com

Path

Consists of several folders separated by the "/" character. A folder can be either name of a folder, like foo, or a keyword, like :bar. Names are matched as is, while keywords are used for name extraction.

The last "/" can be made optional, like so "/?".

Shortest possible path consists of a single character: "/", but in general it's better to use "/?".

Examples:
/ will match only the path /.
/? will match empty path and /
/r/:subreddit with the path /r/photoshopbattles will extract {'subreddit': 'photoshopbattles'}

Query

Query's syntax consists of {}, containing pairs of the form key=:val_name, separated by spaces. Each key's is value extracted and will be stored under the name of the corresponding val_name.

The order of parameters doesn't matter. Treat {}-syntax as a dictionary.

Example: the pattern {v=:id t=:ts} with the query string v=4YabUd9imQU&t=9m14s will extract:

{'id': '4YabUd9imQU', 'ts': '9m14s'}

More Examples

yt_schema = make_schema('https? www.?youtube.com /watch {v=:id t=:ts}')
match(yt_schema, 'https://www.youtube.com/watch?v=57Ykv1D0qEE&t=1m43s')
# => {'id': '57Ykv1D0qEE', 'ts': '1m43s'}

# note that now there is no "www." in the URL
match(yt_schema, 'https://youtube.com/watch?v=57Ykv1D0qEE&t=1m43s')
# => {'id': '57Ykv1D0qEE', 'ts': '1m43s'}
reddit_schema = make_schema('https? www.?reddit.com /r/:subreddit/? {}')
match(reddit_schema, 'https://www.reddit.com/r/coolgithubprojects/')
# => {'subreddit': 'coolgithubprojects'}

# note that the last "/" is absent
match(reddit_schema, 'https://www.reddit.com/r/coolgithubprojects')
# => {'subreddit': 'coolgithubprojects'}
reddit_thread_schema = make_schema('https? www.?reddit.com /r/:subreddit/comments/:thread_id/:thread_name/? {}') 
match(reddit_thread_schema, 
      'https://www.reddit.com/r/photoshopbattles/comments/8sgj7n/psbattle_english_football_team_riding_unicorns_in/')
# => {'subreddit': 'photoshopbattles', 'thread_id': '8sgj7n', 'thread_name': 'psbattle_english_football_team_riding_unicorns_in'}

Why?

The library was created because I needed to extract parts of path and some query parameters from a YouTube URL. Parsing whole URLs with regular expression was not an option, because query parameters can be listed in many orders. furl solves this problem, but I wanted something more declarative and specific.

In fact, current implementation uses both regular expressions and furl, applying patterns to different parts of a furl object. And the schema is parsed by Lark.

url-match's People

Contributors

melomancool avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.