Coder Social home page Coder Social logo

neturl's Introduction

A Robust URL Parser and Builder for Lua

This small Lua library provides a few functions to parse URL with querystring and build new URL easily.

url = require "net.url"

URL parser

The library converts an URL to a table of the elements as described in RFC : scheme, host, path, etc.

u = url.parse("http://www.example.com/test/?start=10")
print(u.scheme)
-- http
print(u.host)
-- www.example.com
print(u.path)
-- /test/

URL normalization

u = url.parse("http://www.FOO.com:80///foo/../foo/./bar"):normalize()
print(u)
-- http://www.foo.com/foo/bar

URL resolver

URL resolution follows the examples provided in the RFC 2396.

u = url.parse("http://a/b/c/d;p?q"):resolve("../../g")
print(u)
-- http://a/g

Path builder

Path segments can be added using the __div metatable or u.addSegment().

u = url.parse('http://example.com')
u / 'bands' / 'AC/DC'
print(u)
-- http://example.com/bands/AC%2FDC

Module Options

  • separator is used to specify which separator is used between query parameters. It is & by default.
  • cumulative_parameters is false by default. If true, query parameters with the same name will be stored in a table.
  • legal_in_path is a table of characters that will not be url encoded in path components.
  • legal_in_query is a table of characters that will not be url encoded in query values. Query parameters on the other hand only support a small set of legal characters (-_.).
  • query_plus_is_space is true by default, so a plus sign in a query value will be converted to %20 (space), not %2B (plus).

If one wants to have the + sign as is in path segments, one can add it to the list of legal characters in path. For example:

url = require "net.url"
url.options.legal_in_path["+"] = true;

Querystring parser

The library supports brackets in querystrings, like PHP. It means you can use brackets to build multi-dimensional tables. The parsed querystring has a tostring() helper. As usual with Lua, if no index is specified, it starts from index 1.

query = url.parseQuery("first=abc&a[]=123&a[]=false&b[]=str&c[]=3.5&a[]=last")
print(query)
-- a[1]=123&a[2]=false&a[3]=last&b[1]=str&c[1]=3.5&first=abc
print(query.a[1])
-- 123

Querystring builder

u = url.parse("http://www.example.com")
u.query.foo = "bar"
print(u)
-- http://www.example.com/?foo=bar

u:setQuery{ json = true, skip = 100 }
print(u)
-- http://www.example.com/?json=true&skip=100

Differences with luasocket/url.lua

  • Luasocket/url.lua can't parse http://www.example.com?url=net correctly because there are no path.
  • Luasocket/url.lua can't clean and normalize url, for example by removing default port, extra zero in port, empty authority, uppercase scheme, domain name.
  • Luasocket/url.lua doesn't parse the query string parameters.
  • Luasocket/url.lua is less compliant with RFC 2396 and will resolve http://a/b/c/d;p?q and : ../../../g to http://ag instead of http://a/g ../../../../g to http://a../g instead of http://a/g g;x=1/../y to http://a/b/c/g;x=1/../y instead of http://a/b/c/y /./g to http://a/./g instead of http://a/g g;x=1/./y to http://a/b/c/g;x=1/./y instead of http://a/b/c/g;x=1/y

neturl's People

Contributors

aleclarson avatar golgote avatar misiek08 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neturl's Issues

Max number of parameters

Add a configuration parameter to set the maximum number of parameters allowed in the query string

Exhaustive, in-one-place API reference is positively needed. Referring to RFC is not enough.

I am trying to migrate from another URL parser to this library. There are differences between that and this library with respect to the host-related values (specifically inclusion vs. exclusion of :<port> in the value). It would be super convenient to have each property and its exact semantics documented exhaustively at hand, instead of being referred to an RFC and having to wade through standardese.

URLs are not encoded on output.

When parsing a URL, this library strips the encoding from the path. But the encoding is not present when converting the URL back into a string:

th> u=url.parse("https://google.com/Link%20with%20a%20space%20in%20it/")
th> u
https://google.com/Link with a space in it/
th> {u:build()}
{
  1 : "https://google.com/Link with a space in it/"
}

The resulting URL should never have a space character in it.

Other characters are not escaped properly either, e.g. https://google.com/a%2fb%2fc resolves to https://google.com/a/b/c, which could lead to some fun security issues...

Is this the right way to convert URLs back to strings? This seems like a pretty fundamental bug.

Lazy parsing of get parameters

Since the number of parameters in the query string can be high, it would be a good idea to parse them on demand (jit).

Ability to parse url arguments with the same name

Some sites use setting multiple values for same

For example, query on russian job search site:

https://hh.ru/search/vacancy?text=lua&salary=&currency_code=RUR&experience=doesNotMatter&employment=full&employment=part&employment=project&schedule=fullDay&schedule=flexible&schedule=remote&order_by=relevance&search_period=&items_on_page=20&no_magic=true

As you see it containts multiple values for one name:

schedule=fullDay&schedule=flexible&schedule=remote

local net_url = require 'thirdparty_libs.neturl_mod'
local u = net_url.parse [[https://hh.ru/search/vacancy?text=lua&salary=&currency_code=RUR&experience=doesNotMatter&employment=full&employment=part&employment=project&schedule=fullDay&schedule=flexible&schedule=remote&order_by=relevance&search_period=&items_on_page=20&no_magic=true]]
print(u.query.schedule)
-- "fullDay"

As you see values other then the fiest are lost.
I propose ability to parse multiple values for one argument as

net_url.allow_args_names_repetition = true
local u = net_url.parse [[https://hh.ru/search/vacancy?text=lua&salary=&currency_code=RUR&experience=doesNotMatter&employment=full&employment=part&employment=project&schedule=fullDay&schedule=flexible&schedule=remote&order_by=relevance&search_period=&items_on_page=20&no_magic=true]]
print(u.query.schedule)
-- "fullDay|flexible|remote"

Fix space and plus sign encoding

I started to work on these fixes. RFC 3986 recommends that spaces are all encoded as %20 and plus sign as %2B to avoid confusion. Other special characters can be left alone or encoded depending on the application (ex: ; = : ). I will add a list of reserved characters and make it configurable.

Consider publishing to OPM

OpenResty Debian docker images now ship with OPM instead of LuaRocks. Would you consider publishing to OPM in addition moving forward?

License

Hello,

can you please add a license to this?

Thank you!

r.

Support encoding of path segments

When building URLs, it would be useful to be able to encode individual path segments, so that they can contain reserved characters such as /.

For example, suppose I have a variable that contains the name of a band. I should be able to build up a URL that contains the name of the band as one of the path segments, e.g.

Band Name URL
Pink Floyd http://example.com/bands/Pink%20Floyd
AC/DC http://example.com/bands/AC%2FDC

I think this is just a case of exposing the encodeSegment function, which would mean I could write:

path = '/' .. url.encodeSegment('band') .. '/' .. url.encodeSegment('AC/DC')

Complete, exhaustive API reference?

Where can I find a complete API reference for this library, as opposed to the short readme that only touches a few APIs?

  • every method
  • every parameter
  • every value of every parameter that accepts values from a fixed set

UX report: after I studied everything about this library that can be found online, getting my head around how to use this library to manipulate query strings still involved WAY too much trial and error.

Add Max Nesting parameter

Add a parameter to configure the maximum number of nested levels in the query string parameters. PHP has this set to 64 by default IIRC.

Bug in url normalization

url = http://example.com//a/b/c
neturl.parse(url):normalize()

Expected outcome

-- url.host = example.com
-- url.path = /a/b/c

Actual outcome

-- url.host = a
-- url.path = /b/c

Whitespace on the host name

> url = require("net.url")
> res = url.parse("http:// spacehost.com")
> = res.scheme
http
> = res.host
 spacehost.com
>

In this case, the res.host should be nil right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.