Coder Social home page Coder Social logo

jurl's Introduction

jurl

Codacy Badge Build Status Test Coverage MIT License

Fast and simple URL parsing for Java, with UTF-8 and path resolving support. Based on Go's excellent net/url lib.

Why

  • Easy to use API - you just want to parse a URL after all.
  • Fast, 4+ million URLs per second on commodity hardware.
  • UTF-8 encoding and decoding.
  • Supports path resolving between URLs (absolute and relative).
  • Good test coverage with plenty of edge cases.
  • Supports IPv4 and IPv6.
  • No external dependencies.

Getting Started

Example:

 // Parse URLs
URL base = URL.parse("https://user:secret@example♬.com/path/to/my/dir#about");
URL ref = URL.parse("./../file.html?search=germany&language=de_DE");

// Parsed base
base.getScheme(); // https
base.getUsername(); // user
base.getPassword(); // secret
base.getHost(); // example♬.com
base.getPath(); // /path/to/my/dir
base.getFragment(); // about

// Parsed reference
ref.getPath(); // ./../file.html
ref.getQueryPairs(); // Map<String, String> = {search=germany, language=de_DE}

// Resolve them!
URL resolved = base.resolveReference(ref); // https://user:secret@example♬.com/path/to/file.html?search=germany&language=de_DE
resolved.getPath(); // /path/to/file.html

// Escaped UTF-8 result
resolved.toString(); // https://user:secret@example%E2%99%AC.com/path/to/file.html?search=germany&language=de_DE

Setup

Add the JitPack repository to your build file.

For gradle:

allprojects {
    repositories {
        maven { url 'https://jitpack.io' }
    }
}

For maven:

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

Add the dependency:

For gradle:

dependencies {
    compile 'com.github.anthonynsimon:jurl:v0.4.2'
}

For maven:

<dependencies>
    <dependency>
        <groupId>com.github.anthonynsimon</groupId>
        <artifactId>jurl</artifactId>
        <version>v0.4.2</version>
    </dependency>
</dependencies>

Issues

The recommended medium to report and track issues is by opening one on Github.

Contributing

Want to hack on the project? Any kind of contribution is welcome! Simply follow the next steps:

  • Fork the project.
  • Create a new branch.
  • Make your changes and write tests when practical.
  • Commit your changes to the new branch.
  • Send a pull request, it will be reviewed shortly.

In case you want to add a feature, please create a new issue and briefly explain what the feature would consist of. For bugs or requests, before creating an issue please check if one has already been created for it.

License

This project is licensed under the MIT license.

jurl's People

Contributors

anthonynsimon avatar codacy-badger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

jurl's Issues

.toString() bugs

The string serialization seems incorrect. Maybe not all URL segments are properly urlencoded ?

Also UTF-8 is assumed in urlencoded parts, but in my corpus from the web there are some examples of latin1 encoded ones. I'n not sure about what the standard says, but chrome recognizes it.

import com.anthonynsimon.url.URL;

public class JurlTest {
  public static void main(String[] args) {
    test("http://abc.net/1160x%3E/quality/");
    test("http://db-engines.com/en/system/PostgreSQL%3BRocksDB");
    test("http://xzy.org/test/hei%DFfl"); // latin1
    test("http://www.net/decom/category/AA/A_%26_BBB/AAA_%26_BBB/"); // !!!
    test("https://en.wikipedia.org/wiki/Eat_one%27s_own_dog_food");
  }

  private static void test(String url) {
    try {
      Thread.sleep(10);
    } catch (InterruptedException e) {
    }
    try {
      URL parse = URL.parse(url);

      if(!parse.toString().equals(url)) {
        System.out.print("NOT EQUAL: ");
        System.out.println(url);
        System.out.println(parse.toString());
        System.out.println();
      }

    } catch (Exception e) {
      System.out.print("KAPUTT: ");
      System.out.println(url);
      e.printStackTrace();
      System.out.println();
    }
  }
}

results in

NOT EQUAL: http://abc.net/1160x%3E/quality/
http://abc.net/1160x>/quality/

NOT EQUAL: http://db-engines.com/en/system/PostgreSQL%3BRocksDB
http://db-engines.com/en/system/PostgreSQL;RocksDB

java.lang.StringIndexOutOfBoundsException: String index out of range: 15
	at java.lang.String.substring(String.java:1963)
	at com.anthonynsimon.url.PercentEncoder.decode(PercentEncoder.java:181)
	at com.anthonynsimon.url.DefaultURLParser.parse(DefaultURLParser.java:85)
	at com.anthonynsimon.url.URL.parse(URL.java:73)
	at JurlTest.test(JurlTest.java:20)
	at JurlTest.main(JurlTest.java:9)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
KAPUTT: http://xzy.org/test/hei%DFfl

NOT EQUAL: http://www.net/decom/category/AA/A_%26_BBB/AAA_%26_BBB/
http://www.net/decom/category/AA/A_&_BBB/AAA_&_BBB/

NOT EQUAL: https://en.wikipedia.org/wiki/Eat_one%27s_own_dog_food
https://en.wikipedia.org/wiki/Eat_one's_own_dog_food

Library doesn't parse port

Your parser is the best solution I've found, but it does not parse port. Is there any reason why you didn't implement port parsing?

Slow parsing

Hi,

How did you get "4+ million URLs per second"?
I have tried parsing 10.000.000 urls, it took 75 416 ms

final int count = 10000000;
val urls = new ArrayList<String>(count);
for (int i = 0; i < count; i++) {
  urls.add("http://user@domain" + i + ".com:12345/a/great/path/?with=query&unicode_parameter=😊&nothing#cool");
}

long start = System.currentTimeMillis();
for (String url : urls) {
  com.anthonynsimon.url.URL.parse(url);
}
System.out.println("time: " + (System.currentTimeMillis() - start) + " ms");

Dropping trailing question mark

possibly related to #2 but I have some URLs that are not roundtripping, e.g. if there is a trailing ?

scala> import com.anthonynsimon.url.URL
scala> URL.parse("http://example.com/?")
res0: com.anthonynsimon.url.URL = http://example.com/

Build src and doc jar

I'm not sure if this because of jitpack.io, but I cant download the sources jar for jurl.
This makes debugging and understanding the library more difficult.

Maybe only an option is missing in the gradle build file.

Thanks for the cool library!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.