Coder Social home page Coder Social logo

robots_txt's People

Contributors

alexander-irbis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

robots_txt's Issues

Yandex-style matcher with reorder directives by prefix len

Using directives jointly

The Allow and Disallow directives from the corresponding User-agent block are sorted according to URL prefix length (from shortest to longest) and applied in order. If several directives match a particular site page, the robot selects the last one in the sorted list. This way the order of directives in the robots.txt file doesn't affect how they are used by the robot.

Got a panic with German umlauts

While comparing url paths with a couple of robots.txt I stuck on a panic caused by an ä (German umlaut).
Is this something that can be fixed or do I have to normalize the path ?

Error Stack:

thread '' panicked at 'byte index 9 is not a char boundary; it is inside 'ä' (bytes 8..10) of /infos/häufig-gestellte-fragen.html', src/libcore/str/mod.rs:2034:5
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.
stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
1: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:71
2: std::panicking::default_hook::{{closure}}
at src/libstd/sys_common/backtrace.rs:59
at src/libstd/panicking.rs:197
3: std::panicking::default_hook
at src/libstd/panicking.rs:211
4: <std::panicking::begin_panic::PanicPayload as core::panic::BoxMeUp>::get
at src/libstd/panicking.rs:474
5: std::panicking::continue_panic_fmt
at src/libstd/panicking.rs:381
6: std::panicking::try::do_call
at src/libstd/panicking.rs:308
7: ::type_id
at src/libcore/panicking.rs:85
8: core::str::traits::<impl core::slice::SliceIndex for core::ops::range::Range>::index::{{closure}}
at src/libcore/str/mod.rs:2034
9: core::ptr::real_drop_in_place
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/str/mod.rs:1821
10: core::str::traits::<impl core::slice::SliceIndex for core::ops::range::RangeFrom>::index
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/option.rs:388
11: core::str::traits::<impl core::slice::SliceIndex for core::ops::range::RangeTo>::index
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/str/mod.rs:1821
12: core::str::traits::::eq
at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/str/mod.rs:1623
13: <core::str::LinesAnyMap as core::ops::function::FnMut<(&str,)>>::call_mut
at /Users/thorstenclaus/.cargo/registry/src/github.com-1ecc6299db9ec823/robots_txt-0.6.0/src/matcher.rs:51

Ignore bad `Host` domain

# Examples of Host directives that will be ignored

Host: www.myhost-.com
Host: www.-myhost.com
Host: www.myhost.com:100000
Host: www.my_host.com
Host: .my-host.com:8000
Host: my-host.com.
Host: my..host.com
Host: www.myhost.com:8080/
Host: 213.180.194.129
Host: www.firsthost.ru,www.secondhost.com
Host: www.firsthost.ru www.secondhost.com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.