Coder Social home page Coder Social logo

udger / udger-java Goto Github PK

View Code? Open in Web Editor NEW
26.0 9.0 19.0 333 KB

Java agent string parser based on Udger https://udger.com/products/local_parser

License: MIT License

Java 100.00%
bot-detection device-detector mobile-detection user-agent-parser

udger-java's Introduction

Udger golang (format V3)

this package reads in memory all the database from udger and lets you lookup user agents's metadatas.

install

go get github.com/udger/udger

Documentation

For detailed documentation and basic usage examples, please see the package documentation at https://godoc.org/github.com/udger/udger

Automatic updates download

For autoupdate data use Udger data updater (https://udger.com/support/documentation/?doc=62)

old v2 format

If you still use the previous format of the db (v2), please see the branch old_format_v2

udger-java's People

Contributors

dimagolomozy avatar mallat avatar mnylen avatar skybber avatar svendiedrichsen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

udger-java's Issues

NPE at org.udger.parser.UdgerParser.patchVersions(UdgerParser.java:667)

The org.udger.parser.UdgerParser.parseUa(), being given an input of Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ) AppleWebKit/533.3 (KHTML, like Gecko) rekonq Safari/533.3 fails inside org.udger.parser.UdgerParser.patchVersions() with an NPE.

Relevant stack trace follows:

java.lang.NullPointerException
	at org.udger.parser.UdgerParser.patchVersions(UdgerParser.java:667)
	at org.udger.parser.UdgerParser.clientDetector(UdgerParser.java:464)
	at org.udger.parser.UdgerParser.parseUa(UdgerParser.java:173)

Updating database leads to mis-categorisation of user agents

Summary: a database update put the Udger library into a state where a large proportion of desktop user agents were being mis-classified as mobile.

The database itself is fine: using it to parse user agents yields correct results. However, we saw a drastic change in behaviour which happened as the database was updated (using the https://github.com/udger/udger-updater-linux updater script), consistently across numerous hosts.

We have not been able to replicate the issue, the majority of updates did not cause any visible issues, and all our hosts manifested the issue on the same update at the same time.

We then saw this behaviour persist after further database updates, and stop happening when we restarted the application (using the same database).

Our suspicion is that running the update script under a running application somehow led to corrupt in-memory state which mis-attributed device types. Is this something you've seen evidence of before?

WordDetector cannot handle words starting with 2 underscores

We regularly find warning log messages like Index out of hashmap58 : __weibo__ in our logs.

This seems to be caused by the way the class WordDetector creates the index into the array. Shouldn't the word be stripped from all non-alpha characters to determine the index?

Ctor overloading - ignoring inMemoryEnabled arguemnt

the use of the ctor

public UdgerParser(ParserDbData parserDbData, boolean inMemoryEnabled, int cacheCapacity) {
       this(parserDbData, cacheCapacity);
       this.inMemoryEnabled = true;
    }

is using hard coded this.inMemoryEnabled = true; this causing the inMemoryEnable argument useless.
the argument should be counted for, to allow proper UpdgerParser ctor.

Because the hard codded this.inMemoryEnabled = true; I have to use it like so:

UdgerParser udgerParser = inMemoryEnabled
                ? new UdgerParser(parserDbData, inMemoryEnabled, cacheCapacity)
                : new UdgerParser(parserDbData, cacheCapacity);

woulbe be better single line

UdgerParser udgerParser = new UdgerParser(parserDbData, inMemoryEnabled, cacheCapacity)

NPE at org.udger.parser.LRUCache$Node.access$202(LRUCache.java:14)

I'm getting many errors while using the client with cache, relevant stack trace:

! java.lang.NullPointerException: null
! at org.udger.parser.LRUCache$Node.access$202(LRUCache.java:14)
! at org.udger.parser.LRUCache.get(LRUCache.java:57)
! at org.udger.parser.UdgerParser.parseUa(UdgerParser.java:144)

Parsing fails with ArrayIndexOutOfBoundsException on slightly malformed UA

Attempting to parse this UA, which appears to have a malformed browser version:

Mozilla/4.0 (Windows NT 6.1; rv:..; Gecko/20100403; Trident/4.0; Maxthon; SV1;  .NET CLR 1.1.4322; .NET CLR 3.0.04320; Firefox/..; FirePHP/3.8)

Results in

java.lang.ArrayIndexOutOfBoundsException: 0
	at org.udger.parser.UdgerParser.patchVersions(UdgerParser.java:669)
	at org.udger.parser.UdgerParser.clientDetector(UdgerParser.java:465)
	at org.udger.parser.UdgerParser.parseUa(UdgerParser.java:198)

I would expect this to result in an unknown/blank version

Parsing differenct to udger-python leading to missmatched

I have noticed that the python version of udger uses this regex ^/?(.*)/([si]*)$ to match out regexps from the database. However, the java version uses ^/?(.*?)/si$. As you can see the java version will not match entries such as /^Mozilla.*MSIE.*Windows.*SiteKiosk ([0-9\.]+)/s which means it wont ever use that regexp. My guess is that the python code is correct and Java wrong.

See https://github.com/udger/udger-python/blob/master/udger/base.py#L13

Use Logger instead of System.out.println

Hi Team,

The library prints the below messages in our applications during startup:

Index out of hashmap335 : |gt
Index out of hashmap396 : |twister
Index out of hashmap330 : |crxt
Index out of hashmap463 : |kiano
Index out of hashmap93 : |hudl
Index out of hashmap173 : |one

I saw that internally the code relies on System.out.println. Could you please introduce Logger and not use println in the entire project. It would be more structured, professional and would allow more control over the desired information on the console/logs. For the above messages we expect them to be logged in DEBUG level.

~Thanks

Udger throwing exception

Udger is throwing the following exception

java.sql.SQLException: [SQLITE_CORRUPT] The database disk image is malformed (database disk image is malformed)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.DB.newSQLException(DB.java:901)
at org.sqlite.core.DB.execute(DB.java:810)
at org.sqlite.jdbc3.JDBC3PreparedStatement.executeQuery(JDBC3PreparedStatement.java:68)
at tcorej.util.http.useragent.UdgerDataCache$UdgerParser.getFirstRow(UdgerDataCache.java:280)
at tcorej.util.http.useragent.UdgerDataCache$UdgerParser.parseUa(UdgerDataCache.java:151)
at tcorej.util.http.useragent.UdgerDataCache$1.load(UdgerDataCache.java:46)
at tcorej.util.http.useragent.UdgerDataCache$1.load(UdgerDataCache.java:40)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.