Coder Social home page Coder Social logo

uap-python's People

Contributors

adamchainz avatar bachp avatar bcaller avatar commenthol avatar elsigh avatar floydwch avatar georgevreilly avatar glogiotatidis avatar innerverse avatar ironholds avatar j1fig avatar jameswann avatar jdalton avatar jnozsc avatar junyer avatar kevinlondon avatar markdepalma avatar masklinn avatar mattrobenolt avatar mbarkhau avatar nolanwilson avatar ondras avatar pdelsante avatar public avatar rascalking avatar rudemateo avatar selwin avatar tmeryu avatar tobie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uap-python's Issues

parser device

How can I parse if the device is desktop, smartphone, tv?

Unrecognised User-Agents

Windows Live Mail

string: Outlook-Express/7.0 (MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; TmstmpExt)
actual: {'family': 'IE', 'major': '7', 'minor': '0', 'patch': None}

string: Outlook-Express/7.0 (MSIE 7.0; Windows NT 5.1; Trident/4.0; AskTB5.6; TmstmpExt)
actual: {'family': 'IE', 'major': '7', 'minor': '0', 'patch': None}

string: Outlook-Express/7.0 (MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; Media Center PC 6.0; OfficeLiveConnector.1.4; OfficeLivePatch.1.3; InfoPath.3; FDM; TmstmpExt)
actual: {'family': 'IE', 'major': '7', 'minor': '0', 'patch': None}

string: Outlook-Express/7.0 (MSIE 6.0; Windows NT 5.1; SV1; GTB6.3; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.2; .NET CLR 3.0.04506.648; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; TmstmpExt)
actual: {'family': 'IE', 'major': '6', 'minor': '0', 'patch': None}

string: Outlook-Express/7.0 (MSIE 8; Windows NT 5.1; Trident/4.0; GTB7.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; TmstmpExt)
actual: {'family': 'Other', 'major': None, 'minor': None, 'patch': None}

string: 'Outlook-Express/7.0 (MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.1; TmstmpExt)',
actual: {'family': 'IE', 'major': '8', 'minor': '0', 'patch': None}

string: Outlook-Express/7.0 (MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; HPDTDF; .NET4.0C; BRI/2; AskTbLOL/5.12.5.17640; TmstmpExt)
actual: {'family': 'IE', 'major': '9', 'minor': '0', 'patch': None}

expected: {'family': 'Windows Live Mail', 'major': None, 'minor': None, 'patch': None}
(for all user-agents above, not sure about a version)

Microsoft Outlook

string: Microsoft Office/12.0 (Windows NT 6.1; Microsoft Office Outlook 12.0.6739; Pro)
actual: {'family': 'Other', 'major': None, 'minor': None, 'patch': None}
expected: {'family': 'Outlook', 'major': '2007', 'minor': None, 'patch': None}

string: Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.5128; Pro)
actual: {'family': 'Other', 'major': None, 'minor': None, 'patch': None}
expected: {'family': 'Outlook', 'major': '2010', 'minor': None, 'patch': None}

string: Microsoft Office/16.0 (Microsoft Outlook Mail 16.0.6525; Pro)
actual: {'family': 'Other', 'major': None, 'minor': None, 'patch': None}
expected: {'family': 'Outlook', 'major': '2016', 'minor': None, 'patch': None}

string: Microsoft Office/16.0 (Windows NT 10.0; Microsoft Outlook 16.0.6326; Pro)
actual: {'family': 'Other', 'major': None, 'minor': None, 'patch': None}
expected: {'family': 'Outlook', 'major': '2016', 'minor': None, 'patch': None}

string: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/8.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; Microsoft Outlook 16.0.6366; ms-office; MSOffice 16)
actual: {'family': 'IE', 'major': '7', 'minor': '0', 'patch': None}
expected: {'family': 'Outlook', 'major': '2016', 'minor': None, 'patch': None}

Other

string: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.2pre) Gecko/2009031117 Spicebird/0.7.1
actual: {'family': 'Other', 'major': None, 'minor': None, 'patch': None}
expected: {'family': 'Spicebird', 'major': '0', 'minor': '7', 'patch': '1'}

string: Mozilla/5.0 (Windows NT 10.0; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 Lightning/4.0.5.1
actual: {'family': 'Lightning', 'major': '4', 'minor': '0', 'patch': '5'}
expected: {'family': 'Thunderbird', 'major': '38', 'minor': '5', 'patch': '1'}

string: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9pre) Gecko/20100209 Shredder/3.0.2pre
actual: {'family': 'Other', 'major': None, 'minor': None, 'patch': None}
expected: {'family': 'Shredder', 'major': '3', 'minor': '0', 'patch': '2pre'}

string: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9pre) Gecko/20100308 Lightning/1.0b1 Shredder/3.0.4pre
actual: {'family': 'Lightning', 'major': '1', 'minor': '0', 'patch': 'b1'}
expected: {'family': 'Shredder', 'major': '3', 'minor': '0', 'patch': '4pre'}

AWS Redshift UDF Installation

Hi Guys,

In November 2015 AWS Redshift added Python support for User Define Functions. I am new to Python but would like to add this library to Redshift but could use some guidance as to how. This could become an alternative setup script for others. According to the following link, AWS requires the source in a zip file and added to an S3 bucket.

http://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html

Once prepared the library can be created then functions can be created based on the library. As best I can tell, UAP has a core that requires pre-processing. Any suggestions on how to write an alternative setup script for the Python project to include all of that in a zip?

Pre-compile yaml into py file

For context, see: getsentry/sentry#1987

I notice this is being translated from yaml to json inside the sdist. Why not go one step further and generate the .py version instead?

tl;dr,

It takes 1.2 seconds to import the module on my computer through parsing the yaml file and converting it at module import time, but 219 microseconds after it's compiled to a py file.

Before:

$ python -m timeit -s 'from sentry.utils.ua_parser import parser' 'reload(parser)'
10 loops, best of 3: 1.2 sec per loop

After:

$ python -m timeit -s 'from sentry.utils.ua_parser import parser' 'reload(parser)'
1000 loops, best of 3: 219 usec per loop

I can submit pull request for this if you're on board.

Cut a 0.5.1 release

I need to use the changes from #28, but can't just pip install the tarball from github due to those not including the submodule. I'd really appreciate a PyPI release.

uap-python does not parse current apple phone information from ua strings

z = user_agent_parser.ParseDevice('Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/70.0.3538.75 Mobile/15E148 Safari/605.1')
print(z)
{'family': 'iPhone', 'brand': 'Apple', 'model': 'iPhone'}

Even though the information is there, the exact model is not picked up, unfortunately.

Thank you for the work you've put into this project btw :)

Problem of parsing Edge's user-agent

A user-agent of Edge is

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063

The parsed result of browser version is {'major': 15, 'minor':15063}, and I parsed the same ua in the following website whatismybrowser, it returnd Edge 40๏ผŒ and I found that 15 is the Layout Engine Version and 40 is Software Version, maybe return Software Version is better.
Forgive my poor English and thanks to the contribution

Misleading result

I was confused by this user agent so tried the library...

>>> a='Mozilla/5.0 (Linux; Android 6.1; iPhone 7PLUS Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Mobile Safari/537.36'
>>> a
'Mozilla/5.0 (Linux; Android 6.1; iPhone 7PLUS Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Mobile Safari/537.36'
>>> from ua_parser import user_agent_parser
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent=4)
>>> parsed_string = user_agent_parser.Parse(a)
>>> pp.pprint(parsed_string)
{   'device': {'brand': 'Apple', 'family': 'iPhone', 'model': 'iPhone'},
    'os': {   'family': 'Android',
              'major': '6',
              'minor': '1',
              'patch': None,
              'patch_minor': None},
    'string': 'Mozilla/5.0 (Linux; Android 6.1; iPhone 7PLUS Build/KOT49H) '
              'AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 '
              'Chrome/30.0.0.0 Mobile Safari/537.36',
    'user_agent': {   'family': 'Chrome Mobile',
                      'major': '30',
                      'minor': '0',
                      'patch': '0'}}
>>>

Apparently this isn iPhone running Android!

Feeding the same string into https://developers.whatismybrowser.com/useragents/parse/#parse-useragent tells me this is Chrome 30 on iOS.

Status of Project

Apologies for the non-issue issue.

We use ua-parser heavily in Sentry and I'm curious about the status of 0.4.0.

Right now we're at a point where we either will be forking the old ua-parser (and possibly vendoring it), or helping push this through.

Mostly the reason for this is we need to get updated support for things like Microsoft Edge.

Fails to build in virtualenv

Attempting to install ua-parser-0.5 on Python 3.5.0 on Mac OS X 10.11.1 and getting the following error:

(MyNewLeaf)MyNewLeaf|store-adminโšก โ‡’ pip install --upgrade ua-parser
Collecting ua-parser
  Using cached ua-parser-0.5.0.tar.gz
Requirement already up-to-date: pyyaml in ./lib/python3.5/site-packages (from ua-parser)
Building wheels for collected packages: ua-parser
  Running setup.py bdist_wheel for ua-parser
  Complete output from command /Users/coltonprovias/Development/MyNewLeaf/bin/python3.5 -c "import setuptools;__file__='/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/tmpe4yexrdlpip-wheel-:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib
  creating build/lib/ua_parser
  copying ./ua_parser/__init__.py -> build/lib/ua_parser
  copying ./ua_parser/user_agent_parser.py -> build/lib/ua_parser
  copying ./ua_parser/user_agent_parser_test.py -> build/lib/ua_parser
  running egg_info
  writing dependency_links to ua_parser.egg-info/dependency_links.txt
  writing top-level names to ua_parser.egg-info/top_level.txt
  writing requirements to ua_parser.egg-info/requires.txt
  writing ua_parser.egg-info/PKG-INFO
  warning: manifest_maker: standard file '-c' not found

  reading manifest file 'ua_parser.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'ua_parser.egg-info/SOURCES.txt'
  copying ./ua_parser/regexes.yaml -> build/lib/ua_parser
  copying ./ua_parser/regexes.json -> build/lib/ua_parser
  installing to build/bdist.macosx-10.11-x86_64/wheel
  running install
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py", line 83, in <module>
      'Programming Language :: Python :: Implementation :: PyPy',
    File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands
      self.run_command(cmd)
    File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/Users/coltonprovias/Development/MyNewLeaf/lib/python3.5/site-packages/wheel/bdist_wheel.py", line 211, in run
      self.run_command('install')
    File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py", line 43, in run
      install_regexes()
    File "/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py", line 16, in install_regexes
      'Unable to find regexes.yaml, should be at %r' % yaml_src)
  RuntimeError: Unable to find regexes.yaml, should be at '/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/uap-core/regexes.yaml'
  Copying regexes.yaml to package directory...

  ----------------------------------------
  Failed building wheel for ua-parser
Failed to build ua-parser
Installing collected packages: ua-parser
  Found existing installation: ua-parser 0.4.1
    Uninstalling ua-parser-0.4.1:
      Successfully uninstalled ua-parser-0.4.1
  Running setup.py install for ua-parser
    Complete output from command /Users/coltonprovias/Development/MyNewLeaf/bin/python3.5 -c "import setuptools, tokenize;__file__='/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-m56ggvi8-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/coltonprovias/Development/MyNewLeaf/bin/../include/site/python3.5/ua-parser:
    running install
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py", line 83, in <module>
        'Programming Language :: Python :: Implementation :: PyPy',
      File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py", line 43, in run
        install_regexes()
      File "/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py", line 16, in install_regexes
        'Unable to find regexes.yaml, should be at %r' % yaml_src)
    RuntimeError: Unable to find regexes.yaml, should be at '/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/uap-core/regexes.yaml'
    Copying regexes.yaml to package directory...

    ----------------------------------------
  Rolling back uninstall of ua-parser
Command "/Users/coltonprovias/Development/MyNewLeaf/bin/python3.5 -c "import setuptools, tokenize;__file__='/private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-m56ggvi8-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/coltonprovias/Development/MyNewLeaf/bin/../include/site/python3.5/ua-parser" failed with error code 1 in /private/var/folders/zl/0vgdj8l96ss0_w20j9k0t7bm0000gn/T/pip-build-x3p3uu52/ua-parser

Using newest regexes.yaml yields an error when loading module

This is being done on a Windows machine in a PowerShell session. See repro and error below:

$env:UA_PARSER_YAML = "C:\\Temp\\regexes.yaml"
>>> from ua_parser import user_agent_parser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\mark.depalma\Desktop\Projects\Intune\Scripts\DynamicRedirector\uatestenv\lib\site-packages\ua_parser\user_agent_parser.py", line 481, in <module>
    regexes = yaml.load(fp, Loader=SafeLoader)
  File "C:\Users\mark.depalma\Desktop\Projects\Intune\Scripts\DynamicRedirector\uatestenv\lib\site-packages\yaml\__init__.py", line 81, in load
    return loader.get_single_data()
  File "C:\Users\mark.depalma\Desktop\Projects\Intune\Scripts\DynamicRedirector\uatestenv\lib\site-packages\yaml\constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "yaml\_yaml.pyx", line 673, in yaml._yaml.CParser.get_single_node
  File "yaml\_yaml.pyx", line 687, in yaml._yaml.CParser._compose_document
  File "yaml\_yaml.pyx", line 731, in yaml._yaml.CParser._compose_node
  File "yaml\_yaml.pyx", line 845, in yaml._yaml.CParser._compose_mapping_node
  File "yaml\_yaml.pyx", line 729, in yaml._yaml.CParser._compose_node
  File "yaml\_yaml.pyx", line 806, in yaml._yaml.CParser._compose_sequence_node
  File "yaml\_yaml.pyx", line 731, in yaml._yaml.CParser._compose_node
  File "yaml\_yaml.pyx", line 845, in yaml._yaml.CParser._compose_mapping_node
  File "yaml\_yaml.pyx", line 694, in yaml._yaml.CParser._compose_node
  File "yaml\_yaml.pyx", line 858, in yaml._yaml.CParser._parse_next_event
  File "yaml\_yaml.pyx", line 867, in yaml._yaml.input_handler
  File "C:\Program Files\Python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 461: character maps to <undefined>

When manually stepping through code I saw that the file was being opened using 'cp1252' encoding by default. Adjusting the code so that the open handle is created as binary resolves the issue.

Line 480 of user_agent_parser.py with fix:
with open(UA_PARSER_YAML, 'rb') as fp:

uap-core pointing to 10 month old commit

Hello there.

We have a problem that new user agents are not parsed correctly in Python.
After looking investigating the regexes.yaml file, we see that Python library is pointing to a 10 month old commit of uap-core, but the most recent version of uap-core is 10 days old.

Is there a reason why old uap-core is used, or we can use the new one?

Incorrect Parsing for Keyword HTTP

When parsing UA string of HTTPie/0.9.9, user_agent_parser reports it as a "Generic Feature Phone", even though it's obviously not a phone.

Parser throwing UnicodeDecodeError on UA with Chinese characters

Here is the User Agent header in question:

Mozilla/5.0 (Linux; U; Android 4.4.3; zh-cn; ่ถๅ˜9999 Build/KOT49H) AppleWebKit/533.1 (KHTML, like Gecko)Version/4.0 MQQBrowser/5.4 TBS/025469 Mobile Safari/533.1 MicroMessenger/6.2.2.54_rec1912d.581 NetType/3gnet Language/zh_CN

And here is the traceback:

File "/usr/lib/python2.7/site-packages/user_agents-1.0.1-py2.7.egg/user_agents/parsers.py", line 232, in parse
return UserAgent(user_agent_string)
File "/usr/lib/python2.7/site-packages/user_agents-1.0.1-py2.7.egg/user_agents/parsers.py", line 126, in __init__
ua_dict = user_agent_parser.Parse(user_agent_string)
File "/usr/lib/python2.7/site-packages/ua_parser/user_agent_parser.py", line 232, in Parse
'device': ParseDevice(user_agent_string, **jsParseBits),
File "/usr/lib/python2.7/site-packages/ua_parser/user_agent_parser.py", line 315, in ParseDevice
device, brand, model = deviceParser.Parse(user_agent_string)
File "/usr/lib/python2.7/site-packages/ua_parser/user_agent_parser.py", line 203, in Parse
model = self.MultiReplace(self.model_replacement, match)
File "/usr/lib/python2.7/site-packages/ua_parser/user_agent_parser.py", line 184, in MultiReplace
_string = re.sub(r'\$(\d)', _repl, string)
File "/usr/lib64/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128)

Below UserAgent String returns wrong device Name

Mozilla/5.0 (Linux; Android 6.0.1; ATH-AL00 Build/HONORATH-AL00; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.124 Mobile Safari/537.36 JsKit/1.0 (Android) SohuNews/5.7.3 BuildCode/113

Output:

{   'device': {   'family': 'ATH-AL00 Build/HONORATH-AL00; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.124 Mobile Safari/537.36 JsKit/1.0 (Android) SohuNews/5.7.3'},
    'os': {   'family': 'Android',
              'major': '6',
              'minor': '0',
              'patch': '1',
              'patch_minor': None},
    'string': 'Mozilla/5.0 (Linux; Android 6.0.1; ATH-AL00 Build/HONORATH-AL00; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.124 Mobile Safari/537.36 JsKit/1.0 (Android) SohuNews/5.7.3 BuildCode/113',
    'user_agent': {   'family': u'Chrome Mobile',
                      'major': '53',
                      'minor': '0',
                      'patch': '2785'}}```

Released Versions?

Are there plans to add released tags to this repo? I'd like to add this repo as a subtree and/or submodule but there are no tags to point at, only commits on master.

Wrong Android Browser

Mozilla/5.0 (Linux; U; Android 2.3.5; zh-cn; HTC_IncredibleS_S710e Build/GRJ90) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1

Returns:
{'user_agent': {'family': 'Android', 'major': '2', 'minor': '3', 'patch': '5'}, 'device': {'family': 'HTC IncredibleS S710e', 'brand': 'HTC', 'model': 'IncredibleS S710e'}, 'os': {'family': 'Android', 'major': '2', 'patch_minor': None, 'minor': '3', 'patch': '5'}, 'string': 'Mozilla/5.0 (Linux; U; Android 2.3.5; zh-cn; HTC_IncredibleS_S710e Build/GRJ90) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1'}

http://www.useragentstring.com/index.php marks the browser as "4.0" but here it's 2.3.5, which is the android version, not the browser version

Update to latest uap-core

@mattrobenolt - unfortunately I missed an issue in uap-core that left my fix for Pingdom bot detection incomplete. #53 includes that fix. I'd appreciate another patch release of uap-python.

Thanks!

ua-parser fails to parse recent Windows version numbers

It seems ua-parser 0.5.0 does not correctly parse out the version number in later versions of Windows.

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:38)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from ua_parser import user_agent_parser
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent=4)
>>> ua_string = 'Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'
>>> parsed_string = user_agent_parser.ParseOS(ua_string)
>>> pp.pprint(parsed_string)
{   'family': u'Windows 8.1',
    'major': None,
    'minor': None,
    'patch': None,
    'patch_minor': None}

firefox tablet parsing fails

  • UA parser: 0.8
{
    'user_agent': {
        'family': 'Firefox Mobile', 
        'major': '41', 
        'minor': '0', 
        'patch': None
    }, 
    'os': {
        'family': 'Android', 
        'major': '4', 
        'minor': '4', 
        'patch': None, 
        'patch_minor': None
        }, 
    'device': {
        'family': 'Generic Tablet', 
        'brand': 'Generic', 
        'model': 'Tablet'
        }, 
    'string': 'Mozilla/5.0 (Android 4.4; Tablet; rv:41.0) Gecko/41.0 Firefox/41.0'
    }
  • UA parser: 0.9
{
    'user_agent': {
        'family': 'Firefox Mobile', 
        'major': '41', 
        'minor': '0', 
        'patch': None
        }, 
    'os': {
        'family': 'Android', 
        'major': '4', 
        'minor': '4', 
        'patch': None, 
        'patch_minor': None
        }, 
    'device': {
        'family': 'rv:41.0', 
        'brand': 'Generic_Android', 
        'model': 'rv:41.0'
        }, 
    'string': 'Mozilla/5.0 (Android 4.4; Tablet; rv:41.0) Gecko/41.0 Firefox/41.0'
    }

I'll see if i can submit a pull request for this. :)

the device part of the dictionary is off.

Improve testing infrastructure

  • improve test matrix and configuration
    • add 3.9 and 3.10
    • add pypy
    • maybe add 3.11 alpha? Might be useful as a smoke test at least
    • reorganise in order to have a single CI for all stable implementations to avoid configuration churn (wrt branch protection), in the current state any addition or removal of a python target has to be impacted there
  • investigate ymyzk/tox-gh-actions (or possibly tox's homegrown but that's only for the yet-to-be-released tox 4), a gh action has been defined for CI but it duplicates the tox file (which I haven't updated and may be broken in part or whole), this duplication seems like a shame
  • test both regex and yaml implementations, otherwise yaml bitrots (cf #99)
  • add benches? needs a benchmarking dataset (cf #97), also not sure it's acceptable to run on GHA, it's technically possible (free plans allow 20 concurrent jobs and up to 6h per job) but it might quickly stray into abuse, look further into this
  • fuzzing? not sure there's really enough actual code execution for this to be useful, though it might be able to uncover redos issues in the base set('s interaction with Python's standard regex engine), the two tools I could find for coverage-guided fuzzing are pythonfuzz (by gitlab) and atheris (by google)
  • look at coverage, this may be useless for the same reasons as above
  • enable merge queues
    it's not super urgent, but from time to time I have two PRs I want to merge (generally because I extracted one from the other) and in that case github's auto-merge is just stupid: with "require branches to be up to date before merging" once the first PR has passed its checks and been merged the second will just wait forever with no notification sent to anyone

Not detecting Instagram app

Actually, this is a user agent for Instagram ios app:

Mozilla/5.0 (iPhone; CPU iPhone OS 12_0_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/16A404 Instagram 72.0.0.20.101 (iPad6,11; iOS 12_0_1; ru_RU; ru-RU; scale=2.00; gamut=normal; 750x1334; 131642248)

It has app name, but the result of user_agent_parser.Parse(ua_string) for this:

{
   'family': 'Mobile Safari UI/WKWebView',
   'major': '12',
   'minor': '0',
   'patch': '1'
}

But 'family': 'Instagram' is expected, because uap-core has regex for Instagram UA.

It seems that uap-core submodule is outdated.

ModuleNotFoundError: No module named 'ua_parser._regexes'

Hello,

I try to install ua-parser using 'pip install ua-parser' and manual install(python setup.py install), however, both issued the same error: ModuleNotFoundError: No module named 'ua_parser._regexes'. The following is the detail error message after I install ua-parser use pip.

Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.
900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

from ua_parser import user_agent_parser
Traceback (most recent call last):
File "", line 1, in
File "E:\ua\uap-python-master\uap-python-master\ua_parser\user_agent_parser.p
", line 552, in
from ._regexes import USER_AGENT_PARSERS, DEVICE_PARSERS, OS_PARSERS
ModuleNotFoundError: No module named 'ua_parser._regexes'

My os is win7 64bit. Is there something I missed?
Thanks for your attention.

New PyPI release?

Thanks for a useful library!

I noticed that a lot has happened in this repo since the last PyPI release (april 2018). Would it be possible to cut a new PyPI release?

user_agent_parser.Parse hangs on certain malformed inputs

user_agent_parser.Parse seems to hang forever on certain malformed UA strings.

Repro:

from ua_parser import user_agent_parser
user_agent_parser.Parse("Mozilla/5.0 (Linux; Android 9; SM-G975U Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/77.0.3865.92 Mobile Safari/537.36                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     (Mobile;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     afma-sdk-a-v14300000.14300000.0)")

(Note the large amount of whitespace.)

Browser family error?

Take the following User Agent instance:
'Mozilla/5.0 (Linux; Android 5.0.2; SM-T535 Build/LRX22G; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/50.0.2661.86 Safari/537.36 [FB_IAB/FB4A;FBAV/77.0.0.20.66;]'

Place that in a variable ua, then:

user_agent_parser.ParseUserAgent(ua)

says that the browser family is 'Facebook'.

I have never heard of such a browser family.

Is this behavior correct?

Consistent and immutable versioning

Hello!

It would be nice if the python package version changed when the linked uap-core submodule was updated. uap-core has been updated several times in master since 0.8.0 has been released. Is there an intention to 'release' a new version of uap-python?

Maintain history and version number consistency

I just noticed that version 0.5.0 is available on pypi, so I went here looking for a chagelog. I cannot find one, and the code in the repository seems like it's still version 0.4.1, which is the version I currently use in my project.

I looked in uap-core and found version 0.5.0 under releases, but I was confused that there was no version 0.4.1 (only 0.4.0), and I still can't really find a good description of what has changed. Of course I can read the commit messages, but that is not very easily read.

All in all, I am hessitant to upgrade, because I am unsure of what has happened.

user_agent_parser.Parse can't get versions that are not in format {major}.{minor}.{patch}

Python version: Python 3.7.1
ua-parser version: ua-parser==0.8.0

REPL:

>>> from ua_parser import user_agent_parser as uap
>>> uap.Parse('Mozilla/5.0 (Linux; Android 9; SM-G960F Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36 PKeyAuth/1.0')

Output:

{'user_agent': {'family': 'Chrome Mobile WebView', 'major': '74', 'minor': '0', 'patch': '3729'}, 'os': {'family': 'Android', 'major': None, 'minor': None, 'patch': None, 'patch_minor': None}, 'device': {'family': 'Samsung SM-G960F', 'brand': 'Samsung', 'model': 'SM-G960F'}, 'string': 'Mozilla/5.0 (Linux; Android 9; SM-G960F Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36 PKeyAuth/1.0'}

Expected:

{'user_agent': {'family': 'Chrome Mobile WebView', 'major': '74', 'minor': '0', 'patch': '3729'}, 'os': {'family': 'Android', 'major': '9', 'minor': None, 'patch': None, 'patch_minor': None}, 'device': {'family': 'Samsung SM-G960F', 'brand': 'Samsung', 'model': 'SM-G960F'}, 'string': 'Mozilla/5.0 (Linux; Android 9; SM-G960F Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36 PKeyAuth/1.0'}

User Agent string generated by Android 9 devices doesn't follow the format {major}.{minor}.{patch} e.g. Android 9.0, instead just give Android 9, which cause the parser unable to recognise its version.

Process for updating uap-core?

I'm wondering what's the process for updating uap-python with the latest rules from uap-core?It looks like @mattrobenolt usually does this - do you want update requests as an issue like this, a PR to merge?

The git submodule for uap-core currently points at a commit not associated with a tag in uap-core which suggests that we don't need to wait for a "release" from uap-core to update uap-python. Seems like it would be preferable to point at a release but makes sense that we'd want to avoid that level of coordination.

I'm asking because I recently submitted ua-parser/uap-core#216 that was merged to master over the weekend and would like to pull it into uap-python so I can use it.

Would be helpful to update the README with instructions on how to make these updates.

Thanks!

Repository lacks license

As per $SUBJ the pypi sdist tarball does not contain license file nor the repository.

Even if you forked you still need to ship a license file (in this case I guess Apache-2.0) within the repo and the archive.

MS Edge support

Hey @elsigh!

So I got a report that python-user-agents is detecting MS Edge wrong and after some digging it looks like it's using uap-python. I believe MS Edge detection was in ua-parser but wanted to check. My python-fu is pretty weak so I'd need some assistance on that front.

Changelog?

Upgrading from 0.4.0 to 0.7.1 I was surprised to find everything coming back in bytes rather than str, I would have found this out more easily with a changelog to read ๐Ÿ˜„

Add Brave browser support

Sample Useragent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) brave/0.7.11 Chrome/47.0.2526.110 Brave/0.36.5 Safari/537.36

website: https://brave.com

Improve uap-python installation instructions

  1. On Mac OS X Yosemite
    make Makefile
    produces
    make: Nothing to be done for Makefile'.`
  2. Running
    python setup.py install
    seems to work, but importing the module fails with the error
>>> from ua_parser import user_agent_parser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ua_parser/user_agent_parser.py", line 460, in <module>
    yamlFile = open(yamlPath)
IOError: [Errno 2] No such file or directory: 'ua_parser/regexes.yaml'

uap-core should be kept upto date better.

The sub-module hash for uap-core comes from November.

  • Can we get an update to 'pip' with the latest regexes from 'uap-core?

For those of you who are interested. If you have ua-parser and you want to get the latest regexs. You will need to run the 'makefile' or 'python setup.py sdist'.

The reason for this, you need the 'json' files as well as the 'yaml'.

git clone https://github.com/ua-parser/uap-python
cd uap-python
git submodule update --init
python setup.py sdist

Then check 'regexes.yaml' for an item in the latest copy. For me, that was 'Opera Coast'. A driving reason why I wanted to update my ua-parser.

Case insensitive parsing

Thank you for this great utility! I wanted to share some feedback and offer to take on the contribution to work on it if you're open to that.

I did a quick search of prior issues and didn't see anything related. Please feel free to link if it is a duplicate

I recently found that lower case user agent strings led to issues where data was not extracted/parsed properly. The below illustrates this:

>>> from ua_parser import user_agent_parser
>>> import pprint
>>> ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'
>>> normal_case_parsed = user_agent_parser.Parse(ua_string)
>>> pprint.pprint(normal_case_parsed)
{'device': {'brand': 'Apple', 'family': 'Mac', 'model': 'Mac'},
 'os': {'family': 'Mac OS X',
        'major': '10',
        'minor': '9',
        'patch': '4',
        'patch_minor': None},
 'string': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 '
           '(KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36',
 'user_agent': {'family': 'Chrome',
                'major': '41',
                'minor': '0',
                'patch': '2272'}}
>>> lower_case_parsed = user_agent_parser.Parse(ua_string.lower())
>>> pprint.pprint(lower_case_parsed)
{'device': {'brand': None, 'family': 'Other', 'model': None},
 'os': {'family': 'Other',
        'major': None,
        'minor': None,
        'patch': None,
        'patch_minor': None},
 'string': 'mozilla/5.0 (macintosh; intel mac os x 10_9_4) applewebkit/537.36 '
           '(khtml, like gecko) chrome/41.0.2272.104 safari/537.36',
 'user_agent': {'family': 'Other', 'major': None, 'minor': None, 'patch': None}}

Let me know of any other details to provide, otherwise I will take a look for where a patch could be applied and see if I can contribute one.

Slowness in parsing user agent

Some user agent strings are causing the user agent parser to be really slow.

from ua_parser import user_agent_parser

ua_str = "Mozilla/5.0 (iPhone; CPU iPhone OS 7_5 like Mac OS X) AppleWebKit/1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 (KHTML, like Gecko) Version/5.1 Mobile/9334 Safari/7548.320"

s = time.time()
print(user_agent_parser.ParseUserAgent(ua_str))
print('took ', time.time() - s)
  • v0.7.3 took ~0.05s
  • v0.8.0 took ~7.0s
  • v0.8.0 took ~110s! with AppleWebKit/<400 digits>

The notable change was updating uap-core gitsubmodule from ce89c7637eeeb07b3464dbf40645bb3973723c0a to fc570f378e41063bad3bdf0532967743efc75b4b

After digging around, I've found that the issue is due to the regex addition in uap-core:

The latest regexes.xml altered the regex, and looks to have fixed the slowness issue.

PR for #70 looks to use the new regex which helps the issue.

IndexError: no such group for Mozilla/5.0 (Linux; U; Android 4.1.1; en-us; Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30

this user agent string fails to parse:
user_agent_parser.Parse('Mozilla/5.0 (Linux; U; Android 4.1.1; en-us; Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30')
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\ua_parser-0.4.0-py2.7.egg\ua_parser\user_agent_parser.py", line 212, in Parse
'device': ParseDevice(user_agent_string, **jsParseBits),
File "C:\Python27\lib\site-packages\ua_parser-0.4.0-py2.7.egg\ua_parser\user_agent_parser.py", line 291, in ParseDevice
device, brand, model = deviceParser.Parse(user_agent_string)
File "C:\Python27\lib\site-packages\ua_parser-0.4.0-py2.7.egg\ua_parser\user_agent_parser.py", line 187, in Parse
device = match.group(1)
IndexError: no such group

Update uap-core submodule to 0.6.10

A fix for detecting Edge (Chromium) as Chrome is in version 0.6.10. Would it be possible to update the submodule and cut a new release please?

ParseUserAgent, ParseOS and ParseDevice are not using the cache

Hi,

I was using this library for a project recently, and was really surprised that ParseUserAgent was significantly slower than Parse. After digging a bit into the code, I think this is because ParseUserAgent is not using the internal cache, whereas Parse is.

As both methods are documented in the README, I was expecting both to have similar performance, with ParseUserAgent being a bit faster because it has to parse less. Currently, both methods seem to be recommended for regular use, but ParseUserAgent seems significantly worse than Parse.

In my opinion it might be helpful to document this behavior in the README or change it, because it is currently faster to use Parse(ua_string)['user_agent'] instead of ParseUserAgent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.