Coder Social home page Coder Social logo

jakelever / pubrunner Goto Github PK

View Code? Open in Web Editor NEW
41.0 7.0 6.0 579 KB

A framework for keeping biomedical text mining result up-to-date

License: MIT License

Python 98.47% Shell 1.16% Dockerfile 0.37%
bionlp text-mining infrastructure python snakemake pubmed pubmed-central

pubrunner's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pubrunner's Issues

bioc version

Hi Jake,

I think the bioc issue from kindred also affects pubrunner:

pip install pubrunner
Collecting pubrunner
  Using cached https://files.pythonhosted.org/packages/cb/2e/f4c4efccacea847f0f61eb1adcf871983a61185daaa4cb5e86972698dedd/pubrunner-0.5.1.tar.gz
Collecting six (from pubrunner)
  Using cached https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Collecting gitpython (from pubrunner)
  Using cached https://files.pythonhosted.org/packages/fe/e5/fafe827507644c32d6dc553a1c435cdf882e0c28918a5bab29f7fbebfb70/GitPython-2.1.11-py2.py3-none-any.whl
Collecting pyyaml (from pubrunner)
  Using cached https://files.pythonhosted.org/packages/9e/a3/1d13970c3f36777c583f136c136f804d70f500168edc1edea6daa7200769/PyYAML-3.13.tar.gz
Collecting wget (from pubrunner)
  Using cached https://files.pythonhosted.org/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip
Collecting requests (from pubrunner)
  Using cached https://files.pythonhosted.org/packages/ff/17/5cbb026005115301a8fb2f9b0e3e8d32313142fe8b617070e7baad20554f/requests-2.20.1-py2.py3-none-any.whl
Collecting ftputil (from pubrunner)
  Using cached https://files.pythonhosted.org/packages/0d/ab/8e5cc3199b16c37d926b5d8091fbaf9b2734a7b2c5579ee96d063f319a2a/ftputil-3.4.tar.gz
Collecting bioc==1.2.2 (from pubrunner)
  Could not find a version that satisfies the requirement bioc==1.2.2 (from pubrunner) (from versions: 1.0.dev22, 1.0.dev23, 1.0.dev24, 1.0.dev25, 1.0.dev27, 1.0.dev28, 1.0.dev29, 1.0.dev30, 1.0.dev31, 1.0.dev32, 1.0, 1.1.dev1, 1.1.dev2, 1.1.dev3, 1.2.3, 1.2.4, 1.3, 1.3.1)
No matching distribution found for bioc==1.2.2 (from pubrunner)

Thanks,
Karyn

Unknown command pubrunner

Hi Jake,

As discussed in another thread, I am trying to run your biowordlists project to create the latest Term List. I was able to successfully install pubrunner via pip.

However, when I try to run it I get the following error -

fish: Unknown command pubrunner

FYI, I am using the fish shell, I don't think that should make a difference since I even tried to run it with bash but got the same error.

I am using Python 3.7. I thought it may be an issue of PATH so I tried to debug a bit but everything looks fine.

Ran the following command to see the installed location of pubrunner -

$ pip3 show pubrunner

Name: pubrunner
Version: 0.5.2
Summary: A framework to rerun text mining tools on the latest publications
Home-page: http://github.com/jakelever/pubrunner
Author: Jake Lever
Author-email: [email protected]
License: MIT
Location: /Users/chaitanyagupta/Library/Python/3.7/lib/python/site-packages
Requires: future, wget, six, gitpython, drmaa, bioc, snakemake, markdown2, pyfiglet, biopython, jsonlines, pymarc, ftputil, pyyaml, requests
Required-by:

I also checked my sys.path to see if it had the above location /Users/chaitanyagupta/Library/Python/3.7/lib/python/site-packages, and it is present -

$python3

Python 3.7.3 (default, Mar 27 2019, 09:23:15)
[Clang 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.path)
['', '/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python37.zip', '/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7', '/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload', '/Users/chaitanyagupta/Library/Python/3.7/lib/python/site-packages', '/usr/local/lib/python3.7/site-packages']

Can you help me with this? I am not sure what I can do to get it running.

Edit: Included link to the other thread

urllib.error.URLError: <urlopen error [Errno 11002] getaddrinfo failed>

Hi,I am new to python and your pubrunner tool. When I run some examples, I have met these problems.

D:\cancermine\pubrunner\examples\SmallTextFinder>pubrunner --test .
       _____  _     _ ______   ______ _     _ __   _ __   _ _______  ______
      |_____] |     | |_____] |_____/ |     | | \  | | \  | |______ |_____/
      |       |_____| |_____] |    \_ |_____| |  \_| |  \_| |______ |    \_



d:\python37\lib\site-packages\pubrunner\globalsettings.py:13: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yamlData = yaml.load(f)
Working directory: C:\Users\詹飞/pubrunner/workspace\SmallTextFinder\test
Traceback (most recent call last):
  File "d:\python37\lib\urllib\request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "d:\python37\lib\http\client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "d:\python37\lib\http\client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "d:\python37\lib\http\client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "d:\python37\lib\http\client.py", line 1016, in _send_output
    self.send(msg)
  File "d:\python37\lib\http\client.py", line 956, in send
    self.connect()
  File "d:\python37\lib\http\client.py", line 1384, in connect
    super().connect()
  File "d:\python37\lib\http\client.py", line 928, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "d:\python37\lib\socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "d:\python37\lib\socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11002] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Python37\Scripts\pubrunner.exe\__main__.py", line 9, in <module>
  File "d:\python37\lib\site-packages\pubrunner\command_line.py", line 66, in main
    pubrunner.pubrun(args.codebase,args.test,(not args.nogetresource),forceresource_dir=args.forceresource_dir,forceresource_format=args.forceresource_format,outputdir=args.outputdir)
  File "d:\python37\lib\site-packages\pubrunner\pubrun.py", line 349, in pubrun
    prepareConversionAndHashingRuns(toolSettings,mode,workingDirectory)
  File "d:\python37\lib\site-packages\pubrunner\pubrun.py", line 103, in prepareConversionAndHashingRuns
    eutilsToFile('pmc',pmcid,filename)
  File "d:\python37\lib\site-packages\pubrunner\pubrun.py", line 37, in eutilsToFile
    handle = Entrez.efetch(db=db, id=id, rettype="gb", retmode="xml")
  File "d:\python37\lib\site-packages\Bio\Entrez\__init__.py", line 195, in efetch
    return _open(cgi, variables, post=post)
  File "d:\python37\lib\site-packages\Bio\Entrez\__init__.py", line 555, in _open
    handle = _urlopen(cgi)
  File "d:\python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "d:\python37\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "d:\python37\lib\urllib\request.py", line 543, in _open
    '_open', req)
  File "d:\python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "d:\python37\lib\urllib\request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "d:\python37\lib\urllib\request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11002] getaddrinfo failed>

Error handling

The error handling needs to be improved to give more useful feedback. Below are a couple ideas

  • Track which slurm file to look at when there is a failure on the cluster
  • Deal appropriately if "IN" or "OUT" is missing in a build step

About the keyword for searching from PubMed

Thanks for your contribution!
I have one question for the PubRunner.I notice that your tool is good for downloading articles. However, when I wanna only download articles within a specific concept such as cancer, I don't know where to start. Can I ask if you have any suggeston regarding this issue?
Thank you very much. I am really appreciated your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.