Coder Social home page Coder Social logo

jocelynpender / fna-query Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 54.35 MB

Miscellaneous code for querying FNA Semantic MediaWiki

License: Other

R 0.62% Python 0.26% HTML 99.12%
botany data-science biodiversity biodiversity-informatics biodiversity-data semantic-mediawiki api-usage-mining flora

fna-query's Introduction



Query the Flora of North America Semantic MediaWiki

These scripts allow you to query the http://beta.semanticfna.org/ API module "ask" using R or Python. They return a CSV file of the results.

Getting started

Prepare your query

The Flora of North America Semantic MediaWiki can be queried using the Semantic MediaWiki semantic search syntax.

In brief, you must have a condition:

[[Authority::Linnaeus]]

You can optionally return properties of the taxa matching your condition:

?Distribution

Putting this all together using pipes, we have a query like this:

[[Authority::Linnaeus]]|?Distribution

Or with additional properties requested, like this:

[[Authority::Linnaeus]]|?Distribution|?Taxon family

Sample queries can be found here:

Read more about Semantic MediaWiki query syntax:

Query size limitations

Semantic MediaWiki limits API queries to 5,000 results. If you expect your query to return more than 5,000 results, you should run your query in batches. (N.B.: There are ~20,000 treatments in the FNA Online.)

We recommend running your queries by 'published volume' by adding a volume condition to your query (e.g., "[[Volume::Volume 17]]"). Please see this page for a list of volumes that can be queried.

Use R

This section assumes you are familiar with the R programming language.

Show instructions

Prerequisites

Open a terminal.

Type git clone https://github.com/jocelynpender/fna-query.git

Open an R console. Type

install.packages("WikipediR")
install.packages("tidyverse")

Run your query

  1. Open an R console
  2. Open the run_query.R script
  3. Run your query:

Option A: Return taxa names only (i.e., query does not include ? parameter)

E.g., [[Distribution::Nunavut]]

Use ask_query_titles. It returns only a list of Taxon names that match your query.

In the fna-query directory, run

source("R/src/query.R")
page_titles_vector <- ask_query_titles("[[Distribution::Nunavut]]", "output_file_name.csv")

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Distribution::Nunavut]]|?Taxon family

Use ask_query_titles_properties It returns a list of Taxon names and associated properties asked for by your query

In the fna-query directory, run

source("R/src/query.R")
properties_texts_data_frame <- ask_query_titles_properties("[[Distribution::Nunavut]]|?Taxon family", "output_file_name.csv")

Expected output

Option A: Return taxa names only (i.e., query does not include ? parameter)

E.g., [[Distribution::Nunavut]]

> page_titles_vector

[1] "Abietinella abietina"                     
[2] "Achillea millefolium"                     
[3] "Agrostis"                                 
[4] "Agrostis anadyrensis"        
 ...

See https://github.com/jocelynpender/fna-query/blob/master/R/demo_queries/distribution/nunavut_taxa.csv for a sample output file.

Option B: Return taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Distribution::Nunavut]]|?Taxon family

> properties_texts_data_frame
                                            Taxon family
Abietinella abietina                         Thuidiaceae
Achillea millefolium                          Asteraceae
Agrostis                                         Poaceae
Agrostis anadyrensis                             Poaceae   
 ...

See https://github.com/jocelynpender/fna-query/blob/master/R/demo_queries/distribution/nunavut_taxa_family_name.csv for a sample output file.

Run a demo query

Don't know what to query? See the demo queries here: https://github.com/jocelynpender/fna-query/tree/master/R/demo_queries

Use Python

This section assumes you are familiar with Python programming.

Show instructions

Prerequisites

Create an account

You'll need to create an account to use the API with Python

  1. Create your account http://beta.floranorthamerica.org/Special:CreateAccount

  2. Find the file called local.py.example in the python/src folder. Rename it to local.py and add your credentials.

Dependencies

Option A. Use pip

requirements.txt has been generated with pip freeze > requirements.txt

Open a terminal.

cd fna-query
pip install -r requirements.txt

Option B. Use conda

The project was built within a conda environment. A conda YAML file has been generated with conda env export > fna-query.yml.

Open a terminal.

cd fna-query
conda env create -f fna-query.yml

Run your query

  1. Open a terminal.
  2. Prepare your query. E.g., [[Special status::Introduced]]
  3. Run your query using: (if using conda, start with: conda activate environment-name)
cd fna-query
cd python
python -m src.run_query --output_file_name "output_file_name.csv" --query_string "[[Query::here]]"

The -m flag tells Python to run the script run_query.py and import the src module.

Expected output

If your query results are extensive, the query will take some time to process. Please be patient.

Option A: Taxa names only (i.e., query does not include ? parameter)

E.g., [[Illustrator::+]][[Illustration::Present]]

python -m src.run_query --output_file_name "illustrated_taxa.csv" --query_string "[[Illustrator::+]][[Illustration::Present]][[Taxon family::Asteraceae]]"

See https://github.com/jocelynpender/fna-query/blob/master/python/demo_queries/distribution/nunavut_taxa.csv for a sample output file.

Option B: Taxa names and properties (i.e., query includes a ? parameter)

E.g., [[Illustrator::+]][[Illustration::Present]]|?Taxon rank

python -m src.run_query --output_file_name "illustrated_taxa_taxon_family.csv" --query_string "[[Illustrator::+]][[Illustration::Present]][[Taxon family::Asteraceae]]|?Taxon rank"

See https://github.com/jocelynpender/fna-query/blob/master/python/demo_queries/distribution/nunavut_taxa_family_name.csv for a sample output file.

Run a demo query

Don't know what to query? See the demo queries here: https://github.com/jocelynpender/fna-query/tree/master/python/demo_queries

Getting help

Contact [email protected] or [email protected] for support.

Bug reports

Please leave your bug reports here: https://github.com/jocelynpender/fna-query/issues

Resources

Dependency documentation

Merging multiple CSV files

Sometimes you'll need to batch the API return results. Here is an R script for merging multiple CSV files.

fna-query's People

Contributors

jocelynpender avatar

Stargazers

 avatar

Watchers

 avatar

fna-query's Issues

Conda error ResolvePackageNotFound

When creating the conda environment using fna-query.yml I get a ResolvePackageNotFound error:

$ conda env create -f fna-query.yml 
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - tk==8.6.8=ha441bb4_0
  - xz==5.2.4=h1de35cc_4
  - python==3.7.6=h359304d_2
  - openssl==1.1.1d=h1de35cc_3
  - libffi==3.2.1=h475c297_4
  - zlib==1.2.11=h1de35cc_3
  - cryptography==2.8=py37ha12b0ac_0
  - sqlite==3.30.1=ha441bb4_0
  - ncurses==6.1=h0a44026_1
  - numpy==1.18.1=py37h7241aed_0
  - libgfortran==3.0.1=h93005f0_2
  - cffi==1.13.2=py37hb5b8e2f_0
  - libcxx==4.0.1=hcfea43d_1
  - mkl-service==2.3.0=py37hfbe908c_0
  - numpy-base==1.18.1=py37h6575580_0
  - libcxxabi==4.0.1=hcfea43d_1
  - mkl_random==1.1.0=py37ha771720_0
  - libedit==3.1.20181209=hb402a30_0
  - intel-openmp==2019.4=233
  - pandas==0.25.3=py37h0a44026_0
  - readline==7.0=h1de35cc_5
  - mkl_fft==1.0.15=py37h5e564d8_0
  - mkl==2019.4=233

Python pip install error: No matching distribution found for mkl-fft

When installing the python dependencies for the query I get a mkl-fft related error:

$ pip install -r requirements.txt
Collecting asn1crypto==1.3.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/e9/51/1db4a60049fb7390959be586b6eb743098e6cea3f6b2d3ed9e17fec62ba2/asn1crypto-1.3.0-py2.py3-none-any.whl (103kB)
    100% |████████████████████████████████| 112kB 1.9MB/s
Collecting blinker==1.4 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/1b/51/e2a9f3b757eb802f61dc1f2b09c8c99f6eb01cf06416c0671253536517b6/blinker-1.4.tar.gz (111kB)
    100% |████████████████████████████████| 112kB 5.6MB/s
Collecting bs4==0.0.1 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting certifi==2019.11.28 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/b9/63/df50cac98ea0d5b006c55a399c3bf1db9da7b5a24de7890bc9cfd5dd9e99/certifi-2019.11.28-py2.py3-none-any.whl (156kB)
    100% |████████████████████████████████| 163kB 5.9MB/s
Collecting cffi==1.13.2 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/49/72/0d42f94fe94afa8030350c26e9d787219f3f008ec9bf6b86c66532b29236/cffi-1.13.2-cp36-cp36m-manylinux1_x86_64.whl (397kB)
    100% |████████████████████████████████| 399kB 9.8MB/s
Requirement already satisfied: chardet==3.0.4 in /home/lujantorob/anaconda3/lib/python3.6/site-packages (from -r requirements.txt (line 6)) (3.0.4)
Collecting cryptography==2.8 (from -r requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/ca/9a/7cece52c46546e214e10811b36b2da52ce1ea7fa203203a629b8dfadad53/cryptography-2.8-cp34-abi3-manylinux2010_x86_64.whl (2.3MB)
    100% |████████████████████████████████| 2.3MB 4.8MB/s
Collecting idna==2.8 (from -r requirements.txt (line 8))
  Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)
    100% |████████████████████████████████| 61kB 23.5MB/s
Collecting mkl-fft==1.0.15 (from -r requirements.txt (line 9))
  Could not find a version that satisfies the requirement mkl-fft==1.0.15 (from -r requirements.txt (line 9)) (from versions: 1.0.0.17, 1.0.2, 1.0.6)
No matching distribution found for mkl-fft==1.0.15 (from -r requirements.txt (line 9))

(Using ubuntu 18)

Use new scheme argument for specifying host with mwclient

DeprecationWarning
/Users/jocelynpender/miniconda3/envs/fna-query/lib/python3.7/site-packages/mwclient/client.py:377: DeprecationWarning: Specifying host as a tuple is deprecated as of mwclient 0.10.0. Please use the new scheme argument instead.

If multiple pages in response, response is not parsed

If the answer is of type "page", and there's more than one answer, the "page" object is not parsed, and is simply returned as a string.

E.g.
python -m src.run_query --output_file_name "out.txt" --query_string "[[Taxon name::×leydeum dutillyanum]]|?Illustrator"

results in:
Taxon name,Illustrator
×leydeum dutillyanum,"[OrderedDict([('fulltext', 'Cindy Roché'), ('fullurl', '//beta.floranorthamerica.org/Cindy_Roch%C3%A9'), ('namespace', 0), ('exists', '1'), ('displaytitle', '')]), OrderedDict([('fulltext', 'Annaliese Miller'), ('fullurl', '//beta.floranorthamerica.org/Annaliese_Miller'), ('namespace', 0), ('exists', '1'), ('displaytitle', '')])]"

Compare with:
python -m src.run_query --output_file_name "out.txt" --query_string "[[Taxon name::×triticosecale]]|?Illustrator"
which results in:
Taxon name,Illustrator
×triticosecale,Cindy Roché

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.