Coder Social home page Coder Social logo

crops's Introduction

petermr repositories

Many of these repos are widely used in collaborative projects and include:

  • code
  • data
  • projects

This special repo is to coordinate navigation and discussion

discussion lists

The "Discussions" for this repo https://github.com/petermr/petermr/discussions include discussions for the other repos and are of indicated by their name. They may replace our (private) Slack for all public-facing material (private project management will remain on Slack).

active repos

active Python projects:

For context: We have 4 packages (if that's the right word). They are largely standalone but can have useful library routines. They all share a common data structure on disk (simply named directories). This means that state is less important and often held on the filesystem. It also means that data can be further manipulated by Unix tools and other utilities. This is very fluid as we are constantly adding new data substructures. (I developed much of this in Java - https://github.com/petermr/ami3/blob/master/README.md) . The top directory is a CProject and its document children are called CTrees as they are useful split into many subdirectory trees.

Each package has a maintainer. These are all volunteers. Their Python is all self-taught . There are also interns - mixture of compsci/engineers/plant_sci who have a 3-month stay. They test the tools, develop resources, explore text-mining, NLP, image analysis, machine-learning, etc. They are encouraged to use the packages, link them into Python scripts or Notebooks but don't have time for serious development. (They might add readers or exporters).

  • pygetpapers , Ayush Garg. https://github.com/petermr/pygetpapers . Searches and downloads articles from repositories. Standalone, but the results may be used by docanalysis or possibly imageanalysis. Can be called from other tools.

  • docanalysis. Shweata Hegde. https://github.com/petermr/docanalysis . Ingests CProjects and carries out text-analysis of documents, including sectioning, NLP/text-mining, vocabulary generation. Uses NLTK and other Python tools for many operations, and spaCy, scispaCy for annotation of entities. Outputs summary data, correlations, word-dictionaries. Links entities to Wikidata.

  • pyamiimage, Anuv Chakroborty + PMR. https://github.com/petermr/pyamiimage . Ingests Figures/images, applies many image processing techniques (erode-dilate, colour quantization, skeletons, etc.), extracts words (Tesseract) , extracts lines and symbols (uses sknw/NetworkX) and recreates semantic diagrams (not finished)

  • py4ami . PMR. https://github.com/petermr/pyami . Translation of ami3(J) to Python. Processes CProjects to extract and combine primitives into semantic objects. Some functionality overlaps with docanalysis and imageanalysis. Includes libraries (e.g. for Wikimedia) and includes prototype GUI in tkinter, and a complex structure of word-dictionaries covering science and related disciplines. (Note the project is called pyami locally but there is already a PyAMI project so there it is called py4ami)

All packages aim to have a common commandline approach, use config files, generate and process CProjects (e.g. iterating over CTrees and applying filters, transformers, map/reduce, etc.). All 4 packages have been uploaded to PyPI

basicTest

Checks that the Python environment works (independently of the applications) https://github.com/petermr/basicTest/blob/main/README.md

presentations

Some presentations about the software, many from collaborators/interns

pygetpapers

notebook

docanalysis

wikidata

crops's People

Contributors

anam04anjum avatar anjalishekhawat1997 avatar ankit3699 avatar anuvc avatar petermr avatar prachi0508 avatar rageshwari000 avatar sasujadhav1 avatar shweatanhegde avatar utkarsha-05 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

crops's Issues

Py4ami installation Error

  • SYSTEM SPECIFICATION:
    • Windows edition:
  • Windows 8.1 single language.
    
  • System:
    • processor: Intel(R)Core(TM)i5-4200u CPU @1.60 GHz
    • Installed RAM: 4.00 GB
    • System Type: 64-bit Operating System, x64-based processor
    • GIT-2.33.0
    • PYTHON-3.9.5
  • Ran Command pip install py4ami
  • pip install py4ami
  • Error Appeared:
  • error: could not create 'build\lib\py4ami\resources\projects\liion4\PMC7048421\sections\0_front\1_article-meta\16_funding-group\0_award-group\0_funding-source\0_institution-wrap\0_institution-id.xml': No such file or directory

Testing dictionary

SYSTEM SPECIFICATION
Windows 10 Home Single Language
Version 20H2
64-bit operating system, x64-based processor

CREATING DICTIONARY
I used following commands to create an dictionary
amidict -v --dictionary eo_Gene --directory gene --input gene.txt create --informat list --outformats xml
Create corpus using this command pygetpapers -q "terpene synthase TPS plant volatile" -o TPSvolatile -x -p -k <number of papers>

Created dictionary:-
https://github.com/petermr/crops/blob/main/Zea%20mays/eo_ZeaTPS.xml

TESTING DICTIONARY

By running this command we have to test the dictionary`` ami -p "TPSvolatile" search --dictionary eo_Gene.xml`

So we get a full data table for eo_gene dictionary

https://github.com/petermr/crops/blob/main/assets/Screenshot%20(95).png

DIFFICULTIES

  • We are facing in full data table for eo_gene dictionary.
  • Not able to get term count for the dictionary terms but only for the word frequencies.

Error while Testing Dictionary

SYSTEM USED

  • Windows 10 Home Single Language
  • 64-bit operating system, x64-based processor

ISSUE

While testing dictionary by following command

ami -p "TPSvolatile" search --dictionary eo_Gene.xml

  • It throws an error java.lang.NullPointerException

I would request to resolve this error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.