Coder Social home page Coder Social logo

lfoppiano / material-parsers Goto Github PK

View Code? Open in Web Editor NEW
6.0 4.0 0.0 70.36 MB

Material parsers and other tools, scripts Initially developed for Grobid Superconductor

License: Apache License 2.0

Python 68.69% JavaScript 27.77% HTML 0.39% Jupyter Notebook 2.82% Dockerfile 0.19% CSS 0.14%
superconductors materials physics text-mining

material-parsers's Introduction

Python CI

Material Parsers (and other tools)

Previously this project was released as grobid-superconductors-tools, born as aister project of grobid-superconductors containing a web service that interfaces with the python libraries (e.g. Spacy).

The service provides the following functionalities:

  • Convert material name to formula (e.g. Lead -> Pb, Hydrogen -> H): /convert/name/formula
  • Decompose formula into structured dict of elements (e.g. La x Fe 1-x O7-> {La: x, Fe: 1-x, O: 7}): /convert/formula/composition
  • Classify material in classes (from the superconductors domain) using a rule-base table (e.g. "La Cu Fe" -> Cuprates): /classify/formula
  • Tc's classification (Tc, not-Tc): /classify/tc for information please open an issue
  • Relation extraction given a sentence and two entities: /process/link for information please open an issue
  • Material processing using Deep Learning models and rule-based processing /process/material

Usage

The service is deployed on huggingface spaces, and can be used right away. For installing the service in your own environment see below.

Convert material name to formula

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/name/formula' \
--form 'input="Hydrogen"'

output:

{"composition": {"H": "1"}, "name": "Hydrogen", "formula": "H"}

Decompose formula in a structured dict of elements

Example:

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/formula/composition' \

--form 'input="CaBr2-x"'

output:

{"composition": {"Ca": "1", "Br": "2-x"}}

Classify materials in classes

Example:

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/classify/formula' \
--form 'input="(Mo 0.96 Zr 0.04 ) 0.85 B x "'

output:

['Alloys']

Process material

This process includes a combination of everything listed above, after passing the material sequence through a DL model

Example:

curl --location 'https://lfoppiano-material-parsers.hf.space/process/material' \
--form 'text="(Mo 0.96 Zr 0.04 ) 0.85 B x "'

output:

[
    {
        "formula": {
            "rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x"
        },
        "resolvedFormulas": [
            {
                "rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x",
                "formulaComposition": {
                    "Mo": "0.816",
                    "Zr": "0.034",
                    "B": "x"
                }
            }
        ]
    }
]

Evaluation

The model uses DeLFT's model BidLSTM_CRF.

Evaluated on the 23/12/25

                  precision    recall  f1-score   support

        <doping>     0.6926    0.6377    0.6640       265
   <fabrication>     0.3333    0.0909    0.1429        44
       <formula>     0.8348    0.8459    0.8403      2569
          <name>     0.7346    0.7935    0.7629       949
         <shape>     0.9089    0.9608    0.9341       841
     <substrate>     0.5875    0.3176    0.4123       148
         <value>     0.8844    0.8920    0.8882       463
      <variable>     0.9645    0.9710    0.9677       448

all (micro avg.)     0.8321    0.8385    0.8353      5727

Installing in your environment

docker run -it lfoppiano/grobid-superconductors-tools:2.1

References

If you use our work, and write about it, please cite our paper:

@article{doi:10.1080/27660400.2022.2153633,
    author = {Luca Foppiano and Pedro Baptista Castro and Pedro Ortiz Suarez and Kensei Terashima and Yoshihiko Takano and Masashi Ishii},
    title = {Automatic extraction of materials and properties from superconductors scientific literature},
    journal = {Science and Technology of Advanced Materials: Methods},
    volume = {3},
    number = {1},
    pages = {2153633},
    year = {2023},
    publisher = {Taylor & Francis},
    doi = {10.1080/27660400.2022.2153633},
    URL = {
    https://doi.org/10.1080/27660400.2022.2153633
    },
    eprint = {
    https://doi.org/10.1080/27660400.2022.2153633
    }
}

Overview of the repository

  • Converters TSV to/from Grobid XML files conversion
  • Linking module: A rule based python algorithm to link entities
  • Commons libraries: contains common code shared between the various component. The Grobid client was borrowed from here, the tokenizer from there.

Developer's notes

Set up on Apple M1

conda install -c apple tensorflow-deps
pip install -r requirements.macos.txt 
conda install scikit-learn=1.0.1

We need to remove tensorflow, h5py, scikit-learn from the delft dependencies in setup.py

pip install -e ../../delft 
pip install -r requirements.txt 

Finally, don't forget to install the spacy model

python -m spacy download en_core_web_sm

Release

bump-my-version bump patch|minor|major

material-parsers's People

Contributors

dependabot[bot] avatar lfoppiano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

material-parsers's Issues

Add summary table

Add a summary table showing all the extracted information
New Mockup 1 copy

Actions:

  • Clicking on the material or the row will move to the part of the pdf where the mention is extracted
  • The items can be edited by clicking on the row
  • Once the user is happy with the result, it can download or copy on the clipboard

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.