Coder Social home page Coder Social logo

hscells / transmute Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 2.0 836 KB

PubMed/Medline query transpiler

Home Page: https://godoc.org/github.com/hscells/transmute

License: MIT License

Go 100.00%
pubmed medline systematic-reviews pubmed-parser medline-parser

transmute's Introduction

gopher

transmute

GoDoc Go Report Card gocover

PubMed/Medline Query Transpiler

The goal of transmute is to provide a way of transforming PubMed/Medline search strategies from systematic reviews into other queries suitable for other search engines. The result of the transformation is an immediate representation which can be analysed with greater ease or transformed again run on other search engines. This is why transmute is described as a transpiler. An immediate representation allows trivial transformation to boolean queries acceptable by search engines, such as Elasticsearch.

An example of a Medline and Pubmed query are:

1. MMSE*.ti,ab.
2. sMMSE.ti,ab.
3. Folstein*.ti,ab.
4. MiniMental.ti,ab.
5. \"mini mental stat*\".ti,ab.
6. or/1-5
(\"Contraceptive Agents, Female\"[Mesh] OR \"Contraceptive Devices, Female\"[Mesh] OR contracept*[tiab]) AND (\"Body Weight\"[Mesh] OR weight[tiab] OR \"Body Mass Index\"[Mesh]) NOT (cancer*[ti] OR polycystic [ti] OR exercise [ti] OR physical activity[ti] OR postmenopaus*[ti])

Both are valid Pubmed and Medline search strategies reported in real systematic reviews; transmute can currently transform both Medline and PubMed queries. An example API usage by constructing a pipeline and executing it is shown in the next section.

API Usage

Here we construct a pipeline in Go:

query := `1. MMSE*.ti,ab.
2. sMMSE.ti,ab.
3. Folstein*.ti,ab.
4. MiniMental.ti,ab.
5. \"mini mental stat*\".ti,ab.
6. or/1-5`

p := transmute.pipeline.NewPipeline(transmute.parser.NewMedlineParser(),
                                    transmute.backend.NewElasticsearchCompiler(),
                                    transmute.pipeline.TransmutePipelineOptions{RequiresLexing: true})
dsl, err := p.Execute(query)
if err != nil {
    panic(err)
}

println(dsl.StringPretty())

Which results in:

{
    "query": {
        "bool": {
            "disable_coord": true,
            "should": [
                {
                    "bool": {
                        "should": [
                            {
                                "wildcard": {
                                    "title": "MMSE*"
                                }
                            },
                            {
                                "wildcard": {
                                    "abstract": "MMSE*"
                                }
                            }
                        ]
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "title",
                            "abstract"
                        ],
                        "query": "sMMSE"
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "wildcard": {
                                    "title": "Folstein*"
                                }
                            },
                            {
                                "wildcard": {
                                    "abstract": "Folstein*"
                                }
                            }
                        ]
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "title",
                            "abstract"
                        ],
                        "query": "MiniMental"
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "wildcard": {
                                    "title": "\"mini mental stat*\""
                                }
                            },
                            {
                                "wildcard": {
                                    "abstract": "\"mini mental stat*\""
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Command Line Usage

As well as being a well-documented library, transmute can also be used on the command line. Since it is still in development, it can be built from source with go tools:

go get -u github.com/hscells/transmute/cmd/transmute
transmute --help
transmute --input mmse.query --parser medline --backend elasticsearch

The output of the command line pretty-prints the same output from above.

Assumptions

The goal of transmute is to parse and transform PubMed/Medline queries into queries suitable for other search engines. However, the project makes some assumptions about the query:

  • The parser does not attempt to simplify boolean expressions, so badly written queries will remain inefficient.
  • A query cannot compile to Elasticsearch when it contains an adjacency operator with more than one field. This is due to a limitation with Elasticsearch.

Extending

If you would like to extend transmute and create a new backend for it, have a read of the documentation. As this should lead you in the right direction. Writing a new backend requires the transformation of the immediate representation into the target query language.

Citing

If you use this work for scientific publication, please reference

@inproceedings{scells2018framework,
 author = {Scells, Harrisen and Locke, Daniel and Zuccon, Guido},
 title = {An Information Retrieval Experiment Framework for Domain Specific Applications},
 booktitle = {The 41st International ACM SIGIR Conference on Research \& Development in Information Retrieval},
 series = {SIGIR '18},
 year = {2018},
} 

Logo

The Go gopher was created by Renee French, licensed under Creative Commons 3.0 Attributions license.

transmute's People

Contributors

hscells avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

transmute's Issues

Parse queries without brackets and "exp"

For instance:

exp animals/ not humans.sh.

This line in a query is not being parsed correctly. I don't currently know what should happen in Elasticsearch if there is an exp expression. There are no brackets surrounding the query either which makes it even harder to parse.

Remove all the panics

Trying to use transmute as a library is almost impossible because it panics over syntax errors. This is stupid and needs to be resolved.

Write test cases

Test cases should be written for each level of the pipeline:

  • lexer tests include preprocessing and the structure of the query tree
  • parser tests include ensuring fields are mapped, wildcards are added, and nested queries are transformed correctly
  • compiler tests include checking compiled queries against known gold-standard (possibly manually translated) queries

Custom field mapping

A field mapping should be able to be loaded as part of a pipeline and from the command line. Everybody will probably call their fields something different in their own index so this will assist with allowing transmute to be more flexible.

limit keyword

Some Medline queries have a limit keyword. It is currently unclear how to incorporate this into the Elasticsearch dsl.

Example:

(tuberculosis or TB).tw
limit 1 to yr="2007 -Current"
Mycobacterium tuberculosis/
limit 2 to yr="2007 -Current"
Tuberculosis, Multidrug-Resistant/ or Tuberculosis/ or Tuberculosis, Pulmonary/
limit 3 to yr="2007 -Current"
1 or 2 or 3
(Xpert or GeneXpert or cepheid or( near* patient)). tw.
limit 4 to yr="2007 -Current"
4 and 5 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.