Coder Social home page Coder Social logo

harisont / concept-alignment Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 37.08 MB

Syntax-based Concept Alignment for Machine Translation

License: BSD 2-Clause "Simplified" License

Haskell 0.34% TeX 98.40% Grammatical Framework 1.26%
grammatical-framework machine-translation universal-dependencies

concept-alignment's People

Contributors

aarneranta avatar harisont avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

inariksit

concept-alignment's Issues

Experiment further with `gf-ud`'s patterns

See if it's possible to use them to replace divergence patterns expressed as functions to:

  • simplify the CA algorithms
  • simplify usage (no need to modify the actual code ever)

Make `evalign` modify the actual alignments

i.e. read and write conllu format, so that judgement are useful for GG as well as debugging. TBD what the role of the "linearized format" is, if any.

  • reimplement evalign
  • update readme
  • use only + or only = alignment in CP/lexicon generation if the conllu files are annotated

[Feature request] Retain old comment in result file

I'm running extract-concepts on a conllu file that has sentences like the following, starting with a comment:

# <http://snomed.info/id/188462001>
1	Secondary	secondary	ADJ	<*>.A.ABS.<SS>	Degree=Pos	3	amod	_	_
2	malignant	malignant	ADJ	A.ABS	Degree=Pos	3	amod	_	_
3	neoplasm	neoplasm	NOUN	N.NOM.SG	Case=Nom|Number=Sing	0	root	_	_
4	of	of	ADP	PREP	AdpType=Prep	5	case	_	_
5	brain	brain	NOUN	<Count>.N.NOM.SG	Case=Nom|Number=Sing	3	nmod	_	_
6	and	and	CCONJ	CC.@CC	_	7	cc	_	_
7	spinal cord	spinal cord	NOUN	<Count>.N.NOM.SG	Case=Nom|Number=Sing	5	conj	_	_

In the result, the original comment has been replaced with new ones:

# sent_id = gfud1000002
# text = of brain and spinal cord
# reasons: {POS, UD} sentence IDs: {"missing sent_id"} correctness: _
1	of	of	ADP	PREP	AdpType=Prep	2	case	_	ADJUSTED=True
2	brain	brain	NOUN	<Count>.N.NOM.SG	Case=Nom|Number=Sing	0	root	_	ADJUSTED=True|ORIG_LABEL=nmod
3	and	and	CCONJ	CC.@CC	_	4	cc	_	ADJUSTED=True
4	spinal cord	spinal cord	NOUN	<Count>.N.NOM.SG	Case=Nom|Number=Sing	2	conj	_	ADJUSTED=True

I would like to know where the fragment came from, so I'd like the original comment to be retained in the output, something like the following. Placement doesn't matter, just that it is there somewhere.

# sent_id = gfud1000002
# text = of brain and spinal cord
# <http://snomed.info/id/188462001>
…

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.