Comments (10)
ooh, that's a very interesting idea.
I would do something like this:
https://runkit.com/spencermountain/65ca22696b87a50008841654
obviously that's not full APA style or anything, but it should get you started
cheers
from compromise.
@ItIsSeven & @spencermountain - I have something awhile back for doing this. Tho it's not battle tested & needs LOTS more rules + Spencer probably has better way of doing things with more knowledge of Compromise API.
But I'll open something up shortly & if either of you wanna pluck away. Feel free.
from compromise.
this is a cool project, and let me know if you'd like to turn it into a plugin, once it gets working.
There are a bunch of edge-cases, i'm sure, that would need to get worked-out, but fun stuff.
from compromise.
@spencermountain - found this via old issue Cap Rule Set.
Could be useful. (if you decide you wanna pick this up)
from compromise.
@spencermountain - can you possibly give me an idea of proper usage for using groups like this?
.match("[government|president] of [#Country]")
Not sure how to use |
with groups to simplify rules, we could then go through ORG words etc. And do something like this and then apply some context rules.
{pattern:"[government|president] of [#Country]", matches:2}
from compromise.
hey - sure no prob.
the OR logic uses ()
brackets, like foo (bar|baz)
capture groups use []
brackets, with an optional name, like doc.match('foo [<two>bar]', 'two')
these two features can be combined, so that you grab either 'bar' or 'baz', like so:
doc.match('foo [(bar|baz)]', 0)
//or
doc.match('foo [<two>(bar|baz)]', 'two')
happy to help, if I can clear things up further
cheers
from compromise.
ps - yeah, that tagging file is really neat, isn't it?
NNP NN PREV2WD ESTATE
seems like it would map to #Pronoun #Noun estate
in compromise jargon. I'm sure there's a lot we could learn from that dataset
from compromise.
@spencermountain - could you give me a better idea of how to do this?
import nlp from "https://esm.sh/compromise"
function CapitalizeWords(text){
let doc = nlp(text)
let finalText = null;
function applyRule(rule){
const groups = doc.match('[(government|president)] of [#ProperNoun]').groups()
for(let item in groups){
finalText = finalText.replace(groups[item].text(),nlp(groups[item].text()).toTitleCase().text())
}
return finalText
}
function goThroughRules(){
const rules = ['[(government|president)] of [#ProperNoun]']
finalText = doc.text()
for(let item in rules){
finalText = applyRule(rules[item])
console.log(finalText)
}
return finalText
}
return goThroughRules()
}
console.log(CapitalizeWords("The government of canada is amazing")) //
so we can easily do this?
console.log(CapitalizeWords("The government of canada is amazing and so is the president of america")) //
When currently using:
const rules = ['[(government|president)] of [#ProperNoun]']
It doesn't work. (Assuming we have to write them as single rules?) if so possible feature request for matcher?
Then we could easily make something based off current org words etc.....
plus as said - this should help big time with NLP by Title Casing, then checking for common nouns etc... And having tags for title case words only... Example House of commons
.
from compromise.
sure, i'd do something like this:
let rules=[
{match:'house of [.]', group:0}
]
rules.forEach(obj=>{
let m = doc.match(obj.match, obj.group)
if(m.found){
m.toTitleCase()
}
})
cheers
from compromise.
@spencermountain my bad - this is what I was looking for. Keeping this here for reference for me, you and @ItIsSeven
function CapitalizeWords(text){
let doc = nlp(text)
let finalText = null;
function applyRule(rule){
const groups = doc.match(rule).groups()
for(let item in groups){
const words = groups[item].json()
for(let word in words){
word = words[word].text
finalText = finalText.replace(word,nlp(word).toTitleCase().text())
}
}
return finalText
}
function goThroughRules(){
const rules = ['[(government|president)] of [#ProperNoun]']
finalText = doc.text()
for(let item in rules){
finalText = applyRule(rules[item])
console.log(finalText)
}
return finalText
}
return goThroughRules()
}
console.log(CapitalizeWords("The government of canada is amazing and the president of canada but the president is not"))
// Outputs: "The Government of Canada is amazing and the President of Canada but the president is not"
Spencer - if you wanna go at this, I am down. I have been training AI for this to make AI rule set for this once I finally figured it out to try and contribute something useful actually to this project instead of my sh*tty issues you probably wanna punch me in the face for lol!
from compromise.
Related Issues (20)
- "to" is a preposition and not a conjuction HOT 2
- Verb is mistakenly parsed as a noun. HOT 2
- Compromise-dates plugin mutates context param obj on function calls HOT 4
- people() seems to get confused by commas HOT 7
- Is `toNumbers` supposed to mutate data? HOT 2
- Compromise-dates plugin: Does not detect "fortnight" as a date
- question: status of wikipedia plugin? HOT 3
- Compromise-dates plugin: Does not project singular possessive dates
- Feature Request: toResponse() HOT 2
- `replaceWith()`s `keep.keepTags` not keeping tags when enabled HOT 2
- Compromise tags "there" incorrectly when followed by "is" HOT 2
- Determiner mistook for adjective HOT 1
- Number detection can fail when the number starts the text HOT 2
- Plugin hook methods overriding each other when multiple plugins share the same hook name HOT 2
- "wanna" is not parsed correctly when followed by a noun HOT 1
- "Here's", "There's", and "Where's" not being expanded HOT 2
- "everybody's" not expanded but only in certain sentences HOT 1
- Contraction not expanded properly for "Somebody's" HOT 1
- add dmy option HOT 1
- Compromise-dates plugin: Support duration ranges HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from compromise.