Coder Social home page Coder Social logo

True Casing about compromise HOT 10 CLOSED

ItIsSeven avatar ItIsSeven commented on September 22, 2024
True Casing

from compromise.

Comments (10)

spencermountain avatar spencermountain commented on September 22, 2024 2

ooh, that's a very interesting idea.
I would do something like this:
https://runkit.com/spencermountain/65ca22696b87a50008841654
obviously that's not full APA style or anything, but it should get you started
cheers

from compromise.

MarketingPip avatar MarketingPip commented on September 22, 2024 1

@ItIsSeven & @spencermountain - I have something awhile back for doing this. Tho it's not battle tested & needs LOTS more rules + Spencer probably has better way of doing things with more knowledge of Compromise API.

But I'll open something up shortly & if either of you wanna pluck away. Feel free.

from compromise.

spencermountain avatar spencermountain commented on September 22, 2024

this is a cool project, and let me know if you'd like to turn it into a plugin, once it gets working.
There are a bunch of edge-cases, i'm sure, that would need to get worked-out, but fun stuff.

from compromise.

MarketingPip avatar MarketingPip commented on September 22, 2024

@spencermountain - found this via old issue Cap Rule Set.

Could be useful. (if you decide you wanna pick this up)

from compromise.

MarketingPip avatar MarketingPip commented on September 22, 2024

@spencermountain - can you possibly give me an idea of proper usage for using groups like this?

.match("[government|president] of [#Country]")

Not sure how to use | with groups to simplify rules, we could then go through ORG words etc. And do something like this and then apply some context rules.

{pattern:"[government|president] of [#Country]", matches:2}

from compromise.

spencermountain avatar spencermountain commented on September 22, 2024

hey - sure no prob.
the OR logic uses () brackets, like foo (bar|baz)
capture groups use [] brackets, with an optional name, like doc.match('foo [<two>bar]', 'two')

these two features can be combined, so that you grab either 'bar' or 'baz', like so:

doc.match('foo [(bar|baz)]', 0)
//or
doc.match('foo [<two>(bar|baz)]', 'two')

happy to help, if I can clear things up further
cheers

from compromise.

spencermountain avatar spencermountain commented on September 22, 2024

ps - yeah, that tagging file is really neat, isn't it?

NNP NN PREV2WD ESTATE seems like it would map to #Pronoun #Noun estate in compromise jargon. I'm sure there's a lot we could learn from that dataset

from compromise.

MarketingPip avatar MarketingPip commented on September 22, 2024

@spencermountain - could you give me a better idea of how to do this?

import nlp from "https://esm.sh/compromise"



function CapitalizeWords(text){
 
  let doc = nlp(text)
  let finalText = null;
  function applyRule(rule){
    
     const groups = doc.match('[(government|president)] of [#ProperNoun]').groups()
for(let item in groups){
 finalText = finalText.replace(groups[item].text(),nlp(groups[item].text()).toTitleCase().text())
}
return finalText 
  }

  
  
  function goThroughRules(){
    const rules = ['[(government|president)] of [#ProperNoun]']
    finalText = doc.text() 
    
    for(let item in rules){
       finalText = applyRule(rules[item])
      console.log(finalText)
    }
    return finalText
  }
  
  return goThroughRules()
}
console.log(CapitalizeWords("The government of canada is amazing")) //

so we can easily do this?

console.log(CapitalizeWords("The government of canada is amazing and so is the president of america")) //

When currently using:

    const rules = ['[(government|president)] of [#ProperNoun]']

It doesn't work. (Assuming we have to write them as single rules?) if so possible feature request for matcher?

Then we could easily make something based off current org words etc.....

plus as said - this should help big time with NLP by Title Casing, then checking for common nouns etc... And having tags for title case words only... Example House of commons.

from compromise.

spencermountain avatar spencermountain commented on September 22, 2024

sure, i'd do something like this:

let rules=[
{match:'house of [.]', group:0}
]

rules.forEach(obj=>{
   let m = doc.match(obj.match, obj.group)
  if(m.found){
    m.toTitleCase()
  }
})

cheers

from compromise.

MarketingPip avatar MarketingPip commented on September 22, 2024

@spencermountain my bad - this is what I was looking for. Keeping this here for reference for me, you and @ItIsSeven

function CapitalizeWords(text){
 
  let doc = nlp(text)
  let finalText = null;
  function applyRule(rule){
    
     const groups = doc.match(rule).groups()
for(let item in groups){
  const words = groups[item].json()
  
  for(let word in words){
    word = words[word].text
    
     finalText = finalText.replace(word,nlp(word).toTitleCase().text())
  }

}
return finalText 
  }

  
  
  function goThroughRules(){
    const rules = ['[(government|president)] of [#ProperNoun]']
    finalText = doc.text() 
    
    for(let item in rules){
       finalText = applyRule(rules[item])
      console.log(finalText)
    }
    return finalText
  }
  
  return goThroughRules()
}
console.log(CapitalizeWords("The government of canada is amazing and the president of canada but the president is not")) 
// Outputs: "The Government of Canada is amazing and the President of Canada but the president is not"

Spencer - if you wanna go at this, I am down. I have been training AI for this to make AI rule set for this once I finally figured it out to try and contribute something useful actually to this project instead of my sh*tty issues you probably wanna punch me in the face for lol!

from compromise.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.