kanishkamisra / footrulr Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 6.0 7.13 MB

Compare sentences using Machine Translation and Text Summarization evaluation metrics

License: Other

R 100.00%

footrulr's People

Contributors

Stargazers

Watchers

Forkers

willdebras katherinesimeon maurolepore goldbergdata weiyangtham chldnjstjr

footrulr's Issues

Include sample corpora for intuitive examples

Adding data for the source, target, generated candidate sets as well as human annotated reference texts would be wonderful to produce vignettes and demonstrate how these metrics can be computed over entire corpora.

Please suggest the corpora that can be included in this issue :)

Website

TODO:

Setup your deployment keys (see ?pkgdown::deploy_site_github)

... you will need to setup your deployment keys. The easiest way is to call travis::use_travis_deploy(). This will generate and push the necessary keys to your GitHub and Travis accounts. See the travis package website for more details.

Add link to github repo https://kanishkamisra.github.io/footrulr

Example:

Badges

Some badges to consider:

use_cran_badge()
use_lifecycle_badge("experimental")

These functions also add badges:

use_coverage()
use_travis()

Code coverage

TODO:

usethis::use_coverage()

Checklist for release

(Adapted fron forestgeo/learn#182)

Diagnosis from running devtools::release()

update.packages()
spell_check_package()

-- R CMD check results -------------------------------- footrulr 0.0.0.9000 ----
Duration: 24.5s

> checking examples ... ERROR
  Running examples in 'footrulr-Ex.R' failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: bleu
  > ### Title: BLEU (Bilingual Evaluation Understudy)
  > ### Aliases: bleu
  > 
  > ### ** Examples
  > 
  > # Our candidate and reference data
  > sample_data <- list(list(candidate = "The cat the cat on the mat", references = c("The cat is on the mat", "There is a cat on the mat")),
  + list(candidate = "The cat the cat on the map", references = c("The cat is on the ccccccat", "There is a cat on the mat")))
  > 
  > 
  > 
  > # Run function
  > bleu(sample_data)
  Error: `map_df()` requires dplyr
  Execution halted

> checking tests ...
  See below...

> checking dependencies in R code ... NOTE
  Unexported object imported by a ':::' call: 'purrr:::probe'
    See the note in ?`:::` about the use of this operator.

> checking R code for possible problems ... NOTE
  footrulr: no visible global function definition for 'map2'
  Undefined global functions or variables:
    map2

> checking Rd line widths ... NOTE
  Rd file 'bleu.Rd':
    \examples lines wider than 100 characters:
       sample_data <- list(list(candidate = "The cat the cat on the mat", references = c("The cat is on the mat", "There is a cat on the mat") ... [TRUNCATED]
       list(candidate = "The cat the cat on the map", references = c("The cat is on the ccccccat", "There is a cat on the mat")))
  
  These lines will be truncated in the PDF manual.

-- Test failures ------------------------------------------------- testthat ----

> library(testthat)
> library(footrulr)
> 
> test_check("footrulr")
-- 1. Error: bleu works (@test-bleu.R#9)  --------
`map_df()` requires dplyr
1: bleu(sample_data, 1) at testthat/test-bleu.R:9
2: map_df(.data, function(item) {
       cand <- item$candidate
       ref <- item$references
       scores <- map_through_ngram(item)
       tibble(candidate = cand, references = list(ref), 
           scores)
   })
3: abort("`map_df()` requires dplyr")

== testthat results  =============================
OK: 2 SKIPPED: 0 FAILED: 1
1. Error: bleu works (@test-bleu.R#9) 

Error: testthat unit tests failed
Execution halted

2 errors x | 0 warnings v | 3 notes x

devtools::check()
Remember to remove dev version.

WARNING: version (0.0.0.9000) should have exactly three components

After fixing local R CMD check, remember to run rhub::check_for_cran()
check_win_devel()
After first release remember to use_news_md()
Remember to update DESCRIPTION. Particularly, ensure there is no typo on Title: and Description, and check author details.
If submitting to CRAN, remember to use_cran_comments()

Full implementation of the BLEU Metric

Currently I have a naive implementation for BLEU that does not penalize short translations. This issue is circumvented using a Brevity Penalty.

Shamelessly stealing an example from Rachel Tatman's blog post:

Consider a sentence: J’ai mangé trois filberts with Reference Translations:

I have eaten three hazelnuts.
I ate three filberts.

And some of the candidate translations:

I hate three hazelnuts
I ate

Both of these get a BLEU-2 score of 1 (since all bigrams in "I ate" are in the reference translation #1.)

A brevity penalty penalizes these short translations that can misguide the analysis from results of a mediocre translation system that produces smaller sentences as translations that still get high BLEU scores.

A reference for the computation can be found here: https://github.com/vikasnar/Bleu/blob/master/calculatebleu.py

Vignettes to demonstrate how the metrics work

Once we have sample texts as well as more metric implementations, it would be nice to have Vignettes to show how the metrics work as well as how this package helps in computing them.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.