Coder Social home page Coder Social logo

footrulr's People

Contributors

goldbergdata avatar kanishkamisra avatar katherinesimeon avatar maurolepore avatar willdebras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

footrulr's Issues

Include sample corpora for intuitive examples

Adding data for the source, target, generated candidate sets as well as human annotated reference texts would be wonderful to produce vignettes and demonstrate how these metrics can be computed over entire corpora.

Please suggest the corpora that can be included in this issue :)

Website

TODO:

  • Setup your deployment keys (see ?pkgdown::deploy_site_github)

... you will need to setup your deployment keys. The easiest way is to call travis::use_travis_deploy(). This will generate and push the necessary keys to your GitHub and Travis accounts. See the travis package website for more details.

Example:

image

Badges

Some badges to consider:

use_cran_badge()
use_lifecycle_badge("experimental")

These functions also add badges:

use_coverage()
use_travis()

Checklist for release

(Adapted fron forestgeo/learn#182)

Diagnosis from running devtools::release()

  • update.packages()
  • spell_check_package()
-- R CMD check results -------------------------------- footrulr 0.0.0.9000 ----
Duration: 24.5s

> checking examples ... ERROR
  Running examples in 'footrulr-Ex.R' failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: bleu
  > ### Title: BLEU (Bilingual Evaluation Understudy)
  > ### Aliases: bleu
  > 
  > ### ** Examples
  > 
  > # Our candidate and reference data
  > sample_data <- list(list(candidate = "The cat the cat on the mat", references = c("The cat is on the mat", "There is a cat on the mat")),
  + list(candidate = "The cat the cat on the map", references = c("The cat is on the ccccccat", "There is a cat on the mat")))
  > 
  > 
  > 
  > # Run function
  > bleu(sample_data)
  Error: `map_df()` requires dplyr
  Execution halted

> checking tests ...
  See below...

> checking dependencies in R code ... NOTE
  Unexported object imported by a ':::' call: 'purrr:::probe'
    See the note in ?`:::` about the use of this operator.

> checking R code for possible problems ... NOTE
  footrulr: no visible global function definition for 'map2'
  Undefined global functions or variables:
    map2

> checking Rd line widths ... NOTE
  Rd file 'bleu.Rd':
    \examples lines wider than 100 characters:
       sample_data <- list(list(candidate = "The cat the cat on the mat", references = c("The cat is on the mat", "There is a cat on the mat") ... [TRUNCATED]
       list(candidate = "The cat the cat on the map", references = c("The cat is on the ccccccat", "There is a cat on the mat")))
  
  These lines will be truncated in the PDF manual.

-- Test failures ------------------------------------------------- testthat ----

> library(testthat)
> library(footrulr)
> 
> test_check("footrulr")
-- 1. Error: bleu works (@test-bleu.R#9)  --------
`map_df()` requires dplyr
1: bleu(sample_data, 1) at testthat/test-bleu.R:9
2: map_df(.data, function(item) {
       cand <- item$candidate
       ref <- item$references
       scores <- map_through_ngram(item)
       tibble(candidate = cand, references = list(ref), 
           scores)
   })
3: abort("`map_df()` requires dplyr")

== testthat results  =============================
OK: 2 SKIPPED: 0 FAILED: 1
1. Error: bleu works (@test-bleu.R#9) 

Error: testthat unit tests failed
Execution halted

2 errors x | 0 warnings v | 3 notes x
  • devtools::check()

  • Remember to remove dev version.

WARNING: version (0.0.0.9000) should have exactly three components
  • After fixing local R CMD check, remember to run rhub::check_for_cran()

  • check_win_devel()

  • After first release remember to use_news_md()

  • Remember to update DESCRIPTION. Particularly, ensure there is no typo on Title: and Description, and check author details.

  • If submitting to CRAN, remember to use_cran_comments()

Full implementation of the BLEU Metric

Currently I have a naive implementation for BLEU that does not penalize short translations. This issue is circumvented using a Brevity Penalty.

Shamelessly stealing an example from Rachel Tatman's blog post:

Consider a sentence: J’ai mangé trois filberts with Reference Translations:

  1. I have eaten three hazelnuts.
  2. I ate three filberts.

And some of the candidate translations:

  1. I hate three hazelnuts
  2. I ate

Both of these get a BLEU-2 score of 1 (since all bigrams in "I ate" are in the reference translation #1.)

A brevity penalty penalizes these short translations that can misguide the analysis from results of a mediocre translation system that produces smaller sentences as translations that still get high BLEU scores.

A reference for the computation can be found here: https://github.com/vikasnar/Bleu/blob/master/calculatebleu.py

Vignettes to demonstrate how the metrics work

Once we have sample texts as well as more metric implementations, it would be nice to have Vignettes to show how the metrics work as well as how this package helps in computing them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.