r-lib / styler Goto Github PK

View Code? Open in Web Editor NEW

701.0 13.0 71.0 15.59 MB

Non-invasive pretty printing of R code

Home Page: https://styler.r-lib.org

License: Other

R 98.95% Rebol 0.03% C++ 0.80% Python 0.01% Rez 0.21%

r pretty-print

styler's People

Contributors

Stargazers

Watchers

Forkers

rlugojr lorenzwalthert jonmcalder jimhester jarodmeng yonicd nemochina2008 applied-statistic-using-r lizl90 devsjr smernagh aespar21 yutannihilation mkhoin yiluheihei nakarumanchi guhjy mhamine krivit beansrowning arekbee miraisolutions pridiltal davidklaing robinlovelace rjcc dshelldhillon mbattagl batpigandme michaelquinn32 lwjohnst86 jhaltiga pratishrai jb388 kassambara richardjacton nikolase90 renkun-ken trichelab cstepper brianlang kevinushey swsoyee jarvisc1 michaelchirico tubbz-alt singha53-zz oranwutang christopherkenny johannesne jonkeane isabella232 zee2theodd bms63 carolinacabl owain-s fkohrt mustberuss polkas dpprdan olivroy 0umfhxcvx5j7joaohfss5mncnistjj6q bahadzie averissimo etiennebacher doutianbao rpolars abson-dev

styler's Issues

Styling for multiple expressions

currently doesn't work. parse() returns a list of expressions, maybe we're looking only at the first element? Don't remember.

Think about grouping transformers

Only add/remove space
Only add/remove space or newlines
Also edit text

indention with multiple brackets

Currently, styler adds indention for every opening bracket, which is the first problem.

library(magrittr)

test <- c("(((", "1", ")))")

styler::style_text(test) %>% cat(sep = "\n")
#> (((
#>       1
#>     )))

Instead of just

#> (((
#>   1
#>     )))

The tree looks as follows:

styler:::compute_parse_data_nested(test) %>% styler:::visit(c(styler:::create_filler)) %>% 
  styler:::create_node_from_nested_root()
#>                                 levelName
#> 1  xxx                                   
#> 2   °--expr:  [0/0]                      
#> 3       ¦--'(': ( [0/0]                  
#> 4       ¦--expr:  [0/0]                  
#> 5       ¦   ¦--'(': ( [0/0]              
#> 6       ¦   ¦--expr:  [0/0]              
#> 7       ¦   ¦   ¦--'(': ( [1/0]          
#> 8       ¦   ¦   ¦--expr:  [1/0]          
#> 9       ¦   ¦   ¦   °--NUM_CONST: 1 [0/0]
#> 10      ¦   ¦   °--')': ) [0/0]          
#> 11      ¦   °--')': ) [0/0]              
#> 12      °--')': ) [0/0]

Second, it cannot "undo" indention when we put multiple closing brackets on one line, i.e. the last three brackets should not be indented, but this depends on what comes after the first bracket, and styler does not "look into the future" when serialising. That's why it gets it right when the brackets are on different lines.

(((
      1
    )
  )
)

Rstudio only adds one level of indention per line, no matter how many opening brackets are there.

((((((((
  1
))))))))

If the closing parenthesis are not on the same line, it un-indents all of them anyways:

(((((
  1
)
)
)
)
)

I think the way to go is to remove indention whenever there is a child that also has indention and starts on the same line. I had it the other way round just now (see here) which means only the first parenthesis has indention, all the subsequent "inner" parenthesis' indention was removed, but then the problem remains that the closing brackets are still indented by one level.
The solution I suggest would involve setting the indention after the visitor, having to go through all levels of nesting again. Alternatively, we could try to integrate it with the visitor we already have but I think that's a bad idea.

Function for applying all nested transformations

like style_text() et al.

add a progress bar when top-level functions are called

create_filler on flat structure

As found in #79 (comment), it may be possible to call create_filler() just once before nesting, which may improve performance.

T and TRUE / F and FALSE

The Introduction to R states in section 2.4 that you should not use the abbreviations T and F for TRUE and FALSE since T and F are just variables that are set to TRUE and FALSE by default but can be overwritten by the user.

The tidyverse style guide, does not explicitly discourage the use of T and F, but always uses TRUE and FALSE as the logical values.

As of 00f0d96, styler does not convert the values T and F to TRUE and FALSE. Since T and F might have been assigned other values than the default, such a conversion would be highly problematic.
On the other hand, currently, T and F are just left as is, without message to the user, which is probably also not the thing we would expect from styler.

To conclude, checking for the usage of T and F and return a warning might be a reasonable way to go, so the user can resolve the problem manually.

Single-line pipes?

tidyverse/style#28

refine testing

Consider using

files like in.R and out.R
a function that runs through all files (to avoid code duplication)

This issue originates in #21

Extensibility: Third-party style guide support

At it’s core, a style guide is a collection of transformer functions that fulfill certain formal requirements.
It would be nice if styler could support style guides defined by third parties.
In the following, I lay out a potential mechanism for how such a third-party support could look like. This were my first thoughts on the topic. Maybe there are better approaches to it.

Styler could provide a function to import style guides and save them in the local package directory of styler. People who want to create style guides could provide the transformer functions for example in a GitHub repository. The style guide that should be used to format a package’s source code (and potentially further options) could then be specified in a file in the root directory of that package (e.g. in styler.yml). Styler would then use the the information from this .yml file, source the corresponding .R files from the styler package directory and apply them (and potentially further options) to the source code to be formatted. The formal requirements of style guides had to be defined (i.e. before it can be imported). Potential requirements: Style guide must not yield an error when applied to a test source code. Transformer functions must have follow certain naming conventions, must be documented ect. Provide detailed instructions how to create / test style guides etc.

creating a development branch

Since we are now about to make changes to the top-level functions and we have already a release out there, I suggest to submit all PRs against devel, so we can have a working branch. @krlmlr what do you think?

support maximum number of characters per line

Something similar to formatR where the desired line length can be user-specified.

styling code with comments fails

styler does not correctly style code that contains comments:

styler::style_text(c("#' comment", " 1+ 1"))
#> [1] "1 + 1"

releases are not "real" releases

It may have to to with tic, but I think our releases are not "real" releases, but rather just annotated tags with NEWS as the description, at least they don't look the same as releases from other repos (like dplyr). You can also not access any release information via gh::gh("/repos/krlmlr/releases") via the gh package for R.

Formatter for nested parse data

Use data.tree package.

Each node should contain token, short, spaces and newlines.

#' @import data.tree
create_node_from_nested_root <- function(pd_nested) {
  n <- Node$new("xxx")
  create_node_from_nested(pd_nested, n)
  n
}

#' @importFrom purrr map2
create_node_from_nested <- function(pd_nested, parent) {
  if (is.null(pd_nested))
    return()

  node_info <-
    pd_nested %>%
    transmute(formatted = paste0(token, ": ", short, " [", newlines, "/", spaces, "]")) %>%
    .[["formatted"]]

  child_nodes <-
    node_info %>%
    map(parent$AddChild)

  map2(pd_nested$child, child_nodes, create_node_from_nested)
  return()
}

modify line break information

Line breaks should be forced by styler

After {
After } (except if followed by else)
after closing ) when the expression spans over multiple lines (see here)
etc.

Renaming

Naming of data

At the moment, for the parse table returned by utils::getParseData and
processed through various functions, we use many different names in different
functions for essentially this (flat) table:

pd (all rules in rules.R)
parse_data (serialize_parse_data)
parse_data_with_ws (serialize_parse_data_flat)
data (create_filler)

I think it would simplify things to just use one variable name throughout. I
suggest parse_data. We could use parse_data_flt for the flat version and and
parse_data_nstd for the nested version for situations where the processing
differs.

Naming of functions

As far as I understand, the terminology around ws is more of a legacy thing, i.e. in the functions compute_parse_data_flat_with_ws or add_ws_to_parse_data since it is not really just white space information that is affected by these functions, but rather a bunch of preprocessing Hence, I would suggest to rename the latter to pre_process_parse_data and the former to compute_parse_data_flt for now.

Tests should write structure of nested parse tree

to a file. This could make it easier to see what's going on internally, for cases where the formatted output is different from what we expect.

use styler to corrupt code

This might sound evil but I think it could be funny to have a function that - instead of making the code clean - corrupts it, i.e. to randomly permute the positions of the tokens before serialising the parse data.

support for single-file-styling

Currently, style_src() only supports formatting all .R files in a directory. However, I think it is reasonable to support styling single files, i.e. with a function that takes a character vectors of file paths as input.
What do you @krlmlr think? If we want to implement that, should it be a separate function? I think then it would be appropriate to change the name of style_src() to style_dir() and create a function style_file() or something like that.

Indention dependent on token-length

styer indents two spaces for each level:

call(
  call(
    call(3
    )
  )
)

However, we generally don't want two spaces. If the first argument of the indenting call follows on the same line, we want to indent depending on the length of the indenting token (at least RStudio does that):

very_long_call_here(1  
                    second_argument,
                    third_argument
)

work with text instead of token

So far, all rules in R/rules.R operate on tokens, i.e. on this vector that contains tokens.

op_token <- c(
  "'+'", "'-'", "'*'", "'/'", "'^'", "AND", "AND2", "EQ", "EQ_ASSIGN",
  "GE", "GT", "LE", "LEFT_ASSIGN", "LT", "NE", "OR", "OR2", "RIGHT_ASSIGN",
  "SPECIAL", "EQ_SUB", "ELSE")

For my PR #35, I was using the argument op_token in the function indent_op() in order to be consistent with previous work. It allows specifying all tokens that trigger indention. However, I think we should rather use text than token for two reasons:

If people want to customise styler, they rather want to specify a text of a token (like >=) than spending time on figuring out what "LE", "GT" etc. stand for.
More importantly, the token "SPECIAL" does not uniquely map to a text element, i.e. I suppose a case that is not covered by the tokens in op_token is declared as "SPECIAL". Hence, we can only define indention for all text elements corresponding to "SPECIAL".

How should we proceed?

make styler faster

The following could make styler run faster:

dropping unnecessary select() and arrange() statements.
early termination (e.g. rules?)
consider replacing some tidyverse calls with base R if substitution
does not make the code "uggly"? What do you think @krlmlr ?
which statements related to needs_indention()

microbenchmark::microbenchmark(dplyr = dplyr::mutate(mtcars, driver = 3), 
                               baseR = mtcars$driver <- 3)
#> Unit: microseconds
#>   expr      min       lq      mean   median        uq        max neval cld
#>  dplyr 1969.341 2009.056 3740.7995 2037.379 2091.4205 152117.420   100   b
#>  baseR   11.663   13.735   24.1628   20.456   22.9275    433.822   100  a

microbenchmark::microbenchmark(dplyr = dplyr::slice(mtcars, 1:3), 
                               baseR = mtcars[1:3, ])
#> Unit: microseconds
#>   expr      min       lq      mean    median       uq      max neval cld
#>  dplyr 2094.290 2148.480 2446.0150 2191.8545 2271.761 8700.220   100   b
#>  baseR  138.072  155.203  176.9532  165.3705  189.216  431.295   100  a

# pipe does probably not slow down things much.
library(magrittr)
microbenchmark::microbenchmark(dplyr = mtcars %>% summary(), 
                               baseR = summary(mtcars))
#> Unit: milliseconds
#>   expr      min       lq     mean   median       uq      max neval cld
#>  dplyr 4.550347 4.619500 5.089398 4.695070 4.833018 9.004815   100   a
#>  baseR 4.390407 4.497096 4.924098 4.548395 4.833101 8.284952   100   a

These are just some thoughts, profiling of our code is probably a better place to start anyways...

space before comma

Styler does not generally remove spaces before commas because there is no rule that does this:

library("styler")
library("magrittr")
style_text("c(    1,       16    , 333 , 33 ,  1)") %>% 
  cat(sep = "\n")
#> c(1, 16    , 333 , 33 , 1)

spacing around (more) operators

in, like in for the fixme in tests/testthat/transformer_grouping/scope_none-out.R.

comments and line-breaks

Comments will currently not be moved to the end of a line if we change line breaks.

a <- function(x) # 
{
  x + 1
}

becomes

a <- function(x) # {
  x + 1
}

Which is really not what we want 😩. I think we talked about this case @krlmlr, but I just forgot about it.

mutate instead of sub assignment

# Good
pd <- pd %>%
  indent_round(pd, indent_by = 2) %>%
  mutate(child = map(child, indent_round_nested)

# Bad
pd <- indent_round(pd, indent_by = 2)
pd$child <- map(pd$child, indent_round_nested)

Maybe this can even be added to the tidyverse style guide

Extract parse_transform_serialize()

Only three steps, but used frequently.

Reference: #79 (comment)

style R Markdown (code chunks)

It would be nice if styler could not only format plain .R files, but also .Rmd files.

Code Chunks

This could probably by implemented by changing transform_files, i.e. adding a function transform_lines_rmd_chunk. Main steps are:

first read read_lines_enc
look for regex ^´´´\\{r.*\\} *$ and ^``` *$
transforming lines between these expressions while leaving everything else unchanged
write back with write_lines_enc

@krlmlr maybe this functionality fits even in the utf8 package?

Plain Text

Line breaks if text is longer than 80 characters
punctuation (i.e. space after . , ? ! ect.)

The second point might be out of the scope of the package though...

distinguish operators

I am not sure how we can distinguish signs of numbers (e.g. - in -2) from addition and subtraction.
Take this example:
42 + -2 / 78.
The tree structure for it looks as follows:

library("magrittr")
styler:::compute_parse_data_nested("42 + -2 / 78") %>% styler:::create_filler_nested() %>% 
  styler:::create_node_from_nested_root()
#>                                 levelName
#> 1  xxx                                   
#> 2   °--expr: 42 +  [0/0]                 
#> 3       ¦--expr: 42 [0/1]                
#> 4       ¦   °--NUM_CONST: 42 [0/0]       
#> 5       ¦--'+': + [0/1]                  
#> 6       °--expr: -2 /  [0/0]             
#> 7           ¦--expr: -2 [0/1]            
#> 8           ¦   ¦--'-': - [0/0]          
#> 9           ¦   °--expr: 2 [0/0]         
#> 10          ¦       °--NUM_CONST: 2 [0/0]
#> 11          ¦--'/': / [0/1]              
#> 12          °--expr: 78 [0/0]            
#> 13              °--NUM_CONST: 78 [0/0]

It looks as if +/- are located at the top of a parse table, they are signs, otherwise, they are operators for subtraction / addition.
Maybe we can look at a more complicated example.

library("magrittr")
styler:::compute_parse_data_nested("42 + (-2 - -3 *( 10 +1)/ 2) / -78") %>% 
  styler:::create_filler_nested() %>% styler:::create_node_from_nested_root()
#>                                                      levelName
#> 1  xxx                                                        
#> 2   °--expr: 42 +  [0/0]                                      
#> 3       ¦--expr: 42 [0/1]                                     
#> 4       ¦   °--NUM_CONST: 42 [0/0]                            
#> 5       ¦--'+': + [0/1]                                       
#> 6       °--expr: (-2 - [0/0]                                  
#> 7           ¦--expr: (-2 - [0/1]                              
#> 8           ¦   ¦--'(': ( [0/0]                               
#> 9           ¦   ¦--expr: -2 -  [0/0]                          
#> 10          ¦   ¦   ¦--expr: -2 [0/1]                         
#> 11          ¦   ¦   ¦   ¦--'-': - [0/0]                       
#> 12          ¦   ¦   ¦   °--expr: 2 [0/0]                      
#> 13          ¦   ¦   ¦       °--NUM_CONST: 2 [0/0]             
#> 14          ¦   ¦   ¦--'-': - [0/1]                           
#> 15          ¦   ¦   °--expr: -3 *( [0/0]                      
#> 16          ¦   ¦       ¦--expr: -3 *( [0/0]                  
#> 17          ¦   ¦       ¦   ¦--expr: -3 [0/1]                 
#> 18          ¦   ¦       ¦   ¦   ¦--'-': - [0/0]               
#> 19          ¦   ¦       ¦   ¦   °--expr: 3 [0/0]              
#> 20          ¦   ¦       ¦   ¦       °--NUM_CONST: 3 [0/0]     
#> 21          ¦   ¦       ¦   ¦--'*': * [0/0]                   
#> 22          ¦   ¦       ¦   °--expr: ( 10  [0/0]              
#> 23          ¦   ¦       ¦       ¦--'(': ( [0/1]               
#> 24          ¦   ¦       ¦       ¦--expr: 10 +1 [0/0]          
#> 25          ¦   ¦       ¦       ¦   ¦--expr: 10 [0/1]         
#> 26          ¦   ¦       ¦       ¦   ¦   °--NUM_CONST: 10 [0/0]
#> 27          ¦   ¦       ¦       ¦   ¦--'+': + [0/0]           
#> 28          ¦   ¦       ¦       ¦   °--expr: 1 [0/0]          
#> 29          ¦   ¦       ¦       ¦       °--NUM_CONST: 1 [0/0] 
#> 30          ¦   ¦       ¦       °--')': ) [0/0]               
#> 31          ¦   ¦       ¦--'/': / [0/1]                       
#> 32          ¦   ¦       °--expr: 2 [0/0]                      
#> 33          ¦   ¦           °--NUM_CONST: 2 [0/0]             
#> 34          ¦   °--')': ) [0/0]                               
#> 35          ¦--'/': / [0/1]                                   
#> 36          °--expr: -78 [0/0]                                
#> 37              ¦--'-': - [0/0]                               
#> 38              °--expr: 78 [0/0]                             
#> 39                  °--NUM_CONST: 78 [0/0]

Indeed it seems we could distinguish signs from operators that way

Indention of function calls

See existing examples of multiline function calls from in.R
Add a new "indent" column to the nested parse data
Use the "indent" column for serialization

add start token in nested structure

Simplify needs_indention()

Is it enough to check if there's a newline after opening in needs_indention()?

needs_indention <- function(pd, token = "'('") {
  opening <- which(pd$token %in% token)
  length(opening) > 0 && pd$newlines[opening + 1L]
}

Goal should be to avoid using line1 and line2 in this computation, see #75.

Think about removing child_indents() and perhaps unindent_child().

simplify visitor calls

simplify visitor calls in apply_transformers()

support styling of active editor file

build an RStudio Add-in using rstudioapi to style the active .R file, maybe combine it with the current Add-in.

Use `dpylr::pull()` instead of `.[[1]]`

As introduced in the release notes of dplyr 0.7.0

lintr compatability

This looks like a great project, and one that I'm watching with a view to ensuring it works well with Emacs and ESS.

Just wondering, are you aiming to be consistent with linters, e.g. https://github.com/jimhester/lintr

cc: @jimhester

CI builds keep failing

Since I added pkgload, all builds keep failing. Also see #43. You can see the history here. Travis builds don't fail. I was wondering why. The error is also not always the same. It is either
after trying to install r-pkgs/callr (which should actually be r-lib/callr) or r-pkgs/pkgbuild. Maybe it is due to the wrongly specified dependencies in pkgload with r-ib / r-pkgs, so I created two PRs for
pkgbuild and pkgload to fix the DESCRIPTION file. It might be unrelated though since Travis does not fail.

Verify lossless parse + serialize (without transforms)

A run without any transformers enabled should yield exactly the same results as the input, minus spaces at EOL and conversion of tabs to spaces. This should be tested separately on all input files, for easier isolation of parse/serialize problems from transformer problems during later refactorings.

Reference: #79 (comment)

differentiate between math operators

Allow the user to set spacing separately for

+ and -
/ and *
^

etc.

Add ws info to nested tibble

using the following pattern:

create_filler_nested <- function(x) {
  if (is.null(x)) return()
  x <- x %>% create_filler
  x$child <- Map(create_filler_nested, x$child)
  x %>% select(newlines, spaces, everything())
}

Create new "multiline" attribute

with a "post" visitor: Indicates if the current row (terminal or non-terminal) contains a newline. Compute this flag after adding/removing newlines, use it instead of line1 and line2.

Note that this is only needed if we still need to use line1 and line2.

allow for zero-indention of comments

For breaking code into sections (either manually or with a tool such as strcode), it is desirable that these are not indented. This is not currently supported since all tokens (including comments) will be indented as the current level requires. One way to deal with that would be to set indention of these token to -Inf and using max(0, indention) in serialize_parse_data_nested() although this to be not particularly elegant.

Avoid checking for hard-coded "." in token text

https://github.com/krlmlr/styler/pull/80/files#diff-abef16e00aa68f1dab0e93325dd84b74R3

styler:::create_tree("a %>% b")
#> 1 ROOT (token: short_text [lag_newlines/spaces] {id})
#> 2  °--expr:  [0/0] {7}                               
#> 3      ¦--expr:  [0/1] {3}                           
#> 4      ¦   °--SYMBOL: a [0/0] {1}                    
#> 5      ¦--SPECIAL-PIPE: %>% [0/1] {2}                
#> 6      °--expr:  [0/0] {6}                           
#> 7          °--SYMBOL: b [0/0] {4}
styler:::create_tree("a %>% b()")
#>                                              levelName
#> 1  ROOT (token: short_text [lag_newlines/spaces] {id})
#> 2   °--expr:  [0/0] {11}                              
#> 3       ¦--expr:  [0/1] {3}                           
#> 4       ¦   °--SYMBOL: a [0/0] {1}                    
#> 5       ¦--SPECIAL-PIPE: %>% [0/1] {2}                
#> 6       °--expr:  [0/0] {9}                           
#> 7           ¦--expr:  [0/0] {6}                       
#> 8           ¦   °--SYMBOL_FUNCTION_CALL: b [0/0] {4}  
#> 9           ¦--'(': ( [0/0] {5}                       
#> 10          °--')': ) [0/0] {7}

The first needs to be transformed to the second example, can be detected by working on the level of the nested parse data that contains the SPECIAL-PIPE symbol and looking into the children (one level deep).

comments

There is still one case in which comments are not styled correctly, namely if they appear not on the first position of the line.

library("magrittr")
library("styler")
style_text(c("a <- {", 
  "q        #ijaf", 
  "}       #jkdaf", 
  "          #2")) %>% 
  cat(sep = "\n")
#> a <- {
#>   q        #ijaf
#> }       #jkdaf
#> #2

Also, there should be a space after #, #', or ## and so on.

add tests for top-level styler (style_src() etc.) with various options

tidyeval

I was wondering how we aim to replace standard evaluation functions like select_(~xyz) with the upcoming release of dplyr 0.6.0 and associated. Is it via rlang .data$xzy, i.e. select(.data$xyz)? I think we should aim for consitency and change it in a single commit or PR.
In addition, I think we need to consider that not all packages from the tidyverse support tidyeval yet, e.g. see this PR for tidyr. I could not find out when that will be the case. Hence, the question is when the ideal point of transition is.

@krlmlr maybe you can advise on that.

Prefer dplyr::between() over x %in% start:stop

faster
more robust

Order of transformers

Add/remove newlines
Indention
Add/remove spaces around tokens

Needs #76.