r-lib / styler Goto Github PK
View Code? Open in Web Editor NEWNon-invasive pretty printing of R code
Home Page: https://styler.r-lib.org
License: Other
Non-invasive pretty printing of R code
Home Page: https://styler.r-lib.org
License: Other
currently doesn't work. parse()
returns a list of expressions, maybe we're looking only at the first element? Don't remember.
Currently, styler adds indention for every opening bracket, which is the first problem.
library(magrittr)
test <- c("(((", "1", ")))")
styler::style_text(test) %>% cat(sep = "\n")
#> (((
#> 1
#> )))
Instead of just
#> (((
#> 1
#> )))
The tree looks as follows:
styler:::compute_parse_data_nested(test) %>% styler:::visit(c(styler:::create_filler)) %>%
styler:::create_node_from_nested_root()
#> levelName
#> 1 xxx
#> 2 °--expr: [0/0]
#> 3 ¦--'(': ( [0/0]
#> 4 ¦--expr: [0/0]
#> 5 ¦ ¦--'(': ( [0/0]
#> 6 ¦ ¦--expr: [0/0]
#> 7 ¦ ¦ ¦--'(': ( [1/0]
#> 8 ¦ ¦ ¦--expr: [1/0]
#> 9 ¦ ¦ ¦ °--NUM_CONST: 1 [0/0]
#> 10 ¦ ¦ °--')': ) [0/0]
#> 11 ¦ °--')': ) [0/0]
#> 12 °--')': ) [0/0]
Second, it cannot "undo" indention when we put multiple closing brackets on one line, i.e. the last three brackets should not be indented, but this depends on what comes after the first bracket, and styler does not "look into the future" when serialising. That's why it gets it right when the brackets are on different lines.
(((
1
)
)
)
Rstudio only adds one level of indention per line, no matter how many opening brackets are there.
((((((((
1
))))))))
If the closing parenthesis are not on the same line, it un-indents all of them anyways:
(((((
1
)
)
)
)
)
I think the way to go is to remove indention whenever there is a child that also has indention and starts on the same line. I had it the other way round just now (see here) which means only the first parenthesis has indention, all the subsequent "inner" parenthesis' indention was removed, but then the problem remains that the closing brackets are still indented by one level.
The solution I suggest would involve setting the indention after the visitor, having to go through all levels of nesting again. Alternatively, we could try to integrate it with the visitor we already have but I think that's a bad idea.
like style_text()
et al.
As found in #79 (comment), it may be possible to call create_filler()
just once before nesting, which may improve performance.
The Introduction to R states in section 2.4 that you should not use the abbreviations T
and F
for TRUE
and FALSE
since T
and F
are just variables that are set to TRUE
and FALSE
by default but can be overwritten by the user.
The tidyverse style guide, does not explicitly discourage the use of T
and F
, but always uses TRUE
and FALSE
as the logical values.
As of 00f0d96, styler does not convert the values T
and F
to TRUE
and FALSE
. Since T
and F
might have been assigned other values than the default, such a conversion would be highly problematic.
On the other hand, currently, T
and F
are just left as is, without message to the user, which is probably also not the thing we would expect from styler.
To conclude, checking for the usage of T
and F
and return a warning might be a reasonable way to go, so the user can resolve the problem manually.
Consider using
in.R
and out.R
This issue originates in #21
At it’s core, a style guide is a collection of transformer functions that fulfill certain formal requirements.
It would be nice if styler could support style guides defined by third parties.
In the following, I lay out a potential mechanism for how such a third-party support could look like. This were my first thoughts on the topic. Maybe there are better approaches to it.
Styler could provide a function to import style guides and save them in the local package directory of styler. People who want to create style guides could provide the transformer functions for example in a GitHub repository. The style guide that should be used to format a package’s source code (and potentially further options) could then be specified in a file in the root directory of that package (e.g. in styler.yml). Styler would then use the the information from this .yml file, source the corresponding .R files from the styler package directory and apply them (and potentially further options) to the source code to be formatted. The formal requirements of style guides had to be defined (i.e. before it can be imported). Potential requirements: Style guide must not yield an error when applied to a test source code. Transformer functions must have follow certain naming conventions, must be documented ect. Provide detailed instructions how to create / test style guides etc.
Since we are now about to make changes to the top-level functions and we have already a release out there, I suggest to submit all PRs against devel, so we can have a working branch. @krlmlr what do you think?
Something similar to formatR where the desired line length can be user-specified.
styler does not correctly style code that contains comments:
styler::style_text(c("#' comment", " 1+ 1"))
#> [1] "1 + 1"
It may have to to with tic, but I think our releases are not "real" releases, but rather just annotated tags with NEWS as the description, at least they don't look the same as releases from other repos (like dplyr). You can also not access any release information via gh::gh("/repos/krlmlr/releases")
via the gh package for R.
Use data.tree package.
Each node should contain token
, short
, spaces
and newlines
.
#' @import data.tree
create_node_from_nested_root <- function(pd_nested) {
n <- Node$new("xxx")
create_node_from_nested(pd_nested, n)
n
}
#' @importFrom purrr map2
create_node_from_nested <- function(pd_nested, parent) {
if (is.null(pd_nested))
return()
node_info <-
pd_nested %>%
transmute(formatted = paste0(token, ": ", short, " [", newlines, "/", spaces, "]")) %>%
.[["formatted"]]
child_nodes <-
node_info %>%
map(parent$AddChild)
map2(pd_nested$child, child_nodes, create_node_from_nested)
return()
}
Line breaks should be forced by styler
{
}
(except if followed by else
))
when the expression spans over multiple lines (see here)At the moment, for the parse table returned by utils::getParseData and
processed through various functions, we use many different names in different
functions for essentially this (flat) table:
I think it would simplify things to just use one variable name throughout. I
suggest parse_data. We could use parse_data_flt
for the flat version and and
parse_data_nstd
for the nested version for situations where the processing
differs.
As far as I understand, the terminology around ws
is more of a legacy thing, i.e. in the functions compute_parse_data_flat_with_ws
or add_ws_to_parse_data
since it is not really just white space information that is affected by these functions, but rather a bunch of preprocessing Hence, I would suggest to rename the latter to pre_process_parse_data
and the former to compute_parse_data_flt
for now.
to a file. This could make it easier to see what's going on internally, for cases where the formatted output is different from what we expect.
This might sound evil but I think it could be funny to have a function that - instead of making the code clean - corrupts it, i.e. to randomly permute the positions of the tokens before serialising the parse data.
Currently, style_src()
only supports formatting all .R files in a directory. However, I think it is reasonable to support styling single files, i.e. with a function that takes a character vectors of file paths as input.
What do you @krlmlr think? If we want to implement that, should it be a separate function? I think then it would be appropriate to change the name of style_src()
to style_dir()
and create a function style_file()
or something like that.
styer indents two spaces for each level:
call(
call(
call(3
)
)
)
However, we generally don't want two spaces. If the first argument of the indenting call follows on the same line, we want to indent depending on the length of the indenting token (at least RStudio does that):
very_long_call_here(1
second_argument,
third_argument
)
So far, all rules in R/rules.R operate on tokens, i.e. on this vector that contains tokens.
op_token <- c(
"'+'", "'-'", "'*'", "'/'", "'^'", "AND", "AND2", "EQ", "EQ_ASSIGN",
"GE", "GT", "LE", "LEFT_ASSIGN", "LT", "NE", "OR", "OR2", "RIGHT_ASSIGN",
"SPECIAL", "EQ_SUB", "ELSE")
For my PR #35, I was using the argument op_token
in the function indent_op()
in order to be consistent with previous work. It allows specifying all tokens that trigger indention. However, I think we should rather use text than token for two reasons:
>=
) than spending time on figuring out what "LE", "GT" etc. stand for.op_token
is declared as "SPECIAL". Hence, we can only define indention for all text elements corresponding to "SPECIAL".How should we proceed?
The following could make styler run faster:
select()
and arrange()
statements.needs_indention()
microbenchmark::microbenchmark(dplyr = dplyr::mutate(mtcars, driver = 3),
baseR = mtcars$driver <- 3)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> dplyr 1969.341 2009.056 3740.7995 2037.379 2091.4205 152117.420 100 b
#> baseR 11.663 13.735 24.1628 20.456 22.9275 433.822 100 a
microbenchmark::microbenchmark(dplyr = dplyr::slice(mtcars, 1:3),
baseR = mtcars[1:3, ])
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> dplyr 2094.290 2148.480 2446.0150 2191.8545 2271.761 8700.220 100 b
#> baseR 138.072 155.203 176.9532 165.3705 189.216 431.295 100 a
# pipe does probably not slow down things much.
library(magrittr)
microbenchmark::microbenchmark(dplyr = mtcars %>% summary(),
baseR = summary(mtcars))
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> dplyr 4.550347 4.619500 5.089398 4.695070 4.833018 9.004815 100 a
#> baseR 4.390407 4.497096 4.924098 4.548395 4.833101 8.284952 100 a
These are just some thoughts, profiling of our code is probably a better place to start anyways...
Styler does not generally remove spaces before commas because there is no rule that does this:
library("styler")
library("magrittr")
style_text("c( 1, 16 , 333 , 33 , 1)") %>%
cat(sep = "\n")
#> c(1, 16 , 333 , 33 , 1)
in
, like in for the fixme in tests/testthat/transformer_grouping/scope_none-out.R.Comments will currently not be moved to the end of a line if we change line breaks.
a <- function(x) #
{
x + 1
}
becomes
a <- function(x) # {
x + 1
}
Which is really not what we want 😩. I think we talked about this case @krlmlr, but I just forgot about it.
# Good
pd <- pd %>%
indent_round(pd, indent_by = 2) %>%
mutate(child = map(child, indent_round_nested)
# Bad
pd <- indent_round(pd, indent_by = 2)
pd$child <- map(pd$child, indent_round_nested)
Maybe this can even be added to the tidyverse style guide
Only three steps, but used frequently.
Reference: #79 (comment)
It would be nice if styler could not only format plain .R files, but also .Rmd files.
This could probably by implemented by changing transform_files
, i.e. adding a function transform_lines_rmd_chunk
. Main steps are:
read read_lines_enc
^´´´\\{r.*\\} *$
and ^``` *$
write_lines_enc
@krlmlr maybe this functionality fits even in the utf8
package?
. , ? !
ect.)The second point might be out of the scope of the package though...
I am not sure how we can distinguish signs of numbers (e.g. -
in -2
) from addition and subtraction.
Take this example:
42 + -2 / 78
.
The tree structure for it looks as follows:
library("magrittr")
styler:::compute_parse_data_nested("42 + -2 / 78") %>% styler:::create_filler_nested() %>%
styler:::create_node_from_nested_root()
#> levelName
#> 1 xxx
#> 2 °--expr: 42 + [0/0]
#> 3 ¦--expr: 42 [0/1]
#> 4 ¦ °--NUM_CONST: 42 [0/0]
#> 5 ¦--'+': + [0/1]
#> 6 °--expr: -2 / [0/0]
#> 7 ¦--expr: -2 [0/1]
#> 8 ¦ ¦--'-': - [0/0]
#> 9 ¦ °--expr: 2 [0/0]
#> 10 ¦ °--NUM_CONST: 2 [0/0]
#> 11 ¦--'/': / [0/1]
#> 12 °--expr: 78 [0/0]
#> 13 °--NUM_CONST: 78 [0/0]
It looks as if +/- are located at the top of a parse table, they are signs, otherwise, they are operators for subtraction / addition.
Maybe we can look at a more complicated example.
library("magrittr")
styler:::compute_parse_data_nested("42 + (-2 - -3 *( 10 +1)/ 2) / -78") %>%
styler:::create_filler_nested() %>% styler:::create_node_from_nested_root()
#> levelName
#> 1 xxx
#> 2 °--expr: 42 + [0/0]
#> 3 ¦--expr: 42 [0/1]
#> 4 ¦ °--NUM_CONST: 42 [0/0]
#> 5 ¦--'+': + [0/1]
#> 6 °--expr: (-2 - [0/0]
#> 7 ¦--expr: (-2 - [0/1]
#> 8 ¦ ¦--'(': ( [0/0]
#> 9 ¦ ¦--expr: -2 - [0/0]
#> 10 ¦ ¦ ¦--expr: -2 [0/1]
#> 11 ¦ ¦ ¦ ¦--'-': - [0/0]
#> 12 ¦ ¦ ¦ °--expr: 2 [0/0]
#> 13 ¦ ¦ ¦ °--NUM_CONST: 2 [0/0]
#> 14 ¦ ¦ ¦--'-': - [0/1]
#> 15 ¦ ¦ °--expr: -3 *( [0/0]
#> 16 ¦ ¦ ¦--expr: -3 *( [0/0]
#> 17 ¦ ¦ ¦ ¦--expr: -3 [0/1]
#> 18 ¦ ¦ ¦ ¦ ¦--'-': - [0/0]
#> 19 ¦ ¦ ¦ ¦ °--expr: 3 [0/0]
#> 20 ¦ ¦ ¦ ¦ °--NUM_CONST: 3 [0/0]
#> 21 ¦ ¦ ¦ ¦--'*': * [0/0]
#> 22 ¦ ¦ ¦ °--expr: ( 10 [0/0]
#> 23 ¦ ¦ ¦ ¦--'(': ( [0/1]
#> 24 ¦ ¦ ¦ ¦--expr: 10 +1 [0/0]
#> 25 ¦ ¦ ¦ ¦ ¦--expr: 10 [0/1]
#> 26 ¦ ¦ ¦ ¦ ¦ °--NUM_CONST: 10 [0/0]
#> 27 ¦ ¦ ¦ ¦ ¦--'+': + [0/0]
#> 28 ¦ ¦ ¦ ¦ °--expr: 1 [0/0]
#> 29 ¦ ¦ ¦ ¦ °--NUM_CONST: 1 [0/0]
#> 30 ¦ ¦ ¦ °--')': ) [0/0]
#> 31 ¦ ¦ ¦--'/': / [0/1]
#> 32 ¦ ¦ °--expr: 2 [0/0]
#> 33 ¦ ¦ °--NUM_CONST: 2 [0/0]
#> 34 ¦ °--')': ) [0/0]
#> 35 ¦--'/': / [0/1]
#> 36 °--expr: -78 [0/0]
#> 37 ¦--'-': - [0/0]
#> 38 °--expr: 78 [0/0]
#> 39 °--NUM_CONST: 78 [0/0]
Indeed it seems we could distinguish signs from operators that way
in.R
"indent"
column to the nested parse data"indent"
column for serializationIs it enough to check if there's a newline after opening
in needs_indention()
?
needs_indention <- function(pd, token = "'('") {
opening <- which(pd$token %in% token)
length(opening) > 0 && pd$newlines[opening + 1L]
}
Goal should be to avoid using line1
and line2
in this computation, see #75.
Think about removing child_indents()
and perhaps unindent_child()
.
apply_transformers()
build an RStudio Add-in using rstudioapi
to style the active .R file, maybe combine it with the current Add-in.
As introduced in the release notes of dplyr 0.7.0
This looks like a great project, and one that I'm watching with a view to ensuring it works well with Emacs and ESS.
Just wondering, are you aiming to be consistent with linters, e.g. https://github.com/jimhester/lintr
cc: @jimhester
Since I added pkgload, all builds keep failing. Also see #43. You can see the history here. Travis builds don't fail. I was wondering why. The error is also not always the same. It is either
after trying to install r-pkgs/callr
(which should actually be r-lib/callr
) or r-pkgs/pkgbuild
. Maybe it is due to the wrongly specified dependencies in pkgload with r-ib / r-pkgs, so I created two PRs for
pkgbuild and pkgload to fix the DESCRIPTION file. It might be unrelated though since Travis does not fail.
A run without any transformers enabled should yield exactly the same results as the input, minus spaces at EOL and conversion of tabs to spaces. This should be tested separately on all input files, for easier isolation of parse/serialize problems from transformer problems during later refactorings.
Reference: #79 (comment)
Allow the user to set spacing separately for
etc.
using the following pattern:
create_filler_nested <- function(x) {
if (is.null(x)) return()
x <- x %>% create_filler
x$child <- Map(create_filler_nested, x$child)
x %>% select(newlines, spaces, everything())
}
with a "post" visitor: Indicates if the current row (terminal or non-terminal) contains a newline. Compute this flag after adding/removing newlines, use it instead of line1
and line2
.
Note that this is only needed if we still need to use line1
and line2
.
For breaking code into sections (either manually or with a tool such as strcode), it is desirable that these are not indented. This is not currently supported since all tokens (including comments) will be indented as the current level requires. One way to deal with that would be to set indention of these token to -Inf
and using max(0, indention)
in serialize_parse_data_nested()
although this to be not particularly elegant.
https://github.com/krlmlr/styler/pull/80/files#diff-abef16e00aa68f1dab0e93325dd84b74R3
styler:::create_tree("a %>% b")
#> 1 ROOT (token: short_text [lag_newlines/spaces] {id})
#> 2 °--expr: [0/0] {7}
#> 3 ¦--expr: [0/1] {3}
#> 4 ¦ °--SYMBOL: a [0/0] {1}
#> 5 ¦--SPECIAL-PIPE: %>% [0/1] {2}
#> 6 °--expr: [0/0] {6}
#> 7 °--SYMBOL: b [0/0] {4}
styler:::create_tree("a %>% b()")
#> levelName
#> 1 ROOT (token: short_text [lag_newlines/spaces] {id})
#> 2 °--expr: [0/0] {11}
#> 3 ¦--expr: [0/1] {3}
#> 4 ¦ °--SYMBOL: a [0/0] {1}
#> 5 ¦--SPECIAL-PIPE: %>% [0/1] {2}
#> 6 °--expr: [0/0] {9}
#> 7 ¦--expr: [0/0] {6}
#> 8 ¦ °--SYMBOL_FUNCTION_CALL: b [0/0] {4}
#> 9 ¦--'(': ( [0/0] {5}
#> 10 °--')': ) [0/0] {7}
The first needs to be transformed to the second example, can be detected by working on the level of the nested parse data that contains the SPECIAL-PIPE
symbol and looking into the children (one level deep).
There is still one case in which comments are not styled correctly, namely if they appear not on the first position of the line.
library("magrittr")
library("styler")
style_text(c("a <- {",
"q #ijaf",
"} #jkdaf",
" #2")) %>%
cat(sep = "\n")
#> a <- {
#> q #ijaf
#> } #jkdaf
#> #2
Also, there should be a space after #
, #'
, or ##
and so on.
select_(~xyz)
with the upcoming release of dplyr 0.6.0 and associated. Is it via rlang .data$xzy
, i.e. select(.data$xyz)
? I think we should aim for consitency and change it in a single commit or PR.@krlmlr maybe you can advise on that.
Needs #76.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.