Comments (58)
I really like your suggestions about %!in%
instead of %out%
- as you say it's more generalizable. At least we can probably consider that as decided :)
The package name - I am not too sure about. Personally for me it doesn't make much difference. When I am using a package I just remember it's name. But if we want to have more users then something "catchy" might be favourable.
A list of my suggested names so far:
%in{}% %!in{}%
%in[]% %!in[]%
%in()% %!in()%
%in(]% %!in(]%
%in[)% %!in[)%
The names I am less sure about:
%#in{}% %#!in{}%
# and all the rest with %#in% for working on values that occur some number of times.
The names I am even less sure about:
%in{.}%
# for extracting the value itself, I think you used %vin{}% for that?
NOTE: I am not sure if overwriting the default %in%
would be a good idea - users might find that after loading the package their older codes break. So my proposal would be to use %in{}%
for expanding on %in%
( {}
brackets denoting "set" as in math).
As always - just an opinion and comments welcome.
from inops.
I'm all for a better package name, I just have no better idea.
%in{}%
is a bit confusing as a name, {}
usually describes sets as you say, and in that case it kind of means "apply".
Also iris %in{}% "setosa"
is basically map_dfr(iris,
%in%, "setosa")
so it seems to me it really makes sense only if it's used a lot. Could you describe some use cases and the added value it has for you?
It also begs the question why we wouldn't have element wise operators for all our other operators, to be consistent, which is a genuine possibility but we need clear naming conventions, like maybe :
%in_apply%
,%in[]_apply%
, ...%apply_in%
,%apply_in[]%
, ...- ...
I don't think we should overload %in%
either, did you think I imply we should ?
The package so far is only overloading <<-
so foo < bar <- value
works, but keeps its original binary use working as before.
re :
%in{.}% for extracting the value itself, I think you used %vin{}% for that?
Yes I did in reddit, and but in this package I used a %subset***%
form, so you can have %subset{}%
but also %subset>=%
.
In this case the {}
could make sense as it's really about sets, but we have a lot of special characters already, the possibilities would be :
%{in[]}%
/%{!in[]}%
%in[]{}%
/ %!in[]{}%`%{[]}%
/ %{![]}%`%{}[]%
/%!{}[]%
These may not be very readable nor easy to type, and there's the question of where the !
would fit that might lead to frustrations, though I think the last one is not that bad and is generalizable to %{}>=%
etc.
We might need to keep ideas coming and sleep on them a few times.
I'm fine with a set of #
operators, is the main use case you're thinking about to aggregate rare values before modelling ?
Some alternatives starting with your proposal :
%#in[]%
%#[]%
%count[]%
%n[]%
from inops.
About %in{}%
I think I didn't convey the purpose of it...
In my view %in{}%
would be the same as %in%
, except would handle all the special cases we add. I am thinking about it this way (left side is notation in math):
x in {"a", "b"} # %in{}%
x in [a:b] # %in[]%
x in (a:b) # %in()%
So x %in{}% A
would be "is element x in set A" and would be the same as %in%
.
In other words - {}
is a variant of interval notation, not a new notation for additional operators. The main purpose is: 1. to consistently specify the type of interval after %in
and 2. to not overload %in%
.
from inops.
Regarding the subset names, I think I agree with you about having a separate verb to separate those. But %subset%
seems a bit long to type.
How do you feel about these:
letters %get[]% c("a", "c")
letters %sub[]% c("a", "c")
?
from inops.
subset
might be long, but base::sub()
is used to replace and get
doesn't convey the right meaning in my opinion.
about %in{}%
I think I get it better now thanks.
In your current package infixer iris %in{}% "setosa"
returns something different than iris %in% "setosa"
, and iris %in[]% "setosa"
.
If this is one of the additional cases that you mention, then it is so far inconsistent with the other functions so if we keepthis behavior I believe all in
functions should return a list/data.frame when applied on a list/data.frame.
from inops.
infixer
will be deleted soon I think. I never got around to polishing it.
If some functions there do not work properly on matrices - then that's an oversight. The intention was for all those operators to return a matrix output, when the input is a matrix... Like %in{}%
did.
from inops.
got it!
from inops.
Inquiry: how do you feel about removing the "in" part from the function names?
%{}%
, %[]%
, %()%
%!{}%
, %![]%
, %!()%
?
from inops.
I am not sure about this myself. I think if we will later add some more functionality that still has the interval symbols, but replaces the "in" word - then in should be left in. But otherwise not sure...
from inops.
Also, can we find alternative name for %like%
?
%~%
and %!~%
? Or would be too cryptic?
from inops.
I had missed this batch of messages!
I like these short names but I am afraid that removing the in
part might make it confusing as it would be similar but different to what the functions do in the package you got inspiration from, was this package prominent ?
I would also have liked these names for the subsetting versions, except then you cannot generalize them for comparison operators.
about renaming %like%
, it does need a new name as I mentioned in comments to your commit as to be consistent we need to make it different from data.table::`%like%`
, I think %like{}%
would be nice, and would be to data.table::`%like%`
what %in{}%
is to %in%
.
%~%
does look good and is clever but it's probably been already used, possibly by a prominent package, and I see it as more potentially confusing than something more explicit.
About the necessity of subsetting variants, we can indeed wait to see if we really miss them, I know that I would miss them for the like
variant, but for intervals I'll leave it to you because I'm not likely to use those that much tbh :).
from inops.
Hmm thinking thinking.
How about %~in{}%
? ~
would specify that we are doing same as %in{}%
except with regex matching. The syntax of using ~
I think is quite universal. If I remember correctly - it is used in perl. This would also allow us to include similar notation for selecting elements that occur specified number of times: %#in{}%
and %#in[]%
.
What do you think?
from inops.
I think I like it. I didn't know about ~
being used for regex, if it is then it makes a lot of sense.
from inops.
Well at least I remember it from my perl days :)
https://perldoc.perl.org/perlretut.html#Part-1%3a-The-basics
We can use that syntax if you like it. I think it would make sense. But maybe a bit cumbersome to write down compared to %like%
.
from inops.
What do you think about %[...]%
syntax for substitute?
So for example: %[in{}]%
. Seems similar to long form:
x[x %in{}% c("a", "b")]
x %[in{}]% c("a", "b")
from inops.
It makes sense, if we consider that this is readable enough:
x %[in[]]% c("a", "b")
x %[in(]]% c("a", "b")
x %[in[)]% c("a", "b")
x %[==]% c("a", "b")
x %[>]% c("a", "b")
I like that it's straightforward to generalize and that it looks good with comparison ops.
some alternative i have or had thought about :
x %subset[]% c("a", "b")
x %subset(]% c("a", "b")
x %subset==% c("a", "b")
x %subset>% c("a", "b")
or
x %vin[]% c("a", "b")
x %vin(]% c("a", "b")
x %vin[)% c("a", "b")
x %v==% c("a", "b")
x %v>% c("a", "b")
or
x %value[]% c("a", "b")
x %value(]% c("a", "b")
x %value[)% c("a", "b")
x %value==% c("a", "b")
x %value>% c("a", "b")
or
x %val[]% c("a", "b")
x %val(]% c("a", "b")
x %val[)% c("a", "b")
x %val==% c("a", "b")
x %val>% c("a", "b")
from inops.
Suggestion: use %in~%
instead of %~in{}%
for %like%
. I think it's more consistent. The argument after "in" specifies the type of operation, while the argument before "in" should specify the transformation (if any) before doing the operation (like %#in{}%
- table before doing %in{}%
).
A lot easier to write as well.
from inops.
We also have %in''%
and %in""%
that could be used for something (maybe even regex?)
from inops.
How about %in""%
for gsub()
and %in''%
for gsub(fixed=TRUE)
?
from inops.
Still thinking about subsetting... If we will use %#in{}%
for tables then I would not use %vin%
- as the symbol before the in
part would have two distinct meanings.
EDIT: I take that back. %!in{}%
is already a second meaning. So maybe we can find alternative for #
instead.
from inops.
How would we implement gsub
? I thought we were onl wrapping grepl
.
I also don't understand the part about second meanings. I think %#!in{}%
is a good name.
I like %in~%
, its friends would be %!in~%
, %[in~]%
and %[!in~]%
, which all seem fairly readable to me.
from inops.
Sorry, I meant grep()
, not gsub()
.
%in~%
seems nice to me, so agree. Thou in some cases it might be convenient to have case-insensitive variant, or fixed=TRUE
variant, don't you think?
Regarding the meanings rigmarole - ignore that for now, still thinking how it all adds up. But the main idea is to define "operators" on the right side of in
and modifications of those operators on the left, and be consistent with it. Then if we use v
for subset - we would use two modifications in the case of %#vin{}%
which might be confusing.
from inops.
To elaborate more on this: we have 3 placeholders:
- rhs: like
#
. - word: like
in
- lhs: like
[]
It would be nice if everything we add here could fit in these 3, without using two rhs at the same time. Thou we can probably consider !in
a word and get away with it. Thou if we later add something like %which{}%
or %length{}%
would !
still be convenient: %!length{}%
?
from inops.
This will be a nice use case for %[<]%
: detect potential categorical variables in your dataset :
map_dbl(data, n_distinct) %[<=]% 20
rather than :
counts <- map_dbl(data, n_distinct)
counts[counts <= 20]
from inops.
I've been also thinking about %startsWith%
which is like startsWith()
but consistent with our other functions when applied on data frames (i.e. returning a matrix of logical). Would come along with %startsWith%<-
, %[startsWith]%
, %#startsWith%
. Same for %endsWith%
.
from inops.
About variants of %in~%
, maybe %in~f%
for fixed = TRUE
, and %in~p%
for perl=TRUE
?
data.table has (in dev version) %plike%
, %flike%
, and %ilike%
to return numeric indices, but f
and p
have orthogonal uses while i
makes sense with any. So data.table would need confusing %iplike%
or %pilike%
etc to be general, while our approach would be unambiguous and more readable because we'd have %#in~%
(wrapped aroud default grepl
), %#in~f%
(fixed = TRUE
), and %#in~p%
(perl = TRUE
).
from inops.
Ah no I think you want #
for counts, not for which()
.
How much do you need #
for counts ? I feel that it's more useful to have a shortcut for which
(to get numerical indices), than a shortcut for sum
(to count), though maybe we can be creative enough to have both.
from inops.
We can probably have both.
Regarding this:
This will be a nice use case for %[<]% : detect potential categorical variables in your dataset :
map_dbl(data, n_distinct) %[<=]% 20
This is actually a potential scenario for #
:
data %#<=% 20
But probably will have to be done column-by-column in case of data.frames.
Regarding which
- I almost never use it. But I think we can find a way to incorporate it if we want. Just have to think about the syntax. It feels to me like this package is quite simply in terms of functionality, so clever and convenient naming scheme is paramount to have. Worth spending some time thinking about how to name stuff.
With regards to in~
being grep - if we agree with this I can send a pull req changing the %like%
to %in~%
. Maybe startsWith()
and endsWith()
can also have a more convenient form? %in^~%
, %in$~%
or something of that sort?
Also with regards to %[in]%
- will we use this for subsetting? Or do you think we can find a nicer alternative.
One additional thought that I want to run by you is - if we use %[in]%
to subset, we can probably also add %|in|%
for number of elements satisfying the match. i.e.:
if((x %|<|% 0) == 0)
print("all elements are positive")
Thou maybe it's excessive a bit.
from inops.
I will think more about startsWith
but let's forget it for now, probably I spoke too fast and it's not needed as we could just use "^foo" on the rhs with %in~%
(the nuance is that startsWith
is fixed, but that might not add that much value).
I was still confused by what you meant by %#in%
, now I think I get it, and I wonder if it shouldn't be just %#%
, see end of post.
I also wonder if %in~%
shouldn't be just %~%
, actually Romain François has it implemented in his package operators (though it doesn't comply to our standards regarding data frames) :
https://cran.r-project.org/web/packages/operators/operators.pdf
Romain François has good ideas too, for instance he uses a *
suffix to wrap in all
, and a +
option to wrap in any
.
One additional thought that I want to run by you is - if we use %[in]% to subset, we can probably also add %|in|% for number of elements satisfying the match
I like it, and it would make sense if we keep the %[foo]%
subsetting.
A spontaneous idea though, would it make sense to have only the right side bracket : %[in%
or does it look weird ? we'd be consistent with a [output_type][negation][operation_type][option]
, and it wouldn't overcharge the right side as in %[in[]]%
.
- output_type :
[
for subsetting,@
for numeric indices,+
for counts ?,*none*
for logical indices - negation :
!
or*none*
- operation_type :
in
for interval and sets,~
for regex,==
,>
etc for comparison ,#
to filter on number of occurences - option : ,
[]
,[)
etc forin
,f
orp
for~
,*none*
11:13 %[!=% 12
#> [1] 11 13
11:13 %@!=% 12
#> [1] 1 3
11:13 %+==% 12
#> 1
c(11, 11, 12, 13, 13, 13) %#% 2
#> [1] TRUE TRUE FALSE TRUE TRUE TRUE
c(11, 11, 12, 13, 13, 13) %[#% 2
#> [1] 11 11 13 13 13
These would be paired consistently with [negation][operation_type][option]<-
operators.
Additional possible output types:
/
for ratio ? (Romain François uses it for split, which is nice too)*
for all ? (borrowed from Romain)?
for any ? (if we think+
is better suited for a sum)[[
for unique subsetting ?
11:13 %/>% 12
#> [1] 0.3333333
11:13 %?>% 12
#> [1] TRUE
11:13 %*>% 12
#> [1] FALSE
c(11, 11, 12, 13, 13, 13) %[[#% 2
#> [1] 11 13
from inops.
I would def agree with %~%
. I suggested the same 20 days ago, but at the time you we agreed that it might overshadow some defined operator in a popular package and introduce conflicts.
I like the @
for indices. I think it's quite logical and makes sense.
But the problem with %#%
is that in this case we wouldn't be able to do something like "select all factors than occur exactly 3 number of times". It would always be "less then"...
from inops.
Overall, I like almost all of your suggestions actually. Maybe not [[
- because it's quite non-intuitive. However with the pattern you are proposing, I think it would make sense even to drop in
. Because I can see little justification for only using it with ranges.
from inops.
Overall I think we can go in several different directions. Two of which I see as the following:
Using the words to specify operations.
%in()%
= get logical indices%get()%
for subsetting- Something like
%ind<%
or%which<%
to get numeric indices %any<%
and%all<%
for existence/every checks.%tab<%
or%table<%
- a possible replacement for#
.
etc. Then we should always keep the word (even in %in~%
- to be consistent).
Using only symbols to specify operations.
%{}%
,%[)%
,%!<%
etc. for logical indices.%[<%
for subsetting. Thou%[[]%
would look a bit weird.%@{}%
etc. for numeric indices.%?{}%
and `%*{}% for existence/every.%#<%
for tables.
from inops.
One correction to the above: just now remembered that %tab%
is a different beast. We would need to provide all the others like %get%
, %ind%
, etc for %tab%
as well.
But I only have %#in()%
and %in#()%
for now.
from inops.
I would def agree with %~%. I suggested the same 20 days ago, but at the time you we agreed that it might overshadow some defined operator in a popular package and introduce conflicts.
oops :). Yes I still think it's a bit annoying, but they do the same most of the time and our package can replace it completely so I think it might be not that bad. %{}%
etc are a bit more confusing because they don't do the same, but if our package covers all related functionalities users shouldn't attach them at the same time anyway I guess...
[[
doesn't look good indeed, I was thinking that something for unique
would be nice and just picked whatever in the moment.
I thing %get()%
is too close to get()
which is very different, in that case %subset()%
would be worth the two additional characters in my opinion, but it seems you don't like this one much :).
You're right about the 2 possible directions I think, I'll need to sleep on it a few times as I have no strong opinion for now, we could also have aliases while we test the package and see what feels right.
from inops.
I'll need to sleep on it a few times as I have no strong opinion for now.
Yup please do! I am leaning a bit more towards the words now, for some reason. I played a bit with both variants and words seem easier to understand/remember and actually to write down (as I type letters faster than symbols). But I am not dead-set on it too much yet.
The %get%
can be renamed of course. I would like %subset%
but would prefer for it to be shorter if possible. Like %val%
(as in "value") or %sub%
(as in "subset" - however this would also coincide with sub()
)
from inops.
output type | words | symbols | description |
---|---|---|---|
logical | == / != |
equality / inequality | |
logical | > / >= / < / <= |
comparison | |
logical | %in()% / %!in()% |
%()% / %!()% |
open interval |
logical | %in(]% / %!in(]% |
%(]% / %!(]% |
open left closed right |
logical | %in[)% / %!in[)% |
%[)% / %![)% |
open right closed left |
logical | %in[]% / %!in[]% |
%[]% / %![]% |
closed interval |
logical | %in{}% / %!in{}% |
%{}% / %!{}% |
generalized %in% |
logical | %in~% / %!in~% / %in~f% / %!in~f% / %in~p% / %!in~p% |
%~% / %!~% / %~f% / %!~f% / %~p% / %!~p% |
regex |
subset | %subset==% / %subset!=% |
%[==% / %[!=% |
equality / inequality |
subset | %subset>% / %subset>=% / %subset<% / %subset<=% |
%[>% / %[>=% / %[<% / %[<=% |
comparison |
subset | %subset()% / %subset!()% |
%[()% / %[!()% |
open interval |
subset | %subset(]% / %subset!(]% |
%[(]% / %[!(]% |
open left closed right |
subset | %subset[)% / %subset![)% |
%[[)% / %[![)% |
open right closed left |
subset | %subset[]% / %subset![]% |
%[[]% / %[![]% |
closed interval |
subset | %subset{}% / %subset!{}% |
%[{}% / %[!{}% |
generalized %in% |
subset | %subset~% / %!subset~% / %subset~f% / %!subset~f% / %subset~p% / %!subset~p% |
%[~% / %[!~% / %[~f% / %[!~f% / %[~p% / %[!~p% |
regex |
numeric indices | %which==% / which!=% |
%@==% / %@!=% |
equality / inequality |
numeric indices | %which>% / %which>=% / %which<% / %which<=% |
%>% / %[>=% / %[<% / %[<=% |
comparison |
numeric indices | %which()% / %which!()% |
%@()% / %@!()% |
open interval |
numeric indices | %which(]% / %which!(]% |
%@(]% / %@!(]% |
open left closed right |
numeric indices | %which[)% / %which![)% |
%@[)% / %@![)% |
open right closed left |
numeric indices | %which[]% / %which![]% |
%@[]% / %@![]% |
closed interval |
numeric indices | %which{}% / %which!{}% |
%@{}% / %@!{}% |
generalized %in% |
numeric indices | %which~% / %!which~% / %which~f% / %!which~f% / %which~p% / %!which~p% |
%@~% / %@!~% / %@~f% / %@!~f% / %@~p% / %@!~p% |
regex |
every | %all==% / all!=% |
%*==% / %*!=% |
equality / inequality |
every | %all>% / %all>=% / %all<% / %all<=% |
%*>% / %*>=% / %*<% / %*<=% |
comparison |
every | %all()% / %all!()% |
%*()% / %*!()% |
open interval |
every | %all(]% / %all!(]% |
%*(]% / %*!(]% |
open left closed right |
every | %all[)% / %all![)% |
%*[)% / %*![)% |
open right closed left |
every | %all[]% / %all![]% |
%*[]% / %*![]% |
closed interval |
every | %all{}% / %all!{}% |
%*{}% / %*!{}% |
generalized %in% |
every | %all~% / %!all~% / %all~f% / %!all~f% / %all~p% / %!all~p% |
%*~% / %*!~% / %*~f% / %*!~f% / %*~p% / %*!~p% |
regex |
any | %any==% / any!=% |
%?==% / %?!=% |
equality / inequality |
any | %any>% / %any>=% / %any<% / %any<=% |
%?>% / %?>=% / %?<% / %?<=% |
comparison |
any | %any()% / %!any()% |
%?()% / %?!()% |
open interval |
any | %any(]% / %!any(]% |
%?(]% / %?!(]% |
open left closed right |
any | %any[)% / %!any[)% |
%?[)% / %?![)% |
open right closed left |
any | %any[]% / %!any[]% |
%?[]% / %?![]% |
closed interval |
any | %any{}% / %!any{}% |
%?{}% / %?!{}% |
generalized %in% |
any | %any~% / %!any~% / %any~f% / %!any~f% / %any~p% / %!any~p% |
%?~% / %?!~% / %?~f% / %?!~f% / %?~p% / %?!~p% |
regex |
words :
- words can be extended
- easier to type
- more readable
symbols :
- shorter and more consistent lengths
- some of them are really neat and readable, such as
%[<%
(the least readable ones are not the most used ones)
Ambiguity in word versions due to positioning of !
(before in
, after subset
), can be mitigated by :
- using symbol versions for logical only, and word versions for the rest
- using
%in!()%
etc (repositioning!
to be consistent) - bringing back
%out%
We can define aliases so we could have both versions
Maybe all
and any
versions should be put in the fridge and we'll see if we need them.
My current preference is :
- only symbols for logical
- words for the rest OR words AND symbol aliases
- skip
all
andany
for now
In any case that's a lot of functions and we might need functions factories to build them to avoid huge amount of copy / pastes (will need its own issue)
from inops.
Agree that we might need a function factory. Thou we can probably also "export" main functionality to functions non-accessible for the user.
As for the words - the long versions of words do not bother you? %subset% for example seems quite a long name. None of the shorter versions of this were appealing?
We can leave the "all" and "any" out for now. Thou personally for my case I would use these more compared to "which". I almost never use indices for subsetting, try to stay with logical.
Nice table by the way. The only thing I would change is - make the placement of !
consistent. I don't mind having it always at the front actually. Or always at the back (after the word).
from inops.
I don't care that much for which
either, it's just that I think they'll be straightforward while all
and any
might entail more discussions because !all[]
is not like all![]
, so which do we need, we could have !all![]
etc, and I think they won't be used that much so by lack of familiarity we'll end up wrapping in all
anyway (just as we might do with any
and which
).
We can skip which
as well, to move forward.
I don't like having all !
in the front because negating a subset doesn't make sense to me. This issue is made clear in the case of all
as in my 1st paragraph above, !all[]
and all![]
mean different things and !subset[]
means nothing.
I propose we go with words with ! at the back (incl %in![]%
etc), and we go with %val**%
to subset. Meanwhile we keep short aliases at least until the end of our test run, then we decide what we choose or if we keep aliases.
I propose we stick with the logican , subset, and replace variants, the latter being assignment versions of logical ones.
I believe the #
/ table
versions were important to you, I didn't include them here because the expected outputs are still not clear to me, so I propose to let you experiment with them and code them as you see fit if you want, PR them and we can discuss them later.
from inops.
I agree with everything you wrote here. Let's keep which and any and all out for now. I like val
better than subset
. And I agree on placing !
at the end after the word.
One thing I would like to hear your opinion on: would it not be more logical to have the replace operator work on ind
(subset) instead of in
? Somehow this would make more sense to me - as the operator is extracting elements, so it call also replace them. I am thinking names(x)
and names(x)<-
or x[2]
and x[2] <- 0
. i.e. replacement works on syntax that returns elements otherwise.
from inops.
The thing is if we're trying to be consistent with ==
, and we want to have x == 3 <- 4
, this needs to be done on logical functions.
I see it as simpler too, because x %in{}% foo <- value
is a shortcut for x[x %in{}% foo] <- value
from inops.
while we're on names I wanted to touch a point on argument names too, as we're moving to more consistency with ==
, it might make sense to switch back to e1
and e2
(==
's arguments).
x
and table
are weird arguments, I know I was the one to push for them, but I wanted to be consistent with %in%
and I don't think it's the right choice anymore :).
from inops.
Here is one more syntax format we might consider (if bringing back out
):
output type | mixed | description |
---|---|---|
logical | equality / inequality | |
logical | comparison | |
logical | %in()% / %out()% |
open interval |
logical | %in(]% / %out(]% |
open left closed right |
logical | %in[)% / %out[)% |
open right closed left |
logical | %in[]% / %out[]% |
closed interval |
logical | %in{}% / %out{}% |
generalized %in% |
logical | %in~% / %out~% / %in~f% / %out~f% / %in~p% / %out~p% |
regex |
subset | %[==% / %[!=% |
equality / inequality |
subset | %[>% / %[>=% / %[<% / %[<=% |
comparison |
subset | %[in()% / %[out()% |
open interval |
subset | %[in(]% / %[out(]% |
open left closed right |
subset | %[in[)% / %[out[)% |
open right closed left |
subset | %[in[]% / %[out[]% |
closed interval |
subset | %[in{}% / %[out{}% |
generalized %in% |
subset | %[in~% / %[out~% / %[in~f% / %[out~f% / %[in~p% / %[out~p% |
regex |
numeric indices | %@in==% / @out=% |
equality / inequality |
numeric indices | %@in>% / %@in>=% / %@in<% / %@in<=% |
comparison |
numeric indices | %@in()% / %@out()% |
open interval |
numeric indices | %@in(]% / %@out(]% |
open left closed right |
numeric indices | %@in[)% / %@out[)% |
open right closed left |
numeric indices | %@in[]% / %@out[]% |
closed interval |
numeric indices | %@in{}% / %@out{}% |
generalized %in% |
numeric indices | %@in~% / %@out~% / %@in~f% / %@out~f% / %@in~p% / %@out~p% |
regex |
every | %*in==% / *out=% |
equality / inequality |
every | %*in>% / %*in>=% / %*in<% / %*in<=% |
comparison |
every | %*in()% / %*out |
open interval |
every | %*in(]% / %*out |
open left closed right |
every | %*in[)% / %*out |
open right closed left |
every | %*in[]% / %*out |
closed interval |
every | %*in{}% / %*out |
generalized %in% |
every | %*in~% / %*out~% / %*in~f% / %*out~f% / %*in~p% / %*out~p% |
regex |
any | %?in==% / ?out=% |
equality / inequality |
any | %?in>% / %?in>=% / %?in<% / %?in<=% |
comparison |
any | %?in()% / %?out()% |
open interval |
any | %?in(]% / %?out(]% |
open left closed right |
any | %?in[)% / %?out[)% |
open right closed left |
any | %?in[]% / %?out[]% |
closed interval |
any | %?in{}% / %?out{}% |
generalized %in% |
any | %?in~% / %?out~% / %?in~f% / %?out~f% / %?in~p% / %?out~p% |
regex |
from inops.
Also - do you think we will try to extend this to include things like %cut%
? Knowing this before hand might help choosing the appropriate naming conventions.
from inops.
what would %cut%
do ? I think I'm comfortable to use functions for cut
as there are so many ways to do it, I even wrote package just for that :). https://github.com/moodymudskipper/cutr
The syntax with %out%
looks good here, its only issue is that it can't be easily extended as every functionality is linked to a symbol, but we're most probably fine with what we have here anyway, and I'm ok with moving on with it.
from inops.
Yup I am aware of the package. Already complimented it when you shown it on reddit :) Why are you not putting your packages on CRAN? Do you think they are not ready?
As for the suggestion/question cut
could basically apply the replace operators multiple times. Like:
x %in[)% c(1,10) <- "low"
x %in[)% c(10,20) <- "medium"
x %in[)% c(20,30) <- "high"
Would be something like:
x %cut[)% list(low=c(1,10), medium=c(10,20), high=c(20,30))
The nice thing is that it would play nicely with all the ranges that we have (even with in~
, etc). And would provide a consistent way between subsetting, checking, and "cutting" into groups.
from inops.
As mentioned in #4 - we can probably drop %@in%
, %?in%
, %*in%
, etc. In the code they would be more confusing than simply using a proper function. i.e.:
if(any(x %in()% c(0, 10))) { ... }
vs
if(x %?in{}% c(0, 10)) { ... }
I would probably choose the first one for clarity.
If that is the case - we are only left with two naming problems:
-
do we use
!
orout
-
how to name the subset operator?
Some possibilities:
%[in()%
%[in()]%
%val()% # in this case - how to negate?
%IN()%
%subset()%
from inops.
I have dotdot on CRAN (a simple 5 lines function to grow variables without repetition) and will put unglue (text extraction) there as soon as I correct a couple bugs as it had some unexpected twitter success.
My other CRAN candidates are :
- cutr and safejoin (a wrapper on dplyr joins) for which I sometimes feel I've bitten a bit more than I could chew and still have existential doubts on some aspects of the interface.
- pipes (a magrittr fork) which is ready to go but has redundant functionalities with tags which I prefer to use, and I feel like it's a bit weird to advertise a package I don't use that much
- tag and tags (adverb factories with a special syntax) which I feel are my best work but need more unit tests and was a big flop on reddit, and failed to attract much attention whenever I suggested it on twitter, so I have to find an angle to explain it simply and advertise its value... but I will definitely put them, I use those all the time.
- doubt (pseudo operators with
?
's precedence) which still needs a lot of work but I think people will like it, or be curious at least. - maybe withDT (data table for shy users), pbfor (use progress bars with regular
for
) and mmdb (assign and query tables of a database as if it was an environment) but I'm not sure if users will care that much.
The problem is that I always start new stuff and then get overwhelmed, take a break, and come back with a new idea. And I don't have sparring partners, it's the first time here :).
from inops.
Isn't this redundant ?
x %cut[)% list(low=c(1,10), medium=c(10,20), high=c(20,30))
What would the following do, and what do we have below 1 and above 30, NAs ?
x %cut[)% list(low=c(1,5), medium=c(10,20), high=c(20,30))
from inops.
Regarding cut:
I do not think this is redundant, because doing it line by line might not be possible. I.E. first replace of x %in(]% c(1, 10) <- "low"
would transform x into character. Unless we expand the ranges to work on multiple ranges at once, then cut
would probably be redundant.
Regarding the left out ranges:
yup NA would be proper value for this case
from inops.
Regarding packages:
Yup I see, you have quite a few. In your case I would probably zoom in on ones I would like to work on and support, and drop the rest, or leave them on GitHub. I've seen tags
but have to admit - I did not quite get the purpose of it :/ Maybe spend too little time reading the docs. Out of all the listed ones - I like cutr
most. Maybe because that's the one I see using myself.
from inops.
Re :
%val()% # in this case - how to negate?
We'd need %val!()%
, in that case, I don't see another way.
But as far as I'm concerned I say we go with your latest "mixed" proposal, my objections to %out%
are not valid anymore because :
link with %in% is explicit => we don't rely on
%in%
much in the end
analogy with!=
=> valid only if we switch position to the right :%in!()%
which doesn't read as well
generalizable, e.g. %!subset[]% => makes no sense as I've argued, with%!all()%
/%all!()%
example, and the[output_type][negation][operation_type][option]
format, which makes the most sense, wouldn't be respected.
And I'd like keep pure symbol aliases for the test run, as I feel they're growing on me and I might end up liking those better.
from inops.
I'm not convinced by the cut
variant, I think if we can afford to type list
and a couple parentheses, we can call the function directly, getting around the fact that infix operators re binary by feeding them a rhs in a list seems wrong to me. You can still code it in there and we see where it goes but I don't think it's good design.
from inops.
about renaming arguments to e1
and e2
you're ok ? I'm ok with more explicit names too, for example :
x
andset
for%in{}%
x
andrange
for%in[]%
x
andpattern
for%in~%
but e1
/ e2
is easy and consistent with ==
etc, it doesn't change much anyway
from inops.
You're right about packages, it's my goal to finish more stuff and start less :), thanks for the advice! now marking all of these as off topic! ;)
from inops.
-
1- RE negation names: OK let's keep
in
andout
, I quite like this. You can try pure symbol aliases as well of course, if we simplify everything they might be very convenient. But how would you do subset with only symbols?%[()%
? -
2- RE
e1
ande2
: I am fine withe1
ande2
but in this case we would probably have to droprange()
from%in()%
? -
3- RE cut: I think it would be a good idea. For example - right now we cannot do overlaps with several ranges
x %in()% list(c(-Inf,10), c(14,16), c(20,Inf))
. If we allow this, then replacement for this type of construct is just next step. And with this we would have all we need. So basically there would be nocut
variant, but simplecut
operations would be performed by replacing. It doesn't have to be a list, for exampleDescTools
package puts multiple ranges in rows of matrices, so it doesrbind()
. But I think list is more convenient. In any case, if we will consider this - it would be after first CRAN release I think.
from inops.
-
1 - yes
-
2- oops it's embarassing but I was under the impression that every operator had
x
andtable
argument. I was not thinking straight, I still think thattable
is a strange argument name but overall current situation is fine with me. -
3-
x %in()% list(c(-Inf,10), c(14,16), c(20,Inf))
could make sense, it would be a breaking change though, unless we make it fail for now for lists that can't be coerced to numeric (nowrange()
will unlist and take the min and max). I stay open minded about your cut proposal :), just not convinced yet!
from inops.
Regarding 2: %in%
uses table
as an argument name, so I got it from there...
Regarding 3: making it fail on lists and maybe extending the functionality later might be the correct approach I think. This is something that we should definitely leave of after the first submission to CRAN. But I just think it provides quite a convenient way to "cut" data into parts without actually thinking about this as "cutting". And we could use the same style with x %in~% list(pattern1, pattern2) <- c("a", "b")
which is natural with this syntax, but out of scope for cut
.
from inops.
We go with the naming conventions of the table above, mixed form is used, symbols-only form is kept for aliases.
closing and pinning, reboot in new thread after 1st release if necessary.
from inops.
Related Issues (20)
- First CRAN release HOT 85
- README HOT 8
- consistency between in_check ops and equality/comparison ops ? HOT 8
- simplify replace ops HOT 2
- package name HOT 7
- `%in%<-`, `%out%`, `%out%<-` HOT 1
- `%in{}%` on (lists of) language objects HOT 1
- regex ops don't have the same consistency to == as other ops HOT 1
- inconsistent way of dealing with factors in `%in{}%` HOT 3
- Improve error "NAs are not allowed in subscripted assignments" in replacement functions HOT 5
- Case for replacement acting as `ifelse()` ? HOT 4
- `%#in%` family HOT 18
- add example `NA %in{}% NA` HOT 5
- CRAN issues HOT 25
- dealing with NAs HOT 6
- conflicted doesn't like inops HOT 14
- The following object is masked from ‘package:base’: <<-
- More fame and glory HOT 1
- multiple replacements HOT 5
- Operator for selecting quantile range?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from inops.