Coder Social home page Coder Social logo

names about inops HOT 58 CLOSED

moodymudskipper avatar moodymudskipper commented on June 1, 2024
names

from inops.

Comments (58)

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

I really like your suggestions about %!in% instead of %out% - as you say it's more generalizable. At least we can probably consider that as decided :)

The package name - I am not too sure about. Personally for me it doesn't make much difference. When I am using a package I just remember it's name. But if we want to have more users then something "catchy" might be favourable.

A list of my suggested names so far:

%in{}%  %!in{}%
%in[]%  %!in[]%
%in()%  %!in()%
%in(]%  %!in(]%
%in[)%  %!in[)%

The names I am less sure about:

%#in{}% %#!in{}%
# and all the rest with %#in% for working on values that occur some number of times.

The names I am even less sure about:

%in{.}%
# for extracting the value itself, I think you used %vin{}% for that?

NOTE: I am not sure if overwriting the default %in% would be a good idea - users might find that after loading the package their older codes break. So my proposal would be to use %in{}% for expanding on %in% ( {} brackets denoting "set" as in math).

As always - just an opinion and comments welcome.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I'm all for a better package name, I just have no better idea.


%in{}% is a bit confusing as a name, {} usually describes sets as you say, and in that case it kind of means "apply".

Also iris %in{}% "setosa" is basically map_dfr(iris, %in%, "setosa") so it seems to me it really makes sense only if it's used a lot. Could you describe some use cases and the added value it has for you?

It also begs the question why we wouldn't have element wise operators for all our other operators, to be consistent, which is a genuine possibility but we need clear naming conventions, like maybe :

  • %in_apply%, %in[]_apply%, ...
  • %apply_in%, %apply_in[]%, ...
  • ...

I don't think we should overload %in% either, did you think I imply we should ?

The package so far is only overloading <<- so foo < bar <- value works, but keeps its original binary use working as before.


re :

%in{.}% for extracting the value itself, I think you used %vin{}% for that?

Yes I did in reddit, and but in this package I used a %subset***% form, so you can have %subset{}% but also %subset>=%.

In this case the {} could make sense as it's really about sets, but we have a lot of special characters already, the possibilities would be :

  • %{in[]}% / %{!in[]}%
  • %in[]{}% / %!in[]{}%`
  • %{[]}% / %{![]}%`
  • %{}[]% / %!{}[]%

These may not be very readable nor easy to type, and there's the question of where the ! would fit that might lead to frustrations, though I think the last one is not that bad and is generalizable to %{}>=% etc.

We might need to keep ideas coming and sleep on them a few times.


I'm fine with a set of # operators, is the main use case you're thinking about to aggregate rare values before modelling ?

Some alternatives starting with your proposal :

  • %#in[]%
  • %#[]%
  • %count[]%
  • %n[]%

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

About %in{}% I think I didn't convey the purpose of it...

In my view %in{}% would be the same as %in%, except would handle all the special cases we add. I am thinking about it this way (left side is notation in math):

x in {"a", "b"}    # %in{}%
x in [a:b]         # %in[]%
x in (a:b)         # %in()%

So x %in{}% A would be "is element x in set A" and would be the same as %in%.

In other words - {} is a variant of interval notation, not a new notation for additional operators. The main purpose is: 1. to consistently specify the type of interval after %in and 2. to not overload %in%.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Regarding the subset names, I think I agree with you about having a separate verb to separate those. But %subset% seems a bit long to type.

How do you feel about these:

letters %get[]% c("a", "c")
letters %sub[]% c("a", "c")

?

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

subset might be long, but base::sub() is used to replace and get doesn't convey the right meaning in my opinion.

about %in{}% I think I get it better now thanks.

In your current package infixer iris %in{}% "setosa" returns something different than iris %in% "setosa", and iris %in[]% "setosa".

If this is one of the additional cases that you mention, then it is so far inconsistent with the other functions so if we keepthis behavior I believe all in functions should return a list/data.frame when applied on a list/data.frame.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

infixer will be deleted soon I think. I never got around to polishing it.

If some functions there do not work properly on matrices - then that's an oversight. The intention was for all those operators to return a matrix output, when the input is a matrix... Like %in{}% did.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

got it!

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Inquiry: how do you feel about removing the "in" part from the function names?

%{}%, %[]%, %()%

%!{}%, %![]%, %!()%

?

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

I am not sure about this myself. I think if we will later add some more functionality that still has the interval symbols, but replaces the "in" word - then in should be left in. But otherwise not sure...

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Also, can we find alternative name for %like%?

%~% and %!~%? Or would be too cryptic?

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I had missed this batch of messages!

I like these short names but I am afraid that removing the in part might make it confusing as it would be similar but different to what the functions do in the package you got inspiration from, was this package prominent ?

I would also have liked these names for the subsetting versions, except then you cannot generalize them for comparison operators.

about renaming %like%, it does need a new name as I mentioned in comments to your commit as to be consistent we need to make it different from data.table::`%like%` , I think %like{}% would be nice, and would be to data.table::`%like%` what %in{}% is to %in%.

%~% does look good and is clever but it's probably been already used, possibly by a prominent package, and I see it as more potentially confusing than something more explicit.

About the necessity of subsetting variants, we can indeed wait to see if we really miss them, I know that I would miss them for the like variant, but for intervals I'll leave it to you because I'm not likely to use those that much tbh :).

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Hmm thinking thinking.

How about %~in{}% ? ~ would specify that we are doing same as %in{}% except with regex matching. The syntax of using ~ I think is quite universal. If I remember correctly - it is used in perl. This would also allow us to include similar notation for selecting elements that occur specified number of times: %#in{}% and %#in[]%.

What do you think?

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I think I like it. I didn't know about ~ being used for regex, if it is then it makes a lot of sense.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Well at least I remember it from my perl days :)

https://perldoc.perl.org/perlretut.html#Part-1%3a-The-basics

We can use that syntax if you like it. I think it would make sense. But maybe a bit cumbersome to write down compared to %like%.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

What do you think about %[...]% syntax for substitute?

So for example: %[in{}]%. Seems similar to long form:

x[x %in{}% c("a", "b")]
x %[in{}]% c("a", "b")

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

It makes sense, if we consider that this is readable enough:

x %[in[]]% c("a", "b")
x %[in(]]% c("a", "b")
x %[in[)]% c("a", "b")
x %[==]% c("a", "b")
x %[>]% c("a", "b")

I like that it's straightforward to generalize and that it looks good with comparison ops.

some alternative i have or had thought about :

x %subset[]% c("a", "b")
x %subset(]% c("a", "b")
x %subset==% c("a", "b")
x %subset>% c("a", "b")

or

x %vin[]% c("a", "b")
x %vin(]% c("a", "b")
x %vin[)% c("a", "b")
x %v==% c("a", "b")
x %v>% c("a", "b")

or

x %value[]% c("a", "b")
x %value(]% c("a", "b")
x %value[)% c("a", "b")
x %value==% c("a", "b")
x %value>% c("a", "b")

or

x %val[]% c("a", "b")
x %val(]% c("a", "b")
x %val[)% c("a", "b")
x %val==% c("a", "b")
x %val>% c("a", "b")

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Suggestion: use %in~% instead of %~in{}% for %like%. I think it's more consistent. The argument after "in" specifies the type of operation, while the argument before "in" should specify the transformation (if any) before doing the operation (like %#in{}% - table before doing %in{}%).

A lot easier to write as well.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

We also have %in''% and %in""% that could be used for something (maybe even regex?)

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

How about %in""% for gsub() and %in''% for gsub(fixed=TRUE) ?

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Still thinking about subsetting... If we will use %#in{}% for tables then I would not use %vin% - as the symbol before the in part would have two distinct meanings.

EDIT: I take that back. %!in{}% is already a second meaning. So maybe we can find alternative for # instead.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

How would we implement gsub ? I thought we were onl wrapping grepl.

I also don't understand the part about second meanings. I think %#!in{}% is a good name.

I like %in~% , its friends would be %!in~%, %[in~]% and %[!in~]%, which all seem fairly readable to me.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Sorry, I meant grep(), not gsub().

%in~% seems nice to me, so agree. Thou in some cases it might be convenient to have case-insensitive variant, or fixed=TRUE variant, don't you think?

Regarding the meanings rigmarole - ignore that for now, still thinking how it all adds up. But the main idea is to define "operators" on the right side of in and modifications of those operators on the left, and be consistent with it. Then if we use v for subset - we would use two modifications in the case of %#vin{}% which might be confusing.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

To elaborate more on this: we have 3 placeholders:

  • rhs: like #.
  • word: like in
  • lhs: like []

It would be nice if everything we add here could fit in these 3, without using two rhs at the same time. Thou we can probably consider !in a word and get away with it. Thou if we later add something like %which{}% or %length{}% would ! still be convenient: %!length{}%?

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

This will be a nice use case for %[<]% : detect potential categorical variables in your dataset :

map_dbl(data, n_distinct) %[<=]% 20

rather than :

counts <- map_dbl(data, n_distinct) 
counts[counts <= 20]

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I've been also thinking about %startsWith% which is like startsWith() but consistent with our other functions when applied on data frames (i.e. returning a matrix of logical). Would come along with %startsWith%<-, %[startsWith]%, %#startsWith%. Same for %endsWith%.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

About variants of %in~%, maybe %in~f% for fixed = TRUE, and %in~p% for perl=TRUE ?

data.table has (in dev version) %plike%, %flike%, and %ilike% to return numeric indices, but f and p have orthogonal uses while i makes sense with any. So data.table would need confusing %iplike% or %pilike% etc to be general, while our approach would be unambiguous and more readable because we'd have %#in~% (wrapped aroud default grepl), %#in~f% (fixed = TRUE), and %#in~p% (perl = TRUE).

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

Ah no I think you want # for counts, not for which().

How much do you need # for counts ? I feel that it's more useful to have a shortcut for which (to get numerical indices), than a shortcut for sum (to count), though maybe we can be creative enough to have both.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

We can probably have both.

Regarding this:

This will be a nice use case for %[<]% : detect potential categorical variables in your dataset :

map_dbl(data, n_distinct) %[<=]% 20

This is actually a potential scenario for #:

data %#<=% 20

But probably will have to be done column-by-column in case of data.frames.

Regarding which - I almost never use it. But I think we can find a way to incorporate it if we want. Just have to think about the syntax. It feels to me like this package is quite simply in terms of functionality, so clever and convenient naming scheme is paramount to have. Worth spending some time thinking about how to name stuff.

With regards to in~ being grep - if we agree with this I can send a pull req changing the %like% to %in~%. Maybe startsWith() and endsWith() can also have a more convenient form? %in^~%, %in$~% or something of that sort?

Also with regards to %[in]% - will we use this for subsetting? Or do you think we can find a nicer alternative.

One additional thought that I want to run by you is - if we use %[in]% to subset, we can probably also add %|in|% for number of elements satisfying the match. i.e.:

if((x %|<|% 0) == 0)
  print("all elements are positive")

Thou maybe it's excessive a bit.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I will think more about startsWith but let's forget it for now, probably I spoke too fast and it's not needed as we could just use "^foo" on the rhs with %in~% (the nuance is that startsWith is fixed, but that might not add that much value).


I was still confused by what you meant by %#in%, now I think I get it, and I wonder if it shouldn't be just %#%, see end of post.

I also wonder if %in~% shouldn't be just %~%, actually Romain François has it implemented in his package operators (though it doesn't comply to our standards regarding data frames) :

https://cran.r-project.org/web/packages/operators/operators.pdf

Romain François has good ideas too, for instance he uses a * suffix to wrap in all, and a + option to wrap in any.

One additional thought that I want to run by you is - if we use %[in]% to subset, we can probably also add %|in|% for number of elements satisfying the match

I like it, and it would make sense if we keep the %[foo]% subsetting.

A spontaneous idea though, would it make sense to have only the right side bracket : %[in% or does it look weird ? we'd be consistent with a [output_type][negation][operation_type][option] , and it wouldn't overcharge the right side as in %[in[]]%.

  • output_type : [ for subsetting, @ for numeric indices, + for counts ?, *none* for logical indices
  • negation : ! or *none*
  • operation_type : in for interval and sets, ~ for regex, ==, > etc for comparison , # to filter on number of occurences
  • option : , [], [) etc for in, f or p for ~, *none*
11:13 %[!=% 12
#> [1] 11 13

11:13 %@!=% 12
#> [1] 1 3

11:13 %+==% 12
#> 1

c(11, 11, 12, 13, 13, 13) %#% 2
#> [1] TRUE TRUE FALSE TRUE TRUE TRUE

c(11, 11, 12, 13, 13, 13) %[#% 2
#> [1] 11 11 13 13 13

These would be paired consistently with [negation][operation_type][option]<- operators.


Additional possible output types:

  • / for ratio ? (Romain François uses it for split, which is nice too)
  • * for all ? (borrowed from Romain)
  • ? for any ? (if we think + is better suited for a sum)
  • [[ for unique subsetting ?
11:13 %/>% 12
#> [1] 0.3333333

11:13 %?>% 12
#> [1] TRUE

11:13 %*>% 12
#> [1] FALSE

c(11, 11, 12, 13, 13, 13) %[[#% 2
#> [1] 11 13

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

I would def agree with %~%. I suggested the same 20 days ago, but at the time you we agreed that it might overshadow some defined operator in a popular package and introduce conflicts.

I like the @ for indices. I think it's quite logical and makes sense.

But the problem with %#% is that in this case we wouldn't be able to do something like "select all factors than occur exactly 3 number of times". It would always be "less then"...

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Overall, I like almost all of your suggestions actually. Maybe not [[ - because it's quite non-intuitive. However with the pattern you are proposing, I think it would make sense even to drop in. Because I can see little justification for only using it with ranges.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Overall I think we can go in several different directions. Two of which I see as the following:

Using the words to specify operations.

  1. %in()% = get logical indices
  2. %get()% for subsetting
  3. Something like %ind<% or %which<% to get numeric indices
  4. %any<% and %all<% for existence/every checks.
  5. %tab<% or %table<% - a possible replacement for #.

etc. Then we should always keep the word (even in %in~% - to be consistent).

Using only symbols to specify operations.

  1. %{}%, %[)%, %!<% etc. for logical indices.
  2. %[<% for subsetting. Thou %[[]% would look a bit weird.
  3. %@{}% etc. for numeric indices.
  4. %?{}% and `%*{}% for existence/every.
  5. %#<% for tables.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

One correction to the above: just now remembered that %tab% is a different beast. We would need to provide all the others like %get%, %ind%, etc for %tab% as well.

But I only have %#in()% and %in#()% for now.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I would def agree with %~%. I suggested the same 20 days ago, but at the time you we agreed that it might overshadow some defined operator in a popular package and introduce conflicts.

oops :). Yes I still think it's a bit annoying, but they do the same most of the time and our package can replace it completely so I think it might be not that bad. %{}% etc are a bit more confusing because they don't do the same, but if our package covers all related functionalities users shouldn't attach them at the same time anyway I guess...

[[ doesn't look good indeed, I was thinking that something for unique would be nice and just picked whatever in the moment.

I thing %get()% is too close to get() which is very different, in that case %subset()% would be worth the two additional characters in my opinion, but it seems you don't like this one much :).

You're right about the 2 possible directions I think, I'll need to sleep on it a few times as I have no strong opinion for now, we could also have aliases while we test the package and see what feels right.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

I'll need to sleep on it a few times as I have no strong opinion for now.

Yup please do! I am leaning a bit more towards the words now, for some reason. I played a bit with both variants and words seem easier to understand/remember and actually to write down (as I type letters faster than symbols). But I am not dead-set on it too much yet.

The %get% can be renamed of course. I would like %subset% but would prefer for it to be shorter if possible. Like %val% (as in "value") or %sub% (as in "subset" - however this would also coincide with sub())

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024
output type words symbols description
logical == / != equality / inequality
logical > / >= / < / <= comparison
logical %in()% / %!in()% %()% / %!()% open interval
logical %in(]% / %!in(]% %(]% / %!(]% open left closed right
logical %in[)% / %!in[)% %[)% / %![)% open right closed left
logical %in[]% / %!in[]% %[]% / %![]% closed interval
logical %in{}% / %!in{}% %{}% / %!{}% generalized %in%
logical %in~% / %!in~% / %in~f% / %!in~f% / %in~p% / %!in~p% %~% / %!~% / %~f% / %!~f% / %~p% / %!~p% regex
subset %subset==% / %subset!=% %[==% / %[!=% equality / inequality
subset %subset>% / %subset>=% / %subset<% / %subset<=% %[>% / %[>=% / %[<% / %[<=% comparison
subset %subset()% / %subset!()% %[()% / %[!()% open interval
subset %subset(]% / %subset!(]% %[(]% / %[!(]% open left closed right
subset %subset[)% / %subset![)% %[[)% / %[![)% open right closed left
subset %subset[]% / %subset![]% %[[]% / %[![]% closed interval
subset %subset{}% / %subset!{}% %[{}% / %[!{}% generalized %in%
subset %subset~% / %!subset~% / %subset~f% / %!subset~f% / %subset~p% / %!subset~p% %[~% / %[!~% / %[~f% / %[!~f% / %[~p% / %[!~p% regex
numeric indices %which==% / which!=% %@==% / %@!=% equality / inequality
numeric indices %which>% / %which>=% / %which<% / %which<=% %>% / %[>=% / %[<% / %[<=% comparison
numeric indices %which()% / %which!()% %@()% / %@!()% open interval
numeric indices %which(]% / %which!(]% %@(]% / %@!(]% open left closed right
numeric indices %which[)% / %which![)% %@[)% / %@![)% open right closed left
numeric indices %which[]% / %which![]% %@[]% / %@![]% closed interval
numeric indices %which{}% / %which!{}% %@{}% / %@!{}% generalized %in%
numeric indices %which~% / %!which~% / %which~f% / %!which~f% / %which~p% / %!which~p% %@~% / %@!~% / %@~f% / %@!~f% / %@~p% / %@!~p% regex
every %all==% / all!=% %*==% / %*!=% equality / inequality
every %all>% / %all>=% / %all<% / %all<=% %*>% / %*>=% / %*<% / %*<=% comparison
every %all()% / %all!()% %*()% / %*!()% open interval
every %all(]% / %all!(]% %*(]% / %*!(]% open left closed right
every %all[)% / %all![)% %*[)% / %*![)% open right closed left
every %all[]% / %all![]% %*[]% / %*![]% closed interval
every %all{}% / %all!{}% %*{}% / %*!{}% generalized %in%
every %all~% / %!all~% / %all~f% / %!all~f% / %all~p% / %!all~p% %*~% / %*!~% / %*~f% / %*!~f% / %*~p% / %*!~p% regex
any %any==% / any!=% %?==% / %?!=% equality / inequality
any %any>% / %any>=% / %any<% / %any<=% %?>% / %?>=% / %?<% / %?<=% comparison
any %any()% / %!any()% %?()% / %?!()% open interval
any %any(]% / %!any(]% %?(]% / %?!(]% open left closed right
any %any[)% / %!any[)% %?[)% / %?![)% open right closed left
any %any[]% / %!any[]% %?[]% / %?![]% closed interval
any %any{}% / %!any{}% %?{}% / %?!{}% generalized %in%
any %any~% / %!any~% / %any~f% / %!any~f% / %any~p% / %!any~p% %?~% / %?!~% / %?~f% / %?!~f% / %?~p% / %?!~p% regex

words :

  • words can be extended
  • easier to type
  • more readable

symbols :

  • shorter and more consistent lengths
  • some of them are really neat and readable, such as %[<% (the least readable ones are not the most used ones)

Ambiguity in word versions due to positioning of ! (before in, after subset), can be mitigated by :

  • using symbol versions for logical only, and word versions for the rest
  • using %in!()% etc (repositioning ! to be consistent)
  • bringing back %out%

We can define aliases so we could have both versions

Maybe all and any versions should be put in the fridge and we'll see if we need them.

My current preference is :

  • only symbols for logical
  • words for the rest OR words AND symbol aliases
  • skip all and any for now

In any case that's a lot of functions and we might need functions factories to build them to avoid huge amount of copy / pastes (will need its own issue)

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Agree that we might need a function factory. Thou we can probably also "export" main functionality to functions non-accessible for the user.

As for the words - the long versions of words do not bother you? %subset% for example seems quite a long name. None of the shorter versions of this were appealing?

We can leave the "all" and "any" out for now. Thou personally for my case I would use these more compared to "which". I almost never use indices for subsetting, try to stay with logical.

Nice table by the way. The only thing I would change is - make the placement of ! consistent. I don't mind having it always at the front actually. Or always at the back (after the word).

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I don't care that much for which either, it's just that I think they'll be straightforward while all and any might entail more discussions because !all[] is not like all![], so which do we need, we could have !all![] etc, and I think they won't be used that much so by lack of familiarity we'll end up wrapping in all anyway (just as we might do with any and which).

We can skip which as well, to move forward.

I don't like having all ! in the front because negating a subset doesn't make sense to me. This issue is made clear in the case of all as in my 1st paragraph above, !all[] and all![] mean different things and !subset[] means nothing.

I propose we go with words with ! at the back (incl %in![]% etc), and we go with %val**% to subset. Meanwhile we keep short aliases at least until the end of our test run, then we decide what we choose or if we keep aliases.

I propose we stick with the logican , subset, and replace variants, the latter being assignment versions of logical ones.

I believe the # / table versions were important to you, I didn't include them here because the expected outputs are still not clear to me, so I propose to let you experiment with them and code them as you see fit if you want, PR them and we can discuss them later.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

I agree with everything you wrote here. Let's keep which and any and all out for now. I like val better than subset. And I agree on placing ! at the end after the word.

One thing I would like to hear your opinion on: would it not be more logical to have the replace operator work on ind (subset) instead of in? Somehow this would make more sense to me - as the operator is extracting elements, so it call also replace them. I am thinking names(x) and names(x)<- or x[2] and x[2] <- 0. i.e. replacement works on syntax that returns elements otherwise.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

The thing is if we're trying to be consistent with ==, and we want to have x == 3 <- 4 , this needs to be done on logical functions.

I see it as simpler too, because x %in{}% foo <- value is a shortcut for x[x %in{}% foo] <- value

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

while we're on names I wanted to touch a point on argument names too, as we're moving to more consistency with ==, it might make sense to switch back to e1 and e2 (=='s arguments).

x and table are weird arguments, I know I was the one to push for them, but I wanted to be consistent with %in% and I don't think it's the right choice anymore :).

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Here is one more syntax format we might consider (if bringing back out):

output type mixed description
logical equality / inequality
logical comparison
logical %in()% / %out()% open interval
logical %in(]% / %out(]% open left closed right
logical %in[)% / %out[)% open right closed left
logical %in[]% / %out[]% closed interval
logical %in{}% / %out{}% generalized %in%
logical %in~% / %out~% / %in~f% / %out~f% / %in~p% / %out~p% regex
subset %[==% / %[!=% equality / inequality
subset %[>% / %[>=% / %[<% / %[<=% comparison
subset %[in()% / %[out()% open interval
subset %[in(]% / %[out(]% open left closed right
subset %[in[)% / %[out[)% open right closed left
subset %[in[]% / %[out[]% closed interval
subset %[in{}% / %[out{}% generalized %in%
subset %[in~% / %[out~% / %[in~f% / %[out~f% / %[in~p% / %[out~p% regex
numeric indices %@in==% / @out=% equality / inequality
numeric indices %@in>% / %@in>=% / %@in<% / %@in<=% comparison
numeric indices %@in()% / %@out()% open interval
numeric indices %@in(]% / %@out(]% open left closed right
numeric indices %@in[)% / %@out[)% open right closed left
numeric indices %@in[]% / %@out[]% closed interval
numeric indices %@in{}% / %@out{}% generalized %in%
numeric indices %@in~% / %@out~% / %@in~f% / %@out~f% / %@in~p% / %@out~p% regex
every %*in==% / *out=% equality / inequality
every %*in>% / %*in>=% / %*in<% / %*in<=% comparison
every %*in()% / %*out open interval
every %*in(]% / %*out open left closed right
every %*in[)% / %*out open right closed left
every %*in[]% / %*out closed interval
every %*in{}% / %*out generalized %in%
every %*in~% / %*out~% / %*in~f% / %*out~f% / %*in~p% / %*out~p% regex
any %?in==% / ?out=% equality / inequality
any %?in>% / %?in>=% / %?in<% / %?in<=% comparison
any %?in()% / %?out()% open interval
any %?in(]% / %?out(]% open left closed right
any %?in[)% / %?out[)% open right closed left
any %?in[]% / %?out[]% closed interval
any %?in{}% / %?out{}% generalized %in%
any %?in~% / %?out~% / %?in~f% / %?out~f% / %?in~p% / %?out~p% regex

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Also - do you think we will try to extend this to include things like %cut%? Knowing this before hand might help choosing the appropriate naming conventions.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

what would %cut% do ? I think I'm comfortable to use functions for cut as there are so many ways to do it, I even wrote package just for that :). https://github.com/moodymudskipper/cutr

The syntax with %out% looks good here, its only issue is that it can't be easily extended as every functionality is linked to a symbol, but we're most probably fine with what we have here anyway, and I'm ok with moving on with it.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Yup I am aware of the package. Already complimented it when you shown it on reddit :) Why are you not putting your packages on CRAN? Do you think they are not ready?

As for the suggestion/question cut could basically apply the replace operators multiple times. Like:

x %in[)% c(1,10) <- "low"
x %in[)% c(10,20) <- "medium"
x %in[)% c(20,30) <- "high"

Would be something like:

x %cut[)% list(low=c(1,10), medium=c(10,20), high=c(20,30))

The nice thing is that it would play nicely with all the ranges that we have (even with in~, etc). And would provide a consistent way between subsetting, checking, and "cutting" into groups.

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

As mentioned in #4 - we can probably drop %@in%, %?in%, %*in%, etc. In the code they would be more confusing than simply using a proper function. i.e.:

if(any(x %in()% c(0, 10))) { ... }

vs

if(x %?in{}% c(0, 10)) { ... }

I would probably choose the first one for clarity.

If that is the case - we are only left with two naming problems:

  1. do we use ! or out

  2. how to name the subset operator?

Some possibilities:

%[in()%
%[in()]%
%val()% # in this case - how to negate?
%IN()%
%subset()%

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I have dotdot on CRAN (a simple 5 lines function to grow variables without repetition) and will put unglue (text extraction) there as soon as I correct a couple bugs as it had some unexpected twitter success.

My other CRAN candidates are :

  • cutr and safejoin (a wrapper on dplyr joins) for which I sometimes feel I've bitten a bit more than I could chew and still have existential doubts on some aspects of the interface.
  • pipes (a magrittr fork) which is ready to go but has redundant functionalities with tags which I prefer to use, and I feel like it's a bit weird to advertise a package I don't use that much
  • tag and tags (adverb factories with a special syntax) which I feel are my best work but need more unit tests and was a big flop on reddit, and failed to attract much attention whenever I suggested it on twitter, so I have to find an angle to explain it simply and advertise its value... but I will definitely put them, I use those all the time.
  • doubt (pseudo operators with ?'s precedence) which still needs a lot of work but I think people will like it, or be curious at least.
  • maybe withDT (data table for shy users), pbfor (use progress bars with regular for) and mmdb (assign and query tables of a database as if it was an environment) but I'm not sure if users will care that much.

The problem is that I always start new stuff and then get overwhelmed, take a break, and come back with a new idea. And I don't have sparring partners, it's the first time here :).

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

Isn't this redundant ?

x %cut[)% list(low=c(1,10), medium=c(10,20), high=c(20,30))

What would the following do, and what do we have below 1 and above 30, NAs ?

x %cut[)% list(low=c(1,5), medium=c(10,20), high=c(20,30))

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Regarding cut:

I do not think this is redundant, because doing it line by line might not be possible. I.E. first replace of x %in(]% c(1, 10) <- "low" would transform x into character. Unless we expand the ranges to work on multiple ranges at once, then cut would probably be redundant.

Regarding the left out ranges:

yup NA would be proper value for this case

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Regarding packages:

Yup I see, you have quite a few. In your case I would probably zoom in on ones I would like to work on and support, and drop the rest, or leave them on GitHub. I've seen tags but have to admit - I did not quite get the purpose of it :/ Maybe spend too little time reading the docs. Out of all the listed ones - I like cutr most. Maybe because that's the one I see using myself.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

Re :

%val()% # in this case - how to negate?

We'd need %val!()%, in that case, I don't see another way.

But as far as I'm concerned I say we go with your latest "mixed" proposal, my objections to %out% are not valid anymore because :

link with %in% is explicit => we don't rely on %in% much in the end
analogy with != => valid only if we switch position to the right : %in!()% which doesn't read as well
generalizable, e.g. %!subset[]% => makes no sense as I've argued, with %!all()% / %all!()% example, and the [output_type][negation][operation_type][option] format, which makes the most sense, wouldn't be respected.

And I'd like keep pure symbol aliases for the test run, as I feel they're growing on me and I might end up liking those better.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

I'm not convinced by the cut variant, I think if we can afford to type list and a couple parentheses, we can call the function directly, getting around the fact that infix operators re binary by feeding them a rhs in a list seems wrong to me. You can still code it in there and we see where it goes but I don't think it's good design.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

about renaming arguments to e1 and e2 you're ok ? I'm ok with more explicit names too, for example :

  • x and set for %in{}%
  • x and range for %in[]%
  • x and pattern for %in~%

but e1 / e2 is easy and consistent with == etc, it doesn't change much anyway

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

You're right about packages, it's my goal to finish more stuff and start less :), thanks for the advice! now marking all of these as off topic! ;)

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024
  • 1- RE negation names: OK let's keep in and out, I quite like this. You can try pure symbol aliases as well of course, if we simplify everything they might be very convenient. But how would you do subset with only symbols? %[()%?

  • 2- RE e1 and e2: I am fine with e1 and e2 but in this case we would probably have to drop range() from %in()% ?

  • 3- RE cut: I think it would be a good idea. For example - right now we cannot do overlaps with several ranges x %in()% list(c(-Inf,10), c(14,16), c(20,Inf)). If we allow this, then replacement for this type of construct is just next step. And with this we would have all we need. So basically there would be no cut variant, but simple cut operations would be performed by replacing. It doesn't have to be a list, for example DescTools package puts multiple ranges in rows of matrices, so it does rbind(). But I think list is more convenient. In any case, if we will consider this - it would be after first CRAN release I think.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024
  • 1 - yes

  • 2- oops it's embarassing but I was under the impression that every operator had x and table argument. I was not thinking straight, I still think that table is a strange argument name but overall current situation is fine with me.

  • 3- x %in()% list(c(-Inf,10), c(14,16), c(20,Inf)) could make sense, it would be a breaking change though, unless we make it fail for now for lists that can't be coerced to numeric (now range() will unlist and take the min and max). I stay open minded about your cut proposal :), just not convinced yet!

from inops.

karoliskoncevicius avatar karoliskoncevicius commented on June 1, 2024

Regarding 2: %in% uses table as an argument name, so I got it from there...

Regarding 3: making it fail on lists and maybe extending the functionality later might be the correct approach I think. This is something that we should definitely leave of after the first submission to CRAN. But I just think it provides quite a convenient way to "cut" data into parts without actually thinking about this as "cutting". And we could use the same style with x %in~% list(pattern1, pattern2) <- c("a", "b") which is natural with this syntax, but out of scope for cut.

from inops.

moodymudskipper avatar moodymudskipper commented on June 1, 2024

We go with the naming conventions of the table above, mixed form is used, symbols-only form is kept for aliases.

closing and pinning, reboot in new thread after 1st release if necessary.

from inops.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.