Coder Social home page Coder Social logo

Comments (5)

davidanthoff avatar davidanthoff commented on June 24, 2024

The whole point of having a data type for missing values is to have something that is not in base so that we can iterate faster. But if this data type is not in base, then a filter in a generator can't special case it, so there is no way to get filter expressions in a generator to drop NA rows. So there already we would end up with a discrepancy between generators and queries, which I want to avoid at all cost.

I also just generally think that you make a system a LOT more complicated the moment you introduce 3VL. That aspect of SQL is not a strength, it makes things much more complicated.

I'd also like to point out that C# had this behavior for over a decade, and there is almost no complaining about it (unlike the SQL story).

Finally, I think there are a gazillion examples of how people can write functions where they didn't properly deal with all the input combinations. This is one more case, I don't see why this is any different than say a function that forgot to deal with negative numbers or something like that.

from datavalues.jl.

davidanthoff avatar davidanthoff commented on June 24, 2024

@ti-s My current plan is this: have == and all the other comparison operators return Bool. And then add lifting support via the dot syntax, so that .== returns a Nullable{Bool}.

from datavalues.jl.

ti-s avatar ti-s commented on June 24, 2024

I have never used C# but one difference could be that in Julia, it is common to type function signatures as loosely as possible and rely on duck typing to catch errors. If that is not the case in C#, it might happen more often in Julia that one gets silently wrong results when passing a DataValue to a function from a completely unrelated package that was written without missing values in mind.

Here is an example from base that gives unexpected results with DataValues (it uses x!=x to test for NaN but that does not work for NA):

julia> const NA = DataValue()
DataValue{Union{}}()

julia> findmax([NA, 1, 2, 3])
(DataValue{Int64}(), 1)

julia> findmax([1, 2, 3, NA])
(DataValue{Int64}(3), 3)

It would be safer to have comparisons return a Nullable{Bool} (or DataValue{Bool}) and dotted operators return a Bool. Then, conditionals would not silently give wrong results if the author didn't anticipate the use of DataValues. Also, it would need only one additional dot to fix the cases where the fallback to false is correct.

This is also different to SQL, which uses 3VL but silently interprets Unknown as falsein conditionals.

from datavalues.jl.

davidanthoff avatar davidanthoff commented on June 24, 2024

Yeah, things can go wrong with this. But I almost feel that in this example findmax should be changed, it seems to rely on a pretty specific implementation detail of floating point numbers in an algorithm that is supposed to work with all sorts of types.

The core problem is that the convention in base now is that the lifted version of a function that you get with a . will return a datatype that can represent missingness (i.e. Nullable in base). To change that convention for one operator in this package here seems really confusing and inconsistent.

This is also different to SQL, which uses 3VL but silently interprets Unknown as false in conditionals.

I know, but I don't like that one bit, I think it makes things way too difficult. I also don't see any chance that base will adopt this convention for things like if etc..

from datavalues.jl.

ti-s avatar ti-s commented on June 24, 2024

I agree that findmax needs to change. But my point is that these kind of errors might be more frequent in Julia than in C# because in Julia functions often rely on duck typing (I don't know if that is common in C#).

To clarify, I don't propose to copy SQL. I mentioned the difference to SQL as a potential advantage of my proposal.

An alternative could be to have comparisons with NA return DataValue(false) instead of NA or false, thereby avoiding 3VL and still having the safety of error messages in boolean contexts. I would like to see an example where this would be simpler than 3VL.

Not being able to use dotted operators because of the inconsistency with Nullable is unfortunate. The need to wrap every expression with potential NAs into a function in boolean contexts could be a bit annoying. But personally I would prefer the increased safety over the loss of convenience.

from datavalues.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.