Comments (5)
The whole point of having a data type for missing values is to have something that is not in base so that we can iterate faster. But if this data type is not in base, then a filter in a generator can't special case it, so there is no way to get filter expressions in a generator to drop NA
rows. So there already we would end up with a discrepancy between generators and queries, which I want to avoid at all cost.
I also just generally think that you make a system a LOT more complicated the moment you introduce 3VL. That aspect of SQL is not a strength, it makes things much more complicated.
I'd also like to point out that C# had this behavior for over a decade, and there is almost no complaining about it (unlike the SQL story).
Finally, I think there are a gazillion examples of how people can write functions where they didn't properly deal with all the input combinations. This is one more case, I don't see why this is any different than say a function that forgot to deal with negative numbers or something like that.
from datavalues.jl.
@ti-s My current plan is this: have ==
and all the other comparison operators return Bool
. And then add lifting support via the dot syntax, so that .==
returns a Nullable{Bool}
.
from datavalues.jl.
I have never used C# but one difference could be that in Julia, it is common to type function signatures as loosely as possible and rely on duck typing to catch errors. If that is not the case in C#, it might happen more often in Julia that one gets silently wrong results when passing a DataValue
to a function from a completely unrelated package that was written without missing values in mind.
Here is an example from base that gives unexpected results with DataValues
(it uses x!=x
to test for NaN
but that does not work for NA
):
julia> const NA = DataValue()
DataValue{Union{}}()
julia> findmax([NA, 1, 2, 3])
(DataValue{Int64}(), 1)
julia> findmax([1, 2, 3, NA])
(DataValue{Int64}(3), 3)
It would be safer to have comparisons return a Nullable{Bool}
(or DataValue{Bool}
) and dotted operators return a Bool
. Then, conditionals would not silently give wrong results if the author didn't anticipate the use of DataValues
. Also, it would need only one additional dot to fix the cases where the fallback to false
is correct.
This is also different to SQL, which uses 3VL but silently interprets Unknown
as false
in conditionals.
from datavalues.jl.
Yeah, things can go wrong with this. But I almost feel that in this example findmax
should be changed, it seems to rely on a pretty specific implementation detail of floating point numbers in an algorithm that is supposed to work with all sorts of types.
The core problem is that the convention in base now is that the lifted version of a function that you get with a .
will return a datatype that can represent missingness (i.e. Nullable
in base). To change that convention for one operator in this package here seems really confusing and inconsistent.
This is also different to SQL, which uses 3VL but silently interprets Unknown as false in conditionals.
I know, but I don't like that one bit, I think it makes things way too difficult. I also don't see any chance that base will adopt this convention for things like if
etc..
from datavalues.jl.
I agree that findmax
needs to change. But my point is that these kind of errors might be more frequent in Julia than in C# because in Julia functions often rely on duck typing (I don't know if that is common in C#).
To clarify, I don't propose to copy SQL. I mentioned the difference to SQL as a potential advantage of my proposal.
An alternative could be to have comparisons with NA
return DataValue(false)
instead of NA
or false
, thereby avoiding 3VL and still having the safety of error messages in boolean contexts. I would like to see an example where this would be simpler than 3VL.
Not being able to use dotted operators because of the inconsistency with Nullable
is unfortunate. The need to wrap every expression with potential NA
s into a function in boolean contexts could be a bit annoying. But personally I would prefer the increased safety over the loss of convenience.
from datavalues.jl.
Related Issues (20)
- Maybe add whitelisted transpose and ctranspose methods
- Check ==, any and all implementations from Nulls.jl
- Make sure our dropna is fast
- Add `get(f, Nullable)`
- Some issue with similar HOT 2
- Check whether one and oneunit should be added
- method ambiguity with similar
- Make sure we have good coverage of string functions
- Fix broadcasting for 0.7
- Address remaining TODO 0.7
- Remove dependency on Nullables.jl
- not a nice output from sum HOT 2
- Cannot `convert` an object of type DataValues.DataValue{Any} to an object of type DateTime HOT 2
- promotion causes segfault/stack overflow
- Add lifted version of occursin
- remove convert method for `Any`? HOT 4
- Add support for similar / undef initializer HOT 3
- Using Any as a lower bound
- Two failing tests HOT 1
- Reduce invalidations HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datavalues.jl.