fuersten / csvsqldb Goto Github PK
View Code? Open in Web Editor NEWA read only sql database that gets its data from supplied csv files.
License: Other
A read only sql database that gets its data from supplied csv files.
License: Other
Add TIMEZONE support to time and timestamp.
While executing, an operator node cannot be interrupted or cancelled. Currently, a node has always to finish it's complete processing before terminating. In order to be more responsive on processing ready situations like a limit operator having reached the limit and the ability to cancel a complete query, we have to be able to cancel the running execution of a node. The cancellation has then to be passed down to all dependent child nodes.
Replace own regex implementation with reflex.
The current ARBITRARY aggregation function is basically the ANY_VALUE. Rename and evaluate if more testing is necessary and that the function behaves correctly.
Currently, when creating a table with a VARCHAR column, an explicit length has to be specified. To add more comfort, a VARCHAR column definition without explicit length shall have a default length of 32 characters. The length is currently not used in csvsqldb anyway.
There are no direct operator nodes tests. Add unit tests for each operator node.
Currently, quoted identifiers are only allowed at very few place. This leads to awkward workarounds.
We have to check the SQL standard and add support for quoted identifier where necessary.
Implementation hint: add a class for identifiers that can distinguish and abstract between quoted and unqoted identifiers.
Currently the test suite does not execute cleanly when using "deviating" time zones. I see
% TZ='' ./bin/Linux_x86_64/csvsqldbtest
..
ApplicationTestSuite:..............................
Fixture StringHelperTestCase::timeFormatTest caught an assertion: Assertion caught: "1970-09-23T08:00:00" != csvsqldb::callTimeStream(tp) (expected: 1970-09-23T08:00:00, actual: 1970-09-23T07:00:00) [../test/stringutil_test.cpp 181]
E...
Fixture TimeHelperTestCase::timeConversionTest caught an assertion: Assertion caught: "1970-09-23T08:00:00" != csvsqldb::callTimeStream(tp) (expected: 1970-09-23T08:00:00, actual: 1970-09-23T07:00:00) [../test/time_helper_test.cpp 75]
E.... (25.4ms)
BaseValuesTestSuite:
Fixture DateTestCase::constructionTest caught an assertion: Assertion caught: 12 != d3.month() (expected: 12, actual: 1) [../test/date_test.cpp 73]
E................................ (3.8ms)
..
and
% TZ='Europe/London' ./bin/Linux_x86_64/csvsqldbtest
..
Fixture DateTestCase::constructionTest caught an assertion: Assertion caught: 12 != d3.month() (expected: 12, actual: 1) [../test/date_test.cpp 73]
E................................ (2.2ms)
With TZ=Europe/Berlin
though everything tests without problems.
Improve csv parser performance.
The csvsqldb main.cpp contains lots of abstractions that should be extracted and unit tested.
Grouping aggregations with expresssions are currently not allowed and you have to use sub-selects to workaround it.
Example that does not work
select name,avg(cast(system as int)) from system_tables group by name order by name
and workaround
select name,avg(sys) from (select name,cast(system as int) as sys from system_tables)
Currently, it is not possible to calculate, group, or order with aggregation results.
This is currently not possible
SELECT count(*) as "count",max(birth_date) as "max birthdate",min(hire_date) as "min hire" FROM employees group by last_name order by "max birthdate"
Currently, the csv import only allows iso format for dates and times. In order to be more easily useful, it shall be possible to specify custom date and time formats for the import.
Add millisecond support to time and timestamp. Currently, the timestamp can already calculate with milliseconds, but milliseconds will not be displayed.
Add the https://github.com/fmtlib/fmt lib to the project. Replace streaming output with fmt. Also includes the log macros.
Currently, globbing only works for linux and mac. We have to analyze how we can support this on the windows platform.
Replace own date, time, and timestamp implementation with std::chrono::date implementation. As we don't support C++20 yet, we will use https://github.com/HowardHinnant/date.
Currently, the execution time statistics are only supported on linux and mac. We should add reasonable timing statistics for windows.
Build-in functions currently must have a single type signature. This is not very convenient, as there are a couple of SQL functions, that actually can cope with different types. Mainly, these are date & time related functions as extract or date_trunc. In order to support these functions for different types, the function registration and retrieval has to cope with type signature sets. Currently, the extract function looks like
ExtractFunction::ExtractFunction()
: Function("EXTRACT", INT, Types({INT, TIMESTAMP}))
{
}
It should be changed to
ExtractFunction::ExtractFunction()
: Function("EXTRACT", {INT,INT,INT}, Types({INT, DATE},{INT, TIME},{INT, TIMESTAMP}))
{
}
Here we have exactly the same number of return types as signature sets. Each signature set corresponds to one overload. The stack machine now has to evaluate all overloads and call the overload with the best matching signature. If no signature matches perfectly, type casting has to be used to find a match.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.