Comments (10)
Interestingly, using a WITH
clause helps:
with t
as (SELECT word, split(word, '') split
FROM (
select *
FROM ( select UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) word)
)
)
select word, list_transform( split , x -> unicode(x)) as code_points from t;
/* output
┌─────────┬────────────────────────────────────┐
│ word │ code_points │
├─────────┼────────────────────────────────────┤
│ foo │ [102, 111, 111] │
├─────────┼────────────────────────────────────┤
│ bar │ [98, 97, 114] │
├─────────┼────────────────────────────────────┤
│ baz │ [98, 97, 122] │
├─────────┼────────────────────────────────────┤
│ giraffe │ [103, 105, 114, 97, 102, 102, 101] │
├─────────┼────────────────────────────────────┤
│ llama │ [108, 108, 97, 109, 97] │
└─────────┴────────────────────────────────────┘
*/
from duckdb.
Better pronounced:
-- 1 record only, why?!
select list_transform(
( SELECT split(word, '') split
FROM (
select *
FROM ( select UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) word)
)
)
, x -> unicode(x)
) as code_points;
-- 5 records as expected
with t
as ( SELECT split(word, '') split
FROM (
select *
FROM ( select UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) word)
)
)
select list_transform( split , x -> unicode(x)) as code_points from t ;
In my opinion, both queries mean the same and should produce exactly the same result set.
from duckdb.
Is it a case that it is parsed as a Scalar Subquery?
duckdb.sql("""
select unnest(['foo', 'bar', 'baz']).split('')
""")
# ┌─────────────────────────────────────────────────────────┐
# │ split(unnest(main.list_value('foo', 'bar', 'baz')), '') │
# │ varchar[] │
# ├─────────────────────────────────────────────────────────┤
# │ [f, o, o] │
# │ [b, a, r] │
# │ [b, a, z] │
# └─────────────────────────────────────────────────────────┘
Which essentially has an implicit LIMIT 1
?
duckdb.sql("""
select (select unnest(['foo', 'bar', 'baz']).split(''))
""")
# ┌──────────────────────────────────────────────────────────────────┐
# │ (SELECT split(unnest(main.list_value('foo', 'bar', 'baz')), '')) │
# │ varchar[] │
# ├──────────────────────────────────────────────────────────────────┤
# │ [f, o, o] │
# └──────────────────────────────────────────────────────────────────┘
from duckdb.
In general, you'd make the developer's life easier by writing truly MWE. This can almost always be achieved by iteratively removing bits from your examples that are unlikely to be part of the problem.
In your example, you almost surely don't need five records instead of two, the split
is almost surely not needed, the unnest
is unlikely to be needed, the unicode
is almost surely not needed, the multiple layers of selects
are likely not needed, the renames are likely not needed, the list_transform
is likely not different from any more simple function, and indeed if you try all these simplifications you end up with a much more concise example of what you are surprised by:
select 0 + (select 0 union select 1)
-- Actual result
┌─────────────────────────────────────┐
│ (0 + ((SELECT 0) UNION (SELECT 1))) │
│ int32 │
├─────────────────────────────────────┤
│ 0 │
└─────────────────────────────────────┘
-- Expected result
┌─────────────────────────────────────┐
│ (1 + ((SELECT 0) UNION (SELECT 1))) │
│ int32 │
├─────────────────────────────────────┤
│ 0 │
├─────────────────────────────────────┤
│ 1 │
└─────────────────────────────────────┘
Now that you have a truly MWE, you can try it at https://www.db-fiddle.com/ with PostgreSQL to see what a very related dialect does and it will tell you
Query Error: error: more than one row returned by a subquery used as an expression
FWIW, I think duckdb should also throw this error, but the docs say it doesn't and instead returns a random single value (which I personally consider dangerous).
I definitely wouldn't expect it to treat the subquery as an array or list because (a) that would prevent scalar subqueries to be used (b) a query results set is not the same as an array or list.
In any case, this isn't really related to your problem. Even if the subquery returned a list, you'd be applying list_transform
to a list of lists, instead of applying list_transform
to each entry of a list of lists.
from duckdb.
After all the lecturing, here is a solution to your problem without a CTE.
SELECT
word,
list_transform(
split(word, ''),
x -> unicode(x)
) as code_points
FROM (
SELECT unnest(['foo', 'bar', 'baz', 'giraffe', 'llama']) word
)
┌─────────┬────────────────────────────────────┐
│ word │ code_points │
│ varchar │ int32[] │
├─────────┼────────────────────────────────────┤
│ foo │ [102, 111, 111] │
│ bar │ [98, 97, 114] │
│ baz │ [98, 97, 122] │
│ giraffe │ [103, 105, 114, 97, 102, 102, 101] │
│ llama │ [108, 108, 97, 109, 97] │
└─────────┴────────────────────────────────────┘
from duckdb.
Thank you for the lecturing and for the work around. I found a similar work around too, using a WITH
clause.
However, I am lost WHY this makes any difference.
To me the hint regarding Scalar Subqueries was most useful, however I still don't see the difference and I think it should be somehow predictable when this "scalar subquery" applies.
Last but not least, keep in mind that not everyone is an experienced C developer. There are people, how are just simple accountants, engineers, risk managers or plumbers.
from duckdb.
In Python pseudocode, your goal was to do
[do_something_to_a_foo(single_foo) for single_foo in multiple_foos]
but instead your original attempt was
do_something_to_a_foo(multiple_foos)
What you should have been surprised by was that this didn't just throw, since you're passing a parameter of the wrong type: a list of foos is not a foo!
The reason it didn't throw is that SQL has the concept of "scalar subqueries" which allow you to use a "inner query"/subquery anywhere a single value is expected. For this to make sense, duckdb expects the subquery to return a single row with a single column so that it can use the single value in that subquery result as the required value (and confusingly, it throws if your subquery returns multiple columns, but it silently discards multiple rows). Hence, subqueries cannot possibly solve your conundrum.
Your and my solution (they are really the same, I just wanted to point out that it's not the CTE/WITH
clause that makes the differences) solve this by actually passing single foos:
SELECT do_something_to_a_foo(single_foo) FROM something_that_produces_multiple_lists
PS: I'm not a duckdb dev and have never written a serious line of C or C++. I'm just lurking around here in attempt to lessen their support burden so they can focus on making this great library even greater, and I'm happy for everyone that shares my fandom, accountant, engineer, risk manager, or plumber.
from duckdb.
Thank you very much for this good explanation. I actually do understand your example:
do_something_to_a_foo(multiple_foos)
The surprise is, that do_something_to_a_foo(multiple_foos)
expects/works with single_foo
only and so just makes my multiple_foos
a single_foo
by trimming silently.
Am I really alone with the assumption that the example below should either FAIL or produce the same outcome:
select unnest([1,2,3])
select ( select 1 union select 2 union select 3)
from duckdb.
You're not alone! As I have stated repeatedely, I find it confusing and dangerous too (note that it will randomly print 1 or 2 or 3) and would prefer that it throws the same error that PostgreSQL throws when the subquery returns multiple rows. If you open a new issue that just makes that single request, I'll keep out of it and let the devs weigh in.
from duckdb.
Ok, thanks again for explanation and patience! I do appreciate.
I am closing this and reopen with a shortened version. (I really though at start it was related to the Lambda function, sorry!).
from duckdb.
Related Issues (20)
- 0.11 Snapshot: Cannot create a list of types STRUCT(t VARCHAR, len INTEGER) and STRUCT(VARCHAR, INTEGER) HOT 2
- Named Function Parameters are Case Sensitive
- Julia bindings: Appender does not accept missing/nothing values HOT 3
- BigInt mod Double returns wrong result in case of a large BigInt HOT 4
- created ARRAYS are not ordered HOT 7
- 0.10.1 breaks build on FreeBSD 13 HOT 8
- Issue found on page 'Nested Functions': `ARRAY_REVERSE` (not working) vs. `ARRAY_REVERSE_SORT` (working) HOT 9
- LIST_EXTRACT not working with composed type containing a fixed length array HOT 1
- json_merge_patch removes nulls when they should be left alone during merge
- Appender assert error on very big maps and large number of rows HOT 2
- `JSON_GROUP_ARRAY(DISTINCT key)` returns invalid input
- ODBC to named file does not work
- call load_aws_credentials() in a docker container doesn't load credentials
- We are missing `bool Time::TryFromTime(hh, mm, ss, us, off, result)` HOT 3
- Setting to get errors on duplicated column names
- Setting to get error on scalar subquery with more than one row
- Test issue without code formatting HOT 1
- Save the date: DuckCon #5 in Seattle on 2024-08-15
- json_valid() only short-circuits sometimes HOT 1
- Unable to download Arrow extension on Windows HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duckdb.