Coder Social home page Coder Social logo

Comments (10)

manticore-projects avatar manticore-projects commented on June 16, 2024

Interestingly, using a WITH clause helps:

with t
as (SELECT word, split(word, '') split
      FROM (
        select * 
        FROM ( select UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) word)
        )
    )
select word, list_transform( split , x -> unicode(x)) as code_points from t;

/* output
┌─────────┬────────────────────────────────────┐
│ word    │ code_points                        │
├─────────┼────────────────────────────────────┤
│ foo     │ [102, 111, 111]                    │
├─────────┼────────────────────────────────────┤
│ bar     │ [98, 97, 114]                      │
├─────────┼────────────────────────────────────┤
│ baz     │ [98, 97, 122]                      │
├─────────┼────────────────────────────────────┤
│ giraffe │ [103, 105, 114, 97, 102, 102, 101] │
├─────────┼────────────────────────────────────┤
│ llama   │ [108, 108, 97, 109, 97]            │
└─────────┴────────────────────────────────────┘
*/

from duckdb.

manticore-projects avatar manticore-projects commented on June 16, 2024

Better pronounced:

-- 1 record only, why?!
select list_transform( 
    ( SELECT split(word, '') split
      FROM (
        select * 
        FROM ( select UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) word)
        )
    )
    , x -> unicode(x) 
) as code_points;


-- 5 records as expected
with t 
as ( SELECT split(word, '') split
      FROM (
        select * 
        FROM ( select UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) word)
        )
    )
select list_transform( split , x -> unicode(x)) as code_points from t ;

In my opinion, both queries mean the same and should produce exactly the same result set.

from duckdb.

cmdlineluser avatar cmdlineluser commented on June 16, 2024

Is it a case that it is parsed as a Scalar Subquery?

duckdb.sql("""
select unnest(['foo', 'bar', 'baz']).split('')
""")
# ┌─────────────────────────────────────────────────────────┐
# │ split(unnest(main.list_value('foo', 'bar', 'baz')), '') │
# │                        varchar[]                        │
# ├─────────────────────────────────────────────────────────┤
# │ [f, o, o]                                               │
# │ [b, a, r]                                               │
# │ [b, a, z]                                               │
# └─────────────────────────────────────────────────────────┘

Which essentially has an implicit LIMIT 1?

duckdb.sql("""
select (select unnest(['foo', 'bar', 'baz']).split(''))
""")
# ┌──────────────────────────────────────────────────────────────────┐
# │ (SELECT split(unnest(main.list_value('foo', 'bar', 'baz')), '')) │
# │                            varchar[]                             │
# ├──────────────────────────────────────────────────────────────────┤
# │ [f, o, o]                                                        │
# └──────────────────────────────────────────────────────────────────┘

from duckdb.

soerenwolfers avatar soerenwolfers commented on June 16, 2024

In general, you'd make the developer's life easier by writing truly MWE. This can almost always be achieved by iteratively removing bits from your examples that are unlikely to be part of the problem.

In your example, you almost surely don't need five records instead of two, the split is almost surely not needed, the unnest is unlikely to be needed, the unicode is almost surely not needed, the multiple layers of selects are likely not needed, the renames are likely not needed, the list_transform is likely not different from any more simple function, and indeed if you try all these simplifications you end up with a much more concise example of what you are surprised by:

select 0 + (select 0 union select 1)
-- Actual result
┌─────────────────────────────────────┐
│ (0 + ((SELECT 0) UNION (SELECT 1))) │
│                int32                │
├─────────────────────────────────────┤
│                                   0 │
└─────────────────────────────────────┘
-- Expected result
┌─────────────────────────────────────┐
│ (1 + ((SELECT 0) UNION (SELECT 1))) │
│                int32                │
├─────────────────────────────────────┤
│                                   0 │
├─────────────────────────────────────┤
│                                   1 │
└─────────────────────────────────────┘

Now that you have a truly MWE, you can try it at https://www.db-fiddle.com/ with PostgreSQL to see what a very related dialect does and it will tell you

Query Error: error: more than one row returned by a subquery used as an expression

FWIW, I think duckdb should also throw this error, but the docs say it doesn't and instead returns a random single value (which I personally consider dangerous).
I definitely wouldn't expect it to treat the subquery as an array or list because (a) that would prevent scalar subqueries to be used (b) a query results set is not the same as an array or list.

In any case, this isn't really related to your problem. Even if the subquery returned a list, you'd be applying list_transform to a list of lists, instead of applying list_transform to each entry of a list of lists.

from duckdb.

soerenwolfers avatar soerenwolfers commented on June 16, 2024

After all the lecturing, here is a solution to your problem without a CTE.

SELECT 
    word,
    list_transform( 
        split(word, ''),
        x -> unicode(x) 
    ) as code_points 
FROM (
    SELECT unnest(['foo', 'bar', 'baz', 'giraffe', 'llama']) word
)
┌─────────┬────────────────────────────────────┐
│  word   │            code_points             │
│ varchar │              int32[]               │
├─────────┼────────────────────────────────────┤
│ foo     │ [102, 111, 111]                    │
│ bar     │ [98, 97, 114]                      │
│ baz     │ [98, 97, 122]                      │
│ giraffe │ [103, 105, 114, 97, 102, 102, 101] │
│ llama   │ [108, 108, 97, 109, 97]            │
└─────────┴────────────────────────────────────┘

from duckdb.

manticore-projects avatar manticore-projects commented on June 16, 2024

Thank you for the lecturing and for the work around. I found a similar work around too, using a WITH clause.
However, I am lost WHY this makes any difference.

To me the hint regarding Scalar Subqueries was most useful, however I still don't see the difference and I think it should be somehow predictable when this "scalar subquery" applies.

Last but not least, keep in mind that not everyone is an experienced C developer. There are people, how are just simple accountants, engineers, risk managers or plumbers.

from duckdb.

soerenwolfers avatar soerenwolfers commented on June 16, 2024

In Python pseudocode, your goal was to do

[do_something_to_a_foo(single_foo) for single_foo in multiple_foos]

but instead your original attempt was

do_something_to_a_foo(multiple_foos)

What you should have been surprised by was that this didn't just throw, since you're passing a parameter of the wrong type: a list of foos is not a foo!

The reason it didn't throw is that SQL has the concept of "scalar subqueries" which allow you to use a "inner query"/subquery anywhere a single value is expected. For this to make sense, duckdb expects the subquery to return a single row with a single column so that it can use the single value in that subquery result as the required value (and confusingly, it throws if your subquery returns multiple columns, but it silently discards multiple rows). Hence, subqueries cannot possibly solve your conundrum.

Your and my solution (they are really the same, I just wanted to point out that it's not the CTE/WITH clause that makes the differences) solve this by actually passing single foos:

SELECT do_something_to_a_foo(single_foo) FROM something_that_produces_multiple_lists

PS: I'm not a duckdb dev and have never written a serious line of C or C++. I'm just lurking around here in attempt to lessen their support burden so they can focus on making this great library even greater, and I'm happy for everyone that shares my fandom, accountant, engineer, risk manager, or plumber.

from duckdb.

manticore-projects avatar manticore-projects commented on June 16, 2024

Thank you very much for this good explanation. I actually do understand your example:

do_something_to_a_foo(multiple_foos)

The surprise is, that do_something_to_a_foo(multiple_foos) expects/works with single_foo only and so just makes my multiple_foos a single_foo by trimming silently.

Am I really alone with the assumption that the example below should either FAIL or produce the same outcome:

select unnest([1,2,3])

select ( select 1 union select 2 union select 3)

from duckdb.

soerenwolfers avatar soerenwolfers commented on June 16, 2024

You're not alone! As I have stated repeatedely, I find it confusing and dangerous too (note that it will randomly print 1 or 2 or 3) and would prefer that it throws the same error that PostgreSQL throws when the subquery returns multiple rows. If you open a new issue that just makes that single request, I'll keep out of it and let the devs weigh in.

from duckdb.

manticore-projects avatar manticore-projects commented on June 16, 2024

Ok, thanks again for explanation and patience! I do appreciate.
I am closing this and reopen with a shortened version. (I really though at start it was related to the Lambda function, sorry!).

from duckdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.