Comments (5)
The above was created with dbplyr v2.3.0, with v2.2.0 we get the expected behavior, i.e. a 2-row tibble without the need for compute():
suppressPackageStartupMessages({
library(dplyr)
library(tibble)
library(duckdb)
})
packageVersion("dbplyr")
#> [1] '2.2.0'
lcl_tbl <- tibble(
a = as.integer(c(1, 2, 1)),
e = c("c", "b", "c"),
f = c(TRUE, FALSE, TRUE)
)
src_duckdb <- dbplyr::src_dbi(DBI::dbConnect(duckdb::duckdb()), auto_disconnect = TRUE)
rmt_tbl <- copy_to(src_duckdb, lcl_tbl)
rmt_tbl %>%
distinct() %>%
mutate(aef = row_number(a))
#> # Source: SQL [2 x 4]
#> # Database: DuckDB 0.5.0 [root@Darwin 22.2.0:R 4.2.2/:memory:]
#> a e f aef
#> <int> <chr> <lgl> <dbl>
#> 1 1 c TRUE 1
#> 2 2 b FALSE 2
Created on 2023-02-14 with reprex v2.0.2
This explains why we see the unexpected behavior on all remote sources. The duckdb version has apparently nothing to do with our problem.
from dm.
Problem seems to lie in the query created by dbplyr v2.3.0:
suppressPackageStartupMessages({
library(dplyr)
library(tibble)
library(duckdb)
})
packageVersion("dbplyr")
#> [1] '2.3.0'
lcl_tbl <- tibble(
a = as.integer(c(1, 2, 1)),
e = c("c", "b", "c"),
f = c(TRUE, FALSE, TRUE)
)
src_duckdb <- dbplyr::src_dbi(DBI::dbConnect(duckdb::duckdb()), auto_disconnect = TRUE)
rmt_tbl <- copy_to(src_duckdb, name = "rmt_tbl", lcl_tbl)
rmt_tbl %>%
distinct() %>%
show_query()
#> <SQL>
#> SELECT DISTINCT *
#> FROM rmt_tbl
rmt_tbl %>%
distinct() %>%
mutate(aef = row_number(a)) %>%
show_query()
#> <SQL>
#> SELECT DISTINCT
#> *,
#> CASE
#> WHEN (NOT((a IS NULL))) THEN ROW_NUMBER() OVER (PARTITION BY (CASE WHEN ((a IS NULL)) THEN 1 ELSE 0 END) ORDER BY a)
#> END AS aef
#> FROM rmt_tbl
Created on 2023-02-15 with reprex v2.0.2
whereas for dbplyr c2.2.0 it was:
# [...] (same as above)
rmt_tbl %>%
distinct() %>%
mutate(aef = row_number(a)) %>%
show_query()
#> <SQL>
#> SELECT *, ROW_NUMBER() OVER (ORDER BY a) AS aef
#> FROM (
#> SELECT DISTINCT *
#> FROM rmt_tbl
#> ) q01
Created on 2023-02-15 with reprex v2.0.2
It looks like the order of distinct() and mutate() is switched by the translation to SQL code for dbplyr v2.3.0.
from dm.
As to the problem with flattening:
suppressPackageStartupMessages({
library(dplyr)
library(tibble)
library(duckdb)
})
packageVersion("dbplyr")
#> [1] '2.3.0'
lcl_tbl_1 <- tibble(
a = 1:2,
e = letters[1:2],
)
lcl_tbl_2 <- tibble(
a = 1:2,
e = letters[3:4],
)
lcl_tbl_3 <- tibble(
a = 1:2,
e = letters[5:6],
)
lcl_tbl_4 <- tibble(
a = 1:2,
e = letters[7:8],
)
lcl_tbl_1 %>%
left_join(rename(lcl_tbl_2, e_2 = e), by = "a") %>%
left_join(rename(lcl_tbl_3, e_3 = e), by = "a") %>%
left_join(rename(lcl_tbl_4, e_4 = e), by = "a")
#> # A tibble: 2 × 5
#> a e e_2 e_3 e_4
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 a c e g
#> 2 2 b d f h
src_duckdb <- dbplyr::src_dbi(DBI::dbConnect(duckdb::duckdb()), auto_disconnect = TRUE)
rmt_tbl_1 <- copy_to(src_duckdb, name = "rmt_tbl_1", lcl_tbl_1)
rmt_tbl_2 <- copy_to(src_duckdb, name = "rmt_tbl_2", lcl_tbl_2)
rmt_tbl_3 <- copy_to(src_duckdb, name = "rmt_tbl_3", lcl_tbl_3)
rmt_tbl_4 <- copy_to(src_duckdb, name = "rmt_tbl_4", lcl_tbl_4)
rmt_tbl_1 %>%
left_join(rename(rmt_tbl_2, e_2 = e), by = "a") %>%
left_join(rename(rmt_tbl_3, e_3 = e), by = "a") %>%
left_join(rename(rmt_tbl_4, e_4 = e), by = "a")
#> # Source: SQL [2 x 5]
#> # Database: DuckDB 0.6.2-dev1166 [root@Darwin 22.2.0:R 4.2.2/:memory:]
#> a e e_2 e_3 e_4
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 a c e e
#> 2 2 b d f f
rmt_tbl_1 %>%
left_join(rename(rmt_tbl_2, e_2 = e), by = "a") %>%
left_join(rename(rmt_tbl_3, e_3 = e), by = "a") %>%
left_join(rename(rmt_tbl_4, e_4 = e), by = "a") %>%
dbplyr::sql_render()
#> <SQL> SELECT rmt_tbl_1.*, rmt_tbl_2.e AS e_2, rmt_tbl_3.e AS e_3, rmt_tbl_3.e AS e_4
#> FROM rmt_tbl_1
#> LEFT JOIN rmt_tbl_2
#> ON (rmt_tbl_1.a = rmt_tbl_2.a)
#> LEFT JOIN rmt_tbl_3
#> ON (rmt_tbl_1.a = rmt_tbl_3.a)
#> LEFT JOIN rmt_tbl_4
#> ON (rmt_tbl_1.a = rmt_tbl_4.a)
Created on 2023-02-15 with reprex v2.0.2
In the SELECT part of the query rmt_tbl_3.e
is wrongly selected twice, once as e_3 (correct) and once as e_4 (wrong).
This behavior leads to the differences in the flatten-snapshot.
from dm.
see tidyverse/dbplyr#1158 and tidyverse/dbplyr#1159
from dm.
Both of these problems will be fixed on GHA after the next patch-release of dbplyr.
from dm.
Related Issues (20)
- Saving & restoring the DM to new DB connection HOT 1
- dm function with error : Error in list_c(.) : could not find function "list_c" HOT 2
- dm_draw() doesn't show unique keys HOT 1
- sql_json_nest.PqConnection defined twice HOT 1
- dm_examine_constraints() has issues with FK constraints on Oracle
- How to view schema specific ERD from Postgres database HOT 1
- Add comments to tables when drawing HOT 4
- Learning of FK-constraint on Postgres fails for non-owner of the relation HOT 6
- dm:::pull_tbl.dm should use as_name rather than as_string
- UKs not being set to DB HOT 3
- Frutiger font no longer shown on documentation page HOT 2
- Review dbplyr > 2.3.2 compatibility HOT 1
- autoincrement attribute out of sync: dm_get_all_pks_impl vs CREATE TABLE sql HOT 4
- recent main branch fails on test-learn.R
- recent main branch fails on test-learn.R HOT 2
- Wrong behaviour of the message "use `collect(pull_tbl())`" HOT 1
- does dm support sparklyr in some way?
- Naming pattern from `.names` ignored in `dm_from_con()`
- Error when supplying `table_names` in `dm_from_con()`
- Use tidyjson?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dm.