cynkra / dm Goto Github PK
View Code? Open in Web Editor NEWWorking with relational data models in R
Home Page: https://dm.cynkra.com
License: Other
Working with relational data models in R
Home Page: https://dm.cynkra.com
License: Other
If trying to remove a key column (pk/fk), a message should be given similar to:
library(tidyverse)
iris %>%
group_by(Species) %>%
select(-Species)
#> Adding missing grouping variables: `Species`
#> # A tibble: 150 x 5
#> # Groups: Species [3]
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 5.1 3.5 1.4 0.2
#> 2 setosa 4.9 3 1.4 0.2
#> 3 setosa 4.7 3.2 1.3 0.2
#> 4 setosa 4.6 3.1 1.5 0.2
#> 5 setosa 5 3.6 1.4 0.2
#> 6 setosa 5.4 3.9 1.7 0.4
#> 7 setosa 4.6 3.4 1.4 0.3
#> 8 setosa 5 3.4 1.5 0.2
#> 9 setosa 4.4 2.9 1.4 0.2
#> 10 setosa 4.9 3.1 1.5 0.1
#> # … with 140 more rows
Created on 2019-09-06 by the reprex package (v0.3.0)
Also consider prune
option to cdm_select()
(later!) that removes all related tables if a key is removed (with #44).
With #57 and part of #54 we're now ready to start looking into an API that extracts a table from a dm but keeps the context, so that it can be put back.
dm %>%
activate(flights) %>%
filter(hour < 6) %>%
...
We temporarily remove a piece of a mosaic and reinsert it or perhaps insert a replica. Is reinsertion different from adding a new version? Maybe zoom_to_tbl()
, update_tbl()
, insert_tbl()
?
zoom_to_tbl()
and update_tbl()
, with print()
and format()
methodselect.dm()
and rename.dm()
, with a nice error message if not activatedfilter.dm()
mutate.dm()
and transmute.dm()
summarise.dm()
group_by.dm()
, ungroup.dm()
Independently:
cdm_add_tbl()
(useful for #77)insert_tbl()
I came across this one here, which I like:
https://www.guru99.com/relational-data-model-dbms.html
(We could also create another short vignette "Intro to relational data models"; but I don't think that should be necessary?)
Reproducible code (R and all CRAN packages up to date, dm up to date)
dm0 <- structure(list(src = structure(list(tbl_f = function (data)
{
as_tibble(data, .name_repair = "check_unique")
}, name = "<environment: 0x000001b8d0e65a70>", env = <environment>), class = c("src_local",
"src")), tables = list(exceptions = structure(list(site = character(0),
onprem_id = numeric(0), exception_message = character(0)), row.names = integer(0), class = c("tbl_df",
"tbl", "data.frame")), sam1_flight = structure(list(onprem_id = numeric(0),
station = character(0), received_at = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), data = character(0)), row.names = integer(0), class = c("tbl_df",
"tbl", "data.frame")), sam26_event = structure(list(onprem_id = numeric(0),
station = character(0), received_at = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), data = character(0)), row.names = integer(0), class = c("tbl_df",
"tbl", "data.frame"))), data_model_tables = structure(list(table = c("exceptions",
"sam1_flight", "sam26_event"), segment = c(NA, NA, NA), display = c("accent4",
"accent2", "accent2")), row.names = c(3L, 12L, 13L), class = "data.frame"),
data_model_pks = structure(list(column = c("onprem_id", "onprem_id"
), table = c("sam1_flight", "sam26_event")), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame")), data_model_fks = structure(list(
table = c("exceptions", "exceptions"), column = c("onprem_id",
"onprem_id"), ref = c("sam26_event", "sam1_flight"),
ref_col = c("onprem_id", "onprem_id")), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame")), filter = NULL), class = "dm")
###############
cdm_draw(dm0[c("exceptions","sam1_flight")],view_type = "all")
cdm_draw(dm0,view_type = "all")
The first cdm_draw
call prints what I expect :
With the second call the column onprem_id
is duplicated:
Should work when the user provides a custom key that is unique. Currently:
library(dplyr)
library(dm)
nycflights13::planes %>%
add_count(model, engines, name = "n_eng") %>%
add_count(model, seats, name = "n_seats") %>%
add_count(model, manufacturer, name = "n_manufacturer") %>%
mutate(model_id = paste0(
model,
if_else(n_eng > 1, paste0("-", engines), ""),
if_else(n_seats > 1, paste0("-", seats), ""),
if_else(n_manufacturer > 1, paste0("-", abbreviate(manufacturer)), "")
)) %>%
decompose_table(model_id, model, manufacturer, type, engines, seats, manufacturer)
#> `new_id_column` can not have an identical name as one of the columns of
#> `.`.
Created on 2019-09-24 by the reprex package (v0.3.0)
Thanks for the fantastic package.
I think the following error could be improved :
library(dm)
library(dplyr,warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.1
dm0 <- as_dm(lst(iris,cars))
dm0[[1]]
#> Can't convert a double vector to a string
Created on 2019-07-31 by the reprex package (v0.3.0)
dm:::`[[.dm`
#> function (x, name)
#> {
#> table <- as_string(name)
#> tbl(x, table)
#> }
My understanding is that dm
objects are lists rather than environments because you want to pass them by copy, but at the same time you'd rather not allow numeric indices because order doesn't make sense in a DB.
In that case I would suggest something like "Can't subset a dm object by position, either subset by name or use cdm_get_tables()
to convert to a regular list first."
Note that the choice not to allow numeric indices make the operation : map(dm0, nrow)
fail, which might be unexpected by a user who had previously success running nrow(dm0[["iris"]])
and nrow(dm0[["cars"]])
.
if no tables are given.
In my use case I would like to define only my primary keys manually, then run a command on my data model and have all obvious foreign keys set up appropriately.
I believe many models are fairly regular so having a feature to guess the keys can be pretty awesome, for me that would spare a fair amount of lines of code and would allow me not to update my code much when adding a table.
I believe this ould be achieved by alowing NULL
arguments in cdm_add_fk()
Standard use :
cdm_add_fk(my_dm, my_table, my_column, my_ref_table, check = TRUE)
Look for my_column
in all tables from my_dm
, and set them as foreign keys linked to my_ref_table
's primary key :
cdm_add_fk(my_dm, NULL, my_column, my_ref_table, check = TRUE)
Look for any column in my_table
named as the primary keys of my_ref_table
, and make them foreign keys :
cdm_add_fk(my_dm, my_table, NULL, my_ref_table, check = TRUE)
Look for any column in any table of my_dm
named as the primary keys of my_ref_table
, and make them foreign keys :
cdm_add_fk(my_dm, NULL, NULL, my_ref_table, check = TRUE)
Look for a primary key on my_column
in any table of my_dm
and build the appropriate foreign key.
cdm_add_fk(my_dm, my_table, my_column, NULL, check = TRUE)
Look for a primary key on my_column
in any table of my_dm
and build the appropriate foreign keys on all columns of other tables containing this column.
cdm_add_fk(my_dm, NULL, my_column, NULL, check = TRUE)
Look for all primary keys, see if any is matching by name the columns of my_table
and build the appropriate foreign keys.
cdm_add_fk(my_dm, my_table, NULL, NULL, check = TRUE)
Look for all primary keys, see if any is matching by name the columns of all other tables and build the appropriate foreign keys.
cdm_add_fk(my_dm, NULL, NULL, NULL, check = TRUE)
check = TRUE
could either trigger failure in case of incompatible subset of value, or issue an explicit warning not define ineligible foreign keys. (check = NA
might be used to choose between these).
Project looks neat. I hoping you could add to your related work links a link to https://github.com/WinVector/rquery/blob/master/extras/DependencySorting.md which was a stab at just the joining many tables issue (I know dm does that and many more things). Anyway thanks and thank for your work on this package and so many others.
To make it easy to maintain internal consistency, I think the internal data structure needs to be a tibble with the following columns:
With this, it seems that we only need two "restore" functions:
These "restore" functions accept a tidyselect-like recipe that describes how to handle renames. These can be passed on to dplyr::recode()
to implement the actual recoding.
Is it possible, perhaps outside of the dm ecosystem, to model relationships between tables that aren't based on FK-PK relationships. Think of a situation where there is a flights and an airports table, where some airport codes may not exist in the flights table (or the other way around). These would definitely be messy data relationship, but do occur fairly frequently in the data I work with. My current approach is to write lots of dplyr join statements (and try to remember the relationships) or to build views in a lightweight system like sqlite and query the views. It would be nice to be able to codify these relationships directly within R.
Thanks for the time and the package!
The function itself should not think too much by itself, it should apply the same logic for all join-methods. Furthermore, only tables that are (direct) parents will be allowed for this function (no grandparents etc.). If the user passes tables in the ellipsis, the joins will happen in this order.
left_join()
, inner_join()
, full_join()
: if no tables given in the ellipsis, the result will depend on the order of the tables in the (relevant part of the) dm
, but it will only affect the column order.semi_join()
: the result will be exactly those rows, that are referenced by ALL parent tables.anti_join()
: the result will be those rows, that are NOT referenced by ANY parent table (i.e. referenced by NO parent table). It might be more interesting for the user to have a function telling them, which rows are not referenced by at least one parent table, but that needs to be another function.nest_join()
: the created columns should have the names of the RHS tables of the join. Again, if ellipsis empty, the order of those tables in the dm
plays a role for the resulting order of the columns.right_join()
: if the ellipsis is empty, the number of rows might depend on the table that comes last in the dm
table list (if no referential integrity is given). In this case, a warning is issued.There will be a further function cdm_flatten_deep_to_tbl()
, which works only for left_join()
, inner_join()
and full_join()
, which also allows for deeper table hierarchies (grandparents etc.)
While working through the README and vignettes to learn this package (it's fantastic by the way, thank you!) I ran into this error:
dm_nycflights <- cdm_nycflights13(cycle = TRUE)
# ... testing and exploration ...
dm_nycflights %>%
cdm_join_to_tbl(airports, flights, join = semi_join)
## Error: `table` must be a string.
## Call `rlang::last_error()` to see a backtrace
It took me a while to figure out that the issue was cycle = TRUE
and not the format of the table arguments. Finally, I realized the lines I was supposed to run needed cycle = FALSE
.
Maybe cdm_join_to_tbl()
could provide a better hint in this case?
As a related suggestion, the README mentions that cdm_join_to_tbl()
requires cycle-free relationships, but it might be helpful to explain why this is the case.
Are there any plans/thoughts on supporting compound keys? I frequently have a table where the primary key is a combination of fields. It looks like putting a PK constraint on something like that in dm
requires creating a new single column that is a mutation of the two non-unique components.
Thanks for the package!
We now have a much better internal structure available through cdm_get_def()
etc.. Remove things related to new_dm()
and new_dm2()
, prefer new_dm3()
over new_dm2()
or even new_dm()
.
cdm_restore()
)Mention which values are not contained in the referred primary key. In #17 I already added printing of duplicate values for primary key candidates.
Required for copying to and learning from a database.
Options:
The first option is simpler, but it looks like we need to take care of the schema identifier ourselves.
Let's add a schema
argument which takes a DBI::Id
object.
Ideally, cdm_learn_from_db()
would be part of the dm()
constructor -- later.
Currently ignores filter conditions. When using cdm_get_tables()
while filter conditions are set we should do one of:
In the third case (which I think is the approach we should probably chose) we would need to have a new function that's implemented like cdm_get_tables()
currently is, and use it wherever possible, because in many places we just need for example the column names and we don't care about filters being applied or not.
Hi all,
I think that {dm}
offers great prospects for those handling complex datasets. In my specific case, I would love to be able to use {dm}
to handle a dozen of locally stored interrelated tables, instead of having to perform endless joining of tables and duplicating the information multiple times. It is not so much to save memory but to make manipulation more natural and safer.
Yet, not coming from the relational database side of things, but having instead always used data frames in R, I see one critical obstacle preventing me from using your otherwise great package: the impossibility to perform mutation (sensu dplyr::mutate
, dplyr::transmute
) even for locally stored data (if I missed it, very sorry).
I would like to share some thoughts about that in case you would like to consider the idea.
dplyr::mutate
-like function. The issue is that ideally a mutation could depend on variables located in different tables (with the usual restriction of source vectors being the same length or of length one). In this context, it is not clear how to create a slick syntax for that, but perhaps a dedicated operator could help.Here is a made up example (probably stupid from an aviation perspective) of what it could look like:
fuel_per_mile_per_engine <- 2.5 # in gallons
cdm_mutate(
data = dm::cdm_nycflights13(),
table = flights,
fuel = distance %from% flights * engines %from% planes * fuel_per_mile_per_engine
)
table
would refer to where to insert the new column, fuel = xxx
would feed the ...
argument and the %from%
(or %$%
?) would quote both the lhs and rhs to evaluate them in the right context. If the the columns name are unique across tables, we could even imagine that the %from% tbl
part could be guessed by cdm_mutate()
.
cdm_update_tbl()
which would be similar to cdm_add_tbl()
but that would allow to update an existing table. The idea is that one would use tbl()
to extract the required tables, do the mutation steps (and joins if required) using {dplyr}
directly and then put the table back into the dm
object using the cdm_update_tbl()
function. Contrary to cdm_rm_tbl()
+ cdm_add_tbl()
, the cdm_update_tbl()
would present the benefit of not loosing the definition of the primary keys and foreign keys in the process.I know this represent some work and that you must set priorities but those would be very welcome feature additions.
Feel free to ignore that and close the issue if you think I am missing the point.
++
Alex
need to consider case where
Required for #40. Same syntax as cdm_filter()
.
cdm_select()
is similar but has the additional complexity that removed columns need to be taken care of. Later.
The default parameters setting for cdm_copy_to()
should produce temporary tables with unique_table_names
.
Idea is to have the default unique_table_names = NULL
, which becomes TRUE
when temporary = TRUE
and FALSE
otherwise.
Could either be fixed in #79 or in a future PR.
The tibbles produced by the functions cdm_enum_pk_candidates()
and cdm_enum_fk_candidates()
could include one more column in which it is explained (by a code or verbose), why the column was rejected as a key candidate.
Requires a more powerful cdm_flatten_to_tbl()
.
view_type
Can be "all" (by default), "keys_only" or "title_only". It defines the level of details for the table rendering (all columns, only primary and foreign keys or no columns)
In fact the default value at the moment is "keys_only"
:
formals(cdm_draw)$view_type
#> [1] "keys_only"
with cdm_join_tbl()
we are able to join two tables of a dm
which share a direct link (FK).
what if we want to join a further table from the same dm
in a second step etc.?
A solution would be a new function, which wraps cdm_join_tbl()
and then returns the original dm
, but with the result of the join as an additional table. This way, there would be the possibility of adding a secondary join to this table in the pipeline.
Questions would be:
Otherwise the dm isn't shown in RStudio's object pane (this messes up rlang::last_error()
too).
In the documentation in ?cdm_filter
and vignette("dm-filtering")
.
The "basic opreation" article does this, but there's room for a simpler introduction, as in "How to Build a Relational Data Model".
CC @cutterkom.
dm()
should be replaced by logic to extract pk/fk information from a datamodelr objectnew_dm()
becomes dm()
, no longer accepts datamodelr objectnew_dm2()
becomes new_dm()
-- a thin wrapper around tibble()
after #57. Need to revisit interface carefully. Maybe extract patch_dm()
to update parts of an existing dm?It's not really needed. We only need to check same_src()
for all input tables.
Interesting package.
@krlmlr and I had a chat at useR!2019 about linking the dm
package with the git2rdata
package(https://ropensci.github.io/git2rdata/).
I would suggest that I send a PR that uses git2rdata
to read and write a dm
object to plain text files suitable for storage under version control. A bit similar to cdm_copy_to()
.
Here are a few ideas.
cdm_copy_to()
into an S3 generic with methods for src_dbi
, git_repository
and character
. The method for src_dbi
is the current implementation. git_repository
writes tables to a path in the root of git repo. character
assumes that the value is the path to the root of a project folder.git2rdata
writes the files into a subfolder of a repository. The user can specify this subfolder. Equivalent to selecting the database.dm
class also accept a path or url as source?Goal: For a key column (primary/foreign), remove all references and all tables that become disconnected if the key column is removed.
Like cdm_get_src()
but returns a database connection (or an error if a data frame or other object).
After b23b969, we need an easy way to check integrity of all pk and fk constraints, with an option to show where integrity is violated:
What's a good name?
doesn't work yet:
library(tidyverse)
library(dbplyr)
#>
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#>
#> ident, sql
library(dm)
as_dm(list(a = memdb_frame(a = 1)))
#> Error in UseMethod("as.data_model"): no applicable method for 'as.data_model' applied to an object of class "NULL"
Created on 2019-08-13 by the reprex package (v0.3.0)
in cdm_select_tbls()
there is currently the parameter all_connected
available and even set by default to TRUE
. The current implementation is not covering all general use cases and therefore the parameter should be removed until further notice.
This function would remove either all filters, or better, it let's you remove the n-th filter, cause you made a typo there, and you don't want to remove all filters at once because the rest seems fine. Maybe two different functions for those two scenarios?
Use a random "rich gets richer" graph.
{dm} should aim at using the {tidyselect}-standard (e.g. rename(table, new_colname = old_colname)
) for our functions cdm_rename_table()
and cdm_select_tbl()
.
Each function/operation should have several unit tests and one or two integration tests.
Each R/*.R
file should have a corresponding tests/testthat/test-*.R
file. The tests in that file should touch (almost) all code in that *.R
file. (How to verify? Can we easily collect this from covr? Perhaps by running repeatedly covr::package_coverage(type = "none", code = 'devtools::test(pattern = "^...$")')
.)
Implement for new tests, close this issue when all existing tests have been adapted.
With one filter, everything seems fine. When a second filter is added, the first filter is applied redundantly (via joins) for every table in the data model. This may lead to an overload of the query planner.
Potential solutions:
library(tidyverse)
library(dm)
dm <-
cdm_nycflights13() %>%
cdm_select_tbl(-weather) %>%
cdm_filter(flights, month == 1, day == 1) %>%
cdm_copy_to(dbplyr::src_memdb(), .)
dm %>%
cdm_filter(flights, year == 1) %>%
cdm_get_tables() %>%
map(dbplyr::sql_render)
#> $airlines
#> <SQL> SELECT * FROM `airlines` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`carrier` = `RHS`.`carrier`)
#> )
#>
#> $airports
#> <SQL> SELECT * FROM `airports` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`faa` = `RHS`.`origin`)
#> )
#>
#> $flights
#> <SQL> SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)
#>
#> $planes
#> <SQL> SELECT * FROM `planes` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`tailnum` = `RHS`.`tailnum`)
#> )
dm %>%
cdm_filter(flights, year == 1) %>%
cdm_filter(airlines, name == "xyz") %>%
cdm_get_tables() %>%
map(dbplyr::sql_render)
#> $airlines
#> <SQL> SELECT *
#> FROM (SELECT * FROM `airlines` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`carrier` = `RHS`.`carrier`)
#> ))
#> WHERE (`name` = 'xyz')
#>
#> $airports
#> <SQL> SELECT * FROM (SELECT * FROM `airports` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`faa` = `RHS`.`origin`)
#> )) AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT * FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM (SELECT * FROM `airlines` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`carrier` = `RHS`.`carrier`)
#> ))
#> WHERE (`name` = 'xyz')) AS `RHS`
#> WHERE (`LHS`.`carrier` = `RHS`.`carrier`)
#> )) AS `RHS`
#> WHERE (`LHS`.`faa` = `RHS`.`origin`)
#> )
#>
#> $flights
#> <SQL> SELECT * FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM (SELECT * FROM `airlines` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`carrier` = `RHS`.`carrier`)
#> ))
#> WHERE (`name` = 'xyz')) AS `RHS`
#> WHERE (`LHS`.`carrier` = `RHS`.`carrier`)
#> )
#>
#> $planes
#> <SQL> SELECT * FROM (SELECT * FROM `planes` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`tailnum` = `RHS`.`tailnum`)
#> )) AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT * FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM (SELECT * FROM `airlines` AS `LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM (SELECT *
#> FROM `flights`
#> WHERE (`year` = 1.0)) AS `RHS`
#> WHERE (`LHS`.`carrier` = `RHS`.`carrier`)
#> ))
#> WHERE (`name` = 'xyz')) AS `RHS`
#> WHERE (`LHS`.`carrier` = `RHS`.`carrier`)
#> )) AS `RHS`
#> WHERE (`LHS`.`tailnum` = `RHS`.`tailnum`)
#> )
Created on 2019-08-14 by the reprex package (v0.3.0)
Goal: rename all columns that exist in more than one table. Prefixed by the table name with a separator, .
as default.
With optional verbose output (quiet = FALSE
).
Should also take care of updating pk and fk.
For adding, use def[0, ]
as boilerplate for the new row in the def
. Use vctrs::vec_rbind()
for row-binding tibbles.
Removing is implemented for symmetry, the documentation should point to cdm_select_tbl()
.
Use @rdname
to document in the same .Rd
file.
Rewrite all examples and documentation to use this new idiom.
IMHO cdm_flatten_to_tbl()
should feature:
drop_key_cols
, since we often do not want to keep the code-columns used as keys, but only the real info (then reunite_parent_child()
might come in handy)start
-tableCurrently, we always call new_dm()
with a fully constructed datamodelr
object. The purpose of #9 was to introduce a simpler internal data structure which now can be used for updating a dm.
We should offer a way to construct a dm
object from internal components only, essentially extracting an internal constructor from new_dm()
. Also, we should provide accessor functions for the new components. Then, all code can work on the internal data structure. This should simplify the codebase a lot.
https://github.com/krlmlr/dm/pull/9/files#r306247485
As a result, calls to cdm_get_data_model()
should become very rare, needed mostly for cdm_draw()
and friends.
and replace everywhere:
expect_cdm_error <- function(expr, class) {
expect_error(expr, cdm_error(class))
}
If a column is a foreign key more than once, it also appears in the visualisation more than once. It should only occur once.
library(dm)
library(tibble)
a <- tibble(a = 1:3)
b <- tibble(b = 1:5)
c <- tibble(c = 1:4)
as_dm(list(a = a, b = b, c = c)) %>%
cdm_add_pk(b, b) %>%
cdm_add_pk(c, c) %>%
cdm_add_fk(a, a, b) %>%
cdm_add_fk(a, a, c) %>% cdm_draw()
Created on 2019-08-27 by the reprex package (v0.3.0)
dm in, dm, out.
Starting from a table, create copies for all tables that can be reached through more than one way in the relation graph.
Example: flights -> airports has two paths, would create two identical copies origin.airports
and dest.airports
(or origin
and dest
) in the resulting dm, breaking the cycle.
Need to think about how this can work with deep hierarchies.
Implementation:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.