Comments (6)
billboard <- read.csv("vignettes/billboard.csv") %>%
tbl_df() %>%
gather(week, rank, starts_with("wk"), na.rm = TRUE) %>%
mutate(week = extract_numeric(week)) %>%
arrange(artist, track, week)
song <- billboard %>%
select(year:time) %>%
distinct() %>%
mutate(song_id = 1:n())
rank <- billboard %>%
left_join(song) %>%
select(song_id, week, rank) %>%
distinct() %>%
unjoin(billbord,
song = year:date.entered,
rank = c(week, rank)
)
from tidyr.
I think unjoin()
will need to split into two tables each time (like join combines two tables). That means you only need to specify either the x or y vars.
Here's another implementation idea:
xvars <- c("year", "artist", "track", "time", "date.entered")
id <- group_indices_(billboard, .dots = xvars)
song <- billboard[!duplicated(id), xvars]
song$sond_id <- id[!duplicated(id)]
rank <- billboard[, setdiff(names(billboard), xvars)]
rank$song_id <- id
@lionel- any thoughts?
from tidyr.
unjoin(billbord,
song = year:date.entered,
rank = c(week, rank)
)
This would yield a list of two data frames right?
Would the following work to separate group-level data?
billboard %>% unjoin(year:date.entered)
And would it be an option to have it return one data frame instead of a list in this case?
from tidyr.
@lionel- I think it always has to return two tables - otherwise how can you hook them together after the fact? (i.e. unjoin()
has to create a new id column)
from tidyr.
This could be dealt with an optional argument .id
, e.g.:
unjoin_.data.frame <- function(.data, ..., .dots, .id) {
dots <- lazyeval::all_dots(.dots, ...)
if (!is.null(.id)) {
dots[[length(dots) + 1]] <- lazyeval::as.lazy(.id)
}
.data <- dplyr::select_(.data, .dots = dots)
if (is.null(.id)) {
# Code for multiple data frames. Yields a list.
} else {
.data %>%
purrr::slice_rows(.id) %>%
purrr::by_slice(function(slice) {
n_unique_cols <- vapply(slice, dplyr::n_distinct, numeric(1))
if (!all(n_unique_cols == 1)) {
stop("values are not unique within groups", call. = FALSE)
}
})
res <- distinct_(.data)
}
res
}
Then, with a table that contains the group-level id column:
billboard_full <- billboard %>% left_join(song)
billboard_full %>% unjoin(year:time, .id = "song_id")
## Source: local data frame [317 x 5]
## year artist track time song_id
## 1 2000 2 Pac Baby Don't Cry (Keep... 4:22 1
## 2 2000 2Ge+her The Hardest Part Of ... 3:15 2
## 3 2000 3 Doors Down Kryptonite 3:53 3
## 4 2000 3 Doors Down Loser 4:24 4
## 5 2000 504 Boyz Wobble Wobble 3:35 5
## 6 2000 98^0 Give Me Just One Nig... 3:24 6
## 7 2000 A*Teens Dancing Queen 3:44 7
## 8 2000 Aaliyah I Don't Wanna 4:15 8
## 9 2000 Aaliyah Try Again 4:03 9
## 10 2000 Adams, Yolanda Open My Heart 5:30 10
## .. ... ... ... ... ...
I need a function that directly gives me the group-level data frame but I understand if you find this an unnecessary complication for unjoin()
since it's not very difficult to subset the correct table right after separation.
from tidyr.
I'm going to close this for now, since I've never actually had a use for it.
from tidyr.
Related Issues (20)
- Generality of nest, which fails depending on the number of rows. Applying nest to tibble of 0 rows HOT 1
- "align both" in separate_wider_* HOT 3
- Consider changing dplyr suggestion in pivot_wider to use .by HOT 1
- testthat tests failing HOT 4
- FR: Provide optional argument to `tsibble::fill_gaps` to specify desired interval. HOT 2
- Feature Request: Allow across() for column selection in complete() HOT 4
- value in gather cannot be the same as any of the original dataset variables HOT 1
- `crossing()` adds missing factor levels (either a bug or a documentation issue)
- Upkeep for tidyr (2023)
- Make `separate_longer_*()` and `separate_wider_*()` generics?
- `pivot_longer` converts variable labels to new value labels HOT 1
- Release tidyr 1.3.1
- separate_wider_delim changes input column names when using names_sep with cols_remove=FALSE HOT 1
- Change in pivot_wider behavior and error messages when using column numbers for `id_cols` HOT 1
- Feature request: function count the missing value
- Feature request: `.vary` in `expand_grid()`
- Solutions for a crowded namespace: selective removal of items? Any better ideas?
- int64 summation fails HOT 3
- Error in tidyr package & issue with columns being added to Markdown output where used to put into rows (no change to code) HOT 9
- keep getting the error while running RStudio R 4.3.3 on MacOS Big Sur HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tidyr.