Coder Social home page Coder Social logo

Comments (25)

crew102 avatar crew102 commented on June 9, 2024 2

Yeah, I've been meaning to get to it. I'll probably push something in the next 1-2 weeks.

from ggextra.

daattali avatar daattali commented on June 9, 2024 1

@kassambara thank you for your input

@crew102 and I are discussing this, and it seems like the likely API will indeed be without a list.

A few more items we agreed on:

  • alpha will be defaulted to 1, as there already is an implicit alpha in every ggmarginal call. You can pass alpha into the ... argument, and it should also work for the case of grouped data. The documentation for this feature should make it clear that the user may find it useful to explicitly set alpha to a different fraction, but it does not need to be an enumerated argument
  • since this feature will allow grouped data to have a "fill" colour for density plots, we should also add support for "fill" in non-grouped data (currently, "fill" is not supported in density plots)

We did not settle on whether the colourGroup/fillGroup will be boolean flags or the name of a variable, though leaning towards the former. Need to ensure whatever we choose is not too restrictive and will support these scenarios:

  • ggplot(mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear)) with fillGroup based on gear (even though original plot doesn't have a fill)
  • ggplot(mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear, fill = carb)) with both fillGroup and colourGroup based on gear (even though original plot has a different fill from colour)

from ggextra.

nmasto avatar nmasto commented on June 9, 2024 1

I got it too work by adding the argument habillage to fviz_pca_ind(). Apologies for any inconviennce. Man do those plots look good. Great work with ggMarginal.

figure_5_ordination_dens

from ggextra.

daattali avatar daattali commented on June 9, 2024

You are right, this would be useful to others as well. I unfortunately will probably not have time to look into this feature myself, but I would be happy to accept a pull request if someone wants to take the lead on this feature.

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

This would be do-able...It would be a little awkward given that we currently use geom_line for creating the density plots, which we would have to move over to geom_density so that we could fill the distributions with color (i.e., specify a fill param).

I actually think the API suggested by @kassambara is good. I.e., the call would look something like:

p <- ggplot(data = mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
ggMarginal(p = p, margMapping = aes(colour = gear))

I think we'll want to require that the user specifies a color or fill mapping for the scatterplot if they also specify one for the marginal plots. We could rely on the xParams and yParams arguments for passing in alpha values of the filled marginal plots, too.

I'll take a stab at it sometime next week. @daattali , we should think about submitting a new version to CRAN after this as well, no?

from ggextra.

daattali avatar daattali commented on June 9, 2024

Yep, I already emailed the authors of packages using ggextra and told them about an upcoming cran release and to check the package for any regression bugs. We're good to go for CRAN. If you're thinking to have a go at this within the next few weeks then the cran release can wait for that.

API: is the idea that the user can also specify a different mapping than the one in the plot? And would using the aes() function be required? In ghplot aes is needed because without it you take the value literally rather than a mapping, would that be needed here as well?

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

If you're thinking to have a go at this within the next few weeks then the cran release can wait for that.

Yeah, let's wait until I take a stab at implementing this feature

API: is the idea that the user can also specify a different mapping than the one in the plot?

Technically, yes, but the mapping should use the same variable. For example, this would be OK:

p <- ggplot(data = mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
ggMarginal(p = p, margMapping = aes(fill = gear))

But we would not be supporting this:

p <- ggplot(data = mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
ggMarginal(p = p, margMapping = aes(colour = cyl))

And would using the aes() function be required?

We wouldn't have to use aes() . I was planning on parsing the aes call and going from there, instead of using it directly (so it would actually be easier to not use it). Do you think using aes would be confusing, given that we won't actually be making a call to it? I was actually originally thinking we should do something like this:

ggMarginal(p = p, margMapping = list(colour= cyl))

But I came around on the use of aes because we are doing something conceptually similar to using aes directly.

from ggextra.

daattali avatar daattali commented on June 9, 2024

I think if we're not actually using aes() then we shouldn't require the user to use it because they might assume that they can write anything that works for aes() in there. Just like ggMarginal() already has x and y params that accept a variable name, and it's not wrapped in aes().

Would there be a technical limitation or any extra code to make something like

p <- ggplot(data = mtcars) + geom_point(aes(x = mpg, y = wt, colour = gear))
ggMarginal(p = p, margMapping = list(colour = cyl))

work? From an implementation point of view, does it matter that the grouping in the plot and in the margin is not the same?

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

There would be two things that would make it awkward/more difficult if we tried to allow that:

  1. We would have to find a place to put the second legend
  2. We would have to add another param for manually mapping the colors of cyl to whatever the user wants to use. If we have just have colour = gear in the call to ggMarginal, ggplot figures out the colors from any potential call to, say, scale_color_manual and puts them in the dataframe that we are using.

from ggextra.

daattali avatar daattali commented on June 9, 2024

Good point re: legend.

Would the only allowed values be either "colour" and "fill", or would it allow any kind of mapping? And what exactly would the enforcement on the variables be - would it only allow variables that already have some mapping in the original plot?

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

I think the only relevant values for this would be colour or fill...Can you think of any others? The enforcement would basically just check that the variable specified in margMapping either be mapped to color or fill in the scatter plot. Also note that we wouldn't be supporting the data param for this feature (i.e., if the user wants to use margMapping, they have to pass in p instead of passing in data, x, and y.

from ggextra.

daattali avatar daattali commented on June 9, 2024

If it's just colour and fill, then it feels wrong to me to have a parameter that claims to take a list of mappings when there are only 2 allowed elements.

What do you think instead of one of these two options, which would be the best for end users?

  • Having two params like colourGroup and fillGroup (these might be terrible names - maybe colourVar and fillVar? I'm bad with naming things)
  • Simply having a single boolean param such as marginalGroup = TRUE/FALSE with FALSE as default. When TRUE, the colour and fill mappings that exist in the original plot get copied over to the marginal plot. This sounds like it could be simpler code and simpler for users perhaps?
  • (option 3: what you were suggesting above)

Let me know your thoughts.

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

My first instinct was to do a combo of choices 1 and 2, so something like:

ggMarginal(p = p, marginalGroup = list(colourGroup = TRUE, colourAlpha = .4, fillGroup = FALSE, fillAlpha = NA))

With the reason being that, I think people will want to use different values for the alpha of the points vs the fill of the distributions. I don't have any strong feelings for whether we just have one marginalGroup argument (which would be a list with 4 elements) or two arguments (colourGroup and fillGroup, each with 2 elements). I think it's going to be awkward any way we do it, to be honest. What do you think is the most intuitive for users?

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

Nvm, I forgot what I was planning to do for alpha, which was to just suggest that users specify it in the xParams or yParams argument...So I guess your option 2 would also work....I think I actually like that option the most, come to think of it!

But we should seperate colour and fill...So either a single marginalGroup argument which takes a list of two bools, or two arguments (colourGroup and fillGroup, both of which take a single bool)

from ggextra.

daattali avatar daattali commented on June 9, 2024

I don't follow the whole alpha thing. Why is alpha needed for the marginal plots? I think alpha should always be 1 for the marginal density/histogram.

In the marginal plot, would it make sense to have mappings for both colour and fill into different variables? I don't even know what that would look like

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

Alpha is needed (at least for fill) for the marginal plots because alpha = 1 will result in you not being able to see the distributions when they overlap. For example, in the example that kassambara posted, you get to see what the distributions look like across their entire support, even when there is another distribution that is overlapping. So we would want to set a default value for alpha somewhere around .5, I think.

I think we should just allow one variable to be mapped to fill or colour (or potentially both)...Using two different variables in the marginal map would bring up the two issues I mentioned above (e.g., adding an extra legend).

from ggextra.

daattali avatar daattali commented on June 9, 2024

Right, alpha <1 definitely needed. But let's just fix it at a value, doesn't need to be customized. You're right.

My second question was: would both colour AND fill be able to get a mapping? What would it look like when they both are used?

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

I think we should allow users to specify the alpha level, given that it will be difficult to choose a default that looks good for all different scenarios (i.e., many vs few groups, lighter vs darker cols, etc.).

Regarding your second question, that's what I thought you meant...We could potentially map a single variable to both fill and colour (but again, there would be no support for two different variables mapped to fill and colour). When you specify a fill param but no colour, the distribution(s) is outlined in black:

library(ggplot2)
mtcars$gear <- as.factor(mtcars$gear)
ggplot(data = mtcars) + 
  geom_density(aes(x = mpg, fill = gear), alpha = .3)

rplot

When you specify colour as well, the outline shares the same colour as the fill, and you only get one legend (at least for the current version of ggplot2 that I'm at):

ggplot(data = mtcars) + 
  geom_density(aes(x = mpg, fill = gear, colour = gear), alpha = .3)

rplot01

I just checked out the case for histograms, and it fill looks pretty bad. It's too difficult to tell which bins refer to which groups:

ggplot(data = mtcars) + 
  geom_histogram(aes(x = mpg, fill = gear), alpha = .3, 
                 position = position_identity(), bins = 10)

rplot02

The case for boxplot looks reasonable, though:

ggplot(data = mtcars) + 
  geom_boxplot(aes(x = mpg, y = mpg, fill = gear, colour = gear), alpha = .3, 
                 position = position_identity())

rplot03

I think we should support all three but just suggest that the user choose type to be either histogram or boxplot when he/she wants to specify a marginal mapping.

from ggextra.

kassambara avatar kassambara commented on June 9, 2024

I think that fixing the default alpha = 0.5 is a good option. Having the possibility to use colourGroup = TRUE and/or fillGroup= TRUE will be also appreciated.

You might have also noted that, when type = "boxplot", the color/fill variable should be used as the x axis variable in the marginal box plot.

rplot02

Thank you :-)!

from ggextra.

kassambara avatar kassambara commented on June 9, 2024

I'm wondering, If it wouldn't be better, if the final format of ggMarginal looks like this:

# Basic usage
ggMarginal(p)

# Grouped data
# (Only) color by groups
ggMarginal(p, colourGroup = TRUE)

# or 
# (Only) fill by groups
ggMarginal(p, fillGroup = TRUE, alpha = 0.5)

# or
# color and fill by groups
ggMarginal(p, colourGroup = TRUE, fillGroup = TRUE, alpha = 0.5)

Instead of this (more typing):

# Basic usage
ggMarginal(p)

# Grouped data
ggMarginal(p, margMapping = list(colourGroup = TRUE))
# or 
ggMarginal(p, margMapping = list(fillGroup = TRUE, alpha = 0.5))
# or
ggMarginal(p, margMapping = list(colourGroup = TRUE, fillGroup = TRUE, alpha = 0.5))

from ggextra.

daattali avatar daattali commented on June 9, 2024

@crew102 I think we left this unresolved - do you have time/would like to come back to this?

from ggextra.

crew102 avatar crew102 commented on June 9, 2024

Closed?

from ggextra.

daattali avatar daattali commented on June 9, 2024

Indeed! @kassambara this exists now

from ggextra.

nmasto avatar nmasto commented on June 9, 2024

Way late to this but perhaps worthwhile - I cannot figure out how to combine the functionality of fvis_pca_ind with ggMarginal. Even after adding a grouping variable outside of the fvis_pca_ind() argument using geom_point, ggMarginal doesn't appear to recognize the grouping variable. See code below - kind of ugly. Is this a communication breakdown between fvis_pca_indtoggplottoggMarginal` ? It recognizes that there are 3 groups but not the color or fill.

state <- fviz_pca_ind(move_pca,
# Individuals
fill.ind = dat$state,
# col.ind = "black",
# pointshape = 21,
# col = "black",
# fill = movevars1$state,
# pointsize = 2,
# labelsize = 5,
alpha = 0.5,
palette = cols,
addEllipses = TRUE,
ellipse.type = "confidence",
ellipse.level = 0.95,
mean.point = FALSE,
label = "var",
col.var = "black",
repel = TRUE,
legend.title = "",
ggtheme = theme_minimal(base_size = 16)) + # Close fviz_pca_ind
labs(title = "",
x = "Time (PC1)",
y = "Energy (PC2)"
) +
geom_point(aes(dat$pc1, dat$pc2, fill = dat$state), color = "black", size = 2, shape = 21) + # rewrite points
scale_fill_manual(values = cols) + # rewrite colors
theme_bw(base_size = 16) +
theme(aspect.ratio = 1,
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text = element_text(color="black", size = 14),
legend.position = c(.35, .95),
legend.justification = c("right", "top"),
legend.box.background = element_blank()
#axis.title.y = element_text(color="black", size = 20),
#axis.title.x = element_text(color="black", size = 20)
)

state1 <- ggMarginal(state, type = "density", col = "black", groupFill = TRUE)```

image

from ggextra.

nmasto avatar nmasto commented on June 9, 2024

Sorry. For context even when I try to specify the data, it says the nrows are misaligned despite the ggplot object has stored data with all the data stored, including a fill variable -- so even specifying my own data, and x,y coords, and fill object throws an error which I'm not sure why:

tail(state$data) # 143 observations with x, y, and fill variables

    name            x           y       coord        cos2      contrib     Fill.
138  138  0.029920576 -0.04905992 0.003302116 0.002063812 0.0005767548 Tennessee
139  139  0.469097338 -0.15273796 0.243381196 0.257227431 0.0425094847 Tennessee
140  140  0.384657694  0.19810512 0.187207182 0.319287372 0.0326980103 Tennessee
141  141 -2.198499773 -0.72048493 5.352499779 0.620500493 0.9348791579 Tennessee
142  142 -0.926122738  0.44988083 1.060096089 0.806648433 0.1851586697 Tennessee
143  143 -0.009075775 -0.56402287 0.318204172 0.205827671 0.0555782271 Tennessee

state1 <- ggMarginal(data = state$data, x = state$data$x, y = state$data$y, fill = state$data$Fill., type = "density")

Error in `ggplot2::geom_density()`:
! Problem while setting up geom aesthetics.
ℹ **Error occurred in the 1st layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (512)**
✖ Fix the following mappings: `fill`
Backtrace:
  1. ggExtra::ggMarginal(...)
  5. ggExtra:::addTopMargPlot(pGrob, top, size)
  6. ggExtra:::getMargGrob(top)
  7. ggplot2::ggplotGrob(margPlot)
 12. ggplot2:::ggplot_build.ggplot(x)
     ...
 21. l$compute_geom_2(d)
 22. ggplot2 (local) compute_geom_2(..., self = self)
 23. self$geom$use_defaults(data, self$aes_params, modifiers)
 24. ggplot2 (local) use_defaults(..., self = self)
 25. ggplot2:::check_aesthetics(params[aes_params], nrow(data))

Not sure where ggMarginal is pulling the data to get 512 for geom_density() when the data is clearly only 143 observations long. Thanks for any advice if/when convenient.

from ggextra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.