Coder Social home page Coder Social logo

grosssbm / misssbm Goto Github PK

View Code? Open in Web Editor NEW
12.0 1.0 2.0 85.89 MB

An R package for adjusting Stochastic Block Models from networks data sampled under various missing data conditions

Home Page: http://grosssbm.github.io/missSBM

License: GNU General Public License v3.0

R 81.03% C++ 18.14% TeX 0.83%
network-analysis missing-data stochastic-block-model network-dataset nas

misssbm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

mproeling astamm

misssbm's Issues

Go parallel

inferSBM should exploit multicore computing if possible (either with parallel or future).

TASK: for testing network samplings, we need the theoretical expected sampling rate

Consider the following piece of code for sampling in a 300-node SBM with various sampling schemes.

## SBM parameters
N <- 300
Q <- 3
alpha <- rep(1,Q)/Q                     # mixture parameter
pi <- diag(.45,Q) + .05                 # connectivity matrix
directed <- FALSE
# Draw a SBM model (Bernoulli, undirected)
mySBM <- simulateSBM(N, alpha, pi, directed)
A <- mySBM$adjacencyMatrix

## network samplings 
dyad  <- samplingSBM(A, "dyad", parameters =.1)
node  <- samplingSBM(A, "node", parameters =.1)
block <- samplingSBM(A, "block", parameters =c(.1, .2, .7), clusters = mySBM$memberships)
double_standard <- samplingSBM(A,"double_standard", parameters =c(0.1, 0.5))
degree <- samplingSBM(A,"degree", parameters =c(0.01,0.01))
snowball <- samplingSBM(A,"snowball", parameters =.3)

To test if each sampling given a network with a proportion of NA corresponding to what's expected, I need the theoretical sampling rate for each sampling design.

For instance, for the dyad samplign rate, it is simply the value of the parameters (0.1).

I need the theoretical expectation for the other ones as a function of the vector of parameters.

Bug: Smooth()

Error: netSampling %in% available_samplings is not TRUE

When compiling JSS paper with knitr.

Handle covariates

The networkSampling_fit should be splitted into two subclasses, with/without covariates.

The same for the class networkSampling_sampler.

See the structure of SBM_fit for a reference.

Bug : Warnings smoothing

There are recurrent warnings that appears during the smoothing of ICL curve with degree sampling : "In log(1 - prob) : NaNs produced"

use formula to specify our model

Once covariates will be available, maybe we could use network ~ covariates to specify our model.

Think also about how specifying smartly the sampling model.

Documentation

All function (even internal) should have a basic documentation to help other developers to understand what's going on...

Vignette

We should start to write a basic vignette.

ICL comparisons

See file inst/lostinICL.R for reproductible example.

ICL inconsistency when comparing with direct computation. I mean ICL computed in the class SBMfit

Initial imputation in missSBM-fit is probably not adapted to cases with covariates

Maybe related to #21, the following code for first estimation of pi in missSBM-fit is relevant for problem without covariates (when private$pi is indeed the mathematical matrix of connectivity between blocks)

https://github.com/jchiquet/missSBM/blob/18c3959d60ae4cf12039e37492987e89a8702253/R/missingSBM_fit.R#L32-L37

However, when private$pi represent gamma, as it is the case for the model with covariatess, we should adapt this first initialization and imputation. It has been show to be crucial in order to reproduce properly the resuts found with @TabouyT 's implementation.

So @TabouyT , comment initialises-tu les pi/gamma dans le modèles avec covariables ?

MAR case: perform optimization only on observed values

At this stage of the development, we perform imputation even in the MAR case to keep the same framework, whatever the underlying sampling process (MAR or NMAR).

Not only it would save some time to eprform the inference only on the observed part of the surrogate loglikelihood, but it would also be more correct in the LMAR case with covariates.

I will create a branch for that, as it changes a bit the interface with the C++ code and also the structure of the R6 object. A elegant solution would be to handle NA in the C++, by only looping over the no-NA value of the network adjacency matrix.

Bug: smooth()

Error: netSampling %in% available_samplings is not TRUE

when sampling = "block-node", same code

Issues with missSBM::smooth

The smooth function doesn't always improve the estimation, it makes it even worst and it shouldn't be ... Example with R code in JSS paper (.Rnw code)... See the ggplot attached with this issue

Rplot01.pdf

How to fit blockmodels on a fix (given) number of cluster?

This will save time in the tests, I only manage to fit on "up to" a required number of clusters 👍

BM <- blockmodels::BM_bernoulli_covariates("SBM_sym", A, covariates_BM, verbosity = 0, explore_max = Q, plotting = "", ncores = 1)

Bug when defining an SBM object

THe following code does not work properly

A <- matrix(rbinom(100,1,.2),10,10)
  type <- "simple"
mySBM <- SimpleSBM_fit$new(A, "poisson",directed=FALSE)

Add unit tests for everything

This is important and should avoid long waste of time when code has not been checked for a while. This is also important for

  • targeting a release on CRAN
  • preparing Timothée's end of PhD

Issues with next version of ggplot2

Hi

We are preparing the next version of ggplot2 and our reverse dependency tests shows an issue with missSBM. The issue revolves around tighter checks of theme settings in facet rendering and means that free scales in facets will error if the theme has a specified aspect ratio. This change results in an error when running the examples in the estimateMissSBM documentation.

The next release is available in the v3.3.4-rc branch if you need to test against it. We plan on releasing in the next week.

best
Thomas

SetModel does not work for the model with greatest index

Hi Julien !

using SBM and LBM functions we would like to explore storedModels using the setModel method. But setModel does not allow exploring the last model, the one with the highest Index value. Is it possible to correct for that?

Also, I have a question about the storedModels method. Are they all here or are there more than that stored but not accessible through the method storedModels ?

Virginie and Benoit

Export less

Most of the functions and classes should not be call by the user.

At this stage of the package, a lot of classes are exported and should not.

Once they are no more exported, note that they are still reachable by missSBM::: to perform test.

add snowball sampling

This is a popular sampling so we should add the possibility to sample against this snowball thing, with a single or several waves.

Bug: Forward smoothing

Recurrent Error : "Error: number of cluster centres must lie between 1 and nrow(x)".
Probably due to K-means algorithm when splitting a class into two when only one node is in the class.
Solution : prevent these cases with a "if"

Smarter smoothing process

We explore all possible models at each iteration of forward/backward smoothing. We can do better than that.

At some point, the smoothing process should even be integrated in the standard fitting procedure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.