Comments (7)
Is there an efficient manner to sample the latent space? I'm currently just randomly sampling the range of min/max for each feature, and having the decoder give the proper representation for that latent feature set, but not sure if thats the best route.
We haven't done extensive sampling, but to me, the approach outlined above sounds like the way to go!
Is the goal to see which latent space features impact your genes of interest? If so, you may be able to short circuit the sampling step and look directly at the genes' connectivity in the decoder weight matrix. I can also imagine a scenario in which you are interested in what the distribution of samples looks like when a specific gene's expression is below a certain threshold. In that case, randomly sampling the latent space and rejecting all output when the gene of interest violates this threshold could be an interesting experiment.
It may also be worthwhile to checkout some of our recent work here: https://github.com/greenelab/BioBombe
Depending on goals and what your classifier is aiming to separate, the size of the latent space may be an important consideration!
from tybalt.
Is the goal to see which latent space features impact your genes of interest?
Perhaps, but this may be looking at things a little differently than I was originally thinking. As you mentioned, we'd like to see what happens when you 'lower' the expression of a specific gene from one level to another, and what happens to the other genes in 'response'. As you're aware, we can just lower the genes manually without compensating the expression of our other genes in the vector, so we figured that random sampling the latent space and only keeping those that have our gene within a specific range would be a good approximation.
I will definitely look at the biobombe project--looks fantastic!
from tybalt.
I'm attempting to sample the latent space with the following code:
NUM_SAMPLED_VECTORS = 1000
sampled_latent = []
maxs = encoded_rnaseq_df.max(axis=0).values.tolist()
# Sample 1000 vectors
for x in range(NUM_SAMPLED_VECTORS):
temp = []
for i,z in enumerate(maxs):
temp.append(uniform(0, z))
sampled_latent.append(temp)
sampled_latent_df = pd.DataFrame(sampled_latent)
sampled_pred_vectors = pd.DataFrame(decoder.predict_on_batch(sampled_latent_df))
However, I'm noticing that the sampled values are very, very small (min values ~10^-32, max values ~10^-22). Any ideas why?
from tybalt.
Hi--don't mean to be a bother, but just wanted to ping this and see if you have any ideas on my comment above. When I randomly sample the latent spaces for each node, the outputs are ~10^-30 smaller than I would expect (e.g. roughly ~10^-1).
from tybalt.
We haven't done extensive analyses on the generative aspects of Tybalt, so I don't know for sure. But if I had to guess, I'd say the model doesn't do well when all latent space features are sampled. I'd recommend isolating a few (maybe even one) and sampling this while fixing the others to represent a true sample. Sorry I don't have a more satisfying answer!
from tybalt.
@gwaygenomics Thanks--sorry for pestering so much! I hadn't thought to try and 'fix' all but one latent variable--I figured that there should be some relationship between the latent variables, such that you explicitly couldn't do that. I'll try that from now on, and thanks for all the fantastic work here.
from tybalt.
There should be a relationship between the latent variables to an extent. I also have a similar trepidation about that approach, but I'd love to hear how it works out. If the representation is sufficiently disentangled, perhaps it is ok 🤞 !
from tybalt.
Related Issues (20)
- Simulation Experiments HOT 2
- Keras versioning error HOT 3
- Add R packages to environment.yml HOT 2
- Reproducing separation HOT 3
- Reorganize repository
- ADAGE Implementation Issues HOT 2
- Replace data in encoded_adage_features.tsv HOT 1
- Something wrong in extracting weights? HOT 3
- Sampling space for specific genes HOT 4
- Zero'd out training HOT 3
- Features that represent biological signals HOT 3
- t-SNT visualization HOT 2
- Matching pancancer expression to metadata HOT 4
- ERROR: VAE Model reconstruct the gene expression data HOT 4
- Error when setting up environment HOT 7
- Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative. HOT 2
- Modify Tybalt to handle missing values for incomplete data HOT 4
- Top n - High Weight Selection Method HOT 1
- MAD: mean or median? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tybalt.