Coder Social home page Coder Social logo

benfulcher / allensdk Goto Github PK

View Code? Open in Web Editor NEW
5.0 4.0 3.0 46 KB

Workflow for retrieving spatial gene-expression data from the Allen Institute's Mouse Brain Atlas

License: GNU General Public License v3.0

Python 46.94% MATLAB 53.06%
geneexpression bioinformatics transcriptomics

allensdk's Introduction

AllenSDK

DOI Twitter

This repository contains code for:

  1. Retrieving gene-expression data from the AllenSDK; and
  2. processing it into nice structures for further analysis in Matlab.

Requires Matlab and python. The AllenSDK package for python must be installed.

If anything is unclear or needs improvement, please send questions by raising an Issue or sending me an email.

This pipeline is based on code developed for Fulcher and Fornito, PNAS (2016), and used for Fulcher et al., PNAS (2019). If you find this code useful, consider citing these papers if relevant to your work, or you can cite this code directly using its DOI.

Constructing a brain region x gene matrix

Retrieve full gene information

You first need to get a full list of genes, by running AllGenes.py.

This outputs you generic information about the genes:

  • sectionDatasetInfo.csv (all section data)
  • geneInfo.csv (gene information: acronym, entrez_id, gene_id, name)
  • geneEntrezID.csv (just the list of EntrezIDs)

Preparing inputs for a specific region x gene matrix

1. Retrieve IDs for all brain regions, structIDs and structInfo

Retrieve all structure IDs of interest directly by adapting WriteStructureInfo.py to retrieve a custom set of structures.

If you already have structure IDs in Matlab, you can alternatively to this step using WriteStructureIDs.m -> structIDs_Oh.csv and structInfo_Oh.csv.

2. Retrieve gene entrez IDs

Save a list of gene entrez IDs for the genes you're interested in. For all genes, you can use the geneEntrezID.csv file produced from AllGenes.py above. For a subset of genes, you can adapt something like subsetGenes.py.

3. Run retrieve the expression data from the Allen API

Now you've defined the structures and genes you're interested in, you can run the queries to get all combinations of expression data (of brain regions and genes). This is done using RetrieveGene.py.

Note that in RetrieveGene.py, variables need to be set.

First the input files need to match the IDs saved in Steps 1 and 2 above.

Input files

  • structIDSource: name of the .csv file of Allen structure IDs
  • entrezSource: name of the .csv file of gene entrez IDs to retrieve

Output filenames

To set:

  • structInfoFilename: saves retrieved information for the structure IDs specified.
  • allDataFilename: saves detailed expression information out to this file.

Generated:

  • expression_energy_AxB: expression energy values for the A structures and B section datasets
  • expression_density_AxB: expression density values for the A structure and B section datasets
  • dataSetIDs_Columns.csv: dataset IDs representing each column in the above matrices

Importing data into Matlab

Then you can import the resulting data into Matlab as:

[GeneExpData,sectionDatasetInfo,geneInfo,structInfo] = ImportAllenToMatlab();

In this function, you must specify the filenames to read in:

  • fileNames.struct: the structure info file specified above (structInfoFilename)
  • fileNames.sectionDatasets: full information about all datasets retrieved (allDataFilename)
  • fileNames.geneInfo:
  • fileNames.energy:
  • fileNames.density:
  • fileNames.columns:

Outputs a processed .mat file: AllenGeneDataset_X.mat containing information about X unique genes.

Computing a structure mask

Example pipeline: First generate .csv files for structure IDs and matching to structure info (for interpretation) E.g., for the Oh et al. 213-region parcellation:

WriteStructureIDs

This generates structIDs_Oh.csv and structInfo_Oh.csv. In the python file MakeCCFMasks, these files are listed as inputs, such that

MakeCCFMasks

generates a mask for these, saving as mask_Oh.h5.

allensdk's People

Contributors

benfulcher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.