What is the source of the data in the AveragedTissues experiment found in the Experiment folder in ResNet 6?

 


1. Several tissue profiling microarray experiments with different normal human tissue samples have been selected from GEO database. For the purpose of more accurate clustering, all these experiments were based on the same platform, which is most common for such experiments, GPL96 (Affymetrix GeneChip Human Genome U133 Array Set HG-U133A). The complete list of the GEO datasets is:

GDS534 - Smoking-induced changes in airway transcriptome
GDS596 - Large-scale analysis of the human transcriptome
GDS860 - Endothelin-1 effect on fibroblasts
GDS1505 - Cultured skin substitute
GDS1096 - Normal tissues of various types
GDS1304 - Cigarette smoking effect on small airway epithelium
GDS1663 - Expression data from different research centers
GDS1376 - Essential thrombocythemia
GDS1775 - Major leukocyte types
GDS2021 - Coronary smooth muscle cell response to beta-1 receptor blockers
GSE3920 - EC_interferon


2. For all these experiments original probes provided in GEO SOFT files have been semi-automatically mapped to Entrez GeneIDs.

3. All experiments have then been combined into the single table with rows corresponding to probes and columns representing to tissues. All tissue samples have been manually mapped to the compiled list of normal human tissues. In general there were 3 or more replicate samples of each tissue type.

4. All expression values in the constructed table are normalized by log2 transformation, followed by global mean centering:

Xij = Xij – Mean(Sj) + Mean(E)

Where: Xij – expression value of probe i, sample j; Mean(Sj) – mean expression value of a sample(column) j; Mean(E) – mean expression value of the entire table. The missing values were ignored.

5. Sample quality and consistency was assessed by sample clustering using UPGMA (average linkage clustering) algorithm with Pearson correlation as a distance metric. The 197 tissue samples representing 28 distinct tissue types (which shown consistent clustering) have been selected. The list of tissue types is below:      

Adrenal
B cells
Basophils
Bone marrow
Brain
Bronchial epithelia
Endothelium
Eosinophils
Fibroblast
Heart
Kidney
Liver
Lymph node
Macrophages
Mast cells
Monocytes
Neutrophils
NK cells
Pancreas
Pituitary
Platelet
Skeletal Muscle
Skin
Smooth Muscle
Spleen
T cells
Testis
Thymus

6. The expression values have been averaged across multiple probes representing the single gene and across multiple samples representing the same tissues to produce final 12979 Genes x 28 Tissues table .