Currently only HGNC-approved gene symbols are supported.
Yes! this is strongly recommended for reproducibility.
Once a run has completed, any abundance estimates, markers and plots can be downloaded from their respective tabs.
Of course! If you do so, please include references for any dataset(s) you have used and cite our paper
Further information can be found in the About tab. Alternatively, you can contact us by email or through social media.
Glioblastoma (GBM) is a highly aggressive and incurable form of brain cancer. A large part of GBM malignancy can be attributed to heterogeneity: GBM tumour cells, along with their interactions to the tumour micro-environment, create a complex milieu that ultimately promotes disease progression and causes therapeutic failure.
GBMDeconvoluteR allows users to estimate the relative abundance of various immune and stromal cell populations within bulk GBM expression profiles. Moreover, it also provides abundance estimates for the four neoplastic cell states described be Neftel et.al (2019), which are thought to drive glioblastoma malignant cells heterogeneity: neural-progenitor-like (NPC); oligodendrocyte-progenitor-like (OPC); astrocyte-like (AC); and mesenchymal-like (MES).
Currently, many computational tools/methods exist which allow estimation of cell populations from bulk RNA-sequencing (RNA-seq) data. Broadly speaking, these tools can be conceptually classified into two categories: reference-based (supervised) approaches and reference-free (unsupervised) approaches. Each category has it's advantages and limitations, however the reliability of the reference used, is often cited (Cobos et.al (2020)) as being a major limiting factor when trying to obtain results which have high accuracy. This is because there is often a discrepancy between the biology of the samples and the reference being used to estimate: Gervin et.al (2019) have shown that samples with different phenotypes to that of the population of interest reduce the performance of reference-based methods.
GBMDeconvoluteR uses a reference-based deconvolution method called MCPCounter which has been shown by Sturum et.el (2019) to perform favourably when compared with other methods. To account for the reference reliability issues mentioned above, we derived a set GBM-immune cell markers using 5 publicly available single cell RNA-seq datasets. More information on this process can be found here: [Placeholder for paper link].
When a new bulk expression dataset is uploaded, any genes which have zero expression across all samples will be filtered out prior to refining neoplastic cell-state markers.
Expression profiles will also be placed into log2-space prior to refining markers and deconvolution.
The neoplastic cell state markers described be Neftel et.al (2019), were derived using single cell RNA-Seq data. However, the expression of genes in bulk samples reflects the combined effect from multiple expressing cell types and therefore many genes, which are good markers for a particular cellular state in single cell data may not be good markers in bulk data.
To exclude such genes, we follow the procedure outlined in the paper under the section titled “Bulk scores defined for TCGA samples”. Briefly this involves the following steps:
Defined initial bulk scores by the average expression of each neoplastic cell-state.
Calculate the correlation of each neoplastic cell-state gene with the initial scores.
Exclude genes if their correlation is below 0.4 or if the correlation is higher for a different neoplastic cell-state.
The scores returned by GBMDeconvoluteR are expressed in arbitrary units and are proportional to the amount of the estimated cell populations in each given sample. Moreover, each estimated population may have different arbitrary unit.
Due to this, the scores CANNOT be used to compare the abundance of different populations within the same sample. However, these scores do allow for comparison of scores (per cell population) between samples.
This is a fundamental difference between MCPCounter (the deconvolution method employed in GBMDeconvoluteR) and other methods such as Cibersort(X) which estimates the relative composition within an overall sample mixture, and therefore allows comparison between populations within a sample, but not between samples.
These fundamental differences are illustrated in Figure 1.
Figure 1. Comparison of MCPCounter and CibersortX scores for different configurations of sample mixture compositions. A.) Schematic representation of three possible cell mixtures. B.) Indicative CibersortX population estimates. C.) Indicative MCPCounter population abundance estimates. We observe that the estimates returned from CibersortX for the first two mixes are similar, as they are expressed as percentages of cells among the screened populations only, regardless of the total infiltration in the sample. Conversely, MCPCounter scores are proportional to the amount of each cell population in the total sample, which allows inter-sample comparison for each population. However, these scores are expressed in a different arbitrary unit for each population, which prevents intra-sample comparison between populations: CibersortX allows for this type of comparison.
Version |
1.5.0 |
Date |
09/09/2022 |
Author |
© Shoaib Ali Ajaib (2022) |
Code |
This app was built using Shiny and the full code is available on Github |
License |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
|