1. Search Database

The search page allows users to select cancer type, and then search a list of genes for different result. Users can input one or more gene symbols to query related associations among different data types in TCGA-derived Database. Users can also select existing gene sets involving in several representative signaling pathways to search for the data associations. When the search request is initiated, a result list appears at the left navigation bar of the result page. The modules of gene expression, survival analysis, co-expression, ceNetwork and somatic mutation are presented in the query result bar.

2. View Results

2.1 Gene Expression

Users can get gene expression information, including gene symbol, Ensemble ID, expression level and heatmap. Gene information and gene expression boxplot can be viewed by clicking on the gene symbol and Plot button. The boxplot is grouped by stage, race, age, gender and grade. The expression level of individual sample can be displayed by floater when mouse on each color block of the heatmap. Click on the Ensemble ID could navigate to the Ensemble database and check the detail information of the gene.

2.2 Co-Expression

Users can click on the gene ID to see the corresponding results, including scatter plot and co-expression network. The co-expression result could be queried by inputting single gene symbol or Ensembl ID in search box. The gene expression scatter plot which is presented by FPKM expression value between every two genes (submitted gene and co-expressed gene) among all TCGA samples in given cancer type. The co-expression network is generated based on the gene pairs which are co-expressed with higher pearson correlation coefficient (top25, top50 and top100 respectively). In the co-expression network, shorter edge means higher pearson correlation coefficient.

2.3 ceNetwork

In the ceNetwork, mRNA, lncRNA and miRNA is respectively represented by nodes of different color, in which dark red node means microRNA correlated with mRNA and lncRNA in the ceNetwork triplites, dark blue represents lncRNA in the triplets and purple node represents mRNA in the triplets. The gene symbols that are retrieved by users are indicated by nodes with red outer circle. The interaction relation among mRNA, lncRNA and miRNA is demonstrated by distinct type of edges. Blue edge is the interaction between miRNA and lncRNA, while the arrow points from miRNA to lncRNA which represents negative correlation between miRNA and lncRNA. The green arrowed edge means the negative correlation between miRNA and mRNA likewise. The positive correlation between mRNA and lncRNA is represented by dotted line. Users can click on the node or input gene symbol to select the gene for query, and double-click it to see gene information. The ceNetwork image could be downloaded as svg or png as well as the Excel of ceNetwork tripletes data.

2.4 Survival Analysis

In the part of Survival Analysis, the survival plot was generated based on two main aspects, which could be classified as Genetic profile and Clinical attribute. The patients are divided into two groups according to one given gene expression in Genetic profile part; while the patients are divided into several groups according to clinical attributes. Users can click upper-lift corner tag in the results page to switch the queried genes in Genetic profile part and download the image and data of survival information. In the result part of Clinical attribute, the TCGA samples are divided into groups based on clinical features, users could choose one attribute group and select the subdivisions. The survival plot will be generated according to the attributes setting.

2.5 Overall Mutation

In the Overall Mutation section, users could browse the overall mutation types of specific genes among all TCGA samples. The color block in the chart represents different mutation type, for example, dark red block means missense mutation, green block means inframe mutation, etc. The number in each row represents percent ratio of mutated sample in total samples. The floater includes mutation type and TCGA sample ID information for each sample.

2.6 Somatic Mutation

Users can scan the mutation landscape of query genes in transcript level in the selected type of cancer. The vertical axis represents the number of samples with mutation and the horizontal axis represents functional peptide domain. Mutation diagram circles are colored with respect to the corresponding mutation types. In case of different mutation types at a single position, color of the circle is determined with respect to the most frequent mutation type. Mutation types includes inframe mutations, splice site mutations, frameshift mutations missense mutations and other mutation (all other types of mutations). Mouse on the circle displays mutation information and specific samples with the mutation in the table below. The domain information would be displayed when mouse on the domain color block, including domain name and PFAM outer link. The table below interacts with somatic mutation when mouse on some mutation circle. The table contains the TCGA somatic mutation details which includes Sample ID, Protein Change, Mutation Type, etc. Users could download the table data of somatic mutation from TCGA database.

3. Statistics and Download

In the Statistic section of the database, users could browse Case Infor, Project Infor and Dataset Infor for detailed statistic data about the case information classified by clinical attributes, project information which contains case distribution in specific primary site of one project, and the Top20 most mutated genes among 33 cancer types in TCGA.

The datasets statistics table mainly includes cancer type name, primary sites, sample size and four fundamental data types. Users could rank the table by cancer name in alphabetic order and sample size in numerical order. By clicking the number listed below specific data type field (e.g. mRNA), the raw data matrix of four data types could be downloaded for further use.