About WikiORA

WikiORA integrates Wikidata, Gene Ontology, PanglaoDB, and Wikipedia for a seamless, encyclopedic enrichment analysis.

WikiORA is a tool designed to simplify the process of gene set over-representation analysis by integrating data from Wikidata and Wikipedia. Our goal is to provide researchers with an intuitive platform to identify significantly enriched gene sets in their data using curated information from various sources.

How It Works

WikiORA follows these steps to perform over-representation analysis:

Users submit a list of genes.
For each gene set, the overlap between the user-provided gene list and the genes associated with the gene set is calculated.
The p-value is computed using is calculated using the hypergeometric test, representing the probability of observing at least as many overlapping genes by chance, given all the background genes in the library.
WikiORA applies the Benjamini-Hochberg correction to adjust p-values for multiple testing.
Results are sorted by p-value to identify the most significantly over-represented terms.

Statistical Metrics Explained

Here are the definitions of the various metrics generated by WikiORA. While only some metrics are shown in the dashboard, they are all available upon downloading the tsv.

Term: The identifier of the gene set.
Description: A brief description of the gene set.
Wikipedia URL: A link to the Wikipedia page associated with the gene set.
Overlap: The genes from the user list that overlap with the genes in the gene set. Each gene includes a link to its Wikipedia page and its status.
Count: The number of overlapping genes.
p-value: The probability of observing the overlap by chance, calculated using the hypergeometric test.
q-value: The p-value adjusted for multiple testing using the Benjamini-Hochberg correction.

Odds Ratio: Measures the association strength, calculated as:

odds_ratio = (overlap_count * (total_genes - gene_set_size - input_gene_list_size + overlap_count)) / max((gene_set_size - overlap_count) * (input_gene_list_size - overlap_count), 1)

Combined Score: An overall significance measure, calculated as:
```
combined_score = -log10(p_value) * odds_ratio
```
Gene Ratio: The ratio of overlapping genes to the total genes in the gene set, calculated as:
```
gene_ratio = overlap_count / gene_set_size
```
Gene Set Size: The number of genes listed in this particular gene set.

Data Sources

WikiORA uses Wikidata as a the sole data source for the gene sets, selecting only terms with linked Wikipedia pages. Wikidata combines community curation with data imports. For cell type markers, the major source of information is PanglaoDB. For biological processes, molecular functions and cellular components, the major sources are the Gene Ontology Annotation (GOA) Database and the Gene Ontology Resource.

The gene sets available for this version can be retrieved in the Download page. To ensure reproducibility, older versions of the tool are archived on GitHub.

Citing

Manuscript in preparation.

Lubiana, T., & Nakaya, H. (2024).WikiORA (version 0.4.0). Retrieved from https://wikiora.sysbio.tools

Team

WikiORA is developed in Brazil by a team of bioinformaticians passionate about open knowledge. The project is led by Tiago Lubiana at the Computational Systems Biology Laboratory, headed by Prof. Helder Nakaya.

Contact Us

If you have any questions, feedback, or suggestions, please contact us via GitHub.

GitHub

If you like our project, give us a ⭐ on GitHub!