About WikiORA

Example Workflow

WikiORA integrates Wikidata, Gene Ontology, PanglaoDB, and Wikipedia for a seamless, encyclopedic enrichment analysis.

WikiORA is a tool designed to simplify the process of gene set over-representation analysis by integrating data from Wikidata and Wikipedia. Our goal is to provide researchers with an intuitive platform to identify significantly enriched gene sets in their data using curated information from various sources.

How It Works

WikiORA follows these steps to perform over-representation analysis:

  1. Users submit a list of genes.
  2. For each gene set, the overlap between the user-provided gene list and the genes associated with the gene set is calculated.
  3. The p-value is computed using is calculated using the hypergeometric test, representing the probability of observing at least as many overlapping genes by chance, given all the background genes in the library.
  4. WikiORA applies the Benjamini-Hochberg correction to adjust p-values for multiple testing.
  5. Results are sorted by p-value to identify the most significantly over-represented terms.

Statistical Metrics Explained

Here are the definitions of the various metrics generated by WikiORA. While only some metrics are shown in the dashboard, they are all available upon downloading the tsv.

Data Sources

WikiORA uses Wikidata as a the sole data source for the gene sets, selecting only terms with linked Wikipedia pages. Wikidata combines community curation with data imports. For cell type markers, the major source of information is PanglaoDB. For biological processes, molecular functions and cellular components, the major sources are the Gene Ontology Annotation (GOA) Database and the Gene Ontology Resource.

The gene sets available for this version can be retrieved in the Download page. To ensure reproducibility, older versions of the tool are archived on GitHub.


Manuscript in preparation.

Lubiana, T., & Nakaya, H. (2024).WikiORA (version 0.4.0). Retrieved from https://wikiora.sysbio.tools


WikiORA is developed in Brazil by a team of bioinformaticians passionate about open knowledge. The project is led by Tiago Lubiana at the Computational Systems Biology Laboratory, headed by Prof. Helder Nakaya.

Contact Us

If you have any questions, feedback, or suggestions, please contact us via GitHub.


If you like our project, give us a ⭐ on GitHub!