Open datasets
Under construction, links to genomes and raw data not verified and signed off. Disclaimer, this page was generate with Claude, model Sonnet 4.6. It was done in around 1h so far - it would have take me several days, I guess…
Thoughts on the results: very positive is the gathering of the datasets starting from my PubMed authors and the Cell Atlas. The structure is good and when datasets were missing, I just put link of the missing data. Good is also the summary, but, as said, the link to the raw data/genome and scObjects is not easy - the reason I want this side…
We are committed to the principles of open and reproducible science and aim to make all data generated by our group as accessible as possible. While many datasets are publicly available through established repositories, we have found that locating and reusing these data is not always straightforward in practice.
To address this, we provide a curated collection of datasets generated by our team, together with stable identifiers and links to the original sources. Our goal is to facilitate easy access, reuse, and integration of these resources by the wider community.
⚠ Please cite the original publications. These datasets represent a substantial investment of time, expertise, and collaboration. If you use any of the data provided here, we kindly ask that you cite the corresponding original publications. Proper citation acknowledges the work involved and helps demonstrate the impact of shared data to the scientific community and funding bodies.
🧬 Reference Genomes
| Dataset / Species | Raw Reads (ENA/EBI) | Assembly / Processed | Publication (DOI) | Description |
|---|---|---|---|---|
| Plasmodium falciparum — Human Malaria | ||||
| P. falciparum 3D7, version 3 | N/A — reference strain | PlasmoDB PlasmoDB | 10.12688/wellcomeopenres.15194.2 | The canonical P. falciparum 3D7 reference genome (v3), the most widely used and extensively manually curated assembly. Describes improvements from 2002–2019, including corrections to mis-assemblies, UTR annotation and the population reference Pfref1. Wellcome Open Research 2019. |
| P. falciparum — 15 PacBio long-read assemblies (lab + clinical strains) | ENA PRJEB21264 ENA | FTP of genomes and annotation | 10.12688/wellcomeopenres.14571.1 | Nearly complete PacBio assemblies of 15 isolates including Dd2, HB3, GB4 and 10 recent clinical isolates from Africa and Asia. Defines the conserved core genome and characterises the highly structured subtelomeric variable regions. Wellcome Open Research 2018. |
| P. falciparum — long-read assemblies improved with ILRA pipeline | see paper supplement | GitHub (ILRA) GitHub | 10.1093/bib/bbad248 | Introduces ILRA (Iterative Long Read Assembler), a pipeline that automatically improves long-read assemblies from contigs to chromosome-scale sequences, validated on P. falciparum and other parasite genomes. Accompanying assemblies and benchmarking data deposited on Zenodo. Briefings in Bioinformatics 2023. |
| P. falciparum varDB — var gene assemblies (~2,400 clinical isolates) | ENA (Pf3k) ENA | FTP site Zenodo 3549770 Zenodo GitHub (varDB) GitHub | 10.12688/wellcomeopenres.15590.1 | Near-complete var gene repertoires from ~2,400 clinical isolates (Pf3k). Nucleotide + amino acid sequences and domain annotations (DBL, CIDR). Reveals global conservation patterns, recombination hotspots and inter-continental gene flow. Wellcome Open Research 2019. |
| P. falciparum varDB extended — Varia tool + ~755 additional isolates (Pf6) | MalariaGEN Pf6 — see paper | GitHub (Additional755) GitHub | 10.1186/s12859-022-04573-6 | Extended var gene database (~3,150 isolates total) released with the Varia tool for predicting PfEMP1 domain composition from short DBLα sequence tags. Enables large-scale var gene typing from amplicon or RNA-seq data. BMC Bioinformatics 2022. |
| Laverania — Chimpanzee & Gorilla Malaria Parasites | ||||
| P. reichenowi + partial P. gaboni — chimpanzee malaria genomes | ENA HG810406–HG810777 ENA | ENA HG810406–HG810777 ENA | 10.1038/ncomms5754 | First complete genome of P. reichenowi (chimpanzee parasite, closest relative of P. falciparum) plus partial P. gaboni sequence. Identifies key differences at erythrocyte invasion loci and conserved multigene family architecture. Nature Communications 2014. |
| All Laverania species — P. praefalciparum, P. adleri, P. billcollinsi, P. blacklocki, P. gaboni, P. reichenowi | ENA PRJEB24389 ENA | ENA PRJEB24389 ENA | 10.1038/s41564-018-0162-2 | Genomes of all known Laverania members infecting gorillas and chimpanzees, including the gorilla parasite P. praefalciparum — the direct ancestor of P. falciparum. Reveals interspecific gene transfers, convergent evolution, and dates the origin of human P. falciparum to 40,000–60,000 years ago. Nature Microbiology 2018. |
| Primate Malaria Parasites (Non-Laverania) | ||||
| P. vivax PvP01 — new reference genome (+ PvC01, PvT01 drafts) | ENA PRJEB10888 ENA | GeneDB (PvP01) GeneDB | 10.12688/wellcomeopenres.9876.1 | A new P. vivax reference (PvP01) assembled from a Papua Indonesian patient isolate, replacing the fragmented monkey-adapted Salvador-I reference. Fragmentation reduced from >2,500 to 226 scaffolds; manual curation raises annotated gene functions from 38% to 58%. Identifies over 1,200 pir multigene family members — 3× more than Salvador-I. Also includes draft assemblies PvC01 (China) and PvT01 (Thailand). Wellcome Open Research 2016. |
| P. vivax-like — 2 new reference genomes + 9 genotypes | ENA PRJEB22729 ENA | Dryad 10.5061/dryad.32tm1k4 Dryad | 10.1371/journal.pbio.2006035 | Two new reference genomes and 9 additional genotypes of P. vivax-like (the closest relative of human P. vivax, found in African great apes). Genomes are highly syntenic with P. vivax but phylogenetically distinct; provides insights into the evolutionary pathway from ape to human parasite. PLOS Biology 2018. |
| P. cynomolgi — improved PcyM assembly | ENA PRJEB19222 ENA | ENA GCA_900180395 ENA | 10.12688/wellcomeopenres.11864.1 | Greatly improved reference assembly for P. cynomolgi M strain (PcyM) using PacBio long-read sequencing — reduced from 1,649 scaffolds to just 56, with near-complete subtelomere coverage. Annotates 6,632 genes and reveals a striking expansion of methyltransferase pseudogenes. Important model for P. vivax hypnozoite biology. Wellcome Open Research 2017. |
| P. malariae + P. ovale curtisi + P. ovale wallikeri | ENA PRJEB14392 ENA | ENA PRJEB14392 ENA | 10.1038/nature21038 | First high-quality genome assemblies for P. malariae and both P. ovale subspecies. Reveals a large chromosomal translocation in P. malariae, novel surface protein families, and candidate genes for hypnozoite formation shared with other relapsing species. Nature 2017. |
| Avian Malaria Parasites | ||||
| P. gallinaceum + P. relictum — avian malaria genomes | ENA PRJEB13596 ENA | ENA PRJEB13596 ENA | 10.1101/gr.218123.116 | High-quality draft genomes of two avian malaria species — the first complete genomes outside mammalian Plasmodium. Identifies 50 avian-specific genes, discovers transposable elements (a first for the genus), and places avian parasites as an outgroup to all mammalian species with divergence estimated at ~10 million years ago. Genome Research 2018. |
| Rodent Malaria Parasites (Model Organisms) | ||||
| P. berghei, P. chabaudi, P. yoelii, P. vinckei — improved rodent malaria genomes | ENA PRJEB2220 ENA | PlasmoDB PlasmoDB | 10.1186/s12915-014-0086-0 | Comprehensive re-assembly and re-annotation of four rodent malaria parasite genomes using Illumina short-read technology, paired with genome-wide RNA-seq expression data across multiple life cycle stages. Substantially improves gene models and provides cross-species comparison of multigene families. BMC Biology 2014. |
| Babesia — Tick-Transmitted Blood Parasites | ||||
| Babesia spp. — variant antigen gene evolution dataset | ENA PRJEB4489 ENA | ENA PRJEB4489 ENA | 10.1093/nar/gku322 | Genome sequences and variant antigen gene analysis across multiple Babesia species (B. microti, B. bovis, B. bigemina and others). Reveals a history of genomic innovation in the VESA multigene families underlying host-parasite interaction and immune evasion. Nucleic Acids Research 2014. |
🔬 Single-Cell Transcriptomics
All atlases can be browsed interactively via our Glasgow Single-Cell Atlas (powered by paraCell). Raw sequencing reads and processed atlas files are listed below. H5AD/RDS files will be added progressively — please check back or contact us.
| Nickname / Atlas | Interactive Viewer | Raw Reads (ENA/GEO) | Processed (H5AD/RDS) | Publication (DOI) | Description |
|---|---|---|---|---|---|
| RA Macrophages Rheumatoid Arthritis | paraCell CXG | see paper | RDS — coming soon | 10.1038/s41591-020-0939-8 | Phenotypic spectrum of synovial tissue macrophages during development and resolution of rheumatoid arthritis. Identifies distinct macrophage subsets that regulate inflammation and drive remission. Nature Medicine 2020. |
| Gamma Delta T-cells CD18 / β2-integrin role | paraCell CXG | see paper | RDS — coming soon | 10.1073/pnas.1921930117 | Role of β2-integrins (CD18) in γδ T cell subset thymic development and peripheral maintenance. PNAS 2020. |
| P. berghei — host cell atlas Erythropoietic niches | paraCell CXG | see paper | 10.1371/journal.pbio.3001522 | Plasmodium berghei differentiation in erythropoietic niches of the bone marrow and spleen. Host cell maturation modulates parasite invasion and sexual differentiation. PLOS Biology 2022. | |
| Clinical Malaria — Ghana Paediatric PBMCs | paraCell CXG | see paper | RDS — coming soon | 10.3389/ebm.2024.10233 | scRNA-seq of PBMCs from Ghanaian children with uncomplicated malaria. Reveals elevated interferon responses and TNF-α/NFκB signalling in monocytes. Experimental Biology and Medicine 2025. |
| COVID Malawi — Lung Atlas Spatial + single-cell | Lung | Immune | Stromal | Nasal | Blood CXG | E-MTAB-13544 | COSMIC on Zenodo | 10.1038/s41591-024-03354-3 | Spatially resolved single-cell atlas of fatal COVID-19 lung disease in a Malawian population. Five complementary atlases (lung, immune, stromal, nasal, blood) plus imaging mass cytometry (IMC). Reveals a distinct cellular signature with dominant interferon-γ response. Nature Medicine 2024. |
| Mouse brain — T. brucei infection | paraCell CXG | see paper | H5AD — coming soon | 10.1038/s41467-022-33542-z | Integrative single-cell and spatial transcriptomic analysis of mouse brain cells during chronic Trypanosoma brucei infection. Reveals reciprocal microglia–plasma cell crosstalk. Nature Communications 2022. |
| T. brucei BSF — bloodstream form | paraCell CXG | ENA PRJEB38110 ENA | RDS — coming soon | 10.1038/s41467-021-25607-2 | Single-cell transcriptomics of bloodstream T. brucei reconstructing cell cycle progression and developmental quorum sensing. Nature Communications 2021. |
| T. brucei cell cycle (BSF + PCF) | BSF | PCF CXG | see paper | H5AD — coming soon | 10.7554/eLife.86325 | Profiling of bloodstream form (BSF) and procyclic form (PCF) T. brucei cell cycle using single-cell transcriptomics. eLife 2023. |
| T. cruzi cell atlas | paraCell CXG | see paper | H5AD — coming soon | 10.1101/2024.10.01.616042 | Complete cell atlas for the full Trypanosoma cruzi life cycle based on >31,000 single-cell transcriptomes. Covers all life cycle stages, reveals surface antigen heterogeneity, differentiation dynamics and comprehensive UTR annotation. |
| Ovine abomasum Sheep gut — nematode infection | paraCell CXG | see paper | H5AD — coming soon | 10.3389/fimmu.2021.781108 | Single-cell atlas of sheep abomasal epithelium following parasitic nematode infection. Shows tuft cell expansion and identifies evolutionarily conserved and divergent host responses. Frontiers in Immunology 2021. |
| Toxoplasma-Mouse BMDM ± IFN-γ infection | paraCell CXG | ENA E-MTAB-14453 ENA | H5AD — coming soon | 10.1093/nar/gkaf091 | Dual host–parasite scRNA-seq atlas of Toxoplasma gondii-infected murine bone marrow-derived macrophages (BMDMs), with and without IFN-γ stimulation. Identifies Atp6v0d2 and Ccl8 as novel markers of infected macrophages. Generated to demonstrate paraCell host–parasite interaction functionality. Nucleic Acids Research 2025. |
| Theileria-Cow Sahiwal vs Holstein breeds | paraCell CXG | ENA E-MTAB-14450 ENA | H5AD — coming soon | 10.1093/nar/gkaf091 | First dual host–parasite scRNA-seq atlas of Theileria annulata-infected bovine macrophages, comparing disease-susceptible Holstein (Bos taurus) and disease-tolerant Sahiwal (Bos indicus) breeds. Reveals an active interferon response in tolerant Sahiwal cells versus a pro-carcinogenic gene signature in susceptible Holstein cells. Generated to demonstrate paraCell host–parasite interaction functionality. Nucleic Acids Research 2025. |
📊 Bulk RNA-seq Datasets
Key bulk transcriptomic datasets generated by the group. Further datasets are available via the primary publications.
| Dataset | Raw Reads (ENA/GEO) | Processed Data | Publication (DOI) | Description |
|---|---|---|---|---|
| P. falciparum amplification-free RNA-seq (transcriptome refinement) | ENA PRJNA481507 ENA | see paper supplement | 10.1186/s12864-020-6692-z | Amplification-free RNA-seq to refine the P. falciparum 3D7 transcriptome, improving gene boundary annotation, UTR definition and discovery of novel transcripts. BMC Genomics 2020. |
More bulk RNA-seq datasets (including P. vivax, T. brucei and helminth experiments) will be added in subsequent updates. Please contact us if you are looking for a specific dataset.
Page maintained by:Thomas D. Otto (BNITM, Data Science Centre / Digital Infection Biology). If you cannot find a dataset, if a link is broken, or if you would like to contribute a citation, please get in touch.
Last updated: April 2026. This page is updated iteratively as new datasets become available.
