Dfind old genome assemblies

12/11/2023

In this study, we focus on cynomolgus macaque and common marmoset to establish a solid baseline for human biomedical research. We had also previously produced pseudo-chromosome assembly by using PacBio long-reads for common marmoset 6, where ‘humanized’ sequences were still used as a reference. High-quality genome assemblies of old world monkeys were also recently reported, such as the ones for Rhesus macaque by three different research groups (PRJNA476474 8, PRJNA509445 9, and PRJNA514196 10), the ones for olive baboon (PRJNA527874 11), golden snub-nosed monkey (PRJNA524949 12), and Francois’s langur (PRJNA488530 13). Taking advantage of these recent advancements, the genome sequences of some non-human primates (NHP) including gorilla, orangutan, and chimpanzee were largely improved 5, 7, followed by the ones for bonobo (Bioproject accession: PRJNA526933), and Northern white-cheeked gibbon (PRJNA369439). Single-molecule long-read sequencing (Pacific Biosciences and Oxford Nanopore Technologies) have drastically increased the contiguity of assemblies, and chromatin contact profiling with Hi-C and other techniques such as optical mapping have paved the way to reconstructing chromosome-scale sequences. Recent technological advancements allow us to obtain chromosome-scale assemblies without relying on existing genome assemblies, where such errors or bias can be avoided. For example, a large inversion of around 20 Mb was observed in chromosome 16 of the earlier marmoset genome assembly, which should have been the result of the ‘humanization’ bias 6.

Long-standing non-human primate (NHP) genome assemblies released earlier used the human genome for ordering and orienting the assemblies into chromosomes, which prevents the observation of intrinsic structural differences between the primate genomes 5. Techniques such as mate-pair sequencing were commonly used to join the contigs into longer scaffolds, albeit with sequence gaps in between. Short-read de novo assembly was not able to resolve complex repetitive genomic regions, and the resulting contigs tended to remain fragmentary.

Their genomes, consisting of 42 and 46 chromosomes in diploids 2, 3, 4, respectively, were assembled initially using first- and second-generation sequencing technologies.

The chromosome-scale genome assemblies produced in this study are valuable resources for non-human primate models and provide an important baseline in human biomedical research.Ĭynomolgus macaque (or crab-eating macaque, Macaca fascicularis) and common marmoset ( Callithrix jacchus), belonging to old world monkey and new world monkey respectively, have been widely used in human biomedical research and drug developments with expectations that they recapitulate human physiology and pathology 1. Our assembly of cynomolgus macaque outperformed all the available assemblies of this species in terms of contiguity. The high fidelity of our assembly is also ascertained by BAC-end concordance in common marmoset. The assemblies achieved scaffold N50 lengths of 149 Mb and 137 Mb for cynomolgus macaque and common marmoset, respectively. We assembled PacBio long reads, and the resultant contigs were scaffolded with Hi-C data, which were further refined based on Hi-C contact maps and alternate de novo assemblies. Here we performed de novo genome assembly of these two species without any human genome-based bias observed in the genome assemblies released earlier. Long-standing primate genome assemblies used the human genome as a reference for ordering and orienting the assembled fragments into chromosomes. Cynomolgus macaque ( Macaca fascicularis) and common marmoset ( Callithrix jacchus) have been widely used in human biomedical research.

0 Comments

discovery guide

Dfind old genome assemblies

Leave a Reply.

Author

Archives

Categories