![]() ![]() These issues can also complicate short-read mapping and variant callings. In addition, recent studies revealed a lack of (population-specific) sequences in the reference genome, and discovered thousands of structural variants (SVs) in worldwide samples 11, 12, 13, 14. With the exception of one donor with an Asian background, all of the donors had a European background resulting in the composition of an Asian haplotype for 4.3% of the reference genome 1, 10. For example, > 70% of the reference genome is composed of a BAC library known as RP-11 (aliased RPCI-11) 1 from a donor with both African and European ancestry 10. As NGS analyses typically assume that the reference allele is the ancestral, healthy, or major allele for any variable site, the inclusion of such rare alleles may also confuse subsequent interpretations.Īnother possible problem associated with the reference genome is that the samples used for its construction are biased toward African and European ancestries. Inclusion of such variants in the reference can lead to erroneous and confusing results of short-read mapping or variant calling 9. Over 90,000 rare variants were used as a reference allele including disease-susceptibility variants for thrombophilia and type 2 diabetes 8, 9. As such, the reference genome inevitably harbors rare or even private variants. For example, although the reference genome is constructed using genetic information from multiple donors, each clone comprising the resulting reference genome is derived from either haploid genome of a particular individual. It should be noted that these genetic and RH maps are original information sources used to construct the reference genome and not derived from the reference genome itself.Īlthough the reference genome is a resource of unparalleled value, several of its characteristics are not ideal for application to NGS analyses, particularly for some populations 7. The assembled contigs or scaffolds were then anchored on each chromosome using information from genetic and radiation hybrid (RH) maps, which have thousands to tens of thousands of sequence-tagged site (STS) markers in linkage groups (i.e., chromosomes). The reference genome was constructed using a hierarchical shotgun sequencing strategy in which fragmented genomic DNA segments cloned in bacterial (BAC) or P1-derived (PAC) artificial chromosome libraries were arranged into a correct physical map to guarantee that the reference genome was haploid (mosaic) 1. The latest and second-latest versions of the reference genome (GRCh38/hg38 and GRCh37/hg19, published in 2013 and in 2009, respectively) are nearly complete, and both are widely used for NGS analyses and genome annotations 5, 6. Therefore, the reference genome is one of the most foundational resources in human genetics, and as such, it is maintained and continually updated by the Genome Reference Consortium (GRC). The coordinate system of the reference genome is used for biological and medical annotations, such as the position or sequence of specific genes, or sites of causal variants associated with both rare and common diseases. ![]() ![]() Because the short reads generated in NGS studies are approximately 100–300 bp in length, mapping them to the reference genome is an indispensable step for calling single nucleotide variants (SNVs) and short insertions and deletions (indels) in the sample individuals. The complete genome sequence-also called “the reference genome”-is currently used as a target for mapping the enormous number of short reads generated using major next-generation sequencing (NGS) techniques 3, 4. The complete human genome sequence 1, 2 has been an invaluable resource for both basic research in human genetics and clinical diagnosis. These results suggest that integrating multiple genomes from a single population can aid genome analyses of that population. ![]() We adopt JG1 as the reference for confirmatory exome re-analyses of seven rare-disease Japanese families and find that re-analysis using JG1 reduces total candidate variant calls versus GRCh37 while retaining disease-causing variants. The resulting genome sequence, JG1, is contiguous, accurate, and carries the Japanese major allele at most loci. We integrate the genomes using the major allele for consensus and anchor the scaffolds using genetic and radiation hybrid maps to reconstruct each chromosome. Here, we perform de novo assembly of three Japanese male genomes using > 100× Pacific Biosciences long reads and Bionano Genomics optical maps per sample. However, some ethnic ancestries are under-represented in the reference genome (e.g., GRCh37) due to its bias toward European and African ancestries. The complete human genome sequence is used as a reference for next-generation sequencing analyses. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |