Georgina Navoly: Ancestral diversity in complex disease genetics: from discovery to translation
Disease prevalence, that is, the proportion of individuals in a population with a disease at a given time, can vary by geographic location. For example, the prevalence of obesity is considerably higher in the USA (42% in adults) than in Japan (5%), largely due to differences in dietary patterns such as higher consumption of sugar-sweetened beverages in the USA1, which increases the risk of obesity-associated diseases such as type 2 diabetes and cardiovascular disease2. Air pollution and insufficient physical activity are other geographically stratified exposures that have a strong impact across several complex diseases, including coronary artery disease and stroke3,4. Prevalence rates for some diseases or specific subtypes can also differ for ethnic groups within a country. For example, in England, overall cancer incidence rates are lower in Black and Asian ethnic groups than in the white ethnic group; however, the incidence rate of prostate cancer is 2.1-fold higher in Black men than in white men5.
Over the past two decades, genome-wide association studies (GWAS) have uncovered many genetic variants linked to complex traits or diseases. However, the vast majority (86%) of published studies were based on data for individuals with European ancestries6. This Eurocentric bias has ethical, scientific and clinical consequences as the generalizability of genetic findings is uncertain, which could deepen existing health disparities across global populations7. Polygenic risk scores (PRS) are an important example that illustrates how a lack of ancestral diversity in genetic studies hinders discovery and inclusive research. PRS are derived by aggregating the effects of genetic variants associated with disease and can estimate part of an individual’s susceptibility to diseases8. PRS have shown utility in research settings and, despite limited predicative accuracy, are increasingly incorporated into clinical practice, for example, to identify individuals at higher lifetime risk9,10,11. The predictive accuracy of PRS developed using data from individuals of European ancestry can be up to 4.9-fold lower in non-European populations than in European populations, raising concerns that their clinical use may exacerbate disparities in health outcomes12.
A lack of ancestral diversity in genetic research also limits the discovery of risk alleles that are common or have stronger effects in specific populations. For example, a damaging missense variant (rs730881101) in TNNT2, which is associated with lower heart function and increased risk of heart failure, was identified in a GWAS of more than 260,000 Japanese individuals and would have been undetectable in European-ancestry-only cohorts13. As another example, an intronic variant (rs77408001) in the ELN gene, associated with kidney function, was identified in a meta-analysis of approximately 80,000 individuals of African ancestries and provided new insights into the pathogenesis of chronic kidney disease14. Addressing imbalanced representation in genetic research is therefore essential to ensuring that advances in genomics benefit all equitably.
The past few years have seen major developments in the genetic and genomic data landscape, with large biobanks emerging outside Western countries as well as large studies focusing on underrepresented groups within Western countries15,16,17,18. However, recent political developments in the USA pose emerging challenges to equity in biomedical research19,20. Reducing funding and support for research focusing on diversity and equity in genomics risks undermining efforts to understand complex diseases in diverse populations7,21.
Here, we review how key demographic events and evolutionary forces, including migration, genetic drift and natural selection, contribute to population genetic variation by influencing allele frequency differences, patterns of linkage disequilibrium and population structure. We then discuss how increased ancestral and geographic diversity in genomic resources has contributed to novel genetic discoveries, including population-specific variants, new trait associations and improved fine-mapping resolution. Diverse data resources also offer new insights on the generalizability of genetic findings based largely on one ancestry group across diverse ancestries and settings. Finally, we address how genetic ancestry can influence patterns of disease prevalence.
Human evolution and genetic diversity
Applying a population genetics framework can help understand observed genetic differences in the context of the demographic events and evolutionary forces that act on allele frequencies over time and across populations24 (Fig. 1). In this section, we introduce the key concepts of population and ancestry before outlining how demographic events, such as migration and archaic introgression, and evolutionary forces, such as genetic drift and natural selection, contribute to variation and why these patterns are relevant for genetic research.
‘Human populations’ refers to groups of people who live in a specific geographic area and share certain characteristics25, including genetics, location, and demographic and cultural aspects26. An individual’s ‘genetic ancestry’ refers to a person’s family origins and genetic lineage, tracing back through generations25. Genetic ancestry is a concept that is relative to time, geographic context, reference population data and the analytical methods used for ancestry inference27. In this Review, we mean ‘genetically inferred ancestry’ when discussing diversity in genetic research. Of note, when estimated using genetic data, ancestry is a continuous measure shaped by historical patterns of admixture and migration, and no clear line of distinction can be drawn between populations or ancestry groups25. When comparing genetic findings across populations or ancestry groups, it is important to carefully disentangle the social and genetic constructs that can influence them.
Most genetic association studies use methods such as principal component analysis to assign individuals to genetically inferred ancestry groups. This means that an individual’s sample is assigned to a genetically similar reference group and labelled accordingly. The choice of reference groups and the boundaries for assignment are somewhat arbitrary. We use the term ancestry group throughout this Review to refer to the genetic similarity-based groupings that were used by the primary research studies and country-based ancestry (for example, labelling Biobank Japan as East Asian). However, we acknowledge that assigning individuals to discrete ancestry groups, although useful for comparisons, as done in this Review, oversimplifies the continuous nature of genetic variation28 and is shaped by social and historical context (Box 1). Moreover, individuals do not derive from a single ancestry in any meaningful sense. Given the long history of race concepts being weaponized, it is crucial to distinguish these ancestry groups used in genetic research from socially constructed racial hierarchies29.
Box 1 Ancestry research in the shadow of racism
Ancestry differences in disease prevalence raise the question of whether genetics contribute to them. However, correlating disease with ancestry is often confounded because of the connection between ancestry and ethnicity or race and the myriad of differences in cultural practices as well as systemic bias, which can affect disease risk. In general, the validity and justification of group comparisons need to be considered with extreme caution. Historically, there have been repeated attempts to explain observed differences in complex traits between ‘races’ through genetics to confirm racist stereotypes21. The complexity of these group comparisons with both known and hidden biases and the pervasive influence of the exposome on complex traits lead to a high risk of incorrect conclusions, and there is also a substantial risk of intentional misappropriation of any comparative genetic research22,23.
It is important to acknowledge that current research practices often assign individuals to discrete ancestry groups — an inherently imperfect approach given the continuous nature of human genetic variation29. While these groupings serve practical purposes in assessing representation and controlling for population stratification, they are not independent of social and historical context, which influence the number and choice of reference populations and the decision of the scale at which to consider population structure. Therefore, the use of genetic ancestry groups in research risks reinforcing problematic concepts25. The historical weaponization of ‘biological race’ concepts across colonial contexts reminds us that genetic ancestry categories, while sometimes scientifically useful, must not be conflated with socially constructed racial hierarchies159. These discredited notions continue to have a pervasive influence on people’s lives, including in the context of health-care systems160. Consequently, many observable group differences reflect the enduring legacy of racism rather than biological reality161. The historical and present-day misuse of genetic differences to justify racial hierarchies demands heightened vigilance.
Recent political developments in the USA, including legislation such as Florida’s ‘Stop WOKE Act’, which prohibits discussions of systemic racism or diversity in federally funded projects162, and executive orders that directly impede research into health disparities, exemplify the ongoing risks of conflating ancestry research with ideological agendas. Concurrently, funding for studies addressing equity and diversity has faced targeted cuts163,164,165, reflecting a broader backlash against equity-focused science166. These policies mirror historical efforts to suppress research that challenges racial hierarchies, underscoring the need for vigilance in defending rigorous, inclusive genomics.
Migration
Africa is the origin of all modern humans30,31. Most human genetic variation (about 99.9% at the DNA level) is shared across populations32. Approximately 50,000–100,000 years ago, modern humans began dispersing from Africa33,34, resulting in a severe population bottleneck (a sharp reduction in population size35) that reduced genetic diversity in populations outside of Africa (Fig. 1). Subsequent demographic events, such as migration and admixture, including episodes of archaic introgression, whereby ancient humans, such as Neanderthals and Denisovans, interbred with anatomically modern humans36, further combined with evolutionary forces, such as mutation, genetic drift, gene flow and natural selection, to shape present-day human genetic diversity37. Populations that remained in Africa did not experience the severe bottleneck that reduced genetic diversity in non-African populations and hence have retained greater genetic diversity, which is key to understanding genetic architecture and disease risk6,38. Because genetic diversity and demographic history differ among populations, so too do patterns of population structure and linkage disequilibrium (that is, the non-random association of alleles with each other)39. Shorter linkage disequilibrium blocks in African populations reflect greater historical recombination and accumulated genetic variation40, producing a rich mosaic of haplotypes. By contrast, European and other populations that have experienced bottlenecks exhibit longer linkage disequilibrium blocks, as recombination has had fewer opportunities to reshuffle alleles41. Leveraging the greater genetic diversity and shorter linkage disequilibrium blocks in African populations can aid the discovery of novel loci, identify population-specific variants and improve fine-mapping resolution42,43.
