Substitute genetic ancestry for race in research? not so fast

Race, widely used as a variable in biomedical research and medicine, is an apt indicator of racism — but not anything biological. Proposals to use genetic ancestry instead of race risk perpetuating the same problems.

Dozens of algorithms widely used in clinical care contain an adjustment factor for a patient’s race. When estimating kidney function, for example, different results are returned depending on whether the patient’s race is entered as “Black” or “non-black”, although at least for kidney function, the use of race is contested. Some drugs have been approved only for those in certain self-identified racial groups. Meanwhile, in research, the race of participants is systematically considered at almost every stage of the research process – from recruitment to analysis to interpretation of results.

Race-based health disparities have reignited debate about the relevance of these uses of race and their potential connection to racism.


Certainly, race is an important variable to track in order to understand the social drivers of health, including the impact of racism. But it’s a very problematic proxy for anything biological.

In an attempt to reflect on how to better grasp any potentially relevant biological differences between groups, a common proposal is to turn to concepts derived from genetics, and in particular to genetic ancestry.


But using genetic ancestry risks perpetuating the same problems as relying on race, as several colleagues and I argue in an essay in Science magazine’s Policy Forum. We argue that genetic ancestry can be part of the solution to understanding our different risks of developing disease and responding to therapies, but only if a sufficiently complex conceptualization of it is adopted.

The danger of turning to genetic ancestry stems from the prevailing way in which ancestry is currently used in genetics, as continental categories such as African ancestry, European ancestry, etc. These categories are easy to confuse with racial categories. European ancestry, for example, is confused with “white” race. This confuses a sociopolitical concept with a biological concept. This well-meaning “solution” ends up perpetuating the same inherent problem with racial categories: that humans can be sorted on the basis of their biology into a small number of types. Such beliefs have been the source of great harm. This provides an ethical imperative to move away from using continental ancestry categories.

There is also a scientific imperative to move away from their use. Here is what is meant by genetic ancestry: An individual’s genetic ancestry is the paths along their family tree through which they inherited each segment of their DNA. Population categories are not an integral part of this definition; imposing any set of categories is a choice that researchers must make and justify.

There are good reasons not to impose categories of continental ancestry.

Continental ancestry categories fail to adequately capture human diversity. Newly assembled datasets, such as those referenced in Science, emphasize that there are no distinct categories of genetic variability, only fuzzy continuities. Recent high-level studies in statistical genetics have shown that in many cases where the use of population categories was previously considered necessary, the categories can be avoided entirely. When basic and translational researchers can avoid categories, they should.

Continental ancestry categories also give a very incomplete picture of our ancestors. Each of us has ancestors from all points of our species’ past. A set of ancestry categories reflects a single point in that past, so this multidimensional historical picture is flattened when only one set of categories is mentioned.

New data increasingly allow us to explore different time slices. The human species, for example, interbred with Neanderthals 50,000 years ago. The best model suggests that three different human groups mixed in Europe 5,000 years ago to shape today’s Europeans. Five hundred years ago, waves of migration and trade in enslaved peoples were creating new patterns of genetic diversity in the Americas. These different time frames may have medical relevance. For example, one of the main genetic variants found to be linked to the severity of Covid-19 was later associated with a human genomic region inherited from Neanderthals. As researchers try to understand the relevance of human genetic backgrounds, they should routinely consider multiple sets of categories, representing multiple time frames.

A consideration of the values, ethics and purpose of human biological research should force researchers and those who apply the results of this research to move away from easy categorization and towards a more complex version of the genetic ancestry – a version that reflects the continuous nature of genetics. variation and its historical depth. Change is never easy. At a minimum, to achieve this, the research community will need new, widely available software tools to enable the use of categories representing multiple time slices, as well as educational materials for researchers, scientists, and clinicians. And publishers and funders will have to reconsider the types of works they will promote.

The willingness of academic and healthcare institutions to re-examine their use of race presents a window of opportunity to move away from using race as a biological variable. To make the most of this opportunity, they must embrace a complex conceptualization of genetic ancestry and not let continental labels that reaffirm past racial groups in ostensibly race-blind language become the new default.

Anna CF Lewis is a research associate at the EJ Safra Center for Ethics at Harvard University.

About Robert Wright

Check Also

The FAA has asked for comment on the small airplane seats. Will they grow?

Comment this story Comment ” Cramped “. “Unsafe.” “Torture.” Many of the more than 26,000 …