Linguistic and genetic inheritance do not always match

Author: Juan F. Trillo, PhD in Linguistics and Philosophy (U. Autónoma de Madrid), PhD in Literary Studies (U. Complutense de Madrid).


The question about to what extent there is a direct relationship between the language spoken by a specific population and its genetic heritage has been controversial for some time, mainly due to the fact that those who defend contrary positions have abundant examples to support their particular thesis. And the truth is that it is tempting to equate the genetic heritage of a people with its vernacular language; actually, Darwin himself was one of the first to point this out, in his well-known On the Origin of the Species. More recent studies, such as those of Cavalli-Sforza and of Sokal, at the end of the last century, seemed to confirm this idea. To begin with, the dispersion of human groups engaged in agriculture that took place between 9,000 and 8,000 B.C. in the Holocene period —due to a slight global warming— would have given rise to today’s major language families, which in turn would have maintained close genetic links. This has been the orthodoxy until very recently, an orthodoxy that dismissed those cases in which such a coincidence did not occur as mere exceptions to the rule.

However, other works, such as that of Lyle Campbell 1 of the University of Hawaii, in 2015, warned of the tendency to look for a parallel between linguistic and genetic transmission over time. Campbell recommended, in his conclusions, “to avoid proposed but unsubstantiated linguistic phylogenetic hypotheses. Attempts to find human genetic correlations with linguistic entities that are known (at least known by most linguists) not to be well-founded will produce no useful results.”

On the one hand, the commonly accepted hypothesis is —and in most cases this is the case— that those who speak the same language share the same genetic inheritance. However, in practice, this is not always the case, and in order to have a broad and precise database that will make it possible to trace the origin and causes of these divergences, an interdisciplinary team at the University of Zürich, Switzerland, in collaboration with the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, has designed 2 a genomic database that will allow a reliable comparison between genetic and linguistic heritages. This database has been named GeLaTo (Genes and Language Together) and accumulates information on 4,030 individuals, which in turn represent 397 genetic populations (population groups with a minimum of 5 individuals) and 295 different languages (the linguistic relationship was based on the Glottolog classification). Chiara Barbieri, a geneticist at the University of Zürich and one of the directors of the study, says that special attention has been paid to those cases in which the linguistic-genetic pattern deviates from the norm, studying where and how often these mismatches occur, in order to determine how languages and populations spread throughout the world.

Overview of linguistic and genetic similarity. Schematic illustration of possible scenarios of matches and mismatches in the transmission of genes and linguistic traits. Genetic (demographic) history is represented by solid black lines that differentiate groups of people (represented by human shapes). Linguistic history is represented by colored lines, differentiating five language families (a–e). The linguistic histories sometimes move in parallel with the demographic history and sometimes not. Numbers correspond to the different cases: 1. linguistic and genetic (matching) enclave; 2. linguistic mismatch (linguistic enclave); 3. genetic mismatch (genetic enclave); 4. population with genetic distances aligned with their linguistic relatives (matching profile); 5. population with genetic distances misaligned to their linguistic relatives (mismatching profile). Source: Barbieri at al (2022)

The results of this study show that, although in a high percentage of cases the genetic and linguistic heritages coincide, exceptions can reach up to 20%. Many of these exceptions occur in those cases in which a group that shares a genetic heritage has absorbed the language of neighboring populations, as occurs, for example, in the Andes among speakers of Quechua who live at different altitudes or among the Damara of Namibia, genetically related to the Bantu ethnic group, but speakers of Khoe, a language proper to groups in the area that are genetically distant. On certain occasions, groups of migrants absorb the local language, as has been the case with the Jewish population of Georgia, which has assimilated the language spoken in the South Caucasus, while the Cochin Jews of India, with whom they share genetic traits, speak Dravidian. These exceptions show that certain human groups possess genetic traits of peoples with whom they do not share a linguistic tradition, as is the case with Hungarian speakers, genetically related to neighboring populations, but linguistically linked to ethnic groups in Siberia.

In essence, the GeLaTo database allows a verification of the Darwinian idea that the phylogenesis and language of a people coincide and, although it is true that it is far from covering the entire world population, it is also true that its size is sufficient to observe certain patterns of great interest. For the moment, GeLaTo reveals that mismatches are not something exceptional, as was believed only a few decades ago, but phenomena that occur regularly throughout history. And, of course, as expected, when they do appear, they are usually cases in which a human group that has moved geographically adopts the language of its new neighbors, even if they are genetically distant. The main reason for these linguistic changes has been well known for a long time: the need to adapt to a culturally and politically dominant environment. In contrast, the reverse situation —that is, genetic assimilation while maintaining the linguistic heritage— is much more unusual. A particularly interesting fact is the temporal duration of this process, which usually spans several generations; however, the current and growing global mobility accelerates it and presents us with equally intriguing possibilities: either linguistic assimilation is accompanied by genetic integration, or the former is maintained but not the latter, in which case we would be faced with a cultural assimilation of the newly arrived population, but preserving its genetic identity. The other two options —not linguistic assimilation but genetic assimilation; and neither of the two— are, as we have already said, much more infrequent, especially the latter.

This is an extensive field of anthropological research which, given that it is only now developing, will have to be studied in the immediate future. Likewise, the increase in the information contained in GeLaTo can contribute significantly to obtaining a clearer idea of the demographic and cultural trajectory that the world population is following. As Campbell proposed, the study of the relationship between both heritages should be an opportunity for geneticists and linguists to collaborate in the search for a better understanding of the dynamics that regulate the interaction between societies and linguistic transmission.


  1. Campbell, L. (2015) Do Languages and Genes Correlate? Language Dynamics and Change doi: 10.1163/22105832-00502007
  2. Chiara Barbieri, Damián E. Basi, Epifanía Arango-Isaza, Alexandros G. Sotiropoulos, Harald Hammarström, Soren Wichmann, Simon J. Greenhill, Russell D. Gray, Robert Forkel, Balthasar Bickel, and Kentaro K. Shimizu (2022) A global analysis of matches and mismatches between human genetic and linguistic histories PNAS doi: 10.1073/pnas.2122084119

Written by


Leave a Reply

Your email address will not be published.Required fields are marked *