Genetics of educational attainment

Educational attainment, the years a person spends within the education system, generates interest among scientists because statistical analysis shows that there is a relationship between that value and different aspects of life including income when adults, state of health and even life expectancy.

Geneticists analyze trends that can explain variations within a population by exploring whether there is evidence that there is a genetic variety that is the substrate of that variability of behaviors, what are the relative contributions of genetic and environmental factors, how many genes in the genome intervene in this variation, if those same genetic effects are associated with other trends and if there are evolutionary aspects that justify that this variability has been maintained generation after generation.1

In the field of education, heritability, a statistical parameter that describes the proportion of variation in a population that depends on genetic factors has been analyzed, in academic achievements, psychological characteristics of students, learning disabilities, disorders of the neurodevelopment and in educational attainment.

In the largest genetic study conducted to date, an international team of researchers has published in the journal Nature Genetics 2 that there are more than a thousand variants in a person’s genes that influence how long they will be in school. Most of these genes are single nucleotide variants (SNPs), a change in one of the “letters” of the DNA.

These data are important but they are only part of a much more complex panorama. Gene diversity explains only a part, and not the major one, of the differences present in education between different groups of people. Environmental influences, and they include aspects as diverse as family income or educational level achieved by parents, has a greater weight.

The hope of this type of studies is that they can help us to understand what teachers and administrators can do to avoid abandonment so that the number of students who leave the classrooms before time is the least possible. The identification of gene variants, which is increasingly simpler, faster and cheaper, can help us in the future to identify children who need special attention, vigilance and greater support. By understanding this bi-univocal game between genes and the environment to a greater extent, we will be able to better address what effects on their behavior and their performance have to improve a child’s learning conditions.

The study has an obvious strength and also a weakness. It is based on data from a huge sample, the genetic sequences of 1.1 million people, but all of them were white of European origin. From the research point of view, it makes sense to use a sample as homogeneous as possible and thus be able to find associations in genes more easily, but the study leaves out most of the world population. In fact, when the team tried to use the genetic variants identified to explain the differences in schooling time among African-Americans, the tool failed and the predictions did not match the actual data. Something important: the genetic sequence does not allow predicting the evolution of a specific child. Genetic patterns are the result of means in large groups, it seems clear that in a given child genetics plays a role in what level of education will reach and how long will remain in the educational system but a prediction that works for large groups does not have to do for individual cases. Another aspect that complicates the interpretation of the results and their use in the classroom is that the genes did not have a uniform effect and the influence of the variants varied from country to country 3. Researchers do not know how to explain the causes of these differences but one possibility is the different school curricula: if in one country emphasis is given in math to memorization and in another country to problem-solving, then some gene variants may be more beneficial for students of one nationality than for those of another.

This study is a good example of the path followed by genetic studies in recent decades, a development Carl Zimmer explains beautifully (3). Early research on the genetic influence on schooling and educational attainment took place in families. These studies saw that identical twins, where genetics are virtually coincident, had more similar academic trajectories than fraternal twins, who are like two different siblings who were born on the same day and who are genetically more diverse. Later studies comparing siblings with step-siblings or with siblings adopted by different families confirmed that there was a slight genetic influence on permanence within the educational system. At the beginning of the 21st century researchers in the social sciences tried to locate those genes that intervened in school performance but the results were not good, in general, because the samples were small for a diverse population and for a behavior that, as we have seen, is modulated by an amazing number of genes and by numerous environmental factors.

However, in recent years, massive sequencing has become much cheaper and many people have sequenced their genomes to prevent any disease, genealogical studies, curiosity or any other reason. Many of them have agreed that their personal data can be used in research. In 2011, Daniel J. Benjamin, an economist at the University of Southern California, along with a group of colleagues, formed the Social Science Genetic Association Consortium with the idea of taking advantage for social studies of the large amount of information that was being accumulated on genetics through biomedical research. When studying a type of cancer or hypertension, to give two examples, people fill out questionnaires about their personal trajectory and it is common that one of the items included is what educational level they have reached. By 2016, Benjamin and his collaborators had information of some 300,000 people and had identified 71 gene variants associated with education, but two developments multiplied the scope of these studies. On the one hand the United Kingdom created the UK Biobank which quickly gathered genetic information from 442,183 people; On the other hand, the company 23andMe contributed the data of 365,539 clients who had voluntarily accepted that their genetic data could be used in research.

With these databases, our capacity to reliably detect specific genetic polymorphisms associated with several behavioral traits increased substantially (1). An important conclusion emerging from studies to date is that single polymorphisms almost always have small effects—and thus, their detection requires very large samples. As larger and larger research samples become available, the number of credibly established associations with a wide range of outcomes will continue to grow. The larger samples will also allow researchers to construct increasingly powerful predictors, called polygenic scores, from genetic information.

The researchers gathered their results by analyzing 71 data sets, which included over a million participants from 15 different countries. The participants all had European ancestry and were 30 years or older. With this population, the international group found 1271 gene variants, single nucleotide polymorphisms or SNPs, which were more common in people who had completed a long education or, on the contrary, who were more common in people who had left the education system early (2). Even so, it is important to note that the associations between an SNP and an early or late abandonment were subtle, not very important. They had statistical validity because of the enormous size of the sample, but when comparing groups of people who had and did not have a specific gene variant, the average time of schooling of each one differed only in a few days. Even variants with the largest effects predict, on average, only about three more weeks of schooling in those who have those variants compared to those who don’t.

The next important discovery is where those SNPs involved are located in the genome. They were not randomly distributed but mostly included genes active in the nervous system, especially involved in one of the key aspects of brain development, synaptogenesis or formation of new neuronal connections. The idea is that when it comes to staying in the educational system the important factor does not seem to be how fast new information is acquired but how fast it is shared between several cerebral regions. In other words, not how fast the signal moves from point A to B of a biological connection, but depending on the complexity of the connections between A and B. A third aspect was striking, the study of genetic association raised a possibility perhaps more surprising. Some variants linked to education did not affect the brains of the students but the people from whom they had inherited those variants, their parents. Perhaps a gene variant generates a certain parenting behavior, which makes the children of those parents stay longer in the educational system.

Based on all these data, the researchers calculated a genetic score for academic success. The more variants you had of those associated with long stay, the score was higher. The researchers calculated the score of 4,775 Americans, distributing them into five groups. In the quintile whose genes gave them a lower score, 12% finished college. In the highest score quintile, 57% achieved it. A similar result was found when seeing the probability of repeating the course at school. In the lowest quintile, 29% of the group did, in the upper quintile only 8%. However, when the same criteria were applied in African-Americans, they did not correctly predict how their academic performance had been, probably because genetic markers are not reliable when it comes to seeing the influence of trends in different populations.

By the study’s conclusion, the researchers had identified a number of genetic factors allowing them to explain between 11 and 13 per cent of the variance in education attainment. The group of researchers plans to increase their study to two million volunteers or more and believe they will find additional thousands of genes linked to education. They also think that the huge amount of sequenced genome available at this time will provide information on many normal and pathological behaviors. An example is a study on insomnia with data from 1.3 million people and others, all of them with more than one million people as a population, which will be published in the coming months (3).


  1. Marioni RE, Ritchie SJ, Joshi PK, Hagenaars SP, Okbay A, Fischer K, Adams MJ, Hill WD, Davies G; Social Science Genetic Association Consortium, Nagy R, Amador C, Läll K, Metspalu A, Liewald DC, Campbell A, Wilson JF, Hayward C, Esko T, Porteous DJ, Gale CR, Deary IJ (2016) Genetic variants linked to education predict longevity. Proc Natl Acad Sci U S A. 113(47): 13366-13371. doi: 10.1073/pnas.1605334113
  2. Lee JJ, Wedow R,… Cesarini D (2018) Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature Genetics doi: 10.1038/s41588-018-0147-3
  3. Zimmer C (2018) Many Genes Play a Role in Educational Attainment, Enormous Genetic Study Finds. The New York Times. July 23.

Written by

Leave a Reply

Your email address will not be published.Required fields are marked *