How migration events have dramatically reshaped the genetic landscape of Africa
The largest genetic study of South Africans shows how three major migrations shaped variations between southern Africa’s major ethno-linguistic groups, which may hold the key to treating various inherited diseases.
First published in the Daily Maverick 168 weekly newspaper.
The largest genetic study ever undertaken of South Africans has challenged the presumption that all southeastern Bantu-speaking groups are a single genetic entity – and this has a huge implication for the study of diseases.
The southeastern Bantu language family includes isiZulu, isiXhosa, siSwati, Xitsonga, Tshivenda, Sepedi, Sesotho and Setswana. Despite linguistic differences, these groups of people are treated mostly as a single group in genetic studies.
Almost 80% of South Africans speak one of the southeastern Bantu languages as their first language. Their origins can be traced to farmers of west central Africa, whose descendants over the past 2,000 years spread south of the equator and into southern Africa.
Professor Michèle Ramsay, director of the Sydney Brenner Institute for Molecular Bio-science at the University of the Witwatersrand University (Wits) and the corresponding author of the study, said to investigate this, “the largest study with genome-wide genotyping in South African populations was undertaken with 5,000 participants. This is a very detailed analysis of genetic markers across the whole genome.”
The research, published in the journal Nature Communications, was carried out by a multidisciplinary team of geneticists, bioinformaticians, linguists, historians and archaeologists at Wits University, including Ramsay, Dhriti Sengupta, Ananyo Choudhury, Scott Hazelhurst, Shaun Aron and Gavin Whitelaw, along with experts at the University of Limpopo and partners in Belgium, Sweden and Switzerland.
The archaeological record and rock art evidence trace the presence of a San-like hunter-gatherer culture in southern Africa to at least 20,000 to 40,000 years ago.
“Three sets of migration events have dramatically reshaped the genetic landscape of this geographic region in the last two millennia. The first of these was a relatively small-scale migration of east African pastoralists, who introduced pastoralism to southern Africa about 2,000 years ago. This population was subsequently assimilated by local southern African San hunter-gatherer groups, forming a new population that was ancestral to the Khoekhoe herder populations.
“Today, southern African Khoe and San populations collectively refer to hunter-gatherer (San) and herder (Khoekhoe) communities. While Khoe-San groups are distributed over a large geographic area today (spanning the Northern Cape province of South Africa, large parts of Namibia, Botswana, and southern Angola), these groups are scattered, small and marginalised.
“The introduction of pastoralism in the region was closely followed by the arrival of the second set of migrants, that is the Bantu-speaking agro-pastoralists. The archaeological record suggests that ancestors of the current-day [Bantu-speaking] populations undertook different waves of migration instead of a single large-scale movement.
“The earliest communities spread along the east coast to reach the KwaZulu-Natal south coast by the mid-fifth century AD, while the final major episode of settlement is estimated to be around AD1350. These archaeologically distinct groups gradually spread across present-day South Africa, interacting to various degrees with the Khoe-San groups … giving rise to South Africa’s diverse [Bantu-speaking] communities.
“The third major movement into southern Africa was during the colonial era in the last four centuries when European colonists settled in the area. During this period slave trade introduced additional intercontinental gene flow giving rise to complex genomic admixture patterns in current-day southern African populations.”
Since these migrations took place, varying degrees of sedentism (the practice of living in one place for a long time), population movements and interaction with Khoe and San communities, as well as people speaking other southeastern Bantu languages, ultimately generated what are today distinct southern African languages such as isiZulu, isiXhosa and Sesotho.
Despite these linguistic differences, these groups are treated mostly as a single group in genetic studies. Understanding genetic diversity in a population is critical to the success of disease genetic studies. If two genetically distinct populations are treated as one, the methods normally used to find disease genes could be error-prone.
Most people on Earth are genetically more similar than different; however, small differences are important in respect to how experts understand complex diseases.
“Southeastern Bantu speakers have a clear linguistic division – they speak more than nine distinct languages – and their geography is clear: some of the groups are found more frequently in the north, some in central, and some in southern Africa. Yet despite these characteristics, the [southeastern Bantu language] groups have so far been treated as a single genetic entity,” said Choudhury.
These groups are too different from each other to be treated as a single genetic unit, the research has shown.
We wanted to see whether this population sub-structure could interfere in studies on diseases susceptibility. What we showed is that if you do a study in South Africa on people who self-identify as southeastern Bantu speakers, one cannot treat them as a homogeneous group.
“So, if you are treating, say, the Tsonga and the Xhosa as the same population – as was often done until now – you might get a completely wrong gene implicated for a disease,” said Sengupta. “There are not major differences, but small cumulative differences in populations that were geographically isolated for about 1,000 years and who encountered and mixed in different ways with other populations (for example the Khoe and San). Many of the differences may not have any phenotypic [observable physical traits] implications, but some may be related to markers that are associated with susceptibility to diseases,” said Ramsay.
“We wanted to see whether this population sub-structure could interfere in studies on diseases susceptibility. What we showed is that if you do a study in South Africa on people who self-identify as southeastern Bantu speakers, one cannot treat them as a homogeneous group.
“If you are doing a case-control study to find genetic markers for association with common diseases like diabetes, cancer or hypertension, and your study cases are predominantly from people of one ethnolinguistic group and your controls are from another, you may find associations that are due to ethnic differences and not association with the disease. So you could make the wrong assumptions about what caused susceptibility to a particular disease,” Ramsay added.
A common approach to identify if a genetic variant causes or predisposes a person to a disease is to take a set of individuals with a disease (such as high blood pressure or diabetes) and another set of healthy individuals without the disease, and compare the occurrence of genetic variants in the two sets. If a variant shows a notable frequency difference, it is assumed that the genetic variant could be associated with the disease.
“However, this approach depends entirely on the underlying assumption that the two groups consist of genetically similar individuals. One of the major highlights of our study is the observation that Bantu-speakers from two geographic regions – or two ethnolinguistic groups – cannot be treated as if they are the same when it comes to disease genetic studies,” said Choudhury.
The study detected major variations in genetic contribution from the Khoe and San into southeastern Bantu-speaking groups; some groups have received a lot of genetic influx from Khoe and San people, while others have had very little genetic exchange with these groups. This variation ranged on average from about 2% in Tsonga to more than 20% in Xhosa and Tswana.
“The study showed that there could be substantial errors in disease gene discovery and disease risk estimation if the differences between south-eastern Bantu-speaking groups are not taken into consideration,” said Sengupta.
The genetic data also show major differences in the history of these groups over the past 1,000 years. Genetic exchanges were found to have occurred at different points in time, suggesting a unique journey for each group over the past millennium.
These genetic differences are strong enough to impact the outcomes of biomedical genetic research.
Sengupta emphasised that ethnolinguistic identities are complex and cautioned against extrapolating broad conclusions from the findings: “Although genetic data showed differences between groups, there was also a substantial amount of overlap. While findings regarding differences could have huge value from a research perspective, they should not be generalised,” she said.
Ramsay said: “We would love to expand the Southern African Human Genome Programme we started in 2011 with funding from the Department of Science and Innovation. We had ambitions to sequence 10,000 South African genomes, but there was no funding for this. It is important to consider what we want to achieve from a scientific point of view and then to assess the sample size that would be needed to achieve our goals. These same samples and their associated phenotype data are also being used to do many other studies on genetic associations with cardiometabolic diseases.”
The effort is also part of the broader Human Heredity and Health in Africa (H3Africa) consortium, a collaboration between the African Society of Human Genetics, the National Institutes of Health in the US and the Wellcome Trust, to boost the study of genomics and the environmental determinants of diseases that are common among African populations.
Professor Ambroise Wonkam, director of Genetic Medicine of African Populations at the University of Cape Town’s Division of Human Genetics, has a vision to work through H3Africa to sequence the genomes of three million people from across the continent. Less than 2% of all human genomes analysed to date have been those of people of African ancestry.
“The reference genome sequences built from the Human Genome Project are missing many variants from African ancestral genomes. A 2019 study estimated that a genome representing the DNA of the African population would have about 10% more DNA than the current reference,” he writes in Nature. DM168
This story first appeared in our weekly Daily Maverick 168 newspaper which is available for free to Pick n Pay Smart Shoppers at these Pick n Pay stores.