Tracing the origins and migratory paths of East Asians, their founding fathers and their Y chromosome haplotypes

Balaresque, Patricia, et al., Y-chromosome descent clusters and male differential reproductive success: young lineage expansions dominate Asian pastoral nomadic populations European Journal of Human Genetics (2015) 23, 1413–1422; doi:10.1038/ejhg.2014.285; published online 14 January 2015

Abstract

High-frequency microsatellite haplotypes of the male-specific Y-chromosome can signal past episodes of high reproductive success of particular men and their patrilineal descendants. Previously, two examples of such successful Y-lineages have been described in Asia, both associated with Altaic-speaking pastoral nomadic societies, and putatively linked to dynasties descending, respectively, from Genghis Khan and Giocangga. Here we surveyed a total of 5321 Y-chromosomes from 127 Asian populations, including novel Y-SNP and microsatellite data on 461 Central Asian males, to ask whether additional lineage expansions could be identified. Based on the most frequent eight-microsatellite haplotypes, we objectively defined 11 descent clusters (DCs), each within a specific haplogroup, that represent likely past instances of high male reproductive success, including the two previously identified cases. Analysis of the geographical patterns and ages of these DCs and their associated cultural characteristics showed that the most successful lineages are found both among sedentary agriculturalists and pastoral nomads, and expanded between 2100 BCE and 1100 CE. However, those with recent origins in the historical period are almost exclusively found in Altaic-speaking pastoral nomadic populations, which may reflect a shift in political organisation in pastoralist economies and a greater ease of transmission of Y-chromosomes through time and space facilitated by the use of horses.

Introduction

Reproductive success, widely used as a measure of fitness, is described as the genetic contribution of an individual to future generations.1 Human behavioural ecologists use various fertility-specific measures to directly estimate reproductive success in extant populations (eg, Strassmann and Gillespie2); however, indirect estimates of past reproductive success can also be obtained through evolutionary population genetic approaches, which demonstrate that its cultural transmission can be extremely effective.3, 4

High variance of male reproductive success is detectable from genetic data because it leads to many Y-chromosomal lineages becoming extinct through drift and others expanding markedly. For any increase to lead to a high population frequency, two factors are needed: biological, with a need for men to be fertile, and cultural, with a continued transmission of reproductive success over generations. Indeed, previously, strong signals of successful transmission of Y-lineages have been associated with recent social selection, and were explained by inherited social status, with two cases in Asia and one in the British Isles. The best-known instance is the finding that ~0.5% of the world’s Y-chromosomes belongs to a single Asian patrilineage,5 descending from a common ancestor in historical times, and suggested to be due to the imperial dynasty founded by Genghis Khan (died 1227). The lineage is characterised by a high-frequency Y-microsatellite haplotype and a set of close mutational neighbours—a so-called ‘star-cluster’. In its structure and haplotype diversity, it resembles the recent descent clusters (DCs) observed within rare British surnames,6 despite its presence in populations spread over an enormous geographical range and speaking many different languages. Subsequently, two further examples of high-frequency DCs were described and were associated with the Qing dynasty descendants of Giocangga (died 1582) in Asia,7 and the Irish early medieval ‘Uí Néill’ dynasty in Europe.8 These three examples might suggest an association between high reproductive success (detectable at the population level) and both wealth and socio-political power, as is observed in some present-day populations (eg, the Tsimané of Bolivia9).

In this study, we ask whether the two signals of continued transmission of success over generations detected in mainland Asia are the only examples of likely recent social selection, or if similar patterns can be detected among the Y-chromosomes of this continent, known for the numerous expansive polities that emerged there during the Bronze Age and later (beginning 5000–4000 YBP). Pinpointing the historical figures associated with any such signals of Y-lineage transmission would require their identification via ancient DNA testing, and/or a comparison with certified living descendants and will not be attempted here. Instead, we aim to understand whether efficient transmission is linked to rules that apply in populations with specific histories of subsistence or culture, or can be detected in other groups with diverse cultural features. We also ask whether any expansions detected converge to the same time period or reflect distinct temporal episodes.

To address these questions, we generated novel Y-microsatellite and Y-SNP data and combined these with published data in order to carry out a systematic analysis of the most frequent haplotypes and their associated DCs. We chose a lineage-based, or interpopulation approach, rather than an intrapopulation approach,10 as we reasoned that dominant lineages are likely to migrate.11 We considered the geographical pattern of identified expansions, their ages and the cultural factors (language and mode of subsistence) characteristic of each population in which they were detected. In total, the 5321 chromosomes analysed belong to 127 populations distributed from the Middle East to Korea. Our analysis focuses upon the 15 most frequent haplotypes that include two haplotypes reflecting the previously recognised ‘Khan’ and ‘Giocangga’ expansions.5, 7 Our analysis shows that the most successful Y-chromosomes in Asia form part of expansions that began between 2100 BCE and 1100 CE, found both among sedentary agriculturalists and pastoral nomads. The ages of these expansions are considered in their cultural contexts.

Also excerpted from the above article.

Identification of unusually frequent haplotypes and associated DCs (Descent Clusters)

Table 1. Features of the fifteen primary descent clusters

Figures and tables index

DC n Simplified haplogroupa Highest resolution haplogroup Core microsatellite haplotype Link with other DC?
DYS19 DYS389I DYS389b DYS390 DYS391 DYS392 DYS393 DYS439
DC1 71 C C3 (xC3c) 16 13 16 25 10 11 13 10 No
DC2 47 R1a R1a1 16 14 18 25 11 11 13 10 No
DC3 43 K(xN1c1,P) L 14 12 16 22 10 14 11 13 No
DC4 41 K(xN1c1,P) O2b 16 14 15 23 10 13 13 12 No
DC5 32 J J 14 13 17 23 11 11 12 11 No
DC6 30 K(xN1c1,P) O3a3c 14 12 16 24 10 14 12 12 No
DC7 28 J J 14 13 17 23 10 11 12 11 DC5b
DC8 28 C C3c 16 13 16 24 9 11 13 11 No
DC9 28 J J 14 13 16 23 10 11 12 11 DC5b
DC10 24 K(xN1c1,P) N 14 14 16 23 10 14 14 10 No
DC11 24 Y(xC,D,E,J,K) H1a 15 13 16 22 10 11 12 12 No
DC12 24 K(xN1c1,P) O2a 15 13 16 24 10 13 14 11 No
DC13 23 J J1 14 13 16 23 11 11 12 11 DC5b
DC14 23 K(xN1c1,P) O2a 15 13 16 25 11 13 14 12 No
DC15 22 R1a R1a1 16 13 18 25 11 11 13 10 DC2 b

Abbreviation: DC, descent cluster.

a Simplified to allow combination of published data sets.

b These DCs were incorporated into the analysis of the linked DC.

Through haplotyping of Central Asian samples (Supplementary Table 2) and a literature survey, we collected 5321 eight-locus Y-chromosomal microsatellite haplotypes belonging to 127 Asian populations (Figure 1 and Supplementary Table 1).

We ranked haplotypes by frequency (Figure 2), reasoning that particularly frequent haplotypes should represent potential cores of clusters indicating past expansions of paternal lineages. The sample contained 2552 distinct haplotypes, 67% of which were unique, and 15 haplotypes (0.6%) were each present >20 times (and up to 71 times), indicating probable examples of successful transmissions (Figure 2). We focused on these 15 haplotypes and studied their spatial distributions and ages to illuminate the history of high reproductive success of their respective common ancestors.

[Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

See  Fig. 7. The TMRCA estimates are consistent with six expansions beginning in protohistorical times (DCs 2, 5, 6, 11, 12 and 14), and four beginning in historical times (DCs 1, 3, 8 and 10; Table 2). DCs detected in the different periods differ (Figure 7) in regard to mode of subsistence (χ2=345.59; df=2; P<0.0001) and language (χ2=211.67; df=4; P<0.0001). This suggests a shift in population dynamics between the two periods, with an increase of successful Y-lineage expansions in Altai-speaking pastoral nomadic populations in more recent times. The historical-period expansions show the highest growth rates (Table 2).]

Characteristics of DCs (Descent Clusters)

A total of 2000 (37.8%) males carry Y-chromosomes belonging to these DCs, and they are remarkably widespread, with only three of the 127 populations analysed lacking any DC chromosomes; DC2 is also detected in one of the two ancient DNA population samples in our database (Krasnoyarsk_Kurgan17). Overall DC proportions in each population vary greatly from 5 to 96.7% (Supplementary Table 3).

As DC4 is restricted to Korea,18 it was not considered for interpopulation analysis. For the remaining 10 DCs, we explored three characteristics for each: its geographical frequency distribution, the geographical distribution of its mean microsatellite variance to indicate its likely expansion source and its TMRCA to suggest the age of expansion. Frequency and variance are represented on maps (Figure 3 and Supplementary Figure 2), and are summarised in Table 2 and Figure 4, which also gives TMRCA estimates. Note that the dating method used, BATWING, provides estimates with large confidence intervals.

Fig 3 Geographical distribution of frequency and microsatellite variance for two DCs. The examples here are for DC1 and DC12 (see Supplementary Figure 2 for the remainder). Open circles indicate populations lacking the DC and filled circles indicate populations possessing it. Increasing frequency is indicated by darkening colour as indicated in the scale to the top right. The population with maximum microsatellite variance for the DC is indicated by a coloured circle and the direction of the expansion based on DC variance is indicated by arrows.

Fig 3 Geographical distribution of frequency and microsatellite variance for two DCs. The examples here are for DC1 and DC12 (see Supplementary Figure 2 for the remainder). Open circles indicate populations lacking the DC and filled circles indicate populations possessing it. Increasing frequency is indicated by darkening colour as indicated in the scale to the top right. The population with maximum microsatellite variance for the DC is indicated by a coloured circle and the direction of the expansion based on DC variance is indicated by arrows.

Fig 4 Schematic synthesis of DC features illustrated geographically. Black open circles indicate potential location origins and arrows indicate directionality. Mean TMRCA is given for each DC.

Fig 4 Schematic synthesis of DC features illustrated geographically. Black open circles indicate potential location origins and arrows indicate directionality. Mean TMRCA is given for each DC.

To test whether individuals included in these 10 expansions share any particular cultural features, such as language or subsistence methods, as is the case for previously published examples,5, 7 we performed a co-inertia test (Figure 6). This revealed the preferential association of DCs 12 and 14 with Sino-Tibetan and Austro-Asiatic languages and a ‘multiple’ mode of subsistence, of DC2s 2 and 5 with Indo-European languages and ‘agricultural’ subsistence, and of DCs 1, 8 and 10 with Altaic languages and pastoral nomadism (Figure 6). Genetic and cultural patterns (mode of subsistence and language) are significantly correlated (Rv=0.47, P-value <0.0001, N=95), suggesting that several expansions share the same cultural characteristics.

Characteristics of DCs

A total of 2000 (37.8%) males carry Y-chromosomes belonging to these DCs, and they are remarkably widespread, with only three of the 127 populations analysed lacking any DC chromosomes; DC2 is also detected in one of the two ancient DNA population samples in our database (Krasnoyarsk_Kurgan17). Overall DC proportions in each population vary greatly from 5 to 96.7% (Supplementary Table 3).

As DC4 is restricted to Korea,18 it was not considered for interpopulation analysis. For the remaining 10 DCs, we explored three characteristics for each: its geographical frequency distribution, the geographical distribution of its mean microsatellite variance to indicate its likely expansion source and its TMRCA to suggest the age of expansion. Frequency and variance are represented on maps (Figure 3 and Supplementary Figure 2), and are summarised in Table 2 andFigure 4, which also gives TMRCA estimates. Note that the dating method used, BATWING, provides estimates with large confidence intervals.

Table 1. Features of the fifteen primary descent clusters

Table 2. TMRCA estimates for the 11 DCs and summary of geographical distributions and likely origins

DC Core ht (n) nin DC Max steps from core hg TMRCA mean/y(95%CI) Growth rate per generation (alpha) Origin Max variance (pop) Perioda Cultural period
1b 71 142 3 C3 (xC3c) 951 (212–3826) 0.344 Silk Road (NE. China/Mongolia) 0.131 (Inner Mongolian) H: 1060 CE Khitan Empire-Liao Dynasty/Mongol Empire
2 47 271 6 R1a1 3284 (222–22922) 0.129 N.India/Tibet/Central Asia 0.321 (Uyghur) P: 1300 BCE Indus Valley Civilisation (mid-late Bronze Age)
3 43 95 4 L 929 (129–5219) 0.386 Fertile Crescent 0.154 (Iran) H: 1100 CE Islamic Golden Age; Pre Islam/Post Islam
4 41 109 3 O2b 1721 (184–10054) 0.230 Korea 0.179 (Korean) H: 300 CE Proto-Three Kingdoms / Several States Period
5 32 505 6 J 2730 (576–10373) 0.111 Fertile Crescent 0.315 (Anatolia NE) P: 700 BCE Neo-Assyrian, Neo-Babylonian Empire (late Bronze Age)
6 30 211 4 O3a3c 2887 (208–20202) 0.152 Tibet 0.279 (Tibet) P: 900 BCE Vedic Period – Iron Age
8c 28 59 3 C3/C3c 1325 (57–14756) 0.373 Silk Road (NE China/Mongolia) 0.104 (Oroqen) H: 700 CE Rouran / Uyghur Khaganate/ Khitan Empire-Liao Dynasty
10 24 26 3 N 1158 (116–8353) 0.323 Silk Road (NE China/Mongolia) 0.058 (Hezhe) H: 850 CE Rouran / Uyghur Khaganate/ Khitan Empire-Liao Dynasty
11 24 95 4 H1a 2296 (73–24529) 0.255 India/Central Asia 0.111 (Muria) P: 300 BCE Maurya Empire
12 24 152 6 O2a 3540 (369–19967) 0.117 Laos 0.243 (So) P: 1500 BCE Early-mid Bronze Age
14 23 335 6 O2a 4125 (326–29437) 0.09 Laos 0.324 (Alak) P: 2100 BCE Early-mid Bronze Age

a H, historical; P, protohistorical; n, number of chromosomes.

b ‘Khan’ haplotype cluster.

c ‘Giocangga’ haplotype cluster.

Figures and tables index

Associating protohistorical and historical expansions with subsistence mode and language

Six of the 10 DCs analysed originated during the Bronze Age or earlier (2100–300 BCE), and correspond to 30% of the individuals in our data set. The ‘origins’ of these expansions lie within three main regions: South East Asia, including Laos (DCs 12 and 14), Tibet/East India (DCs 6 and 11) and Central Asia/Fertile Crescent (DC2 and DC5). The co-inertia analysis shows that these expansions are found mainly among agricultural populations, speaking Indo-European and Austro-Asiatic languages. DCs 5, 12 and 14 have their putative sources centred both on the Fertile Crescent and South East Asia—places known as centres of agricultural innovation;38 these DCs could be, then, by-products of the civilisations that underwent a shift to agriculture during the Bronze Age (Figure 4). The origin of DC2 is located in Central Asia and its TMRCA estimate is 1300 BCE; this is coherent with the ancient DNA Y-chromosomes from the Middle Bronze Age (Andronovo) detected in DC2.17

Four expansions date to the historical period (700–1100 CE). The most recent (DC3; 1100 CE), originating in the Near East and extending to the South East Indian coast, is mainly detected in agricultural populations. It could be linked to the rapid expansion of Muslim power from the Middle East, across Central Asia and to the borders of China and India, after the establishment of a unified polity in the Arabian Peninsula by Muhammad in the 7th century and under the subsequent Caliphates.

The three other expansions (DCs 1, 8 and 10; 700–1060 CE) encompass 79% of individuals involved in historical expansions and are almost exclusively detected in Altaic-speaking pastoral nomadic populations. The territory is large and follows the Silk Road corridor. The signal of expansion spreads from East to West (from Mongolia to the Caspian Sea), as DC1 has its source in Inner Mongolia (hgC3[xC3c]), DC8 in the Oroqen (hgC3c) and DC10 in the Hezhe (hgN1). Given their different sources and associated haplogroups, these DCs are likely to represent three distinct expansions. These expansions show high variance values not only at their sources but also in other locations (Supplementary Table 3); this could be explained by a rapid migration of founders and parallel transmissions to generate diversity in these different places, all descending from a unique ancestor.

The two previously recognised Asian star-clusters (‘Khan’ and ‘Giocangga’57) are identified here, and correspond to DCs 1 and 8, respectively. Their TMRCAs are estimated at 1090 CE for DC1, almost identical to the published estimate of ~1000 YA,5 and at 700 CE for DC8, older than the published estimate of 590±340 YA.7 The ‘Khan’ DC remains the most striking signal of an Asian expansion lineage, representing 2.7% of the entire data set, and the highest number of identical Y-chromosomes (N=71). Interestingly, the westward directions of expansions DC8 and DC10, their potential sources in northeast China, their geographic extents from China to Karakalpakia, and also the Altai-speaking populations associated with them, could also indicate involvement of the Imperial or elite lineages associated with the Khitan Empire. Abaoji, Emperor Taizu of Liao and the Great Khan of the Khitans (died 926 CE), who officially designated his eldest son as his successor, maintained a pattern of seasonal movements typical of pastoral societies, which could also explain the geographic distribution of these lineages.

Discussion

Highly represented Y-lineages as a proxy for high reproductive success

Reproductive success, widely used as a measure of fitness, is the genetic contribution of one individual to future generations; it implies that offspring are plentiful and that they survive. In humans, offspring number and survival can be strongly influenced by culture.4 Examples of long-term high reproductive success have been revealed by unusually frequent Asian Y-chromosome haplotypes suggested to descend from two individuals, Genghis Khan5 and Giocangga,7 and to have spread in past generations via social selection (because of associated power and prestige) during historical times. In this study, we used a similar approach, analysing 5321 Y-microsatellite haplotypes belonging to 127 Asian populations to ask whether other examples of continued Y-lineage transmission are observed.

A favourable post-Bronze Age socio-political context for successful Y-lineage transmissions

We have detected 11 highly represented Y-chromosome lineages among the 5321 males analysed; altogether the DCs derived from 11 founding chromosomes represent a large proportion (38%) of the Y-chromosomes analysed, attesting to the importance of efficient Y-lineage transmission in the history of the continent. The presence of DCs with TMRCAs from 2100 BCE to 1100 CE suggests that the socio-cultural context, from the Bronze Age up to more recent historical periods, has been sufficiently favourable to ensure a continuous transmission of these different Y-lineages. Indeed, several admixture events have deeply affected the Asian gene pool during this period, involving migrant groups coming from North East and South East Asia.39 It has been argued40 that the development and impact of complex polities beginning with the Bronze Age ~2500–100 BCE in Central Asia were responsible for drastic changes in population structure, movements and organisation. These polities and early states emerged from local traditions of mobility and multiresource pastoralism, including agriculture and distributed hierarchy and administration,4142 with a royal or imperial form of leadership combining elements of kinship and political office.42 Archaeological evidence from the late Bronze and early Iron Ages (1400–400 BCE) confirms the existence of control hierarchies associated with wealth differences and socio-political power, even before the emergence of political entities.42 The development of these polities was accompanied by the emergence of stratified societies, recognised elite lineages and differences in power and status between individuals.

High reproductive success is often associated with high social status, ‘prestigious’ men having higher intramarital fertility, lower offspring mortality9 and access to a greater than average number of wives.43 This suggests that the link between status and fitness detected in modern populations (eg, Nettle and Pollet,44Snyder et al45 and Heyer et al46) also existed in ancient populations. For the TMRCA period from 2100 BCE to 1100 CE, the DCs are equally distributed between sedentary agriculturalist (36.8%) and pastoral nomadic populations (36.7%), indicating that uninterrupted transmission of Y-lineages over many generations could occur in both groups. Cultural transmission of fertility, reproductive success and prestige, seen in contemporaneous populations,47 could also explain the persistence of these frequent Y-lineages over time. The high variance of DC proportions among populations indicates that the ‘memes’ responsible for cultural transmission can vary greatly among populations, even when populations share similar cultural characteristics.

A shift from sedentary agriculturalist to pastoral nomadic populations for successful Y-lineages

Our study highlights a difference in population composition between protohistorical and historical periods. In protohistorical times, DCs are detected mainly in agriculturalist but also in multiresource and pastoralist populations, whereas in historical times they are predominantly seen in pastoralist populations. This shift towards pastoralists only suggests a modification of social organisation in Asia and a higher probability to generate reproductively successful lineages in pastoral-derived societies and states. The development of complex polities began with the Bronze Age in Central Asia ~2500–100 BCE.40Agriculture arrived ~6000 BCE (Djeitun, Turkmenistan4849) and dispersed during the Bronze Age (3000–2000 BCE) in coexistence with hunter–gatherer communities. The pastoral nomadic lifestyle, linked to horse domestication,50emerged in Central Asia ~5000 BCE and gained importance during the late Bronze Age. The pastoral nomadic tribes eventually came to dominate the Ponto–Caspian steppes during the first millennium BCE and entered historical records as Scythians (Herodotus) or Sakas (Persian accounts).51

The over-representation of historical-period DCs in pastoral nomadic Altaic-speaking populations is compatible with the development of new forms of political and social organisation in pastoralist economies. Among present-day pastoral nomads, patrilineal descent rules lead to a cultural transmission of reproductive success (Heyer et al, submitted) in male lineages.

New social systems and economic adaptations emerged after horse domestication. Horse-riding greatly enhanced both east–west connections and north–south trade between Siberia and southerly regions, and allowed new techniques of warfare, a key element explaining the successes of mobile pastoralists in their conflicts with more sedentary societies.41 A series of expansive polities emerged in Inner Asia by 200 BCE: 15 steppe polities beginning with Xiongnu (Khunnu) ~200 BCE and concluding with the Zunghars in the mid-18th century have been described, including the Mongol and the Qing empires/dynasties.4252 The DCs detected along the Silk Road corridor originated from 700 to 1060 CE and may be associated with dynasties that coexisted with five major empires that dominated the Ponto–Caspian steppes from the 7th to the 13th centuries: the Khitan (Great Liao), Tangut Xia, Jurchin, Kara-Khitan and the Mongol Empires.42 In order to identify the prestigious DC founders among potential candidates, the expansion strategies of these different dynasties, and also the mechanisms of co-inheritance of Y-lineages and prestige via marriage rules should be carefully investigated. As an example, sororal polygyny (in which a man can marry two or more sisters) followed by a shift to the Han Chinese system of taking one wife and one or more concubines, as occurred among the Liao elite throughout the length of the Liao dynasty, are likely to promote the efficient transmission of prestigious lineages among males.

Further investigations including ancient DNA approaches, and next-generation sequencing of modern expansion lineage Y-chromosomes could be undertaken to refine the history of the expansion lineages we have observed, and perhaps to identify the prestigious and powerful pastoralist founders associated with them.

***End of article***

Read also:

Genghis Khan Not the Only Genes in Town – Genetic Founding Fathers of Asia were Mystery Men Ancient origins, 29 JANUARY, 2015 – 22:12 Liz Leafloor