Ancient corridors via Tharus (Nepal) for migrations from East Asia

Out of India through Tharus to Southeast Asia and … to Japan

Fornarino, Simon et al., Mitochondrial and Y-chromosome diversity of the Tharus (Nepal): a reservoir of genetic variation BMC Evolutionary Biology 2009, 9:154 doi:10.1186/1471-2148-9-154:

Background: Central Asia and the Indian subcontinent represent an area considered as a source and a reservoir for human genetic diversity, with many markers taking root here, most of which are the ancestral state of eastern and western haplogroups, while others are local. Between these two regions, Terai (Nepal) is a pivotal passageway allowing, in different times, multiple population interactions, although because of its highly malarial environment, it was scarcely inhabited until a few decades ago, when malaria was eradicated. One of the oldest and the largest indigenous people of Terai is represented by the malaria resistant Tharus, whose gene pool could still retain traces of ancient complex interactions. Until now, however, investigations on their genetic structure have been scarce mainly identifying East Asian signatures.

Results
High-resolution analyses of mitochondrial-DNA (including 34 complete sequences) and Y-chromosome (67 SNPs and 12 STRs) variations carried out in 173 Tharus (two groups from Central and one from Eastern Terai), and 104 Indians (Hindus from Terai and New Delhi and tribals from Andhra Pradesh) allowed the identification of three principal components: East Asian, West Eurasian and Indian, the last including both local and inter-regional sub-components, at least for the Y chromosome.

Conclusion
Although remarkable quantitative and qualitative differences appear among the various population groups and also between sexes within the same group, many mitochondrial-DNA and Y-chromosome lineages are shared or derived from ancient Indian haplogroups, thus revealing a deep shared ancestry between Tharus and Indians. Interestingly, the local Y-chromosome Indian component observed in the Andhra-Pradesh tribals is present in all Tharu groups, whereas the inter-regional component strongly prevails in the two Hindu samples and other Nepalese populations.

The complete sequencing of mtDNAs from unresolved haplogroups also provided informative markers that greatly improved the mtDNA phylogeny and allowed the identification of ancient relationships between Tharus and Malaysia, the Andaman Islands and Japan as well as between India and North and East Africa…

“Of particular interest is the link emerging between Tharus and tribals from Andhra Pradesh, as well illustrated by the Y-chromosome PCA plots (Figure 8) and by the high prevalence in these two populations of the local Y-chromosome haplogroup component (Figure 9), in comparison to the Hindus and to the other populations of Nepal [37] where the inter-regional component is clearly predominant. This further supports a deep common ancestry between Tharus and Indians, probably due to the legacy of the first settlers who arrived from the Indian coasts during the out-of-Africa dispersal.

The links between the Central Tharus and the Andaman Islanders through Northeast India (Hg M31), between the Eastern Tharus and Japan (Hg R30) and between Central Tharus and Malaysia (Hg M21), are ancient.”

“The East Asian component made up by haplogroups C(xC5), D, N, O3, Q, and K*, and mainly represented by Hg O3, is, on the whole, much more frequent among Tharus (39.8%) than among Indians (7.7%). The high Tharu frequency, mostly accounted for by the subgroup O3-M117 (83.8%), shows a wide range in the three groups with significant differences between Th-CI vs both Th-CII (P < 0.02) and Th-E (P = 0.001). Among the less represented East Asian markers of interest is Hg D that is very frequent in Tibet, absent in other Nepalese populations [37] but present in six Central Tharus: as D1-M15 in two Th-CI subjects and as D*-M174 in four Th-CII subjects. The latter, by showing the DYS392 -7 repeat allele that characterizes the D3-P47 chromosomes [37], could belong to the recently identified Hg D3* [73]. In addition, two other haplogroups were encountered: K-M9* in a single Eastern Tharus and Q1-P36 in two Tharus-CII. Hg Q, which is present in Tibetans, was seen in only one sample from Kathmandu [37]. In Indians, the very scarce East Asian component was represented by three Hg O3 (each belonging to a different sub-haplogroup and to a different Indian sample), one C3-M217 in Terai (previously observed only in a few Kathmandu and Tibetan samples [37]), two N1-LLY22g*, one in Terai and one in New Delhi and by three Q1-P36 in New Delhi. Only three East Asian haplogroups, Q1-P36, O3-M134* and O3-M117, are shared between Tharus and Indians.

The Indian subcontinent component includes lineages of haplogroups C, F, H, L, O, R and among Indians it ranges from 80% in the New Delhi sample to 85% in Terai, and to 90% in the Andhra Pradesh. Among Tharus, with the exception of an incidence of ~32% in the Th-CI group, it reaches values around 50% in the other two groups….”

“Of particular interest is the detection of haplogroups M21 and M31 (two subjects each) among the central Tharus. The Tharu M21 sequence (Figure 5) shares nine mutations with one of the three M21 lineages found in all Orang Asli groups of Malaysia [24] and in other groups from Southeast Asia [44], belonging to the sub-group M21b. The Tharu M31 sequence, together with one Megalaya mtDNA [31], clusters with one West Bengal Rajbhansi [21,27] and defines a sub-group of M31b. This subclade, together with M31a2 of the tribal Lodha, Lambadi and Chenchu populations, represents the Indian counterparts of the M31a1 Andaman lineages [27], further supporting a common ancestry of the Indian subcontinent and people of the Bengal Bay islands.

As for the R haplogroups, R7 and R30 are of particular interest. Very informative for the structure and for the age evaluation of haplogroup R7 is the Andhra Pradesh sequence #56 (Figure 5) that defines an extremely deep branch of the R7 in India. This branch shares with the root of the phylogeny of Chaubey et al.[54] only the mutations 13105, 16319 and, in addition, it does not display the 16260 and 16261 mutations characterizing the R7a and R7b branches observed in different R samples from Indian groups [11,52,54-57] and, interestingly, in one R7 Tutsi from Rwanda (unpublished data). Two Tharu mtDNAs, one from Chitwan and one from Eastern Terai, belong to the R30 haplogroup. The first is closely related to two Indian sequences, one from Andhra Pradesh and the other from Uttar Pradesh, and contributes to define a sub-clade of the R30a [54]. The second joins a Punjab sequence [54] with a Japanese deep lineage [22] indicating an ancient link between India and Japan. A more recent connection with Japan is, in turn, revealed by the F1d haplogroup showing a tight linkage between an Eastern Tharu sequence and two Japanese mtDNAs.

Ancient Hg O2-M95

“The T deletion further characterizes the HgO2-M95 clade that is considered a genetic footprint of the earliest Palaeolithic Austro-Asiatic settlers in the Indian subcontinent [14,71,74], and also as an autochthonous Indian Austro-Asiatic population marker…[Tdel, was first noticed in haplogroup O2-P31 while typing the P31 marker and was confirmed by sequencing. This is due to a T deletion in the 6T stretch starting at np 127, adjacent to the P31 T to C transition [63]. The T deletion, not found in the other examined Hg O derivatives, is always present in our O2 samples (all tribals; four of the Eastern Tharus and one from Andhra Pradesh). Taking into account that this haplogroup is often recognized through markers different from P31 and that in other studies, where the P31 was examined [64,65], a technique not detecting Tdel was employed, additional DHPLC/sequencing analyses of P31 chromosomes are necessary to evaluate the extent of the contemporary presence of the two mutations. It is worth noting that these samples were also all positive for the PK4 marker recently observed in four Pakistani Pathans...] The remaining endogenous haplogroups include haplogroup C5-M356, shared between Indians and Tharus (two in the Terai Hindus and one in the Tharus-CII), haplogroup F-M89* and its new derivative F5-M481, both considered as tribal markers and observed in Andhra Pradesh (10.3%).”

The Nepalese populations in Tharus “examined by Gayden et al. [37], apart from the homogeneous Tamang sample that displays almost exclusively the East Asian haplogroup O3-M134, the Newar and Kathmandu groups, like Tharus, show an important Indian component. However, whereas in the first two, the inter-regional haplogroups are most represented, in the Tharus the local ones are prevalent (Figure 9). Both quantitative and qualitative differences emerge from the East Asian component: on the whole it is most frequent and heterogeneous among Tharus, especially in the Chitwan groups which, in addition to the frequent Hg-O3-M117, show the Hgs D and Q, reflecting a Tibetan influence.

The analyses carried out on the mtDNA and Y chromosome of the Tharus, one of the oldest and the largest indigenous people of Terai, have shown a complex genetic structure within which are identifiable: i) a deep common ancestry between Tharus and Indians, not previously reported, more evident for mtDNA but also revealed by the prevalence of the local Indian Y-chromosome subcomponent, as in the tribals of Andhra Pradesh; ii) a significant East Asian genetic contribution both in the male and female gene pool; iii) a western heritage, clearly evident for the Y-chromosome; iv) a remarkable heterogeneity of the Tharu population (with the Eastern Tharus more dissimilar to the others) ascribable both to various exogenous influences and to subgroup specific lineages stemming from a shared genetic background with Indians.

Particularly informative has been the complete mtDNA sequencing that further supports a deep differentiation of mtDNA haplogroups in the Indian subcontinent, indicating that some branches are geographically or socially specific, while others are widespread. The improvement in the mtDNA phylogeny has also allowed the identification of ancient relationships between Tharus, not only with the Indian subcontinent area, including Pakistan, but also with the Andaman Islands, Malaysia, and Japan, as well as between India and North and East Africa

…

Hua-Wei Wang et al., Revisiting the role of the Himalayas in peopling Nepal: insights from mitochondrial genomes, Journal of Human Genetics (2012) 57, 228–234; doi:10.1038/jhg.2012.8; published online 22 March 2012

Abstract

Himalayas was believed to be a formidably geographical barrier between South and East Asia. The observed high frequency of the East Eurasian paternal lineages in Nepal led some researchers to suggest that these lineages were introduced into Nepal from Tibet directly; however, it is also possible that the East Eurasian genetic components might trace their origins to northeast India where abundant East Eurasian maternal lineages have been detected. To trace the origin of the Nepalese maternal genetic
components, especially those of East Eurasian ancestry, and then to better understand the role of the Himalayas in peopling Nepal, we have studied the matenal genetic composition extensively, especially the East Eurasian lineages, in Nepalese and its surrounding populations. Our results revealed the closer affinity between the Nepalese and the Tibetans, specifically, the Nepalese lineages of the East Eurasian ancestry generally are phylogenetically closer with the ones from Tibet, albeit a few mitochondrial DNA haplotypes, likely resulted from recent gene flow, were shared between the Nepalese and northeast Indians.
It seems that Tibet was most likely to be the homeland for most of the East Eurasian in the Nepalese. Taking into account the previous observation on Y chromosome, now it is convincing that bearer of the East Eurasian genetic components had entered Nepal across the Himalayas around 6 kilo years ago (kya), a scenario in good agreement with the previous results from linguistics and archaeology.

…

Chuan-Chao Wang et al., Genetic Structure of Qiangic Populations Residing in the Western Sichuan Corridor

Abstract

The Qiangic languages in western Sichuan (WSC) are believed to be the oldest branch of the Sino-Tibetan linguistic family, and therefore, all Sino-Tibetan populations might have originated in WSC. However, very few genetic investigations have been done on Qiangic populations and no genetic evidences for the origin of Sino-Tibetan populations have been provided. By using the informative Y chromosome and mitochondrial DNA (mtDNA) markers, we analyzed the genetic structure of Qiangic populations. Our results revealed a predominantly Northern Asian-specific component in Qiangic populations, especially in maternal lineages. The Qiangic populations are an admixture of the northward migrations of East Asian initial settlers with Y chromosome haplogroup D (D1-M15 and the later originated D3a-P47) in the late Paleolithic age, and the southward Di-Qiang people with dominant haplogroup O3a2c1*-M134 and O3a2c1a-M117 in the Neolithic Age

According to the nomenclature of Y Chromosome Consortium (YCC) [16], [20], 23 SNP haplogroups were determined from the 127 male individual samples (Figure 2a, Table S1, andTable S2). Haplogroup D1-M15 and its subhaplogroups, which are widely distributed across East Asia including most of the Tibeto-Burman, Tai-Kadai and Hmong-Mien speaking populations [4], [14], [75] (Figure S1 in Doc S1), are also prevalent in the four studied populations (44.44% and 12.50% in Horpa-Danba and Horpa-Daofu, respectively; 8.70% in Tibetan-Xinlong and 6.38% in Tibetan-Yajiang). Haplogroup D3a-P47 is almost exclusively distributed in Tibeto-Burman populations [4], [14], [75] (Figure S1 in Doc S1) and also found highly frequent in Horpa-Daofu, Tibetan-Xinlong and Tibetan-Yajiang, but absent in Horpa-Danba. Haplogroup O1a1-P203, which occurs at high frequencies in Tai-Kadai speaking people along the southeast coast of China and Taiwan aborigines [16], [75], is also observed at a high frequency in Yajiang (21.28%) and moderate frequencies in Daofu and Xinlong (6.25% and 8.70%, respectively), but absent in Danba. The major lineages in the Indo-China Peninsula, O2a1-M95 and its subhaplogroups, are also found at moderate or relatively low levels in the four studied populations. Haplogroup O3-M122 is the most common haplogroup in China and prevalent throughout East and Southeast Asia, comprising roughly 25–37% of the studied Qiangic populations. O3a1c-002611, O3a2c1-M134, and O3a2c1a-M117 are three main subclades of O3, each accounting for 12–17% of the Han Chinese [16], [75]. However, their frequencies vary a lot in Qiangic populations. O3a1c-002611 comprises 15.22% of Xinlong Tibetans, but absent in three other populations. O3a2c1*-M134 accounts for about 6% of the Horpa-Danba and Tibetans of Xinlong and Yajiang, but absent in Horpa-Daofu. Haplogroup O3a2c1a-M117, which exhibits high frequencies in other Tibeto-Burman populations, is also observed at high frequencies in Horpa-Danba and Tibetan-Yajiang (22.22% and 19.15%, respectively), and moderate frequencies in Horpa-Daofu and Tibetan-Xinlong (12.50% and 10.87%, respectively). Haplogroup C-M130 has a very wide distribution and might represent one of the earliest settlements in East Asia. Haplogroup C* (M130+, M105−, M38−, M217−, M347−, and M356−) has been found at low frequencies along the southern coast of mainland East Asia as well as throughout the islands of Southeast Asia [75], [76]. In spite of the wide distribution of C*, they all have similar STR haplotypes (DYS19, 15; DYS389I, 12; DYS389b, 16; DYS390, 21; DYS391, 10; DYS392, 11; DYS393). There are two C* individuals detected in this study, one in Horpa-Danba and the other in Tibetan-Xinlong. Those two individuals also have the same STR haplotype as mentioned above. Haplogroup C3-M217 is the most widespread subclade of C-M130, and reaches the highest frequencies among the populations of Northern East Aisa, especially in Mongolians [75]–[77]. Haplogroup C3-M217 has also been found in Tibetan-Yajiang at a frequency of 10.64%, but totally absent in other three populations. Haplogroup N-M231 has both a unique and widespread distribution throughout northern Eurasia and reaches highest frequency among most of the Uralic populations as well as some Altaic populations. Haplogroup N1c1a-M178 is the most common subclade of N-M231 and thought to be originated in China [75], [78]. N1c1a-M178 has also been detected in Horpa-Daofu and Tibetan-Xinlong at 12.50% and 2.17%, respectively. The 17-STR haplotype of N1c1a individuals in Horpa-Daofu is exactly the same with some Komi people in Russia [79], [80]. However, the haplotype of N1c1a individual in Xinlong shows more similarity with samples of its surrounding populations (unpublished data). It is particularly noteworthy that Central-South Asia related haplogroups J-M304 and R2-M124 [81] have also been detected at low frequencies in Qiangic populations.

PCA and STR genetic distance analysis

The paternal genetic relationships among Qiangic, Tibeto-Burman, and other East Asian populations were discerned with the aid of additional published Y chromosome datasets. We used a PCA based on the distribution of Y chromosome haplogroup frequencies of 51 populations to show the overall clustering pattern (Figure 3a, Table S3). Results of PCA are presented by the plots of the first two principal components (PCs), which together account for 31.31% of the Y chromosome variation in these populations. The first PC revealed a clear north-south geographic division between Altaic and Sino-Tibetan, Tai-Kadai & Hmong-Mien. Haplogroup C3-M217, G-M201, J-P209, and R-M207 were found to contribute most to the northern pole of Altaic. Haplogroup O-M175 contributed most to the southern pole. Sino-Tibetan, Tai-Kadai and Hmong-Mien populations showed different distributions of the second PC. Horpa-Danba, Horpa-Daofu, Tibetan-Xinlong, and Tibetan-Yajiang were clustered within Sino-Tibetan group, which reflected a clear linguistic clustering pattern. Haplogroups O3a1c-002611, O3a2c1*-M134, and O3a2c1a-M117 contributed most to the Sino-Tibetan pole. Contrastingly, haplogroups O3a2b*-M7 and O2a1-M95 were concentrated at the Tai-Kadai and Hmong-Mien pole. The four western Sichuan populations clustered tightly together with other Tibeto-Burman populations, such as Qiang, Tibetan-Yunnan, Yi, and Tujia, mostly due to high frequencies of haplogroup D3a-P47, O3a2c1a-M117, D1-M15, and O3a2c1*-M134. In the STR genetic distance based neighbor-joining tree, Horpa-Daofu, Tibetan-Yajiang, and Tibetan-Xinlong also clustered tightly with Tibeto-Burman populations. However, Horpa-Danba was close related to Han and Hmong-Mien populations (Figure S2 in Doc S1). As PCA was performed from frequencies of haplogroups and genetic distance was obtained from only 6 STR markers (Table S4), the results are suggestive but not conclusive.

Network analysis and time estimation

To discern the detail relationship between the D3a-P47, O3a2c1a-M117, D1-M15, and O3a2c1*-M134 haplogroups in Tibeto-Burman and other related populations, a median-joining network was constructed based on Y-STR haplotypes of those haplogroups (Figure 4). A clear Sino-Tibetan vs. Tai-Kadai and Hmong-Mien divergence can be inferred from the network of D1-M15 though sporadic haplotype sharing exists. Furthermore, within the Sino-Tibetan populations, haplogroup D1-M15 contains distinct STR haplotypes between Qiangic populations, Northern Han, and Tibetan-Tibet, implying that D1-M15 experienced a serial of founder effects or strong bottlenecks and a secondary expansion in Sino-Tibetan populations. In the network of D3a-P47, the divergence between Qiang and Tibetan with other Tibeto-Burman populations has been observed. Other Tibeto-Burman populations only have a subset of the Qiang and Tibetan haplotypes. The star-like network of D3a-P47 also suggests population expansion in Tibetans. The network of O3a2c1*-M134 shows a clear divergence between Tibetan and northern populations (Northern Han and Altaic). Southern Han and Tai-Kadai samples constitute the center of the network and act as a bridge connected Tibetan and northern populations, which supports the southern origin and northern expansion of O3a2c1*-M134. Most of the Qiangic samples belonging to haplogroup O3a2c1*-M134 share haplotypes with northern populations, indicating a recent gene flow from northern populations to Qiangic populations. A population expansion has also been observed in the star-like network of haplogroup O3a2c1a-M117. o However, the haplotypes of O3a2c1a-M117 are extensively shared among all the East Asia populations.

To get more insights into the origin of the East Eurasian maternal components observed in the Nepalese and therefore test the two competing scenarios about how these components had been introduced into Nepal, we focused on the phylogenetic affinity between from the Tibetan, northeast and northwest Indian populations. Fig 3 illustrates the principle component analysis plot of the 43 populations under study, which was constructed based merely on the East Eurasia lineages. Among the five Nepalese populations under study, three clustered with the Tibetans (Fig 3). After we considered all the Nepalese regional populations as a while and calculated its Fst value with the populations from its neighboring regions, the smallest genetic distance was observed between the Nepalese and the Tibetans from the nodes occupied almost exclusively by the Tibetan lineages and only a few haplotypes are shared sporadically between the Nepalese and the northern Indians. Taken together, the Nepalese lineages of East Eurasian ancestry generally show closer affinity with the ones from Tibet, albeit a few mtDNA haplotypes, likely resulted from recent gene flow, were shared between the Nepalese and northern (including northeast) Indians (Figures 4 and 5).

Even though we focused on the East Eurasian lineages identified in the Nepalese poplations we did observe a number of Nepalese-specific haplotypes, strongy suggesting their rather ancient origin and most plausibly de novo differentiation in Nepal. To get some hint at the arrival time of the lineages, we have focused on two clades from contain from haplogroups G2a and M9a1a2 simply because both clades contain the Nepalese haplotypes at their terminal branch or basal node and likely have differentiated in Nepal; estimating their ages would then help to date the arrival time of the migration from Tibet. In fact, time estimation results revealed that haplogroups G2a2 and M9a1a2a have very similar ages of B5.7 kya, and this age becomes a little older (B6 kya) when calibration rate proposed by Forster et al. 44 was used. To this end, the very similar ages of both haplogroups, which likely had in situ differentiated in Nepal, strongly suggest that the bearers of these East Eurasian maternal components would have arrived at Nepal no later than 5.7 kya (Table 4). In retrospect, previous work has suggested that the maternal genetic components from the northern East Eurasian was introduced into Tibet around 8.2 kya,1 and our time estimation results fit this dating frame very well. It is then conceivable that the settlement of Nepal by the bearer of the East Eurasian genetic components occurred likely before 5.7 kya, a result in good agreement with the archeological findings reporting shared the Neolithic features between Nepal and Tibet (references therein).49

Previous studies have observed substantial East Eurasian genetic components in the Nepalese populations;4,5 however, it remains controversial whether the East Eurasian lineages have been introduced into Nepal from Tibet directly (across the Himalayas)4,6 or via northeast India.5,8,50 By extensively analyzing the mtDNA variation in Nepal, Tibet, northern India populations, our observations, based on the principle component analysis, Fst and admixture estimation, revealed the closer genetic affinity between the Nepalese and the Tibetans, and this result was further substantiated by the median networks, (Figures 4 and 5) in which most of the Nepalese mtDNAs
prevalent among northern Asian populations shared the haplotypes
with the Tibetans at root level or branched off directly from the nodes
consisting almost exclusively of the Tibetan lineages. Our results
strongly suggest that most of the East Eurasian maternal components identified in the Nepalese were introduced directly from Tibet,4,6 and the time estimation results further date that this peopling scenario plausibly occurred about 6 kya. Indeed, this inference seems to be in striking accordance with the historically recorded passes (such as the Kodari and Rasuwa Passes), which bridged the Nepalese and the Tibetans since the ancient time.3 However, the observed gene flow from northeast India suggests genetic contribution, albeit limited, from this region, a scenario echoing the proposed inland dispersal route.50 In this spirit, our findings complete the understanding of the origin of the Nepalese and the way how the East Eurasian genetic components had been introduced into Nepal. Taking into account the previous observation on Y chromosome, now it is convincing that bearer of the East Eurasian genetic components had entered Nepal across the Himalayas around 6 kilo years ago (kya), a scenario in good agreement with the previous results from linguistics and archeology.

We then estimated the coalescence and expansion time of Y chromosome lineages in Qiangic populations (Table 1). The ages estimated using evolutionary rate are about two or three times higher than using genealogical rates. As the times using genealogical rates fit well with sequence-based estimates in Y chromosome lineage dating [82], we present results from the genealogical calculations in the following section. Haplogroup D can trace back to late Palaeolithic period, while other subhaplogroups coalescence more likely in Neolithic Time. The lineage expansion times all fall into Neolithic Time ranging from 4.2 to 7.5 kya.

397 samples were successfully assigned to mtDNA haplogroups using a combination of HVS-I sequence motifs and single nucleotide polymorphisms (SNPs) distributed around the coding region of the mtDNA genome. A total of 79 haplogroups or paragroups (unclassified lineages within a clade marked with an asterisk [*]) were identified (Figure 2b, Table S1 and Table S2), all within the two principal out-Africa macrohaplogroups: M and N (including R). Macrohaplogroup M and its subhaplogroups comprise 59.70% of the Qiangic maternal gene pool, and macrohaplogroup N and its subhaplogroups comprise the left 49.30%. The most prevalent haplogroups within macrohaplogroup M, haplogroup D and G represent 18.14% and 13.60% of all the samples. Within macrohaplogroup N, haplogroup A and F are the most common lineages, accounting for 13.60% and 10.58% of Qiangic, respectively. The majority of the mtDNA lineages belong to eastern Eurasian specific groups, including those from Northeast Asia (A, D4, D5, G, C, and Z) [83]–[85] and Southern China or Southeast Asia (B, F, M7, and R9) [54]. Only two U samples in Yajiang might be traced for their origins to western or southern Eurasia, comprising 0.5% of Qiangic. The frequencies of Southern China or Southeast Asia specific haplogroups in Horpa-Danba, Horpa-Daofu, Tibetan-Xinlong, and Tibetan-Yajiang are 26.09%, 22.50%, 27.73%, and 21.35%, respectively. However, Tibetan-Yajiang, Horpa-Danba, Horpa-Daofu and, to a lesser extent, Tibetan-Xinlong, display a considerable Northeast Asian proportion of lineages (56.77%, 56.52%, 55.00%, and 43.70%, respectively). Consistent with other studied Tibetan populations on the Tibetan Plateau, Qiangic populations also showed a strong similarity with Northeast Asian populations.

We performed a PCA using the mtDNA haplogroup frequencies of Qiangic groups in this study and other 68 populations to see the detailed genetic patterns of those populations (Figure 3b,Table S3). The first PC revealed a clear geographic division between northern populations (Altaic and Northern Han) and southern populations (Southern Han, Tai-Kadai, and Hmong-Mien). Qiangic groups were clustered in the northern pole due to the high frequencies of haplogroup A and G. Han Chinese and Tibeto-Burman populations showed significantly different distributions in the second PC. Qiangic populations were clustered within Tibeto-Burman group due to the existence of haplogroup M9a’b and M13.

Phylogeography of Macrohaplogroup M.

Macrohaplogroup M and its subhaplogroups represent the majority of the Qiangic maternal lineages, with frequencies ranging from 65.22% in Horpa-Danba to 57.98% in Tibetan-Xinlong. Haplogroup D4 and G are the most frequent sub-clades of macrohaplogroup M in Qiangic populations, each comprising 13.60%. Haplogroup D4, which is prevalent throughout Central Asia [85], Northeast Asia [86], [87], and Southwest China [5], [8], [65], [66], represents the majority of haplogroup D samples in Horpa-Danba (17.39%), Tibetan-Yajiang (13.54%), Tibetan-Xinlong (13.45%), and Horpa-Daofu (10.00%). The haplotypes of D4* were extensively shared among Qiangic, Tibetan, Han Chinese, and Altaic (Figure 5). Specifically, sub-haplogroup D4j3 was detected in Horpa-Danba and Horpa-Daofu with considerable frequencies (4.35% and 5.00%, respectively). The age estimates generated for D4* and D4j3 in Qiangic were about 15 kya (Table 3). In addition, the population growth factor, Fu’s Fs values of haplogroups D4* and D4j3, were significantly negative (Table 4), implying post-LGM expansions of those two lineages in Qiangic.

Haplogroup G is found at high frequencies in northeastern Siberia but it is also common among populations of Japanese Archipelago and Korean Peninsula. This haplogroup also comprises an average of 20% of the maternal gene pool of the Tharus from Nepal [88] and accounts for more than 10% in the Tibetan populations of Nagqu, Chamdo, Lhasa, Garze, and Monba [5]. In this study, haplogroup G and subhaplogroups G2a, G2b1b, G3, and G3a1 account for 20% of Horpa-Daofu and reach frequencies greater than 10% in three other Qiangic populations. Subhaplogroup G2a is represented as four distinct HVS-I motif types: 16129–16223–16278–16362 (I), frequent in Tibetan and Southern Han but nearly absent in Altaics; 16223–16227–16278–16362 (II), frequent in all the above three populations and probably experienced population expansion in Altaics (Figure 5); 16193–16223–16278–16362 (III), exclusive in South Asia. All of the G2a samples in Horpa-Daofu harbor haplotype II but add one more mutation at site 16304. However, most of Tibetan-Xinlong samples belong to haplotype I (50%). Subhaplogroup G2b1b was first reported as a novel haplogroup in northeast India and has low frequency distribution in Tibet and surrounding regions [89], [90]. This haplogroup accounts for 4.69%, 2.50%, and 0.84 of Tibetan-Yajiang, Horpa-Daofu, and Tibetan-Xinlong. Compared with other Tibetan samples, 72.73% of Qiangic G2b1b samples were detected with a mutation at site 16356, thus forming some exclusive clades in the network (Figure 5). Subhaplogroup G3 comprises 6.77%, 5.00%, 3.36%, and 2.17% of Tibetan-Yajiang, Horpa-Daofu, Tibetan-Xinlong, and Horpa-Danba, respectively. Two Yajiang samples are further defined as G3a1 by a mutation at site 16215. In addition, we have found two Horpa-Danba G2a samples bearing both G2a (16278) and G3 (16274) characteristic mutations and thus we could not tell the exact haplogroup classification of those two samples. The coalescence time estimates of G*, G2b1b, and G3 were all around 20 kya and the age of G2a even reached about 34 kya (Table 3). However, it is noteworthy that the arrival time of these haplogroups at the Tibetan Plateau might be somewhat more recent than their coalescent ages would indicate, because nearly all these haplogroups (except G2b1b) had already differentiated before their arrival on the plateau (Figure 5). The exclusive clades in the network (Figure 5) and the significant negative Fu’s Fs values (Table 4) of G2a and G3 suggest the probable isolation and secondary population expansion of the two lineages.

Haplogroup M8 has two sublineages, haplogroup C and Z. Haplogroup C is a common lineage, which is widespread in East Asia and Siberia and is one of the founder lineages among Native Americans [6]. Haplogroup C comprises 8–10% of Horpa-Danba and Tibetan-Yajiang, but was detected at a very low frequency or even absent in Tibetan-Xinlong and Horpa-Daofu. Almost 60% of the C samples in present study harbored a specific HVS-I motif 16093–16298–16327 and were assigned as C4d. One Horpa-Danba individual with HVS-I motif 16298–16327 is also classified as C4d through complete sequencing (Doc S2). Haplogroup C4d has been supposed to be Tibetan specific, frequencies ranging from 1.6% to 5.0% in populations of Tibet [5]. However, the frequency of C4d in Tibetan-Yajiang even reaches 6.25%. In addition, all the reported C4d samples in Tibet and Qinghai have the same motif as above mentioned. However, 25% of the C4d samples in Yajiang share another mutation at site 16111. About 23% of C samples in Qiangic with a mutation at site 16357 might be assigned as C4a2′3′4, which is also restricted to Tibeto-Burman populations. Haplogroup Z is observed at relatively low frequencies in Qiangic populations.

M9a’b is widely distributed in mainland East Asia [89] and Japan, and reaches its greatest frequency and diversity in Tibet [5], [8] and its surrounding regions, including Nepal [88] and northeast India [90], [91]. It has been proposed recently that haplogroup M9’b had most likely originated in southern China and/or mainland Southeast Asia. After the LGM, M9a’b might be involved in some northward migrations in mainland East Asia [60]. In the present study, the frequencies of M9a’b in Horpa-Danba, Horpa-Daofu, Tibetan-Xinlong, and Tibetan-Yajiang are 4.35%, 10%, 13.45%, and 6.77%, respectively. Most M9a* samples (62.5%) of Qiangic shared the main haplotype that clustered in the central largest clade with other Tibeto-Burman populations in the network. However, the estimated age of M9a* is relatively young at about 7 kya. M9b is largely restricted to the non-Tibetans in southern China and southwest China [60]. We have detected low frequencies of M9b in Horpa-Danba and Tibetan-Xinlong (2.17% and 0.84%, respectively). In the networks of M9a1a and M9a1b, most of the Qiangic samples shared the descent types, giving a clear signal of out of Tibet migrations of those haplogroups. The age estimates generated for M9a1a and M9a1b1 in Qiangic were around 12–13 kya (Table 3), consistent with proposed post-glacial dispersal of the M9a’b lineages.

Haplogroup M13a has been found at its greatest frequency and diversity in Tibet, but it has also been detected at very low frequencies in Siberian Buryat, Yakut, Altaian Kazakh, and Ewenki[85], and central Asian Kirghizs [92] as well as Barghuts [84], [93], [94]. The frequency of haplogroup M13a in Qiangic populations is remarkable, accounting for 3.27% of all samples. In the network of haplogroup M13a1 and M13a2, Qiangic and Tibetan-Burman samples formed some almost exclusive clades. This strongly suggests that these specific lineages have de novo origins within Tibetans. Specially, 70% of subhaplogroup M13a1b samples in Qiangic share the same haplotype. A coalescence time estimate for M13a1b corresponded to 5.7 kya (Table 3), suggesting a relatively recent Neolithic expansion out of Tibet and even more recent arrival into northern Asia of this lineage.

Qiangic populations also exhibit some basal Eurasian mtDNA lineages. Haplogroup M62, for example, was first reported in Northeast India [90] and since then has been reported in several populations at low frequency throughout Tibet [5], [8]. Zhao et al. suggested that M62 might represent the genetic relics of the initial Late Paleolithic settlers (>21 kya) on the Tibetan Plateau. In this study, we observed haplogroup M62b in three Yajiang Tibetans. The haplotype of those three individuals is different from all other reported M62 samples with a mutation at site 16305. Likewise, haplogroup M74a was detected in one Xinlong Tibetan, and the haplotype of which bearing a distinctive mutation at site 16274 only shared with one Maonan individual, one Zhuang individual, and one Hainan Han Chinese [52]. Haplogroup M33c was found in a Tibetan sample from Yajiang with a similar haplotype as some Hmong-Mien samples [52].

Phylogeography of Macrohaplogroup N.

Haplogroup R and its subhaplogroups (B and F) represent the majority of the lineages branching from the basal N trunk, accounting for 26.09%, 22.50%, 28.57%, and 23.44% of the maternal diversity in Horpa-Danba, Horpa-Daofu, Tibetan-Xinlong, and Tibetan-Yajiang, respectively. Subhaplogroup B4* is the most frequent lineage of haplogroup B in Qiangic, comprising 4.53% of all the samples. In the network of B4*, the root clade composed almost exclusively of non-Tibetan-Burman samples, however, the Tibetan-Burman samples only formed some small clusters or shared the terminal types, suggesting that B4* had already differentiated before its arrival in Tibet. Subhaplogroup F1* is the most frequent lineage of haplogroup F in Qiangic, accounting for 5.54% of all the samples, and even comprising as high as 12.5% of Horpa-Daofu. Age estimate generated for F1* in Qiangic was around 5 kya (Table 3). The exclusive Qiangic cluster of F1* in the network suggests a strong bottleneck or founder effect in its Neolithic migration towards the plateau. The significant negative values of the growth factor estimates (Table 4) suggest a secondary expansion and probable selection of F1* lineage during its adaptation in the plateau.

Haplogroup N* is almost exclusively represented by haplogroup A in our samples. Haplogroup A is widely distributed in northern and eastern Asia, occurring at frequencies of 5%–10% in different populations [85]. Haplogroup A also has an average frequency of nearly 9% on the plateau [5]. Subhaplogroup A4*, which is mainly found in Central, Northeast and Southwest Asia, is the most frequent sublineage of haplogroup A in Qiangic, accounting for 2.17%, 5.00%, 4.20%, and 12.50% of Horpa-Danba, Horpa-Daofu, Tibetan-Xinlong, and Tibetan-Yajiang, respectively. Network analysis of haplogroup A4* revealed a star-like pattern and thus showed a signal of population expansion on the plateau (Figure 5). The probable population expansion was also confirmed by growth summary statistics in this lineage (Table 4). Subhaplogroup A11 split from the root of haplogroup A very early and formed a distinct lineage. A11a and A11b, the two sublineages of A11, have the different distribution pattern. Most of the A11 samples in Tibet belong to A11* or A11a and only a few have a control-region substitution at site 16234, assigned as A11b. However, almost all the A11 samples in the Tibetan-Burman and Han Chinese of Yunnan belong to A11b. In the present study, three of five A11 samples belonged to A11* and the other two were assigned as A11b.

Discussion

The Sino-Tibetan linguistic family comprises some 460 languages distributed in East Asia, Southeast Asia, and parts of South Asia, including the Chinese and Tibeto-Burman subfamilies[1]. Despite intense linguistic, archaeological, and genetic researches, where the Sino-Tibetan speakers came from, how they dispersed remain major open questions. One widely accepted hypothesis states that the ancestors of the Sino-Tibetan population were originally from the Neolithic Age Di-Qiang people in the upper and middle Yellow River basin. Di people have gradually developed into Han Chinese and Qiangic populations since the collapse of Later Liang dynasty (one of the Sixteen Kingdoms dynasty, AD 386–403). Here, we integrated the Y chromosome and mtDNA evidence of Qiangic populations to provide a broader framework for reconstructing the history of Sino-Tibetan.

From the paternal Y chromosome perspective, haplogroup D1-M15 originated from D*-M174 during its migration into mainland East Asia [95]. Around 50–60 kya, a subgroup of haplogroup D*-M174 and D1-M15 started their northward migration through WSC corridor into nowadays Qinghai province, and then probably moved along the well-known route, called the Tibeto-Burman corridor, to enter the Himalayas [95]. Haplogroup D*-M174 probably gave birth to D3a-P47 in Tibet [95]. Haplogroup D3a-P47 experienced recent population expansion on the Tibetan Plateau, and then probably migrated southward via the WSC corridor and gradually became the main genetic component of Tibeto-Burman populations in nowadays Sichuan, Yunnan, and Guangxi province. Y chromosome haplogroup D might give the evidences of the late Palaeolithic human activity on the plateau. The genetic relics of late Palaeolithic age have also been detected in the maternal side, for example, haplogroup M62b. In addition, a number of Paleolithic sites have been excavated crossing the Tibetan Plateau [96]–[99], documenting the earliest human presence on the plateau dated to 20–30 kya.

Around 20–40 kya, a population with dominant haplogroup O3-M122 Y chromosomes (haplogroup O3a1c-002611, O3a2c1*-M134, O3a2c1a-M117, and probably other O3 lineages) finally reached the upper and middle Yellow River basin and formed the Di-Qiang populations. During the Neolithic period, the Di-Qiang people experienced relatively huge population expansion. A subgroup of the Di-Qiang people with dominant haplogroup O3a2c1*-M134 and O3a2c1a-M117, now called the Proto-Tibeto-Burman people left their Yellow River homeland, probably also moved along the Tibeto-Burman corridor, embarking on large-scale westward migrations to nowadays Qinghai province and then southward to the Himalayas, or southward migration directly via the WSC corridor to Yunnan and Guangxi, where they mixed with D-M174 linages and developed into Tibeto-Burman populations. However, haplogroup O3a2c1*-M134 might have already reached Tibet predated the above southward migration together with O3a2c1a-M117, judging from the high diversity in the network of O3a2c1*-M134 (Figure 4). In addition, another branch of the Di-Qiang people, the proto-Chinese, with dominant haplogroup O3a1c-002611 migrated eastward to the central China plain area, the middle and lower Yellow River Valley, and integrated gradually with the natives (probably populations with haplogroup C-M130 or D-M174) around 5–6 kya. Subsequently, the Di-Qiang people that resided in upper and middle Yellow River basin with haplogroup O3a2c1*-M134 and O3a2c1a-M117 formed the well-known Yan-Huang tribe (Hot Emperor and Yellow Emperor), and the eastward branch with O3a1c-002611 developed into the Dong Yi tribe. The Yan-Huang tribe together with the Dong Yi tribe gradually developed into a large population known as Han Chinese. With the expansion of Han Chinese, especially southward, this group became the largest one of the 56 officially recognized ethnic populations in China.

The role of haplogroup O3-M122 lineages played in the origin of Tibeto-Burman populations has suggested extensive genetic input from northern Asians. This suggestion has been supported by previous studies employing autosomal STR [100], [101], Y chromosome [33], [34], and mtDNA [5]–[9]. It is not surprising that the maternal variation of Qiangic populations was also largely contributed by northern Asian-prevalent haplogroups, including haplogroups A, C, D, and G. In addition, cultural features of the upper Yellow River basin, such as painted pottery, millet agriculture, and urn burial, are prevalent in the Neolithic sites of WSC, probably due to the demic diffusion via the genetic corridor [102]. However, we still could not rule out the possibility that the complex genetic structure of Qiangic populations might be due to repeated admixture from surrounding populations, which provides directions for future work.

…

Gayden T et al., Genetic insights into the origins of Tibeto-Burman populations in the Himalayas. J Hum Genet. 2009 Apr;54(4):216-23. doi: 10.1038/jhg.2009.14. Epub 2009 Feb 27.

Abstract

The Himalayan mountain range has played a dual role in shaping the genetic landscape of the region by (1) delineating east–west migrations including the Silk Road and (2) restricting human dispersals, especially from the Indian subcontinent into the Tibetan plateau. In this study, 15 hypervariable autosomal STR loci were employed to evaluate the genetic relationships of three populations from Nepal (Kathmandu, Newar and Tamang) and a general collection from Tibet. These Himalayan groups were compared to geographically targeted worldwide populations as well as Tibeto-Burman (TB) speaking groups from Northeast India. Our results suggest a Northeast Asian origin for the Himalayan populations with subsequent gene flow from South Asia into the Kathmandu valley and the Newar population, corroborating a previous Y-chromosome study. In contrast, Tamang and Tibet exhibit limited genetic contributions from South Asia, possibly due to the orographic obstacle presented by the Himalayan massif. The TB groups from Northeast India are genetically distinct compared to their counterparts from the Himalayas probably resulting from prolonged isolation and/or founder effects.

…

Close genetic ties have been reported between the Tamang and Tibet.¹ It is likely that Tamangs are descendants of Tibetans who migrated south and settled in the southern region of the Himalayan range.¹ This affinity is reflected in both CA plots (Figures 2 and 3) and NJ dendrograms (Figures 4 and 5). The Tibetan connection to the Tamang is also evident in their shared cultural and religious practices. The partitioning of these two populations with Bhutan and their proximity to the general collection from Nepal (Figures 2, 3, 4 and 5) may be associated with Neolithic migrants carrying Y-haplogroup O3a5-M134, an East Asian-specific marker, shared among TB populations.^{1, 3, 4, 9, 60} The Himalayan populations, with the exception of Newar and Kathmandu, segregate close to the Northeast Asian cluster in agreement with the admixture analyses results (Table 3). Northeast Asia is the major contributor to both Tibet (63.4%) and Tamang (59.7%) whereas Newar (44.7%) and Bhutan (41.1%) received equivalent percentages, followed by Kathmandu (22.3%). These results corroborate studies indicating a shared common ancestry between Tibet and the Northeast Asian collections of Japan and Korea by a variety of marker systems, including classical,^{61, 62} autosomal,⁶³ Y-chromosome^{1, 12, 64, 65} and mtDNA.^{12, 64, 66, 67}

More than half of the Tibetan men possess the YAP polymorphic Alu insertion in their Y-chromosome which is believed to have originated in Central Asia,^{1, 4, 11, 14}although its source remains highly debated.^{64, 68, 69} In this study, however, given the lack of representative Central Asian populations due to the paucity of the data available from the region, no clear connections were made between Tibet and its possible Central Asian genetic contributors. Afghanistan is the sole Central Asian collection included in the analyses and appears to make no contributions to any of the Himalayan groups except for a minor influence in Kathmandu (12.9%).

To evaluate the genetic relationships between the Himalayan collections and the neighboring TB-speaking populations at the regional level, six Northeast Indian TB groups were included in the phylogenetic and statistical analyses performed using the 13 core CODIS STR loci. These Northeast Indian TB groups map distantly from both the Himalayan and East Asian populations in the CA graph (Figure 3), inconsistent with previous Y-chromosome and mtDNA studies which report a high degree of genetic homogeneity between Himalayan and Northeast Indian TB groups.^{3, 4, 9, 70} The discrepancy observed between Y-chromosome and microsatellite polymorphisms in the Northeast Indian TB groups may be explained by a male founder effect from Northeast Asia and their subsequent genetic isolation for an extended period of time following their arrival.⁹

Altogether, our results suggest a Northeast Asian ancestry for the Himalayan populations with subsequent genetic admixture in Kathmandu and Newar populations from South Asia. South Asian influences in Tibet and Tamang are negligible most likely due to the natural barrier presented by the Himalayas.¹Tamang, Tibet and Bhutan display close genetic affiliations in all analyses possibly indicating a shared common ancestry. The biparental markers examined in this study reveal unique genetic profiles for the Northeast Indian TB groups, which are distinct from their Himalayan counterparts implying limited gene flow, geographic isolation and/or founder effects.

Virus marker studies support the above picture of an early migration and route out of India carried first into SEA (Tai and Cambodians), then into Cambodia and Vietnam, then carried by SEA Dai ancestors into Thailand, and at a much later stage, the virus arrives from India again, penetrating Malaysian populations within the past 200 years.

Evolutionary History of Helicobacter pylori Sequences Reflect Past Human Migrations in Southeast Asia. PLoS ONE:

Abstract

The human population history in Southeast Asia was shaped by numerous migrations and population expansions. Their reconstruction based on archaeological, linguistic or human genetic data is often hampered by the limited number of informative polymorphisms in classical human genetic markers, such as the hypervariable regions of the mitochondrial DNA. Here, we analyse housekeeping gene sequences of the human stomach bacterium Helicobacter pylori from various countries in Southeast Asia and we provide evidence that H. pylori accompanied at least three ancient human migrations into this area:

i) a migration from India introducing hpEurope bacteria into Thailand, Cambodia and Malaysia;

ii) a migration of the ancestors of Austro-Asiatic speaking people into Vietnam and Cambodia carrying hspEAsia bacteria;

and iii) a migration of the ancestors of the Thai people from Southern China into Thailand carrying H. pylori of population hpAsia2. Moreover, the H. pylori sequences reflect iv) the migrations of Chinese to Thailand and Malaysia within the last 200 years spreading hspEasia strains, and v) migrations of Indians to Malaysia within the last 200 years distributing both hpAsia2 and hpEurope bacteria.

…

The fragmented distribution of speakers of the five major language families in Southeast Asia is the result of extensive human migrations. Hmong Mien, Austro-Asiatic and Austronesian are considered the older language families in the region [1], whereas the presence of the Sino-Tibetan and Tai-Kadai language families can be attributed to relatively recent population expansions. Most fragmented is the distribution of Hmong-Mien speakers living in numerous small enclaves surrounded by Sino-Tibetan and Tai-Kadai speakers in Southern China, Laos and Northern Vietnam because of an extreme expansion of the Chinese subfamily of Sino-Tibetan (mostly during the Zhou dynasty 1100 to 221 BC) which distributed Chinese languages continuously over a large region from North to South China, pushing speakers of other languages further south and west. The Austro-Asiatic language family (with the examples of Vietnamese from Vietnam and Khmer from Cambodia) was previously distributed from Vietnam in the east and South China in the north to the Malay Peninsula in the south and North India to the west [2] before massive expansions of Indo-European speakers in India and Tibeto-Burman speakers (a subgroup of Sino-Tibetan different from Chinese) from South China into Myanmar restricted Austro-Asiatic languages to numerous enclaves in this area.

A subsequent expansion of Tai-Kadai speakers during the early second millennium AD from their homeland in South China into Thailand and Laos replaced Austro-Asiatic speakers in large parts of Southeast Asia that previously belonged to the Khmer empire [3], [4], [5]. Subsequently, Tai-Kadai is found from South China over Thailand to the Malay Peninsula and Myanmar.

In historic times, parts of Southeast Asia have repeatedly been ruled by colonial forces, but there has never been overall occupation [1], [4]. The Han Chinese invaded North Vietnam (Tonkin) in the 1^st century BC and stayed for nearly a millennium, after which Vietnamese dynasties from North Vietnam conquered central Vietnam (Annam) and South Vietnam (Cochin China). The French occupied the same area (Tonkin, Annam, Cochin China) during a far shorter period (1863–1953), and added present day Cambodia and Laos to their colonial French Indochina. Both of these colonial episodes excluded Siam (Thailand), the only country in Southeast Asia never colonized by a European power.

Archaeology suggests an ancient close connection between India and the Thailand/Cambodia region through settlement [6], [7], [8], [9], accompanied by an increasing exposure to Indian culture from about 300 BC. Early states-like societies from Southeast Asia called by the Sanskrit term “mandala” had in common the adoption of Indian forms of religion (Hinduism), the Sanskrit language and aspects of government (Funan mandala from 100 to 550 AD, Chenla mandala from 550 to 802 AD and Angkorian mandala from 802 to 1431 AD) [4]. However, the Indian influence in Southeast Asia was not supported by human mitochondrial DNA (mtDNA) data [10], [11], [12].

In previous studies, we have used housekeeping gene sequences of a bacterial parasite which infects the stomach of most humans, Helicobacter pylori, to elucidate the patterns of human prehistory. H. pylori accompanied modern humans during their migrations out of Africa ca. 60,000 years ago [13], and subsequent geographic separation plus founder effects have resulted in genetic populations of bacterial strains that are specific for large continental areas. In all, 7 bacterial genetic populations have been described… The specific geographic distribution and ethnic association of the H. pylori populations reflects numerous ancient and historic human migrations which established H. pylori sequences as a useful genetic marker to unravel debated topics in human population history. For example, the genetic variation in H. pylori has showed more discriminatory power in determining the ancient sources of human migrations in the Ladakh region of Northern India [19] and in the Pacific (Austronesian expansion) [16] than traditional human genetic markers such as the hypervariable region (HSV1) of mtDNA. Therefore, we analysed H. pylori sequences from Cambodia which borders Thailand to its west and northwest, Vietnam to its east and southeast and Laos to its north, to gain additional insights into the human population history in continental Southeast Asia.

…

Gayden, Tenzin et al., The Himalayas: barrier and conduit for gene flow. Am J Phys Anthropol. 2013 Jun;151(2):169-82. doi: 10.1002/ajpa.22240. Epub 2013 Apr 12.

“Although previous Y-chromosome studies indicate that the Himalayas served as a natural barrier for gene flow from the south to the Tibetan plateau, this region is believed to have played an important role as a corridor for human migrations between East and West Eurasia along the ancient Silk Road.” The analysis of mitochondrial DNA variation in 344 samples from three Nepalese collections (Newar, Kathmandu and Tamang) and a general population of Tibet “revealed a predominantly East Asian-specific component in Tibet and Tamang, whereas Newar and Kathmandu are both characterized by a combination of East and South Central Asian lineages. Newar and Kathmandu harbor several deep-rooted Indian lineages, including M2, R5, and U2, whose coalescent times from this study (U2, >40 kya) and previous reports (M2 and R5, >50 kya) suggest that Nepal was inhabited during the initial peopling of South Central Asia.”

The study confirmed “that while the Himalayas acted as a geographic barrier for human movement from the Indian subcontinent to the Tibetan highland, it also served as a conduit for gene flow between Central and East Asia.”

…

The missing link and the route through Terai region of land-locked Nepal. The Terai mountain passes of Kodari and Rasuwa, though not the only passes, they are the only mountain passes in the land-locked Nepal which were not seasonal (i.e. accessible at all times, while others were covered often in deep snow) along which ancient migrations and trading took place and which were important for the Silk Route trade. It was an important migratory route from India into China.

After the arrival of the Out-of-Indian ancestral Austro-Asiatic branch in SEA, O expands rapidly and diverges into the Sinitic Qiang, the Tibeto-Burman, Hmong, Tai-Kadai branches, and most of the studies show the expansion routes taken out of MSEA.

Looking at the earliest branches of the East Asians

Two groups may have branched off from the ancestral A-A early, one into the Sinitic Qiang in the North Asia /Central Asia, another into the Southern Tibeto-Burmans, and the remnant ancestral A-A continuing on to SEA/South China from where the major Austronesian expansions begin, with an Austronesian group down the coast through Fujian and southwards along the coast of Vietnam, branching off into Taiwan along the way, and a Tai-Kadai group diverging from the pre-Austronesian homeland in East Asia/S. China.

Thus from hereon, an Out-of-India ancestral A-A expands in a way that is consonant with Sagart’s O1a out of the coastal plains of Central Eastern China in Jiangsu-Zhejiang-Fujian-Taiwan scenario.

See Laurent Sagart’s highly plausible ideas on the spread of pre-Austronesians to Taiwan and out of Taiwan in “The higher phylogeny of Austronesian and the position of Tai-Kadai“

“…pre-Austronesians spoke a language related to Sino-Tibetan, and that they reached Taiwan from a location in NE China where millet and rice were cultivated, and where ritual evulsion of the upper lateral incisors in boys and girls was practiced. The eastern China seaboard region north of the Yangzi estuary, from north Jiangsu to north Shandong, is the one area in East Asia where the distribution of these three traits overlaps in the period before the arrival of the Austronesians in Taiwan: thus both rice and millet were cultivated in Xihe in north Shandong (Wright 2004) c. 8000 BP and in Longqiuzhuang in the lower Huai basin c. 7000-5000 BP. Tooth evulsion is attested from 6500 BP on in Shandong and north Jiangsu (Han and Nakahashi 1996). We may surmise that before they reached Taiwan, the pre-Austronesians were expanding southward along the coastal plains of central-eastern China in Jiangsu, Zhejiang and north Fujian. We can expect that archaeological sites with rice, Setaria, tooth evulsion, and a technology intermediate between the Dawenkou culture of north-east China and Ta-Pen-K’eng of Taiwan will eventually appear there.
If this scenario is correct, it is likely that the passage to Taiwan did not exhaust the pre-An population of the Fujian coast. More likely, this population continued expanding along the coast in a south-westerly direction towards the Pearl River delta, even after a group of them had crossed to Taiwan. Their archaeological traces SW of Fujian are perhaps seen in the Pearl river delta, although direct evidence of agriculture there has so far not appeared; Hedang in the Pearl River delta, with tooth evulsion (Higham 1996:84), c. 3000-2000 BCE, may be one such site. In Taiwan, Tsang (in press) describes the newly excavated site of Nan-kuan-li near Tainan in south-west Taiwan, where a team led by him recently discovered a neolithic culture having rice, millet, and practicing ritual tooth ablation around 5000-4500 BP. In the same paper he argues that the Ta-Pen-K’eng culture, as seen in Nan-kuan-li near Tainan, “has close affinities with the Neolithic cultures of Hong Kong and the Pearl River Delta”. I disagree with Tsang when he concludes that “The Pearl River Delta of Kuangtung is most probably the source area of the Tapenkeng Culture in Taiwan”. I think it more likely that both cultures are descended from a common precursor on the Fujian coast. Pearl River delta sites having affinities to Taiwan TPK like Hedang are also probably too early and too far east to be ancestral to the Tai-Kadai-speaking cultures.
Conclusion
I have presented an explicit account of the early phylogeny of the
Austronesian family. The new phylogeny is tree-like. A salient characteristic is that out of a majority of nodes, only one branch leads to further branching (Table 4). This makes Formosan phylogeny similar to Malayo-Polynesian phylogeny. Non-branching nodes can be associated with stay-at-homes, and branching ones with out-migrating groups. PMP has been shown to be part of a taxon that also includes languages of the NE Formosan Coast, as well as Tai-Kadai (as proposed in Sagart 2001; in press, a). That taxon itself is part of a larger taxon including languages of the East coast and south Taiwan.

The most likely location of PAn is in the region of Luilang, Saisiat and Pazeh, in the north-west of Taiwan…This model shows a consistent geographical pattern: early Austronesian speakers settling Taiwan progressively in a counter-clockwise movement, starting from the north-west, then expanding southward along the west coast, and reaching the southern tip of the island before finally settling the east coast from south to north …

… the general pattern is clear: a gradual, unidirectional encirclement of the island by Austronesian speakers.
Apparently the main direction of movement was along the coastal plains.
This implies that, given a choice, the early Austronesians preferred to expand into the coastal plains. This pattern is consistent with what archaeology and linguistics tell us about their mode of subsistence, which combined exploitation of marine resources, including fishing, with hunting and gathering and cultivation of rice and millet. We may suppose that population movements into the mountains, as with the Saisiats, Atayalics, Thaos, Tsouics and Bununs, were generally late, and made under pressure. Such indeed is the pattern observed in the rest of the Austronesian world (Blust 1999:53). The pattern of progression from the west to the east coast is moreover consistent with archaeological dates for Ta-Pen-K’eng sites, which are older on the west coast than on the east coast.

The geographical stability over time of the initial settlement pattern is striking. Most modern languages are still spoken or were still spoken until recently in the area of the meso-language they are descended from. A major factor in this is the geography of Taiwan, where the central mountain range [Sagart gives provisional dates for settlement of Taiwanese neolithic sites are 5500 BP and 400oBP]

Finally, under the present interpretation, FAMP and FATK, the two Muic languages whose speakers left Taiwan to settle other regions, were probably located in the north-east or north of the island, where the last available agricultural lands had been. The MP and TK migrations out of Taiwan thus appear motivated by the need to find new agricultural lands. It is probably no coincidence that the site of Yuan-Shan near Taipei, in the region where Ketagalan was spoken until the early 20th century, has significant connections to the earliest neolithic of the Philippines (Bellwood 1997: 215)”

…

The Southern Hypothesis – an early articulation is found in the 1999 paper seemed to use rather simple and circular reasoning:

Bing Su et al., Y-Chromosome Evidence for a Northward Migration of Modern Humans into Eastern Asia during the Last Ice Age, Am J Hum Genet. Dec 1999; 65(6): 1718–1724. Published online Nov 2, 1999. doi: 10.1086/302680

PMCID: PMC128838

The difference between southern and northern populations is further reflected by the results of principal-component analysis (fig. 3). This analysis showed that all northern populations cluster together at the upper-right corner and are well separated from the southern eastern-Asian populations, which are far more diversified than the northern populations. Given the observation that Southeast Asian populations, including Cambodians and Thais, are the most polymorphic, because they exhibit almost all of the Asian-specific haplotypes (see table 1), it is reasonable to conclude that the northern populations derived from the southern populations and that the first settlement of the ancient African immigrants was in mainland Southeast Asia, from which they expanded northward to other parts of eastern Asia. A study by Ballinger et al. () also suggested a southern Mongoloid origin of eastern Asians.

To estimate the time of the entry of modern humans into eastern Asia, we typed three Y-chromosome microsatellite loci for individuals carrying the C allele at locus M122—that is, the allele state shared by Asian-specific haplotypes H6–H8. A total of five, eight, and six alleles were observed at DYS391, DYS390, and DYS389, respectively. The single-step mutation model and a mutation rate of 0.18% (Heyer et al. 1997; Bianchi et al. 1998) were used in the estimation. To minimize the possible influence of population substructure on the estimation, only Han Chinese samples were included (160 M122-C individuals in total). When an effective population size of 750–2,000 is assumed (see the Material and Methods section), the number of generations estimated is 919–3,032 for DYS390, the oldest among all three estimations. Therefore, the age of M122C is ∼18,000–60,000 years, if we assume a 20-year generation time. We argue that this estimation reflects the age of the bottleneck event leading to the entrance of modern humans into eastern Asia, since the extensive presence of the M122-C allele in Southeast Asian populations suggests that this mutation predates their entry.

It is difficult to accurately date the ancient human migrations (or mutations), because of the errors inherently involved in estimating both the effective population size of the males and the mutation rate. However, our knowledge of morphology and archaeology can help us to narrow the estimated age range. According to the morphological study by Turner et al. (1993), the so-called Sinodont dentition in northern-Asian peoples occurred ∼18,000–25,000 years ago. A similar dentition pattern predominates among all the Southeast Asian populations and was thought to be ancestral to the Sinodont pattern. Consequently, this sequence of dental evolution tends to rule out an 18,000-year colonization dating—the lower boundary of our age estimation—which was based on an overestimated effective population size. In addition, archaeological evidence from the Altai Mountain and Lake Baikal regions of southeastern Siberia are beginning to show the presence of modern human lithic cultures of 25,000–45,000 years ago (Vasil’ev 1993). Therefore, the first entry of eastern-Asian populations should predate the emergence of the lithic culture in northern Asia. Recent evidence from archaeological studies indicates that Papua New Guinea was settled ∼35,000–50,000 years ago by modern humans, aboriginal Australia perhaps even earlier than that (Brown et al. 1992; Swisher et al. ). Hence, if we accept that mainland Southeast Asia is the homeland for all eastern-Asian populations, including Siberian and Oceanian, the upper boundary of the M122-lineage time depth—that is, 60,000 years ago—seems to be a likely estimation of the initial colonization of eastern Asia by modern human populations from Africa.

The last Ice Age occurred 75,000–15,000 years ago, although its distribution and the exact date of its presence in eastern Asia are not clear (Dawson 1992). Interestingly, in a close examination of the collection of hominid fossils in eastern Asia, we found a nontrivial gap between H. sapiens and H. s. sapiens, in terms of time continuity. All the H. sapiens fossils are [gt-or-equal, slanted]

100,000 years old, whereas all the H. s. sapiens fossils are <50,000 years (with most being 10,000–30,000 years old). Hence, no hominid fossils of 100,000–50,000 years ago have yet been found in eastern Asia, a finding that is particularly anomalous given the abundance of either earlier or later fossil records that have been found in this area (Wu and Poirier 1995; Etler 1996). Both the extensive duration of the temporal discontinuity of the fossil records in China and the distinctive morphological characters of the hominid fossils found before and after would strongly argue against any casual explanation that this gap is attributable to a “missing link.”

In conclusion, the evidence presented in this report indicates that the first entry of modern humans into the southern part of eastern Asia was ∼60,000 years ago, followed by a northward migration coinciding with glaciers receding in that area. It may suggest that the old hominids living in eastern Asia disappeared before or during the last Ice Age and that the modern humans of African descent made their way to the vast land of eastern Asia. A subpopulation with predominantly H6 and H8 haplotypes later made the arduous journey to the north, which contributed to the peopling of northern China and then Siberia.

So back to the Southern Hypothesis, just exactly how “Southern” in character is the Japanese population?

There are other studies that challenge the Southern Hypothesis, coming to different conclusions or at least a more variegated and modulated picture of origins that include a Northern cline and North Asian components.

The above excerpted study on the Qiangic populations of the Western Sichuan corridor already discussed in considerable depth the many possible northern genetic contributions and affinities with the Altaic or East Eurasian elements.

See also the following:

Tian C, Kosoy R, Lee A, Ransom M, Belmont JW, et al. (2008), Analysis of East Asia Genetic Substructure Using Genome-Wide SNP Arrays PLoS ONE 3(12): e3862. doi:10.1371/journal.pone.000386

Excerpted from the extract:

This study combines high density SNP array genotypes from studies of EAS population groups within the Human Genome Diversity Panel (HGDP) [12,13] with those of several additional population groups of EAS ancestry. The use of high density SNP genotypes containing over 200 K common autosomal genotypes allows a more comprehensive analyses than those
previously performed using limited number of autosomal genotypes. It also complements studies of mitochondrial and Y chromosome haplogroups as well as classical markers that provide important information with respect to part of the history of particular EAS ethnic groups [14–20]. Our study expands on previous analyses using HGDP population groups [13] by
examining additional parameters of population structure/diversity…

” With respect to population groups derived from very populous groups, the data indicate that Japanese and Korean were very closely related, as were Korean and Han Chinese but that these groups are much further from the south-east Asian populations (Filipino and Vietnamese). The Han Chinese and Japanese groups showed larger separation than either with Korean, although the paired Fst values were still small relative to Chinese/Filipino Fst. The Fst values also showed a close relationship between the Dai ethnic group in China and the Vietnamese population sample. Each of the groups had large paired Fst values with the Yakut from Siberia with the exception of the Mongolian, Hezhen and Oroqen ethnic groups that derive from north-eastern China or Mongolia. The relative size of the Fst values also generally corresponded to the geographical separation of the EAS population groups (depicted in Figure 1).

…

Principal Component Analyses Using >200 K SNPs Show Substructure Relationships

To further explore the relationship among EAS population groups and examine population substructure PCA was performed using the genotype results from a set of >200 K SNPs. Analyses were done with and without the inclusion of the Yakut population thought to originate in central Asia, since PCA results are influenced by the inclusion or exclusion of different population groups and we were interested in the relationship between EAS and central Asian populations. The first two principal components in these analyses display the largest genotype variation (Table 2) and are graphically depicted in Figure 1. Inclusion of the Yakut group showed a possible cline in PC1/PC2 that extends from the current Siberian location of the Yakut to the northern East Asian population groups (Figure 1A). Interestingly, the position of the different population groups shows a remarkable correspondence with the geographic origin of each group. This is more clearly suggested when the Yakut population is excluded (Figure 1B) and is best illustrated by comparing these geographic locations with rotated PCA results (Figure 1C and D). Additional, PCA analyses including the central Asian Uygur and Hazara population groups were also performed but these did not show a clear relationship with the EAS (Supplemental Figure S1).

The PCA results for PC1 and PC2 are generally consistent with the relative paired Fst values with respect to the distance separation among the different population groups. For example the position of the Korean group approximately midway between the HapMap CHB and JPT groups both graphically (Figure 1) and as discussed above for paired Fst values. It is also consistent with the closer relationship between the Dai ethnic group and the Vietnamese subjects. However, the first two PCs do not show the full relationships among the population groups. For example the Lahu ethnic group appears to be closely related to the Cambodian ethnic group (Figure 1), although the paired Fst value is relatively large (Table 1). Examination of additional PCs shows the large difference between the Lahu and Cambodian ethnic groups in PCs 3, 4 and 5 (Figure 2). Using both the Kruskal-Wallis test [23], a nonparametric alternative to the ANOVA, and a split half reliability test (see Methods) substructure was present in multiple principal components (Table 2). Substantial population substructure can be observed by the nonrandom grouping of population groups that extends through PC7

For the entire EAS population groups studied, the majority of substructure variation defined by PCA appears to be within the first 4 PCs (Table 2). The eigenvalues plateau after PC4 with only small differences observed in subsequent PCs (Figure 3a). The proportion of the sum of the eigenvalues above this plateau provides a measure of the relative amount of substructure variation defined by each PC (Figure 3b). For the total EAS group, >90% of the substructure is defined in the first four PCs by this measurement. For the group of the five populations representing the most populous ethnic groups studied the first two PCs account for 90% of the variation above the plateau.

Similar analyses were also performed using population sets restricted to the more closely related Han Chinese, Japanese, and Korean groups, as well as a group restricted to Han Chinese and Chinese Americans (Table 2). These results as expected indicated substantially less substructure. However, even the subject set limited to Han Chinese and Chinese Americans showed substructure in PC1 using the split half reliability test and with the self identified groupings (ANOVA result). The relationship among the Han Chinese can be demonstrated in PCAs performed either including or excluding other EAS populations (Figure 4). Although there is variability in the distribution of many of the self-identified groups there was a general northwest/southeast gradient within these Chinese participants. In PC1 the North Han Chinese (HGDP from north central China[12]) were most separated from the southern Chinese participants including the Chinese American participants from Taiwan or with self-reported southern China origin.

The graphic representation of the first two PCs showed close correspondence to the historical geographical location and/or sample collection site for most of the EAS population groups. Thus, despite admixture and perhaps uncertain migration patterns, overall the largest component of genotypic variation that is discernable by reducing high order data (all genotypes) to lower order variations (PCs) is consistent with the population geography. This finding supports hypotheses that the relationships among the EAS populations are largely explained by clines formed by demic expansion(s). We speculate that the inclusion of many different related ethnic groups has recapitulated the most common events that separated these ethnic groups. The first PC axis accounting for the largest variation has a north/south orientation. One major part of this pattern forms a line from Siberia (Yakut) to Mongolia to Eastern China (Figure 1). The PCA analyses also suggest that at least two separate clines originating or terminating in eastern China at one end and Cambodia and the Philippines at the other end. In addition there is another cline extending from Eastern China to the Korean peninsular and Japan.

Multiple previous studies have examined the relationship between and possible origins of different EAS population groups. Analysis of mitochondrial and Y chromosome haplogroups as well as a limited numbers of classical markers and microsatellite polymorphisms have also provided results that are generally consistent with a north/south orientation of relationships between different EAS population groups [15]–[18]. However, there are exceptions with some studies failing to show this relationship [19], [25]. Summarized by a recent review [26] there are three different postulates regarding the origins of EAS population groups: 1) South East Asian origin [14]–[18], 2) North Asian origin [27] and 3) a combination of northern and southern origin[19], [20]. However, the majority of studies have supported a South-East Asian origin for most EAS populations and include detailed analyses of the age of specific mitochondrial haplogroups, Y chromosome sequences as well as limited marker studies [26]. In contrast, hierarchical trees in the recent HGDP study [13] show branching points consistent with a Yakut derivation. Recent studies using a novel copying model statistical approach appear to suggest an initial northern and southern origin (Cambodians, Mongolians, Xibo, Yi , Tu, Daur, and Naxi receiving large contributions from central-Asian populations) that contribute to Han ancestry [28]. These studies also provide data supporting the derivation of many other EAS groups from a Han expansion (including She, Japanese, Dai, Lahu and Miao). While the current study does not strongly support any of these hypotheses, it does suggest that eastern China is central to the events shaping the population groups in this region. …

Kang L, et al., Northward genetic penetration across the Himalayas viewed from Sherpa people. Mitochondrial DNA. 2014 Mar 11.

Abstract

The Himalayas have been suggested as a natural barrier for human migrations, especially the northward dispersals from the Indian Subcontinent to Tibetan Plateau. However, although the majority of Sherpa have a Tibeto-Burman origin, considerable genetic components from Indian Subcontinent have been observed in Sherpa people living in Tibet. The western Y chromosomal haplogroups R1a1a-M17, J-M304, and F*-M89 comprise almost 17% of Sherpa paternal gene pool. In the maternal side, M5c2, M21d, and U from the west also count up to 8% of Sherpa people. Those lineages with South Asian origin indicate that the Himalayas have been permeable to bidirectional gene flow.

“…In this paper, we mainly focus on the origin and migration pattern of Sherpa people. According to the historical literature, Sherpa migrated from the Kham region in eastern Tibet and western Sichuan to the southern foot of the Himalayas (Oppitz, 1968). However, some folktales suggest the Sherpa as descendants of the Tangut Kingdom (1038 to 1227 AD) who fled their homeland in Muyag district to escape Mongol invasion (Gong-Bo, 2011). Here, we use informative Y chromosome and mtDNA markers to give a clue about the northward gene flow across the Himalayas and shed light on the origin of the Sherpa..”

“According to the nomenclature of Y Chromosome Consortium(YCC) (Karafet et al., 2008; Yan et al., 2011), nine SNP haplogroups were determined from the 84 male individual samples (Figure 1a and Table S1). Haplogroup D1-M15, which is supposed to be the Paleolithic genetic legacy with a wide distribution among most Tibeto-Burman, Tai-Kadai, and Hmong-Mien populations (Shi et al., 2008), is also prevalent in Sherpa(11.90%). Haplogroup D3-P99 and its sublineage D3a-P47 are almost exclusively distributed in Tibeto-Burman populations(Shi et al., 2008), and also found highly frequent in Sherpa(7.14% and 15.48%, respectively). Haplogroup O3a2c1a-M117,one of the three main sublineages of O3, accounts for about 16%of Han Chinese and also exhibits high frequencies in Tibeto-Burman populations (Wang & Li, 2013; Yan et al., 2011). In thisstudy, O3a2c1a-M117 comprises nearly half of Sherpa people(45.24%). The frequencies of another two main components of Sino-Tibetan populations, O3a2c1*-M134 and O3a1c-002611(Wang et al., 2013; Yan et al., 2011), are negligible in Sherpa(1.19% and 0, respectively). It is particularly noteworthy that Central-South Asia and West Eurasia related haplogroups R1a1a-M17 and J-M304 (Zhong et al., 2011) have also been detected at considerable frequencies in WSC populations, espe-cially R1a1a-M17, which contributes 11.90% of Sherpa. Figure 2(a) illustrates a PCA plot based on 68 populations,including the Sherpa people in this study and 67 reference populations retrieved from literature. Almost all the Tibeto-Burman populations, including Sherpa, cluster together in the middle left corner of the plot, which is accounted for by the extensive sharing of haplogroup D1-M15, D3-P99, and O3a2c1a-M117 among them. The middle and upper right corner depict the Indo-European, Dravidian and Austro-Asiatic populations in the South Asian Subcontinent, due to the high frequency of haplogroup R, L, and H. The Altaic populations segregate intermediate between the East Asian and South Asian clusters. The Sherpa people slightly tend to deviate from the East Asian cluster owing to its considerable frequency of haplogroup R. We then used the Y-STR induced Rst NJ tree and structure analysis to show the overall clustering pattern of those populations at both the haplogroup and haplotype level. In haplogroup R1a1a-M17, Sherpa is mainly clustered with populations from Afghanistan and India in the NJ tree. However, Sherpa people share most haplotypes with Newar people of Nepal and Brahui people of Pakistan as revealed by the structure plot (Figure 3a). It is worthy to note that only two haplotypes have been identified in ten R1a1a-M17 samples of Sherpa, which suggest a founder effect took place when this lineage was involved in Sherpa or this lineage have experienced bottlenecks.

In haplogroup O3a2c1a-M117, most of the Tibetan populations cluster tightly together in the NJ tree, along with Sherpa and Tamang of Nepal. However, more haplotypes of Sherpa samples share ancestry with Tibetan and other Tibeto-Burman populations from East Asia other than from Nepal (Figure 3b). Similarly, haplotypes of D1-M15 of Sherpa share ancestry with Tibetan, northwestern Han, and Zhuang (included in Tai-Kadai) populations from East Asia, although Sherpa has tended to be segregated away from the Tibetan cluster in the NJ tree. Similarly with R1a1a-M17, the haplotype diversity of D1-M15 samples in Sherpa is very low, and the haplotypes of Sherpa are only a small subset of thos e of Tibetan populations, probably also due to the founder effect when Sherpa was formed (Figure 3c). As we have mentioned above, haplogroup D3-P99 and D3a-P47 are almost exclusively distributed in Tibeto-Burman populations; but not only that, haplotypes of D3 also show strong similarities among different populations with distinctive and specific seven repeats at locus DYS392 (Figure 3d and Table S1). The haplotypes of J-M304 samples have already given us sufficient information to infer their origin and diffusion, although without typing downstream markers for those samples. The J-M304 samples should be probably assigned into J2 haplogroup and the haplotypes of those samples show strong similarities with those of Indian (in Southwest India) (Chennakrishnaiah et al.,2013), Malaysian Indian (Pamjav et al., 2011), and Lebanesesamples (Zalloua et al., 2008). One Sherpa sample has beenassigned as paragroup F*, which is observed only infrequently and primarily on the Indian subcontinent (Karafet et al., 2008).

The most common mtDNA haplogroups in Sherpa are A4, C4a3b, M9a1a, D4, and U(including U* and U2a), in order of frequency. The majority of the mtDNA lineages belong to eastern Eurasian specific groups, including those from Northeast Asia (A, D4, D5, G, C, and Z)(Derenko et al., 2003, 2007; Tanaka et al., 2004) and SouthernChina or Southeast Asia (F, M9, M12 and M13) (Li et al., 2007), accounting for 59.02% and 23.50%, respectively. The South Asianlineages (M5c2, M21d, U* and U2a) also comprise 7.65% of Sherpa people, which is consistent with our previous observation(Kang et al., 2013). The considerable South Asian component of mtDNA lineages in Sherpa also corroborates the above Y chromosome evidence that the northward gene flow from the South Asian Subcontinent into Tibet really happened and has played an ignorable role in the formation of Sherpa. We used a PCA based on the distribution of mtDNA haplogroup frequencies of 100 populations to show the matrilinealgenetic patterns (Figure 2b). Most of the Tibeto-Burman popu-lations, including Sherpa, cluster tightly in the upper leftcorner of the plot. However, the Indo-European and Dravidian populations have been situated in the lower left corner. Similarly, with the Y chromosome PCA plot, the Altaic populations also aggregate intermediate between the East Asian Tibeto-Burman cluster and the South Asian groups. However, the results based on haplogroup frequency compari-sons could be misleading because of the quickly changing frequencies of the mtDNA lineages (Lu et al., 2013). A network analysis of individual lineages will most likely offer a better investigation of maternal relationships among the Sherpa and Himalayan populations.

…

. Haplogroup A4, C4a, and M9a comprise more than 60% of Sherpa samples, and the networks of those haplogroups were analyzed based on the HVS-I motif (Figure 4). In haplogroup A4, most haplotypes of Sherpa are shared with Tibeto-Burman, Altaic, and Han Chinese and clustered in the main clade of the network. The Indo-European samples of SouthAsian Subcontinent are scattered in the terminal nodes throughout the network (Figure 4a), indicating late emergence of A4 into those Indo-European populations. In the network of C4a, nearly all the Sherpa samples cluster together and form a big exclusive clade along with few Nepalese from Katmandu (Figure 4b). Those exclusive haplotypes might represent the ancient component of Sherpa. The initial C4a individuals of Sherpa might have undergone founder events or bottlenecks in their history, and then remained genetically isolated for a long period of time. In the network of M9a, about half Sherpa M9a samples share the root haplotypes with the main ancestral clade, other samples mainly cluster with Indo-European and Tibeto-Burman samples in the terminal small clades (Figure 4c). The star-like networks of M9a and its sublineages have clearly indicated the population expansion of those lineages in Tibeto-Burman populations. The M9a samples of Sherpa and Indo-European populations might probably be results of the expansion of M9a in the Tibet Plateau

Discussion

Tibeto-Burman origin of Sherpa

About 83% of Sherpa Y chromosomes, including haplogroup C,D, and O, can be assigned an East or Southeast Asian origin. Detailed genetic structures at haplotype level of those lineages reveal strong affinities between Sherpa and Tibeto-Burman populations (especially with Tibetans). From the maternal side, mtDNA lineages that can trace to the East or Southeast Asian origin comprise about 82.5% of Sherpa people, and most of HVS-I haplotypes are shared or close connected with samples of Tibeto-Burman and Altaic populations. The internal homogeneity observed in some lineages suggests a possible founder effect during the origin of Sherpa, especially for Y chromosome haplogroup D1-M15 and O3a2c1a-M117, mtDNA haplogroup A4 and C4a; that is, Sherpa people of those haplogroups are derived from a small number of migrants from a Tibeto-Burman source population.

Northward gene flow across Himalayas

Considerable South Asian genetic components have been observed in Sherpa. Y chromosome haplogroup R1a1a-M17, J-M304, F*-M89 comprise almost 17% of Sherpa paternal genepool. However, in the maternal side, M5c2, M21d, U* and U2aonly account for 8% of Sherpa people. The obvious sex-biased gene flow might be caused by physiological difference between male and female, especially under the extreme circumstances on the plateau. Paternal lineage R1a1a-M17 might probably experience severe founder effect or bottlenecks in Sherpa. In the previous studies, the negligible gene flow from Indian Subcontinent to Tibetan Plateau has suggested a unidirectional human dispersal pattern in the Himalayan mountain range. However, the considerable northward gene flow across Himalayas has been observed in Sherpa’s case.

…

SNP YDNA updated O tree The north and south divisions are shown at p. 3 here

Ancient corridors via Tharus (Nepal) for migrations from East Asia

Leave a comment Cancel reply

Categories

Top Posts & Pages

Search

Pages

Blogroll

Search this site

Ancient corridors via Tharus (Nepal) for migrations from East Asia

Share this:

Leave a comment Cancel reply

Categories

Top Posts & Pages

Search

Pages