The hunt for the cradle of O2, O2b and O3 populations

Y-chromosome haplogroup O2a is neglible in Japan except for a 3.1% frequency in Asahikawa, Hokkaido, which suggests the lack of provenance nor genetic affinity between Dai and Vietnamese populations for the O2 populations in Japan. This suggests that Yayoi burial pottery and their makers did not emerge from Thai or Vietnamese populations in ancient times despite the similarity of the burial pottery traditions.

Burial pottery jar from Sa Huynh Culture, in today’s central and southern Vietnam that flourished between 1000 BC and 200 AD Source: Wikimedia Commons

We present the frequency distributions of 13 Y-specific STR polymorphisms (DYS19, DXYS156, DYS385, DYS389 I and II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439 and YCAII) and the frequency of the combination of these haplotypes in Vietnamese males
Viet-Muong total (Cai et al. 2011)
2/27 = 7.4% F-M89(xK-M9)
2/27 = 7.4% O-M175(xO1a-M119, O2a1-M95, O3-M122)
8/27 = 29.6% O3-M122(xM7, M134)
2/27 = 7.4% O3a2c1-M134(xM117)
2/27 = 7.4% O3a2c1a-M117
1/27 = 3.7% O1a-M119(xM110)
10/27 = 37.0% O2a1-M95(xM88)Kinh from Hanoi, Vietnam (He et al. 2012)
9/76 = 11.8% C2-M217
1/76 = 1.3% K-P131(xN-M231, O-P191, Q1-P36, R-M207)
2/76 = 2.6% N-M231
5/76 = 6.6% O1a1-P203(xM101)
9/76 = 11.8% O2a1-M95(xM88)
23/76 = 30.3% O2a1a-M88

7/76 = 9.2% O3a-P200(xM121, M164, P201, JST002611)
2/76 = 2.6% O3a2-P201(xM7, M134)
8/76 = 10.5% O3a2b-M7
7/76 = 9.2% O3a2c1-M134
2/76 = 2.6% O3a1c-JST002611
1/76 = 1.3% R1a1a-M17Vietnam (Karafet et al. 2010)
3/70 = 4.3% C2-M217
2/70 = 2.9% D1a1-M15
1/70 = 1.4% J-M304(xJ1-M267, J2-M172)
1/70 = 1.4% J2-M172(xJ2b-M12)
2/70 = 2.9% N-M231 [LLY22g+]
2/70 = 2.9% O3a-P197(xP201, JST002611)
10/70 = 14.3% O3a1c-JST002611
1/70 = 1.4% O3a2-P201(xM7, M134)
4/70 = 5.7% O3a2b-M7
11/70 = 15.7% O3a2c1-M134
4/70 = 5.7% O1a1-P203
1/70 = 1.4% O2-P31(xO2a1-M95, O2b-SRY465)
1/70 = 1.4% O2b-SRY465(x47z)
2/70 = 2.9% O2b1a-47z
5/70 = 7.1% O2a1-M95(xM111)
14/70 = 20.0% O2a1a-M111
5/70 = 7.1% Q1-P36(xM346)
1/70 = 1.4% R1a1a-M17
DYS 385 Y-DNA haplogroup O2a – is also found in mostly northern Thai Hmong
Jatupol Kampuansai* and Kullaporn Totsparin,
Abstract
DYS385 is a polymorphic marker on the Y chromosome. The distribution of DYS385 genotypes was studied in five Hmong populations residing in the northern part of Thailand. Genotype 13–21 was the most frequently observed in the Hmong. The haplotype diversity of the Hmong (0.8916, n = 90) was lower than the previously investigated in the Northern Thai population. The genetic relationship using DYS385 variations does not correlate with the Hmong cultural variety (White Hmong and Black Hmong) or locality relatedness. The DYS385 high polymorphism, compares to other loci, is very useful for identifying individuals and paternity test.
The Hmong is the second largest group of hilltribes in Thailand with the estimated population number about 111,677 people. The majority of the Hmong had slowly migrated southward from China into Laos, then from Laos into northern Thailand during the first half of the twentieth century. The latest exodus of a Hmong migration into Thailand occurred recently about 50-60 years ago
Haplotype diversity of the Hmong (0.8916) was lower than the previous investigated in the Northern Thai (0.9430) (Bhoopat et al. 2003). The consanguineous marriage together with the patrilocal postmarital residence cultures may be the important factors that shape low diversity in the Hmong, comparing to the inter-ethnic marriage in Northern Thai. However, there is no specific DYS385 variant to distinguish the Hmong from other ethnic groups. Allele 13, the most frequent allele in Hmong, also appears at high frequency among the EastAsian populations. Moreover, the haplotype 13-17, 13-18, 13-19 which counted for one-third of the Northern Thai (Bhoopat et al. 2003), are observed in most of the Hmong populations. Even though, haplotype 13-21 is the highest frequency in pooled Hmong samples, it is not the highest one in Hm1KK and Hm2KL. Thus, there is no DYS385 Hmong-specific pattern observed in this study.
Comparing among the Y-STRs, DYS385 shows the highest polymorphism. Its specialized feature as duplicated linked subloci creates lot number of haplotype variations. Even though, distinct DYS385 haplotype frequencies distributions are observed among the Hmong, not all population pairs are statistical significant different, bases on linearization Rst values (Table 2). Only the Hm3MT is genetically different from all other Hmong. Most of the populations showed close related to at least one of the other villages. This result suggests that the diversity of DYS385 can not distinguish the genetic different among Hmong populations. High polymorphic Y-STR loci, such as DYS19, DYS460, DYS461, dataset must be combined to increase the power of discrimination. Interestingly, the Y-STR Haplotype Reference Database (YHRD) (available online at http://www.yhrd.org) suggests an informative forensic Y-STR core set, consists of DYS19, DYS389a/b, DYS390, DYS391, DYS392, DYS393, DYS385a/b, but the DYS391 locus presents a low diversity in the Hmong (Figure 3) and should not be used to distinguish the Hmong individuals. This observation lead to our suggestion that, as the Y-STRs distributions in each ethnic group are different, the survey of genetic marker diversities is still important before establishing specific Y-STRs dataset for forensic purpose in particular populations
While the evidence shows that O2b did not emerge downstream of the Dai or Hmong populations.
It has been, however, theorized that the O2b (of Korea, Japan) and O2a of SEA emerged from the same ancestral O2*, which in turn issued from O1 see George Van Driem:

Last Glacial Maximum, haplogroup O split up into the subclades O1 (MSY2.2), O2 (M268) and O3 (M122).

The three subclades can be putatively assigned to three geographical loci along an east-west axis without any claim to geographical precision. Whereas haplogroup O1 moved to the drainage of the Pearl River and its tributaries, the bearers of haplogroup O2 moved to southern Yunnan, whilst bearers of haplogroup O3 remained in the Eastern Himalaya. The O2 clade split into O2a (M95) and O2b (M176). Asian rice may have first been domesticated roughly in the area hypothetically imputed to O2 south of the central Yangtze.

The interaction between ancient Austroasiatics and the early Hmong-Mien not only involved the sharing of rice agriculture technology, but also left high frequencies of haplogroup O2a in today’s Hmong-Mien and haplogroup O3a3b in today’s Austroasiatic populations.

The bearers of the subclade O2a became the ancestors of the Austroasiatics, who spread initially to the Salween drainage in northeastern Burma, to northern Thailand and to western Laos. In time, the Austroasiatics would spread as far as the Mekong delta, the Malay peninsula and the Nicobars. Later, early Austroasiatics would introduce both their language and their paternal lineage to indigenous peoples of eastern India, whose descendants are today’s Munda language communities.

Meanwhile, the bearers of the fraternal subclade O2b spread eastward, where they introduced rice agriculture to areas downstream south of the Yangtze. The bearers of the O2b haplogroup continued to sow seed as they continued to move ever further eastward, but they left no linguistic traces. This paternal lineage moved as far as the Korean peninsula and represents the second major wave of peopling attested in the Japanese genome. Yet the Japanese speak a language of the Altaic linguistic phylum, and the peopling of Japan is a distinct episode of prehistory.

At the dawn of the Holocene in the Eastern Himalaya, haplogroup O3 gave rise to the ancestral Trans-Himalayan paternal lineage O3a3c (M134) and the original Hmong-Mien paternal lineage O3a3b (M7). The bearers of haplogroup O3a3c stayed behind in the Eastern Himalaya, whilst bearers of the O3a3b lineage migrated east to settle in areas south of the Yangtze. On their way, the early Hmong-Mien encountered the ancient Austroasiatics, from whom they adopted rice agriculture.

The interaction between ancient Austroasiatics and the early Hmong-Mien not only involved the sharing of rice agriculture technology, but also left high frequencies of haplogroup O2a in today’s Hmong-Mien and haplogroup O3a3b in today’s Austroasiatic populations. The Austroasiatic paternal contribution to Hmong-Mien populations was modest, but the Hmong-Mien paternal contribution to Austroasiatic populations in Southeast Asia was significant. However, the incidence of haplogroup O3a3b in Austroasiatic communities of the Subcontinent is undetectably low. Subsequently, the Hmong-Mien continued to move eastward, as did bearers of haplogroup O2b.

Even further east, the O1 (MSY2.2) paternal lineage gave rise to the O1a (M119) subclade, which moved from the Pearl River to the Min river drainage in the Fujian hill tracts and then across the Taiwan Strait. Formosa consequently became the homeland of the Austronesians. The Malayo-Polynesian expansion via the Philippines into insular Southeast Asia must have entailed the introduction of Austronesian languages by bearers of haplogroup O1a to resident communities, whose original Austroasiatic paternal haplogroup O2a alongside other older paternal lineages would remain dominant even after linguistic assimilation. Similarly, Malagasy is an Austronesian language, but the Malagasy people trace their biological ancestries equally to Borneo and the African mainland.

Back in the Eastern Himalaya, the paternal spread of Trans-Himalayan is preserved in the distribution of Y-chromosomal haplogroup O3a3c (M134). The centre of phylogenetic diversity of the Trans-Himalayan language family is rooted squarely in the Eastern Himalaya, with outliers trailing off towards the  loess plains of the Yellow River basin in the northeast.

While Van Driem posits a southerly origin of O and O3 due to the Trans-Himalayan distribution of O3a3c, DNA studies based on the populations of Central Plains and surrounding it show a O3 as the securely a northern Han marker and that Hengbei emerges as the centre of the northern Han and contact zone between Northern and Southern Han where  O*, O2a and O3 and N haplogroups are mixed into the populations here.

Zhao Y-B, Zhang Y, Zhang Q-C, Li H-J, Cui Y-Q, Xu Z, et al. (2015) Ancient DNA Reveals That the Genetic Structure of the Northern Han Chinese Was Shaped Prior to 3,000 Years Ago. PLoS ONE 10(5): e0125676. https://doi.org/10.1371/journal.pone.0125676

Abstract
The Han Chinese are the largest ethnic group in the world, and their origins, development, and expansion are complex. Many genetic studies have shown that Han Chinese can be divided into two distinct groups: northern Han Chinese and southern Han Chinese. The genetic history of the southern Han Chinese has been well studied. However, the genetic history of the northern Han Chinese is still obscure. In order to gain insight into the genetic history of the northern Han Chinese, 89 human remains were sampled from the Hengbei site which is located in the Central Plain and dates back to a key transitional period during the rise of the Han Chinese (approximately 3,000 years ago). We used 64 authentic mtDNA data obtained in this study, 27 Y chromosome SNP data profiles from previously studied Hengbei samples, and genetic datasets of the current Chinese populations and two ancient northern Chinese populations to analyze the relationship between the ancient people of Hengbei and present-day northern Han Chinese. We used a wide range of population genetic analyses, including principal component analyses, shared mtDNA haplotype analyses, and geographic mapping of maternal genetic distances. The results show that the ancient people of Hengbei bore a strong genetic resemblance to present-day northern Han Chinese and were genetically distinct from other present-day Chinese populations and two ancient populations. These findings suggest that the genetic structure of northern Han Chinese was already shaped 3,000 years ago in the Central Plain area.

Introduction
According to historical documents, the generally accepted view is that the Han Chinese can trace their origins to the Huaxia ethnic group, which formed during the Shang and Zhou dynasties (21st–8th centuries BC) in the Central Plain region of China (Fig 1) [2]. During the Han Dynasty (260 BC-220 AD), the Huaxia ethnic group developed into a tribe known as the Han Chinese [3]. Because of their advanced agriculture and technology, this group migrated northward into regions inhabited by many ancient northern ethnic groups that were most likely Altaic in origin [4]. In addition, they migrated south into regions originally inhabited by ancient southern ethnic groups, including those speaking the Daic, Austro-Asiatic, and Hmong-Mien languages [3]. Historically, the Han Chinese dispersed across China, becoming the largest of the 56 officially recognized ethnic groups.

To date, studies of classic genetic markers and microsatellites have revealed that the Han Chinese can be divided into two distinct groups: the northern Han Chinese (NH) and the southern Han Chinese (SH) [9,10]. Based on present-day genetic data from NH, SH, and southern minorities, the genetic history of the SH group has been well studied. The consensus is that the Han Chinese migrated south and contributed greatly to the paternal gene pool of the SH, whereas the Han Chinese and ancient southern ethnic groups both contributed almost equally to the SH maternal gene pool [11]. However, the genetic history of the NH is still obscure. Currently, NH populations inhabit much of northern China, including the Central Plain and many outer regions that were inhabited by ancient northern ethnic groups (Fig 1). The Han Chinese or their ancestors who migrated northward from the Central Plain might have mixed with ancient northern ethnic groups or culturally assimilated the native population. This scenario would indicate that the Han Chinese living in different areas should have genetic profiles that differ from each other. However, genetic analyses have shown that there are no significant differences among the northern Han Chinese populations [12], which has led to conflicting arguments on whether the genetic structure of the NH is the result of an earlier ethnogenesis or, instead, results from a combination of population admixture and continuous migration of the Han Chinese. The addition of ancient DNA analysis on ancient Han Chinese samples provides increased information that can be used to reconstruct recent human evolutionary events in ancient China [13].

Until now, only a few genetic studies have investigated the ancient Han Chinese or their ancestors. These studies have been restricted by small sample sizes [14,15], high levels of kinship among samples [16], and short fragments of mitochondrial DNA (mtDNA) [17,18] and thus provide limited insights into the genetic history of the Han Chinese. Recently, a large number of graves were excavated at a necropolis called Hengbei located in the southern part of Shanxi Province, China, on the Central Plain (Fig 1), that dates back to approximately 3,000 years ago (Zhou dynasty) [19], a key transitional period for the rise of the Han Chinese. In a previous study investigating when haplogroup Q1a1 entered the genetic pool of the Han Chinese, we analyzed Y chromosome single nucleotide polymorphisms (SNPs) from human remains excavated from the Hengbei (HB) site and identified haplogroups for 27 samples[20]. In the present study, we attempted to extract DNA from 89 human remains. Using a combination of Y chromosome SNPs and mtDNA genetic data, we uncover aspects of the genetic structure of the ancient people from the Central Plain region and begin to determine the genetic legacy of the northern Han Chinese in both the maternal and paternal lineages
Recently, a large number of graves were excavated at a necropolis called Hengbei located in the southern part of Shanxi Province, China, on the Central Plain (Fig 1), that dates back to approximately 3,000 years ago (Zhou dynasty) [19], a key transitional period for the rise of the Han Chinese. In a previous study investigating when haplogroup Q1a1 entered the genetic pool of the Han Chinese, we analyzed Y chromosome single nucleotide polymorphisms (SNPs) from human remains excavated from the Hengbei (HB) site and identified haplogroups for 27 samples[20]. In the present study, we attempted to extract DNA from 89 human remains. Using a combination of Y chromosome SNPs and mtDNA genetic data, we uncover aspects of the genetic structure of the ancient people from the Central Plain region and begin to determine the genetic legacy of the northern Han Chinese in both the maternal and paternal lineages
Discussion
The Han Chinese originated from the Central Plain region, which is substantially smaller than the region the Han Chinese now occupy. According to historical documents, the Han Chinese suffered many conflicts with natives prior to expansion into their lands[3]. The Han migrated northward into regions inhabited by many ancient northern ethnic groups. Based on the advanced agriculture, technology, and culture, the Han Chinese or their ancestors often had a greater demographic advantage over ancient northern ethnic groups. Thus, the Han Chinese or their ancestors might have played a predominant role in the genetic mixture of populations. This scenario would mean that the genetic structure of the NH was shaped a long time ago. In our study, the HB population showed great genetic affinities with the NH when maternal lineages were tested. First, the HB contained a distribution and component of mtDNA similar to that of the NH and clustered closely together with the NH in the PCA plot. Second, the HB shared more haplotypes with the NH than with other populations in the haplotype-sharing analysis. Third, the FST value from comparisons between the HB and NH populations was lowest and negative. Generally, FST value should theoretically range between 0 and 1. However, if the estimate of within diversity is larger than the estimate obtained of variance among groups, negative FST values should be obtained, and they are represented as equal to zero[48,49]. It indicated that HB bore a very high similarity to NH populations. Considering the location and culture of the HB, we suggest that the NH might have provided a significant contribution to the HB and find that the maternal genetic profiles of the NH were shaped 3,000 years ago.

These conclusions are further supported by the relationship between the HB and NM, XN, and XB. In our study, the PCA plot is consistent with the SH not only mixing with the SM but also with the NH, which is consistent with a previous genetic study that concluded that the SH was formed from almost equal contributions of southward migrating Han Chinese and southern natives [11]. However, the NH and NM group into two separate clusters, which is not consistent with their current geographic distributions because these two populations often live together in the northern region of China. Moreover, XN,XB1 and XB2 pool into the NM and are far away from HB and NH. A haplotype-sharing analysis of the three ancient populations and each present-day Han Chinese population shows that the fraction of haplotypes from HB is significantly higher than that from XN, XB1 and XB2 (all of the p values of HB/XN, HB/XB1 and XB2 are less than 0.01, two-tailed t-test; S4 Fig). In the FST comparisons, the FST values of the XN/HB, XB/HB, XB/NH, XN/NH, and NM/NH are significantly higher, and all of the p values are less than 0.05, indicating that the XN and XB were distinct from the NH and HB (S3 Fig). This finding indicates that the ancient populations of the XN and XB had a limited maternal genetic impact on present-day Han Chinese.

Y chromosome SNP analysis was consistent with the conclusions drawn from studying the maternal lineages. In the paternal lineage, HB contained the haplogroups or sub-haplogroups N, O*, O2a, O3 and Q1a1. The total frequencies of these haplogroups reached high levels (66%–100%) in current Han Chinese [11,27,30,52,53]. Haplogroup Q1a1, which was predominant in HB, is highly specific to the Han Chinese [53]. Haplogroup O3, the second highest frequency (33.34%) in HB, occupies the highest frequencies in almost all current Han Chinese populations (32.5%-76.92%) [11,27,30,52,53]. Moreover, in the PCA plot, HB groups closely with the Han Chinese. These results indicate that the 3,000-year-old ancient people from the Central Plain region share similar paternal genetic profiles with the current Han Chinese. In contrast, XN yielded three haplogroups (N3, Q, and C) but no haplogroup O [54]. The frequency of O in NM is significantly lower than the frequency of O in NH, but the frequency of haplogroup N shows the inverse trend. Moreover, NM has a relatively high frequency of haplogroup R, but NH does not. Thus, the major paternal genetic component of NH was shaped in the Central Plain region of China prior to 3,000 years ago.

According to historical documents, most of the ancient populations that inhabited the northern region of China were nomads. With no permanent settlement, these populations often moved from place to place. In contrast, the ancestors of the Han Chinese were farming people, who often settled down in a region and seldom moved. Following increases in population size, the ancestors of the Han Chinese gradually expanded into the surrounding areas and conflicted with the ancient northern groups. Finally, most of the ancient northern groups gradually disappeared. Because of the large differences in lifestyle and culture between farmers and nomads, most of the ancient northern ethnic populations might have migrated to other areas when they were defeated, and their lands were gradually occupied by the Han Chinese. A similar population replacement model is also found in Europe, where the diffusion of agriculture and language from the Near East was concomitant with a large movement of farmers [13,55–58]. The Han Chinese have the largest population size relative to the populations they admixed with, suggesting a stable genetic structure in the northern Han Chinese for at least the past 3,000 years.

DISTRIBUTION OF MITOCHONDRIAL DNA HAPLOGROUPS

According to a previous study, the haplogroups of the Han Chinese can be classified into the northern East Asian-dominating haplogroups, including A, C, D, G, M8, M9, and Z, and the southern East Asian-dominating haplogroups, including B, F, M7, N*, and R [11]. These haplogroups account for 52.7% and 33.85% of those in the NH, respectively. Among these haplogroups, D, B, F, and A were predominant in the NH, with frequencies of 25.77%, 11.54%, 11.54%, and 8.08%, respectively [11,23,24,28,51]. However, in the SH, the northern and southern East Asian-dominating haplogroups accounted for 35.62% and 51.91%, respectively. The frequencies of haplogroups D, B, F, and A reached 15.68%, 20.85%, 16.29%, and 5.63%, respectively. Notably, in the HB samples, haplogroups D, B, F, and A were also predominant and showed frequencies of 23.44%, 12.5%, 10.93%, and 10.93%, respectively. In addition, the frequency of haplogroup M* was high and reached 17.19%. Other haplogroups such as C, G, M7, M8, M9, Z, N9a and R had lower frequencies at 3.13%, 1.56%, 1.56%, 3.13%, 7.81%, 3.13%, 3.13% and 1.56%, respectively. The northern and southern East Asian-dominating haplogroups account for 50.04% and 26.56%, respectively, which is similar to the values in the NH (S2 Fig).

PRINCIPLE COMPONENT ANALYSIS

To further identify the genetic affinities among the HB, two ancient populations and the present-day Chinese population, represented by 9 NH, 9 NM, 14 SH and 57 SM groups, the mtDNA haplogroup distributions were compared using a PCA. The PCA plot of the first and second components (31.81% of the total variance, Fig 2A) shows that the current populations largely segregate into three main clusters: NH (in orange), SH (in blue) and SM (in gray), and NM (in green). The distribution of populations in the PCA plot was in line with their geographic distribution, and these populations were separated by the first principal component. The populations living in northern China (NH and NM) are located on the right side of the PCA, and they contain the northern East Asian-dominating haplogroups A, C, D, G, M8, M9, and Z. In contrast, the populations living in southern China (SH and SM) are located on the left side of the PCA, and they contain the southern East Asian-dominating haplogroups B, F, M7, and R. Moreover, the NH can be separated from other populations except for two SH (Hubei and Shanghai), using the second principal component. The HB population (PC1 value: 0.071; PC2 value: 1.453) groups closely with the NH (PC1 value: 0.239±0.269; PC2 value: 1.590±0.336). Overall, these results indicate that the HB population shares a similar genetic profile with the NH that is distinct from the NM and ancient northern ethnic groups

O2 and O2b-M176 (relabeled as O1b2-M176) patterns of expansion

Note:  Emerging Ancient DNA shows a need to modify some of the earlier theories of southerly cradles for O haplogroups, as well as update the Phylogenetic Trees of O haplogroup, that were formed prior to aDNA analysis,  as seen in

Wei Deng, Evolution and migration history of the Chinese population inferred from Chinese Y-chromosome evidence J Hum Genet (2004) 49:339–348
DOI 10.1007/s10038-004-0154-3

Banpo Village, Xi’an of Shaanxi Province is considered by some Chinese to Zhao Y-B, Zhang Y, Zhang Q-C, Li H-J, Cui Y-Q, Xu Z, et al. (2015) Ancient DNA Reveals That the Genetic Structure of the Northern Han Chinese Was Shaped Prior to 3,000 Years Ago. PLoS ONE 10(5): e0125676. https://doi.org/10.1371/journal.pone.0125676 the cradle of Chinese civilization   

See  updated Phylogenetic Tree of O haplogroup especially of the positions of O2a and PK4 found in Pakistan. The older literature and studies may all have to be revised substantially.

Kim SH, High frequencies of Y-chromosome haplogroup O2b-SRY465 lineages in Korea: a genetic perspective on the peopling of Korea The conclusions of the 2011 study should perhaps be revised to reflect the revised and refined positions of Y-DNA O2a and and O2b as O1b2 as well as new aDNA data on expansion of O3a from Central Plains into Liao River Valley…

Yali Xue, Tatiana Zerjal, Weidong Bao, Suling Zhu, Qunfang Shu, Jiujin Xu, Ruofu Du, Songbin Fu, Pu Li, Matthew E. Hurles, Huanming Yang and Chris Tyler-Smith Male Demography in East Asia: A North–South Contrast in Human Population Expansion Times
GENETICS April 1, 2006 vol. 172 no. 4 2431-2439; https://doi.org/10.1534/genetics.105.054270