C2 haplogroup (yDNA) and presence of Diego blood group confirm recent branching of Mongolian group from northern China, Korea and Japan

Haplogroups C-M217 map source : Haplogroup C (Wikipedia)

The dominant Y-chromosome C2 haplogroup found in Japanese and Ainu populations belongs to the founder-group of paternal Mongoloid lineages, this is supported by the presence of Diego blood group antigens and alleles which correlate  with the expansion of Mongol populations.

Haplogroup C is another extremely old lineage that left Africa approximately 60,000 years ago and spread over most of Eurasia. Two subclades of C are found in Japan: C1a1 (aka C-M8, formerly C1) and C2a (aka C-M93, formerly C3). Both are likely to have been in the Japanese archipelago since the first human beings reached the region 35,000 years ago.

Haplogroup C1a seems to have split around 45,000 years ago in the middle of Eurasia, one group going west to Europe, and the other east to Japan. C1a2 was found among the first Palaeolithic Europeans (Cro-Magnons) during the Aurignacian period, and was still relatively common 7,000 years ago, both among Mesolithic West Europeans and Neolithic farmers from Anatolia. C1a2 is now nearly extinct in Europe. C1a1 is particularly common in Okinawa (7%), Shikoku (10%) and Tohoku (10%), but is apparently absent from Hokkaido and Kyushu.

Haplogroup C2a, representing also 3% of the population, is typically found among the Mongols, Manchus, Koreans and Siberians, which suggest a propagation by the Yayoi farmers. The last surviving tribes of ‘pure’ Ainu people, living on the island of Sakhalin in Russia, just north of Hokkaido, possess 15% of C2a (the remaining 85% being D1b). There is therefore a good chance that C2a could also have come to Japan from Siberia through Sakhalin and Hokkaido. C2a is indeed found at both extremities of the country, peaking in Kyushu (8%), Hokkaido (5%), but is rare in central Japan, which supports the theory of two separate points of entry.

Source: The origins of the Japanese people (Wapedia)

Red cell polymorphisms can provide evidence of human migration and adaptation patterns. In Eurasia, the distribution of Diego blood group system polymorphisms remains unaddressed. To shed light on the dispersal of the Dia antigen, we performed analyses of correlations between the frequencies of DI*01 allele, C2-M217 and C2-M401 Y-chromosome haplotypes ascribed as being of Mongolian-origin and language affiliations, in 75 Eurasian populations including DI*01frequency data from the HGDP-CEPH panel. We revealed that DI*01 reaches its highest frequency in Mongolia, Turkmenistan and Kyrgyzstan, expanding southward and westward across Asia with Altaic-speaking nomadic carriers of C2-M217, and even more precisely C2-M401, from their homeland presumably in Mongolia, between the third century BCE and the thirteenth century CE. The present study has highlighted the gene-culture co-migration with the demographic movements that occurred during the past two millennia in Central and East Asia.

Source: Florence Petit, et al., The radial expansion of the Diego blood group system polymorphisms in Asia: mark of co-migration with the Mongol conquests 

In Japan, relative frequencies of haplogroup C2 (M217) has been found among Ainu people and populations from the Kyūshū Island of Japan.

Haplogroup C2 (M217), is the most frequently occurring branch of the Major Haplogroup C (M130). This haplogroup is believed to have originated in eastern or central Asia, approximately 14700-5100 BC.

M217 is found at high frequencies among Central Asian peoples (Mongolians), indigenous Siberians (Kazakhs), and some Native peoples of North America. Also particularly found in males belonging to peoples such as the Buryats, Evens, Evenks, and Udegeys.

M217 is found at especially high frequencies among the following peoples; Kalmyks, Buryats, Koryaks, Hazaras, Daurs, Manchus, Itelmens, Sibes, Mongolians, and Oroqens. It is also distributed moderately among Ainu of Japan, Han Chinese, Hani and Hui people, other Tungusic peoples, Koreans, Nivkhs, Tuvinians, Altaians, Uzbeks, and Tujia.

The highest frequencies of Haplogroup C2 are found among the populations of Far East Russia and Mongolia. Haplogroup C2 (M217) is the only major clade of Haplogroup C (M130) found among Native Americans Na-Dené tribe.

In Vietnam, Hanoi, C2 has been found in low to moderate frequencies.

C2 Subclade distribution

The subclades of C2 are distributed as follows;

M93 Observed sporadically in Japanese populations.

Source: Mongoloid haplogroup C2-M217


The Mongolian branch is the younger and recent branch

The Y-chromosome haplogroup C2c1a1a1-M407 is a predominant paternal lineage in Mongolic-speaking populations, especially in Buryats and Kalmyks. However, the origin and internal phylogeny of C2c1a1a1-M407 have not been investigated in detail. In this study, we analyzed twenty-three Y-chromosome sequences of haplogroup C2c1a1a1-M407 and its most closely related clades. We generated a high-resolution phylogenetic tree of haplogroup C2c1a1a1-M407 and its upstream clade C2c1a1-CTS2657, including 32 subclades and 144 non-private Y-chromosome polymorphisms. We discover that all available C2c1a1a1-M407 samples from Mongolic-speaking populations belong to its newly defined downstream clade C2c1a1a1b-F8465, whereas all samples of C2c1a1-CTS2657(xF8465) come from northern Han Chinese, Korean, and Japanese. Furthermore, we observe that C2c1a1a1b-F8465 and its subclade C2c1a1a1b1-F8536 expanded at approximately 0.86 and 0.44 thousand years ago, respectively. Therefore, we conclude that C2c1a1a1-M407 in Mongolic-speaking populations has originated from northeastern Asia. C2c1a1a1b1-F8536, the newly defined subclade of C2c1a1a1-M407, probably represents the genetic relationships between ancient Oyrats, modern Kalmyks, Mongolians, and Buryats.

Source: Yun-Zhi Huang,   Whole sequence analysis indicates a recent southern origin of Mongolian Y-chromosome C2c1a1a1-M407 Molecular Genetics and Genomics, June 2018, Volume 293, Issue 3, pp 657–663 |

Wei LH et al., Whole-sequence analysis indicates that the Y chromosome C2*-Star Cluster traces back to ordinary Mongols, rather than Genghis Khan.
Eur J Hum Genet. 2018 Feb;26(2):230-237. doi: 10.1038/s41431-017-0012-3. Epub 2018 Jan 22.

The Y-chromosome haplogroup C3*-Star Cluster (revised to C2*-ST in this study) was proposed to be the Y-profile of Genghis Khan. Here, we re-examined the origin of C2*-ST and its associations with Genghis Khan and Mongol populations. We analyzed 34 Y-chromosome sequences of haplogroup C2*-ST and its most closely related lineage. We redefined this paternal lineage as C2b1a3a1-F3796 and generated a highly revised phylogenetic tree of the haplogroup, including 36 sub-lineages and 265 non-private Y-chromosome variants. We performed a comprehensive analysis and age estimation of this lineage in eastern Eurasia, including 18,210 individuals from 292 populations. We discovered that the origin of populations with high frequencies of C2*-ST can be traced to either an ancient Niru’un Mongol clan or ordinary Mongol tribes. Importantly, the age of the most recent common ancestor of C2*-ST (2576 years, 95% CI = 1975-3178) and its sub-lineages, and their expansion patterns, are consistent with the diffusion of all Mongolic-speaking populations, rather than Genghis Khan himself or his close male relatives. We concluded that haplogroup C2*-ST is one of the founder paternal lineages of all Mongolic-speaking populations, and direct evidence of an association between C2*-ST and Genghis Khan has yet to be discovered.

C2 is also characteristic of various Turkic peoples, see Joo-Yup Lee, S. Kuang’s A Comparative Analysis of Chinese Historical Sources and Y-DNA Studies with Regard to the Early and Medieval Turkic Peoples  Inner Asia 19, no. 2 (2017): 197–239

“… the official Chinese histories do not view the Turkic peoples such as the Tiele/Uighur, Kök Türks (Tujue) and Qirghiz as belonging to a single uniform entity called ‘Turks’. Instead, they describe them as forming separate identities. … one may still infer from the given genetic data that the early and medieval Turkic peoples possessed dissimilar sets of Y-chromosome haplogroups with different representative haplogroups, some of which were of West Eurasian origin. This means that the various Turkic peoples did not have a common patrilineal origin or uniform physiognomy. Notably, the Xiongnu themselves, whether they were a Turkic-speaking entity or not, were a hybrid  people composed of carriers of both East and Inner Eurasian haplogroups C2, N, and Q and West Eurasian haplogroup R1a1.
The analysis of genetic survey data on the Turkic peoples also allows us to speculate on the Turkic Urheimat. We suggest that it was a geographical region where the carriers of haplogroups C2, N, Q and R1a1 could intermix, since these haplogroups are carried by various past and modern-day Turkic peoples in eastern Inner Asia and the Xiongnu. It has been suggested that the early Turkic peoples probably had contact with Indo-European, Uralic, Yeniseian, and Mongolic groups in their formative period (Golden 2006: 139). …  drawing on the findings of DNA studies, we are inclined to think that certain similarities that exist between the Turkic languages and the Mongolic, Tungusic and Uralic languages are at least partly associated with haplogroups C2 and N, among others. More specifically, we conjecture that the Turkic languages came into existence as a result of the fusion of Uralic groups (characterized by a high frequency of haplogroup N subclades) and Proto-Mongolic groups (characterized by a high frequency of haplogroup C2) who also merged with other linguistic groups, including Yeniseian speakers (characterized by a high frequency of haplogroup Q like the Kets) and Indo-European speakers (characterized by a high frequency of haplogroups R1a1).83 The best candidate for the Turkic Urheimat would then be northern and western Mongolia and Tuva, where all these haplogroups could have intermingled, rather than eastern and southern Mongolia or the Yenisei River and the Altai Mountains
regions in Russia….”


Komatsu F1 et al., Prevalence of diego blood group Dia antigen in Mongolians: comparison with that in Japanese. Transfus Apher Sci. 2004 Apr;30(2):119-24. Abstract
The Diego blood group is composed of Di(a) and Di(b) antigens. Prevalence of the Di(a) antigen is known to be different among races. The Di(a) antigen is generally found in Oriental people. Thus, it is called a Mongoloid factor. In Japanese, the prevalence of this antigen is 8.78%. However, the prevalence in Mongolians had not previously been examined. In September of 2002, we determined this antigen among inhabitants of Ulaanbaatar. It was found in 24 of 242 subjects (9.92%). This prevalence approximates that in Japanese. The Rh blood group phenotypes also showed patterns similar to those in Japanese. These results  are not contrary to the presumption that Mongolians and Japanese may have a common racial background.

Diego as a “Mongolian Factor”

In 1956, in a paper published in the Nature Journal (5), Layrisse and Arends stated: “Since the Indians of the American continent are considered to be anthropological related to the Mongolian people of the old world, we decided to investigate the incidence of the Diego Factor in other available representative Asians living in Venezuela”. They tested 100 unrelated males from Canton (China), living in Venezuela, and detected 5 Diego positive individuals (5% of the Chinese tested). They also tested sixty-five unrelated Japanese and found 8 Diego positive subjects (12.5% of the Japanese tested) (6). These findings indicated that the Diego factor was not restricted to South America and suggested that this antigen was a Mongolian rather than an Indian factor.

In the same year, Lewis et al., (7) showed that the Diego antigen was found to be present in 16 of 148 unrelated Chippewa Indians from North Minnesota and in 6 of 77 unrelated Japanese from Winnipeg. This finding suggested that Diego might be an Asiatic characteristic.

In 1957, Levine and Robinson (8) said that studies carried out by Layrisse and his colleagues on the Diego blood factor in other populations including the Brazilian Indians carried out by Junqueira et al. (3), apparently suspected that the Diego factor could be Mongolian in its origin. Further, they concluded that the term Indian for the Diego blood factor was not appropriate.

Many papers showing the distribution of the Dia antigen considered that it was essentially a Mongolian characteristic, absent in Whites , Blacks, Australian aborigines and other populations (10-39). Table 3 shows the incidence of the Dia antigen in the Chinese and Korean populations.

The book named “The distribution of Human Blood Groups and other polymorphisms” (Mourant et al., 1976) (40) is considered the best one to show the early worldwide race distribution of Dia..

In 1967, thirteen years after the detection of the anti-Dia, Thompson, Childers and Hatcher identified the anti-Dib (41). As the phenotype Di(a–b–) has not been reported yet, we may assume that only two alleles (Dia and Dib) control the Diego blood group system.

Source: The history of the Diego blood group 

Further information:

Haplogroup C2 (M217) – the most numerous and widely dispersed C lineage – which is believed to have originated in South East/Central Asia, spread from there into Northern Asia and the Americas.

Haplogroup C-M217 is the modal haplogroup among Mongolians and most indigenous populations of the Russian Far East, such as the Northern Tungusic peoples, Koryaks, and Itelmens. The frequency of Haplogroup C-M217 tends to be negatively correlated with distance from Mongolia and the Russian Far East, but it still comprises more than ten percent of the total Y-chromosome diversity among the Manchus, Koreans, Ainu, and some Turkic peoples of Central Asia. (The subclade C-P39 is common among males of the indigenous North American peoples whose languages belong to the Na-Dené phylum.)

After sharing a most recent common ancestor with Haplogroup C-F3393   approximately 48,400 [95% CI 46,000 <-> 50,900] years before present, Haplogroup C-M217 is believed to have begun spreading approximately 34,000 [95% CI 31,500 <-> 36,700] years before present in eastern or central Asia.

The extremely broad distribution of Haplogroup C-M217 Y-chromosomes, coupled with the fact that the ancestral paragroup C is not found among any of the modern Siberian or North American populations among whom Haplogroup C-M217 predominates, makes the determination of the geographical origin of the defining M217 mutation exceedingly difficult. The presence of Haplogroup C-M217 at a low frequency but relatively high diversity throughout East Asia and parts of Southeast Asia makes that region one likely source. In addition, the C-M217 haplotypes found with high frequency among North Asian populations appear to belong to a different genealogical branch from the C-M217 haplotypes found with low frequency among East and Southeast Asians, which suggests that the marginal presence of C-M217 among modern East and Southeast Asian populations may not be due to recent admixture from Northeast or Central Asia.

More precisely, haplogroup C-M217 is now divided into two primary subclades, C-F1067 and C-L1373. C-L1373 has been found often in populations from Central Asia through North Asia to the Americas, and rarely in individuals from some neighboring regions, such as Europe or East Asia. C-L1373 includes C-P39, which has been found at high frequency in samples of some indigenous North American populations, and C-M48, which is especially frequent among modern Tungusic peoples. The predominantly East Asian distributed C-F1067 subsumes a major clade, C-F2613, and a minor clade, C-CTS4660. The minor clade C-CTS4660 has been found in China (namely, a Han from Fujian and a Dai). The major clade C-F2613 has known representatives from China (Han, Dai, Hezhe, OroqenTujia ), Japan, Korea, Vietnam, Bhutan, Bangladesh, Mongolia, Kyrgyzstan (Dungan, Kyrgyz), Afghanistan (Hazara, Tajik), Pakistan (Burusho, Hazara), and Chechnya and includes the populous subclades C-F845, C-CTS2657, and C-Z8440. C-M407, a notable subclade of C-CTS2657, has expanded in a post-Neolithic time frame to include large percentages of modern Buryat, Soyot, and Hamnigan males in Buryatia in addition to many Kalmyks and other Mongols[31][47][18][48] and members of the Qongirat tribe in Kazakhstan (but only 2 or 0.67% of a sample of 300 Korean males ).

Distribution of upstream C

Haplogroup C-M130 seems to have come into existence shortly after SNP mutation M168 occurred for the first time, bringing the modern Haplogroup CT into existence, from which Haplogroup CF, and in turn Haplogroup C, derived. This was probably at least 60,000 years ago.

Haplogroup C-M130 attains its highest frequencies among the indigenouspopulations of Kazakhstan, Mongolia, the Russian Far East, Polynesia, Australia, and at moderate frequency in Korea and Manchu people. It is therefore hypothesized that Haplogroup C-M130 either originated or underwent its longest period of evolution in the greater Central Asian region or in Southwest Asian coastal regions. Its expansion in East Asia is suggested to have started approximately 40,000 years ago.[3]

Males carrying C-M130 are believed to have migrated to the Americas some 6,000-8,000 years ago, and was carried by Na-Dené-speaking peoples into the northwest Pacific coast of North America.

The distribution of Haplogroup C-M130 is generally limited to populations of northern Asia, eastern Asia, Oceania, and the Americas. Due to the tremendous age of Haplogroup C, numerous secondary mutations have had time to accumulate, and many regionally important subbranches of Haplogroup C-M130 have been identified.

C* (M130) was also identified in prehistoric remains, dating from 34,000 years BP, found in Russia and known as “Kostenki 14“.

Up to 46% of Aboriginal Australian males carried either basal C* (C-M130*), C1b2b* (C-M347*) or C1b2b1 (C-M210), before contact with and significant immigration by Europeans, according to a 2015 study by Nagle et al.[9] That is, 20.0% of the Y-chromosomes of 657 modern individuals, before 56% of those samples were excluded as “non-indigenous”. C-M130* was apparently carried by up to 2.7% of Aboriginal males before colonisation; 43% carried C-M347, which has not been found outside Australia.[9][10]

Low levels of C-M130* are carried by males:

Earliest settlement in East Asia

The age of haplogroup O in East Asia is no more than 30 thousand years when estimated from sufficient num- bers (>7) of STR markers. Therefore, haplogroup O was not the earliest Y chromosome carried by modern human into East Asia. Haplogroup C-M130 may represent one of the earliest settlements in East Asia. Haplogroup C has a high to moderate frequency in Far East and Oceania, and lower frequency in Europe and the Americas, but is absent in Africa (Figure 1). Several geographically specific subclades of haplogroup C have been identified, that is, C1-M8, C2-M38, C3-M217, C4-M347, C5-M356 and C6-P55 [25]. Haplogroup C3-M217 is the most widespread subclade, and reaches the highest frequencies among the populations of Mongolia and Siberia. Haplogroup C1-M8 is absolutely restricted to the Japanese and Ryukyuans, appearing at a low frequency of about 5% or less.


Inferring human history in East Asia from Y chromosome genes