Using ALDH2 genotype to trace Yayoi ancestry

Hu Li et al., Refined Geographic Distribution of the Oriental ALDH2*504Lys (nee 487Lys) Variant

Mitochondrial aldehyde dehydrogenase (ALDH2) is one of the most important enzymes in human alcohol metabolism. The oriental ALDH2*504Lys variant functions as a dominant negative greatly reducing activity in heterozygotes and abolishing activity in homozygotes. This allele is associated with serious disorders such as alcohol liver disease, late onset Alzheimer disease, colorectal cancer, and esophageal cancer, and is best known for protection against alcoholism. Many hundreds of papers in various languages have been published on this variant, providing allele frequency data for many different populations. To develop a highly refined global geographic distribution of ALDH2*504Lys, we have collected new data on 4,091 individuals from 86 population samples and assembled published data on a total of 80,691 individuals from 366 population samples. The allele is essentially absent in all parts of the world except East Asia. The ALDH2*504Lys allele has its highest frequency in Southeast China, and occurs in most areas of China, Japan, Korea, Mongolia, and Indochina with frequencies gradually declining radially from Southeast China. As the indigenous populations in South China have much lower frequencies than the southern Han migrants from Central China, we conclude that ALDH2*504Lys was carried by Han Chinese as they spread throughout East Asia. Esophageal cancer, with its highest incidence in East Asia, may be associated with ALDH2*504Lys because of a toxic effect of increased acetaldehyde in the tissue where ingested ethanol has its highest concentration. While the distributions of esophageal cancer and ALDH2*504Lys do not precisely correlate, that does not disprove the hypothesis. In general the study of fine scale geographic distributions of ALDH2*504Lys and diseases may help in understanding the multiple relationships among genes, diseases, environments, and cultures.

Central China Origin of ALDH2*504Lys
The frequency decline from Southeast China to West and North China is quite smooth. The allele frequencies decrease to less than 20% in Southwest and Central China, and to less than 10% in Manchuria, Mongolia, Xinjiang, and Tibet within the broader region of East Asia. In Central Asia and Siberia, beyond the pronounced genetic influence of Han Chinese, the ALDH2*504Lys allele is rare. The allele is also detected in some Iranian populations, which may be explained by diffusion along the Silk Road. We conclude that the spread of ALDH2*504Lys to the north and west was concomitant with the expansion of Han Chinese and diffusion of the allele into surrounding populations.
Although the ALDH2*504Lys allele frequency reaches a peak in Southeastern Chinese populations, we cannot draw the conclusion that this allele originated there. The population history shows clearly that Hakka and Minnam Chinese presently in Southeast China are descendants of migrants from Central China (Wen et al., 2004). The indigenous populations in South China, such as Hmong-Mien populations (Hmong and She) from the Yangtze River area, and Daic populations (Kam, Laka, Mulam, and Maonan) from the Pearl River area, exhibit much lower frequency of ALDH2*504Lys. ALDH2*504Lys is almost absent in the aboriginal populations of Hainan and Taiwan, the two largest islands in South China. Therefore, it is unlikely that the Southeast Chinese obtained the ALDH2*504Lys allele from the indigenous populations. Unlike the gradually decreasing frequency to the north and west, the allele frequency drops sharply to the south. The allele exists at low frequency in Peninsular Southeast Asia, and is rare in the Southeast Asian islands. If this allele originated in the Southeast Chinese populations after they arrived in the present region, the quick expansion of the allele to the north and west cannot be explained. Therefore, we conclude that the ALDH2*504Lys allele was most probably carried south by the Han Chinese migrants from Central China, rather than originating in the indigenous populations in the region where it now has the highest frequencies.
Understanding why the present Central China populations exhibit much lower ALDH2*504Lys than the Southeast China populations is crucial in the study of the history of this allele. Both the decrease in Central China and the increase in Southeast China should be accounted for. In the history of China, many Altaic populations moved from the North China to Central China after wars in the 4th, 12th, and 13th centuries which also resulted in the migration of some Chinese populations from Central China to South China. These Altaic populations later merged with the Central Chinese populations after their kingdoms or dynasties ended. The most famous examples are Sienbers (Xianbei, founders of Former Yan Kingdom, Later Yan Kingdom, Western Qin Kingdom, Southern Liang Kingdom and Southern Yan Kingdom of Sixteen Kingdoms Period, and Northern Dynasties), Huns (founders of Han-Zhao Kingdom and Northern Liang Kingdom of Sixteen Kingdoms Period), Khitans (founders of Liao Dynasty), and Jurchens (founders of Jin Dynasty). Those Altaic migrants may have included very few or no individuals carrying the ALDH2*504Lys allele because present Altaic populations have a low frequency of the allele. The merging of these Altaic populations could have decreased the proportion of ALDH2*504Lys in the Central Chinese populations. On the other hand, some as yet unknown protective effects of ALDH2*504Lys against diseases might also have contributed to the increased frequency of this allele in Southern Chinese. Since migrations to South China resulted from wars, the refugees may have been subjected to considerable stress and a selective advantage could have had great impact. We can speculate that the ALDH2*504Lys heterozygotes had an advantage because they tended to drink less alcohol or had some other advantage (Chen et al., 1999). The recent appreciation of other metabolic/pharmacologic roles for ALDH2 (Li et al., 2006; Larson et al., 2007; Chen et al., 2008) suggest that if selective factors are responsible for the high ALDH2*2 frequency in East Asia, their nature may be unrelated to the current association with esophageal cancer or ethanol metabolism. Alternative hypotheses of increased resistance to some disease organisms (Enoch and Goldman, 1990;Yokoyama et al., 2001; Oota et al., 2004; Yokoyama & Omori, 2005; Yang et al., 2007; Li et al., 2008) would also explain a clear advantage to heterozygotes. However, statistically positive selection on ALDH2*504Lys cannot be detected using the extended haplotype test (Sabeti et al., 2007) as very low levels of recombination exist in the genomic region of ALDH2 locus (Oota et al., 2004). Other methods suggest positive selection on ALDH2*504Lys (Long et al., 2006).
Go to:
Geographic Association with Esophageal Cancer Incidence
Whatever positive selection may have increased the frequency of ALDH2*504Lys, serious diseases such as esophageal cancer or ischemia could act to decrease the ALDH2*504Lys allele frequency among the populations since studies report that heavy alcohol drinkers who are heterozygotes for ALDH2*504Lys have higher risk for esophageal cancer (Yokoyama & Omori, 2005; Yang et al., 2007; Li et al., 2008). In addition, ALDH2 activation was shown to reduce ischemic damage to the heart, suggesting that patients with reduced ALDH2 activity may suffer increased damage during cardiac ischemic events or coronary bypass surgery (Chen et al., 2008). The typical age of onset for esophageal cancer in the high incidence area can be earlier than 30 (He et al., 2006). We compared the geographic distribution of esophageal cancer incidence with the ALDH2*504Lys allele and carrier frequency distributions. We collected the male esophageal cancer incidence data of 355 populations from the literature, covering most countries in the world (Table S2). Central and Southeast China were examined in detail. Figure 3 illustrates the world distribution of esophageal cancer incidence and the details in East Asia. The extremely high incidences only appear in East Asia and some populations in Central Asia where the frequency of ALDH2*504Lys carriers is also high. However, comparison of Figure 2 and Figure 3 shows that the distributions are far from identical. However, the high cancer incidence areas mostly fall into the high frequency area of the derived allele carriers. The acetaldehyde accumulation resulting from ALDH2*504Lys in those who drink alcohol is certainly not the only risk factor for esophageal cancer. As noted above, ALDH2 also has other metabolic functions that could be independently influencing the distribution of ALDH2 variants (Li et al., 2006; Larson et al., 2007). Some environmental factors such as soil and vegetation characteristics and life styles may also be associated with the esophageal cancer risk (Wu et al., 2007; Fan et al., 2008; Moradi, 2008).

In Central Chinese populations, the heritability of esophageal cancer is estimated at around 49% (Han et al., 1994; Li et al., 1998). East Asian migrants in America also have a much higher esophageal cancer incidence than European Americans and African Americans (Parkin et al., 1997), indicating the pronounced heritability of esophageal cancer. Therefore, the incidence of esophageal cancer is affected by multiple factors that interact with the ALDH2*504Lys allele frequency in a complex way. That complexity could explain the differences between the distributions of esophageal cancer and the ALDH2*504Lys allele carriers in East Asia.
In most areas of South China and Southeast Asia, the incidence of esophageal cancer is much lower than that observed in Central China, indicating that there are fewer environmental risk factors and lower susceptibility of esophageal cancer in South China. However, there is still a high incidence area in Southeast China, which might be associated with the highest allele frequency of ALDH2*504Lys in exactly the same geographic area. In contrast to the high incidence of esophageal cancer in Southeast China being the consequence of the high ALDH2*504Lys frequency, it is possible that the high incidence of esophageal cancer in Central China is working to decrease the ALDH2*504Lys frequency while cultural pressure to consume ethanol increases as the impact of *504Lys decreases. The answer depends on which factors increasing risk are most important in which area and how they interact.
Go to:
In conclusion, we hypothesize that the oriental ALDH2*504Lys variant might have originated in the ancient Han Chinese population in Central China and spread to most areas of East Asia with the expansion of Han Chinese and their genetic influences on neighboring populations over the past few thousand years. Some diseases such as esophageal cancer show a complex relationship with the frequency of ALDH2*504Lys. Where the ALDH2*504Lys frequency is high for whatever reason, as in Southeast China, there is a clear increased risk of esophageal cancer in heterozygotes that results in higher esophageal cancer incidences in some subregions. In other areas of China there is also an increased risk of esophageal cancer in heterozygotes (Yang et al., 2007; Wu et al., 2001; Chen, 2005; Yang, 2005; Xiao, 2007) but the lower frequency of ALDH2*504Lys is not sufficient to explain the high incidence of esophageal cancer. More genetic epidemiological investigations in China are required to reveal any possible reciprocal relationship between esophageal cancer and the ALDH2*504Lys allele and identify the other risk factors that appear to be present.


2004 The evolution and population genetics of the ALDH2 locus: Random genetic drift, selection, and low levels of recombination

Effects of Worldwide Population Subdivision on ALDH2 Linkage Disequilibrium

Full article pdf
Yayoi Source of ALDH2 vs. Worldwide map of ALDH2 genotypes

Origins of Japanese lecture by Saitou

2017 Takeuchi et al., The fine-scale genetic structure and evolution of the Japanese population

See also Tracing Jomon and Yayoi ancestries in Japan using ALDH2 and JCV virus genotypes