The settlement of the Americas has been the focus of incessant debate for more than 100 years, and open questions regarding the timing and spatial patterns of colonization still remain today. Phylogenetic studies with complete human Y chromosome sequences are used as a highly informative tool to investigate the history of human populations in a given time frame. To study the phylogenetic relationships of Native American lineages and infer the settlement history of the Americas, we analyzed Y chromosome Q Haplogroup, which is a Pan-American haplogroup and represents practically all Native American lineages in Mesoamerica and South America. We built a phylogenetic tree for Q Haplogroup based on 102 whole Y chromosome sequences, of which 13 new Argentine sequences were provided by our group. Moreover, 1,072 new single nucleotide polymorphisms (SNPs) that contribute to its resolution and diversity were identified. Q-M848 is known to be the most frequent autochthonous sub-haplogroup of the Americas. The present is the first genomic study of Q Haplogroup in which current knowledge on Q-M848 sub-lineages is contrasted with the historical, archaeological and linguistic data available. The divergence times, spatial structure and the SNPs found here as novel for Q-Z780, a less frequent sub-haplogroup autochthonous of the Americas, provide genetic support for a South American settlement before 18,000 years ago. We analyzed how environmental events that occurred during the Younger Dryas period may have affected Native American lineages, and found that this event may have caused a substantial loss of lineages. This could explain the current low frequency of Q-Z780 (also perhaps of Q-F4674, a third possible sub-haplogroup autochthonous of the Americas). These environmental events could have acted as a driving force for expansion and diversification of the Q-M848 sub-lineages, which show a spatial structure that developed during the Younger Dryas period.

Funding: MM was supported by funding from the Consejo Nacional de Investigación Científica y Tecnológica (CONICET; http://www.conicet.gov.ar/?lan=en ); the PICT2014 #0396, Agencia Nacional de Promoción Científica y Tecnológica ( http://www.agencia.mincyt.gob.ar/ ); and the Fundación Bunge y Born ( https://www.fundacionbyb.org/ ), Argentina. GB was supported by funding from the PIP2013-2015 #325 of the Consejo Nacional de Investigación Científica y Tecnológica (CONICET; http://www.conicet.gov.ar/?lan=en ); and the PICT 2013-2015 #424 of the Agencia Nacional de Promoción Científica y Tecnológica ( http://www.agencia.mincyt.gob.ar/ ), Argentina. CMB was supported by funding from the PIP CONICET 2010 N°1 of the Consejo Nacional de Investigación Científica y Tecnológica (CONICET; http://www.conicet.gov.ar/?lan=en ); the PICT 2005 #16-32450; PICT 2008 #715; PICT 2013 #1611; PICT 2015 #2167; and PICT 2017 #27 of the Agencia Nacional de Promoción Científica y Tecnológica ( http://www.agencia.mincyt.gob.ar/ ), Argentina. PBPS, ACM, EJS, JJZ, MC, MS, DRG, EA and ELAG was supported by funding from the Consejo Nacional de Investigación Científica y Tecnológica (CONICET; http://www.conicet.gov.ar/?lan=en ); PBPS, DRG, and EA was supported by funding from the Agencia Nacional de Promoción Científica y Tecnológica ( http://www.agencia.mincyt.gob.ar/ ), Argentina. CS and MRS was supported by funding from the Comisión de Investigaciones Científicas de la Provincia de Buenos Aires ( www.cic.gba.gob.ar ), Argentina. JED was supported by funding from the Universidad Nacional de Jujuy, Argentina ( www.unju.edu.ar ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2022 Paz Sepúlveda et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

To provide new insights into the history of American settlement, we present here a complete phylogenetic reassessment of the Y chromosome Q Haplogroup. For the first time in a genomic study of Q Haplogroup, current knowledge on Q-M848 sub-lineages is contrasted with the historical, archaeological, and linguistic data available. We propose a hypothesis for the American settlement based on the divergence times of the Native American lineages and recent archaeological research.

In order to have a more integrated view about the settlement of the Americas it is necessary to analyze the environmental changes that influenced the human populations of the time. The Younger Dryas (YD) was a major large-scale rapid climate change detected in the Northern Hemisphere about 12,900–11,600 cal BP [ 20 , 21 ]. The Younger Dryas impact hypothesis at 12,800 cal BP has been associated with the abrupt YD climatic changes, large-scale megafauna’s extinction and decline and/or reorganization of human populations [ 22 – 24 ].

The human Y chromosome is used as a highly informative tool to investigate the history of human populations, since it has the longest stretch of non-recombinant DNA in the entire human genome, and is completely transmitted from fathers to sons, containing a record of the history of the paternal lineage [ 16 ]. Advances in the complete Y chromosome sequencing technique provided by next-generation sequencing (NGS) platforms have made it possible to create robust de novo phylogenetic trees based on a large set of mutational data, where the branch lengths are proportional to the number of SNPs and, therefore, to time [ 17 ]. These studies are proving valuable in providing information and new insights into human male history. Q Haplogroup in the Y chromosome is the only Pan-American haplogroup and represents virtually all Native American lineages in Mesoamerica and South America [ 6 ]. The autochthonous Q-M3 sub-haplogroup of Amerindians has been previously described at high frequency [ 18 ] and with a star-shaped phylogenetic topology that has been interpreted as the initial colonization of South America with a rapid expansion ~15 thousand years ago (kya) [ 7 ]. Furthermore, it has been observed that Q-M848 sub-lineages (within Q-M3) present a spatial structure in South America that arose as early as ~12.3 kya [ 5 ]. Q-Z780 is another Native American autochthonous sub-haplogroup which occurs at low frequency [ 19 ] and is still little studied from genomic data due to its low availability in sequence databases. A recent report using high coverage complete sequences has dated this lineage ~17 kya, which was explained as a more complex settlement scenario in the Americas where the deep branches could reflect a separate out-of-Beringia dispersal after the melting of the glaciers at the end of the Pleistocene [ 5 ].

The human settlement of the American continent has been subject of extensive debate and controversy in the academic community for more than 100 years [ 1 , 2 ]; central questions related to the time of arrival as well as the spatial distribution patterns are still open and under discussion. The scenario mostly accepted by both archaeological and genomic studies is the one that proposes a settlement of the American continent with an intermediate chronology between ~18,500 to 13,000 calibrated years before present (cal BP) and a human entry in South America shortly after [ 3 – 8 ]. Moreover, other controversial hypotheses suggest a longer chronology with a South American settlement before 18,000 cal BP [ 9 – 15 ].

Results and discussion

We generated new whole Y chromosome sequences from 13 Argentine samples belonging to Q Haplogroup (see Table A in S1 Text), non-randomly chosen to cover the extent of known Y short tandem repeats (STR) diversity [25]. We compared these to 89 samples published worldwide, belonging to Q Haplogroup, including 76 American and 13 Eurasian Y chromosome sequences, resulting in a combined dataset of 102 Y chromosome sequences (for more information about the dataset, see S1 Table).

We analyzed SNPs from 10.45 Mb regions within which reliable genotype calls can be made in the non-recombinant region of the Y chromosome (NRY) [26], covered in all 102 modern sequences of this study, resulting in 8,839 high-confidence SNPs (see Methods and S1 Text). We updated the phylogeny for Q Haplogroup, which includes new divergence times estimated in this work (see Fig 1 and S5 Table). Taking all the sequences from this study into account, 19 previously known Q sub-haplogroups were defined; of these, 1 belongs to L275, 1 to F1096, 1 to Q-F4674, 1 to Q-Z780, 1 to L804, and 14 to Q-M848. Moreover, we present here 1,072 novel informative SNPs absent in ISOGG (S2 Table) out of which 74 SNPs were validated and named as Q-GMP1 to Q-GMP74 (see S3 and S4 Tables). GMP51, GMP73 and GMP74 were able to define two new sub-lineages (See S1, S2 Tables and S5 Fig). These new data provide further support and vastly increase the resolution of Q Haplogroup. Seven of our 13 Argentine sequences presented in this work allowed us extending the sub-haplogroups resolution, of which 4 belong to Q-M848, 2 to Q-Z780, and 1 to Q-F4674 (see S5 Fig and S1 Table). Moreover, 6 of them combined with 13 other sequences from databases are part of phylogenetic branches that are currently polytomic.

In the following sections, we will discuss the novel results, including the necessary background for a better understanding of our conclusions. The lineages studied in depth in other reports such as: Q-L275 distributed in Eurasia [27]; Q-F1096 in Eurasia, in Athabaskan natives from North America, in Greenland, in ancient Aleutian Islanders, and in ancient Northern Alaskan Athabascans [16, 28–33]; and Q-L804 in Northern Europeans [34, 35], are not analyzed here.

Sub-lineage Q-M346 Q-M346 was dated 25 kya (22–28.3) in the present work, within the range previously reported in literature [5]. It has been described in Eastern Europe, Middle East, Central Asia, Eastern Asia, Southern Siberia, and in the Americas [18, 32, 36–38]. The most prevalent Native American lineages, Q-Z780 and Q-M3, are derived from Q-M346 [18]. It is known that one of its bifurcations occurs for Q-B28 and Q-L54 [7]. Q-F4674 is a sub-lineage of Q-B28, the latter currently with no clear location in ISOGG [39], though it is the marker most widely described in the literature for this lineage. Q-B28 occurs in Europe and in South and East Asia [6, 39]. Q-F4674 was previously dated 18.3 kya (16.7–19.9) [40]. One of the new Argentine samples sequenced in this study, from San Juan (RUTBE), is presented as a sub-lineage of Q-F4674 along with 2 other Sri Lankan samples from the databases (S5 Fig and S1 Table). In turn, San Juan’s sample shares Q-Z36057 with 1 of Sri Lanka’s individuals; the age of this lineage has not been yet estimated. The occurrence of Q-F4674 in the Americas has recently been found and published by our group [38]. In the aforementioned study, 2 samples from San Juan, including RUTBE are presented as Q-M346* (derived for Q-M346 and ancestral for Q-L54), very differentiated from each other, and not parentally related. In the present study, we verified that both San Juan’s samples have Q-F4674 and Q-Z36057. These new results lead us to reinforce what was previously proposed by Jurado Medina et al. [38]: Q-M346*, more precisely Q-F4674, would be a third autochthonous sub-haplogroup of the Americas, along with Q-M848 and Q-Z780. Q-F4674 could be part of the gene pool of ancient founding Native American paternal lineages but may not have had as much survival success as Q-M848 and Q-Z780 sub-lineages [38, 41] (possible reasons for this low frequency are discussed in the Hypothesis on the American Settlement section of this work). We believe that a higher number of individuals derived for Q-M346* could be found in the American continent, but since studies on Native American lineages that have included Q-M346 have not analyzed Q-L54 [42], Q-M346* sub-lineages become difficult to be registered.

Sub-lineage Q-Z780 Q-Z780 haplogroup is recognized as a Y chromosome founding lineage in the Americas at low frequency [43]. It is widely distributed, with representatives from Mexico, Peru, Bolivia, Brazil, and Argentina in this study, but its presence has been also reported for Central America, Colombia and Paraguay [6, 19, 36, 38]. Given its low frequency and scarce sequence availability in databases, little is known about its sub-lineages. According to the best known markers, it can be classified into Q-Z781 and Q-FGC47532 [39], the latter characterizing the ancient Anzick-1 Y chromosome (with a 14C calibrated age of 12,600 cal BP) [44]. Four new SNPs parallel to Q-Z780 not described in ISOGG were found; one of them was validated and named Q-GMP10 (see S2 and S4 Tables). Two samples incorporated in this study to Q-Z780 (Z8ZMY and S8BAL) allowed adjust its temporal depth, showing values of 19.3 kya (17–21.9), older than those reported in the literature of 17 kya (15, 0–19.3) [5]. Two samples within Q-Z780 are polytomic, 1 of them is a sample from our collection (Z8ZMY) which has 62 private SNPs absent in ISOGG that provide new information to this lineage, and 2 of them were validated as Q-GMP13 and Q-GMP14 (see S2 and S4 Tables).

Sub-lineage Q-Z781 Q-Z781 is the most represented sub-lineage of Q-Z780. Given the similar number of Q-Z780 and Q-Z781 sequences, the dates found for both are equal, though older than those found by other authors with values of 16 kya (14.1–18.1) for Q-Z781 [5]. However, older dates as 22.9 kya (18.3–27.5) have been found from STRs for a great number of samples for this sub-lineage [6]. Q-Z781 branches into Q-Y2816 and Q-YP937. Q-Y2816 is distributed mainly in individuals of Mexican origin [7, 45], and also in an individual from the United States without a defined origin [34]. We found 3 individuals of Mexican origin in this sub-lineage, 2 of which also share Q-Z782. Q-YP937 is characteristic of South America with a wide distribution from Peru and Argentina to Brazil. We found 3 new SNPs parallel to Q-YP937 (S2 Table). The dating for this sub-lineage is 18.7 kya (16.5–21.2), which is older than that of 12.5 kya (11–14) reported in literature [6]. We found a new sub-lineage, not described in ISOGG, supported by 2 SNPs named Q-GMP73 and Q-GMP74 (see S6 Fig, S2 and S4 Tables) with a dating of 18.2 kya (16.1–20.6). The phylogenetic association found for this new sub-lineage evidences a link between Andean individuals and Central-West Argentina with dates for which there are no archaeological records showing such temporal depth for human groups from this region. Further analysis of these findings is provided in the Hypothesis of the American Settlement section.

Sub-lineage Q-M3 Q-M3 haplogroup has been previously described as a founder lineage of the Y chromosome in the Americas [46–48] and is the most frequent sub-lineage among present-day Native Americans [18, 49–51]. Although its presence has also been described for some populations from Siberia, it is not known whether these are remains of the founding lineage or evidence of regressive migrations from Beringia to East Asia [37]. The dating found for this marker in the present work is 15.4 kya (13.6–17.4), within the range previously reported in the literature [5]. In recent decades, the internal resolution of Q-M3 has expanded and this lineage is now known to be subdivided into two branches, Q-M848 and Q-Y4308 [5, 6]. Although Q-Y4308 is still underrepresented, it is widely distributed. Its presence has been reported for individuals from the United States in association with those who speak the Algonquian language [52], Eskimo peoples from the extreme Northeast of Asia [52], Mexican individuals [53], and a Tupi-Guarani individual from Southern Brazil reported in this work (S5 Fig), this latter in agreement with previously reported data [5]. Q-M848 is the most represented sub-lineage of the Q Haplogroup in the Americas, and more frequent in South America than in North America. It has been previously described with a star-shaped topology where many short branches are connected in the same internal node (S5 Fig) [5–7]. Given the high Q-M848 and poor Q-Y4308 representativeness of the samples studied in this work, the datings for Q-M848 and Q-M3 show the same values, within ranges estimated in the literature [5]. The fossil remains of Kennewick man [54], found on the banks of the Columbia River in the United States, belong to Q-M848 haplogroup and have been dated 8.3–9.2 cal BP [5, 55].

Sub-lineage Q-MPB118 The Q-MPB118 sub-lineage was found here defining the same Aranã samples from Southeast Brazil and Xavante from West Brazil (S6 Fig and S1 Table), in agreement with previous reports [5]. For the moment, this lineage is restricted to Brazilian individuals [56]. We found 6 new SNPs not validated, provided in this work as new information to this lineage (S2 Table). The dating found here for this node is 9.7 kya (8.5–11) (S5 Table), similar to previously reported estimates [5]. Since the aim of the present study is to reconstruct the history of the lineages belonging to Q-M848, in section 6 of S1 Text, we present further information for each of these sub-lineages regarding the history of their ethnic groups, linguistic family, and the region they inhabit or inhabited. Q-MPB118 supports a lineage ancestry shared between native Xavante and Aranã groups, both of the Macro-Jê linguistic trunk. Since its differentiation (~9.7 kya), this lineage is present among human groups from Central-West and Southeast Brazil, although further study on its distribution is still necessary. For a schematic representation of the geographic distribution of Q-MPB118, see S4B Fig.

Sub-lineage Q-SK281 Q-SK281 is currently presented as a restricted lineage for Peruvian individuals (S6 Fig), dated 12.6 kya (12.1–13.1) [57]. This study provides 19 new SNPs for its sub-lineage defined by Q-Z35727 (S2 Table). For a schematic representation of the geographic distribution of Q-SK281, see S2C Fig.

Sub-lineage Q-MPB139 Uro samples from Peru and Pasto from Ecuador were found here supported by Q-MPB139 in agreement with previous reports [5]. The dating found in the present study was 14 kya (12.4–15.9), similar to the one previously reported [5]. Q-MPB139 shows a shared lineage ancestry between the Uros of Peruvian Altiplano and the Pasto of the Ecuadorian Altiplano, evidencing a great temporal depth and vast movements of human groups between the Central and North Andean Areas, near 14 kya (12.4–15.9). For a schematic representation of the geographic distribution of Q-MPB139, see S1A Fig. In this lineage the Uro individual is separated from other characteristic lineages of Peruvians and individuals of the Central Andes (such as: Q-SK281, Q-Z6658, Q-Y788, Q-Z5908, Q-Z35841, and Q-Z5906), providing genetic support to anthropological and linguistic hypotheses that consider the Uro ethnic group different from neighboring ethnic groups (such as Aymara and Quechua), with its own language, traditions, beliefs, and ways of hunting, fishing, gathering and farming [58, 59]. Previous studies carried out with Y chromosome microsatellites found that the Uros have exclusive lineages different from Aymara, Quechua, and Arawak haplotypes [42], whom they have been associated with in other studies [60]. According to some researchers, the Uros were the first settlers of the Andean Altiplano; however, their origin is unknown and is currently subject of academic debate [58, 61–64].

Sub-lineage Q-B46 Q-B46 has been previously described as characteristic of Colla individuals from Salta [8, 52], which we have corroborated in this study even extending its distribution to Northwestern Argentina due to its presence in a sample from this region (S5 Fig and S1 Table). In the present work we contribute 2 new SNPs absent in ISOGG equivalent to Q-B46, validated as Q-GMP21 and Q-GMP22. Moreover, we found 100 new SNPs not reported in ISOGG private for the Catamarca’s sample of our collection and without data for the Colla’s sample from the databases. Three of these SNPs were validated as GMP23 to GMP25 (see S2 and S4 Tables). Although this lineage still has very few sequences and its dating has not been established, its phylogenetic relationship is expected to be found since the autochthonous peoples of the Northwestern region of Argentina have remained permanently related to each other since ancient times, through exchange, trade, migration, and the promotion of their artistic and craftwork materials [65].

Sub-lineage Q-Z35505 / Q-Z35497 / Q-B43 So far, the Q-B43 sub-lineage has been described by other authors in Wichi individuals from Salta, Argentina [8], and in individuals from Paraguay and Brazil [6].The present study includes an individual from the Paresí community, from Mato Grosso, Brazil, obtained from the databases [5], as well as another individual from Salta, from our collection. The phylogenetic relationship found in this work for 3 individuals from East Salta, Argentina, was supported by Q-Z35505, parallel to Q-B43 [39]. Moreover, one of these samples from the East of Salta, from our collection, shares 26 SNPs with the individual from the Paresí community, out of which Q-Z35497 is also parallel to the 2 markers mentioned above [39]. We contribute 5 new SNPs to this sub-lineage (S2 Table), 4 of them validated as Q-GMP26 to Q-GMP29 (S4 Table). We also found 50 new private SNPs for our Salta sample of this lineage (S2 Table), 4 of them validated as Q-GMP30 to Q-GMP33 (S4 Table). The dotted line in S5 Fig for this sub-lineage means that further studies are required for its better definition; here we have found some difficulties due to the large amount of missing data present in samples for which complete sequences are not available and are in VCF format (see Section 3 in S1 Text and S1 Table). The dating of this lineage had been previously estimated as 1.5 kya (0.9–2.1) [6], calculated only between 2 samples (GS000016946-ASM and GS000016945-ASM) for which the complete sequence is not available and comes from VCF files (see section 3 in S1 Text and S1 Table). The dating found in this work, calculated only between two samples for which the complete sequence is available and present greater sequencing coverage (N87FK8 and GRC14349596_S) (see section 2 in S1 Text and S1 Table), is 9.6 kya (8.4–10.8). We believe that our results could be reflecting a temporal estimate more in line with the greater geographic distribution found here for this lineage. On the other hand, the estimate of ~1.5 kya [6] could reflect some internal sub-lineage with regional differentiation in Argentina’s Northwest and also characteristic of the Wichi community; this could be better defined in the future with the incorporation of more samples to this lineage. In this study we present genetic evidence that associates within the same sub-lineage (Q-Z35505/Q-Z35497) Mataguayan-speaking individuals from Gran Chaco and Arawak-speaking individuals from the Mato Grosso region, bordering the Gran Chaco (for more information see section 6 in S1 Text). In this regard, it has been previously argued that Mataguayan-speaking population may have moved to the Southeast due to pressure from Amazonian groups, speaking Arawak languages [66]. In fact, some sort of exchange must have taken place between Mataguayos and local Arawak farmers before their settling in the area, since some archaeological sites in the Gran Chaco reveal similar but more rudimentary decorated pottery [67]. We present genetic support for these hypotheses, adding a temporal depth for Q-Z35505/Q-Z35497 of ~9.6 kya. We cannot determine whether Mataguayan-speaking and Arawak-speaking communities have a common origin or if both groups have different origins and then linked and admixed leaving shared genetic traits. The dates found suggest that Gran Chaco could have been inhabited earlier than estimated [68].

Sub-lineage Q-Z6658 / Q-Z5915 This lineage is currently restricted to individuals from Peru and has been previously described [5, 6, 69]. A dating of 12.0 kya (9.5–14.7) was found in literature [69].

Sub-lineage Q-B42 The marker Q-B42 has been previously described as ancestral to Q-B43 (parallel to Q-Z35505) and Q-B46 [8]. Q-B42 is known to be a recurrent mutation that is used to describe another sub-lineage belonging to the European R haplogroup (R1b1a1b1a1a2c1a3a2a1d3). Given this characteristic, the ISOGG platform does not include this marker within Q Haplogroup, but it is still used in current works on the phylogenetic reconstruction of Q Haplogroup [6]. In the present study, Q-B42 is present among individuals belonging to the sub-lineages Q-B46, Q-Z35505, and Q-Z6658 (discussed above) but it is absent in some individuals within the last two sub-lineages (see S1 Table). S5 Fig proposes the position of Q-B42 based on these results, which is represented with a dotted line suggesting that findings should be further studied. The contribution of two new high coverage complete sequences provided in this work belonging to Q-B42 allows their dating to be adjusted to values of 14.2 kya (12.6–16.2), older than those of 10.1 kya (8.4–11.8) found in the literature [6]. Q-B42 occurs among individuals from Peru, Northwest Argentina, and Central Chaco. For the Huaca Prieta archaeological site located near the Pacific coast in Northern Peru, radiocarbon dating indicates intermittent human presence between ~15,000 and 8,000 cal BP [70]. In Northwestern Argentina, several sites date from ~12.0 kya and possibly as early as ~12.8 kya [71]. As proof of the influence of the first civilizations of the Andean highlands on Northwestern Argentina, such as the pre-Inca culture of Tiwanaku located near Titicaca Lake within the current territory of Bolivia, cultural legacy has been found in Peru, Chile, and elsewhere Northwestern Argentina [72]. The Collasuyu was part of the Inca Empire (Tawantinsuyu) and expanded to the Argentine Northwest [73]. Q-B42 could have differentiated in the central Andean region and could be one of the oldest Q-M848 sub-lineages with 14.2 kya (12.6–16.2), being part of the gene pool of cultures that settled in this region. This sub-lineage shows links and gene flow among Andean, Chaco, and Amazonian groups, in accordance with archaeological studies that have found cultural evidence showing that Chaco human groups have received peripheral influence, both Andean and Amazonian [74]. The phylogenetic relationships found in the present study for Q-B42 and Q-Z35497 provides genetic support to these findings. The characteristics of the Chaco territory, with fluctuating seasonality in relation to the flood levels of the land, may have not been an obstacle for a constant interrelation among the human groups of these regions. For a schematic representation of the geographic distribution of Q-B42 and its sub-lineages, see S1C Fig.

Sub-lineage Q-CTS2731 The Q-CTS2731 lineage is currently restricted to native populations of the United States and Mexico [52, 75]. We have found this lineage in 4 Mexican samples from the databases (S1 Table), in agreement with previous reports [5, 6]. The estimates found in the literature for this lineage are 12.4 kya (10.6–14.3) [75]. We contribute 2 new SNPs parallel to Q-CTS2731; 2 new SNPs parallel to the Q-CTS8571 sub-lineage; and 43 new SNPs parallel to the Q-Y26467 sub-lineage, absent in ISOGG (S2 Table). Q-Y26467 has been described as characteristic of the Zapotec male population of Mexico [6, 52], which has also been observed in the present study (S1 Table). The dating found in this work for this node is 0.6 kya (0.53–0.68), within the range previously reported [6, 76]. The timing found for the first human occupation in Mexico is currently under discussion. Archaeological studies consider that Mexico shows consistent evidence of human occupation from at least 40–30 kya [15, 77–81], but dating for that period is controversial. Yet, there is not so much discussion about the common scattered sites in Mexico for the 10.5–13 kya period [80, 81]. It is known that the writing style used in the Central Valleys of Oaxaca, belonging to the Zapotec scribal tradition, constitutes the earliest evidence of writing in the American continent. The first tangible manifestations of the graphic system can be dated to approximately 600 years before the Common Era [82]. These archaeological dates are consistent with those found for the Zapotec sub-lineage Q-Y26467. For a schematic representation of the geographic distribution of Q-CTS2731 and its sub-lineages, see S2A Fig.

Sub-lineage Q-CTS11357 / Q-M925 This lineage is known to have a wide distribution beginning at Southwest United States and extending through Mexico, Central America, and South America [5, 6, 52, 83]. In the present study, Q-CTS11357 is widely represented by 7 individuals from Mexico, 2 from Colombia, and one from Brazil, in accordance with previously reported data [5, 6]. The dating found in this work for this lineage is 11.3 kya (10.3–13.2), close to that found by other authors [5, 6, 83]. We contributed 4 new SNPs to Q-CTS11357 sub-lineages, absent in ISOGG (S2 Table). Q-CTS11357 is classified into the following sub-lineages: Q-Z5917 found in individuals from Colombia in the present study and in individuals from Panama and Nicaragua from the databases [6, 84]. Q-SK1974 (equivalent to Q-Y26547) [39] is currently restricted to individuals from Brazil; in fact, we found this sub-lineage in a Brazilian individual from the database, belonging to the Karitiana ethnic group [5, 6, 85]. Q-BZ4012 is at the moment restricted to individuals from the United States [83], is absent in ISOGG, and no data are reported for this sub-lineage in this study. Q-CTS11330 has been described in this and other research works as characteristic of Mexican individuals [5, 6] though has been also found in one individual from San Salvador de Jujuy [83]. The dating found in this study for Q-CTS11330 is 8.4 kya (7.4–9.6), close to the literature estimates [6]. The Q-CTS11357 lineage is represented by individuals of the Pima and Nahua, speakers of the Uto-Aztecan language, and Karitiana ethnic groups, speakers of the only remnant of the Arikém linguistic family, being a sub-family of the Tupí linguistic trunk (see S1 Table and section 6 in S1 Text). For a schematic representation of the geographic distribution of Q-CTS11357 and its sub-lineages, see S3B Fig. The Q-CTS11357 lineage shows that approximately 11.3 kya (10.3–13.2) there was a population focus in Mexico that extended to Southwest United States, Central America, reaching Colombia and the Brazilian Amazon. At present, this lineage finds greater representation and differentiation in Mexico, with Uto-Aztecan speaking representatives (See S1 Table). This evidence could provide genetic support to previous hypotheses suggesting that the Proto-Uto-Aztecan speaking community could have formed in Central Mexico, being one of the drivers of the primary domestication of maize [86, 87]. Its expansion towards North America and the Amazon could have been driven by demographic pressure resulting from a growing commitment to the cultivation of this gramineous. The phylogenetic links found for this lineage are also in agreement with studies on the genetic diversity of corn using contemporary and archaeological maize samples [88], showing that corn used by Brazilian indigenous populations, including those from the Amazon, is genetically closer to corn samples from Mexico, as compared to other regions such as the Andes. Q-CTS11357 evidences a shared lineage ancestry between Uto-Aztecan- and Arikém-speaking human groups; given its temporal depth it is likely that this lineage has formed part of the gene pool of both the proto-Uto-Aztecan and proto-Tupí speakers. It is not possible to define whether both groups have a common origin or, having different origins, left shared genetic traits due to their geographic expansion.

Sub-lineage Q-Y27993/Q-Y27992 Currently, this lineage has been reported for individuals from Mexico and Argentina [6, 89]. The ISOGG platform defines Q-Y27993 and Q-Y27992 as parallel; we have found Q-Y27993 in a Mexican sample from the databases and an Argentine individual belonging to the Chané ethnic group from Salta (S1 Table). Q-Y27992 occurs in a Mixtec individual from Oaxaca, but we have not found any marker shared by the three samples; therefore, in Fig 1 and S5 Fig it is represented with dotted lines since further study is needed to determine this link. The time estimate found for this lineage is 16.1 kya (14.2–18.2) (see S5 Table), older than that found in the literature [6, 89]. We consider that the current dating calculations for this lineage are subject to biases due to low sample size and lineage resolution. If the three samples were not a monophyletic group and therefore Q-Y27992 and Q-Y27993 would not belong to the same sub-lineage, then the dating found in this study would be an error. On the contrary, if they really are a monophyletic group, our results could indicate that it is one of the oldest lineages of Q-M848, and this would lead to question its dating. However, given that the status of Q-Y27993/Q-Y27992 still requires further studies and higher resolution, in S5 Fig we consider the dating calculated in the literature as 12.6 kya (12.1–13.1) [89]. For a schematic representation of the geographic distribution of Q-Y27993/Q-Y27992, see S2B Fig. Q-Y27993 occurs among individuals from Mexico and Chané from Northern Argentina. The language spoken by the Chané belongs to the Arawak linguistic group, one of the largest and most dispersed linguistic families in the Americas [90]. Q-Y27993 provide genetic support to the links found between Arawak-speaking individuals and Mexican communities, though further studies would be necessary to determine the ethnic relationship found between these regions.

Sub-lineage Q-Z19357 This lineage has been reported before in Peru [91] and in Argentine individuals of the Colla ethnic group from Salta province [6]. In this study we corroborate this previous distribution and add a Brazilian individual from a database belonging to Maxakalí ethnic group from Minas Gerais, as a new contribution to this sub-lineage (S1 Table and S5 Fig). For a schematic representation of the geographic distribution of Q-Z19357 and its sub-lineages, see S3C Fig. Q-Z19357 provides evidence of a shared lineage ancestry among individuals from Andean Peru, Northwestern Argentina, and the Brazilian Maxakalí ethnic group with a temporal depth of 8.1 kya (9.5–6.7), as reported in the literature [6]. We cannot determine whether these groups have a common origin or have different origins and were later linked and admixed leaving shared genetic traits. The greatest current diversity for this lineage occurs among Andean individuals, so it is likely that this sub-lineage has differentiated among Andean human groups, perhaps within the current territory of Peru, a region known for being the cradle of great South American civilizations, expanding its ties with the Macro-Jê-speaking communities of Brazil, native language of Maxakalí groups. These people have probably also established relationships with Chaco human groups because this area relates the Andean region with the Brazilian Cerrado Ecoregion, which was extensively inhabited by Macro-Jê speakers in times previous to European colonization. In this regard, linguistic studies on languages of the Guaicurú family (spoken by Mocovíes, Toba, Pilagás, and Caduveos), typical of the Chaco region and Mato Grosso do Sul, have shown some grammatical morphemes similar to elements of languages belonging to the Macro-Jê linguistic trunk, widely spread throughout the Central and Eastern regions of Brazil [92, 93]. A higher amount of Chaco samples should be analyzed in Y chromosome genomic studies to better understand the human links among these regions.

Sub-lineage Q-MPB016 The phylogenetic relationship found in this work between an Ecuadorian individual of the Cañari ethnic group and a Brazilian individual of the Hupda ethnic group agrees with that previously described [5]. In the present work we contribute 7 new SNPs shared between both samples, not described in the literature and absent in ISOGG (see S2 Table). The dating found in this study for this sub-lineage is 11.2 kya (9.9–12.7), within the range estimated before [5]. Q-MPB016 provides evidence of a shared lineage ancestry among human groups of the Cañari ethnic group of Ecuador and the Hupda ethnic group of Northwestern Amazonia of Brazil with a temporal depth of 11.2 kya (9.9–12.7). We cannot define whether these groups have a common origin, or if they had different origins and were then linked and admixed leaving shared genetic traits, but genetic evidence of separation of the Cañari lineage from the characteristic lineages of Peru (such as Q-SK281or Q-Z6658) would indicate that the Cañaris managed to preserve their ancestral lineage despite the Inca and Spanish conquest processes. The same is observed for the Hupda ethnic group, which presents a differentiated lineage from those found for neighboring Amazonian ethnic groups such as Arawak (such as Q-Z35497 or Y27993). The links between Cañari and Hupda groups could also be useful for the reconstruction of their ancient history. For a schematic representation of the geographic distribution of Q-MPB016, see S3C Fig.

Sub-lineage Q-Z5908/Q-B48 Q-Z5908 was found in the present study shared among 6 individuals from Peru, as previously reported [5, 52, 94], 2 individuals from the Province of Salta, Argentina, 1 individual belonging to the Colla ethnic group, and another one from the town of Cachi, also previously described for this lineage [6, 8]. A new Argentine sample in our collection from La Quiaca, Jujuy Province, is added to this sub-lineage as a novelty in this study (S1 Table and S5 Fig). We have found 69 new SNPs for this lineage, absent in ISOGG (S2 Table), one of which is equivalent for Q-Z5912 and another one is equivalent for Q-Z5910. We have also described a new sub-lineage derived from Q-Z5908 defined by Q-GMP51 (S4 Table and S5 Fig). The remaining 66 SNPs are private for the new sample in our collection, and 7 named Q-GMP52 to Q-GMP58 were validated (see S2 and S4 Tables). Furthermore, the incorporation of a new high coverage complete sequence to this sub-lineage allows a new estimate of its dating, with values of 13.6 kya (12.0–15.4), older than those reported in the literature [6]. The phylogenetic relationships found for Q-Z5908 show links among human groups from the Central Andes, extending through the territories that today are part of Peru and Northwestern Argentina, with a regional differentiation and defined spatial structure (~13.6 kya). These links resemble what was previously discussed for the Q-B42 lineage in human groups of these regions. For a schematic representation of the geographic distribution of Q-Z5908, see S1B Fig.

Sub-lineage Q-Z35841 This lineage has been found in the present study in an Argentine individual from the town of Cachi in the province of Salta, as reported in literature [8], and in a Peruvian individual, previously described as part of this lineage [95] (S1 Table and S5 Fig). At present, no study has been able to date it. The phylogenetic relationship found for this clade between a Peruvian individual and one from Salta provides further evidence to the movements and gene flow observed among human groups of the Central Andes, extending through Northwestern Argentina, reinforcing what was found and discussed for Q-Z5908 and Q-B42. For a schematic representation of the geographic distribution of Q-Z35841, see S4C Fig.

Sub-lineage Q-Z5906 The Q-Z5906 sub-lineage includes 11 individuals, out of which 5 are Peruvians and 6 are individuals from Northwestern Argentina (S1 Table and S6 Fig). This lineage has been described in the literature as characteristic among members of Peru, Bolivia, Calchaquí communities, and Colla ethnic groups of Argentina [6, 8, 52, 96]. In the present study, 2 new sequences of Argentine individuals are contributed to this sub-lineage, both from La Quiaca, province of Jujuy (see S1 Table and Table A in S1 Text). We have found 30 new SNPs for this lineage, absent in ISOGG, out of which 8 are equivalent for Q-B35 and 5 were validated as Q-GMP65 to Q-GMP69; downstream, we found a new sub-lineage validated as Q-GMP70, with other 3 SNPs equivalents, 2 of which were validated as Q-GMP71 to Q-GMP72 (see S2 and S4 Tables). We also detected 2 Q-Z5907 equivalents and the remaining 16 were private of the new samples of this lineage (S2 Table). The gene flow between Andean human groups of Peru and Northwestern Argentina is reflected once again by Q-Z5906 and its sub-lineages Q-B35, Q-GMP70, and Q-Z5907 (S6 Fig), similar to the links found for Q-Z35841, Q-Z5908, and Q-B42 lineages, discussed above for human groups of these regions. This lineage was found in literature with an estimated dating of 12.88 kya (11.38–14.57) [5]. The present study determined datings of 2.4 kya (2.1–2.7) and 1.7 kya (1.5–1.9) for the derived sub-lineages Q-GMP70 and Q-Z5907respectively (S5 Table). This indicates a great temporal depth for this lineage but with more recent regional differentiation covering the great extension between Peru and Northwestern Argentina, which shows the constant interaction and gene flow of these groups for thousands of years. For a schematic representation of the geographic distribution of Q-Z5906 and its sub-lineages, see S2A Fig.