The Neanderthals who started World War I

Disclosure: I consult for the DNA Diagnostics Center).

The utopian view of genetic genealogy


The utopian view of genetic genealogy is that genetic analyses would be perfectly aligned with one’s family tree. This will never happen for many reasons. First, genealogical trees do not deal with genetic background, but rather with individuals and who they married and had children with. Such trees can go back only a few hundred years, at best. None of this is a reliable indicator of genetic background or origins. Second, inbreeding or endogamy “recycles” the genetic material. So while time moves on, as it always does, the genetic data looks very much the same, because of the inbreeding effect, throwing off any genetic analysis that attempt to align genetic differences with the tree.

Another reason family-trees won’t mirror genetic analysis is the incomplete genealogies, and again, the lack of genetic knowledge that raises doubts in the matches output by the genetic analysis. For these reasons Y and mtDNA were considered the preferred tools to reconstruct the family tree, however they are very limited to specific lineages and cannot capture the vast majority of recent demographic events, which can only be captured with autosomal chromosomes. Since neither family-finding tools nor family-trees are accurate, some people may find that their genetic history matches their family-tree and other people may not.

Not just who but when

It is thereby not surprising that the ability to date genetic events is a significant booster to the reliability of the genetic analysis as it potentially provides an anchor which links the genetic events to the family-tree.


Over the years different tools have been proposed to date genomic events (e.g., Moorjani et al. 2011; Gravel 2012; Loh et al. 2013; Busby et al. 2015; Busby et al. 2016; Pugach et al. 2016). Dating methods can be broadly divided into two classes of similar nature (Loh et al. 2013):

  • Local ancestry inference methods analyze chromosomes of admixed individuals with the goal of recovering continuous blocks inherited directly from each ancestral population. Because recombination breaks down ancestry tracts through successive generations, the time of admixture can be inferred from the tract length distribution, with the caveat that accurate local ancestry inference becomes difficult when tracts are short or the reference populations used are highly diverged from the true mixing populations.
  • Methods that use the associations between pairs of loci to make inference about admixture. Briefly, recombination breaks down these associations, leaving a signature of the time elapsed since admixture that can be probed by aggregating pairwise LD measurements through an appropriate weighting scheme; the resulting weighted LD curve (as a function of genetic distance) exhibits an exponential decay with rate constant giving the age of admixture.

The latter approach is very similar to the first one, excepting that LD statistics can be modeled more easily. If this is still unclear consider the following very simplistic example. Assume that 4 generations ago, regions in LD were of certain size. Outbreeding would slice them each generation, whereas inbreeding would have the opposite effect. By assuming that outbreeding (which we have to do, otherwise there would be measurable changes) occurred each generation and looking at the LD in the last generation (which is the only thing available to us) we would incorrectly conclude that the original genome (#1) occurred 3 generations ago. Why? Because we are blind to inbreeding, which results in no change and makes as “skip a generation.”


Illustrating the IBD process in which identical genetic regions inherited across generations may (or may not) lose the genetic similarity to the ancestor (#1).

Not just when but what

Now that we understand how we can use genetics to date, let us stop and ask, what exactly are we dating? That’s a good question. In genealogy this is a very obvious question – we are dating the events (usually birth and death) that, no doubt, have occurred at some point. In genetics, we cannot infer the immediate ancestors of an individual form the genetic data. What we can say is that the individual genome exhibits a certain combination of ancestries, which we understand much better. The algorithms that we discussed so far typically measure the major event that caused LD/local admixture to change, that is, the major mixing event of two populations of distinct genetic background who merged their genetic materials. Only one such date can be produced and this it typically corresponds to the most recent major event. Clearly, this is a very different concept than the linear more obvious type of dating that genealogists expect to see and what is worse, the results vary greatly between different tools.

In Marshall et al. (Marshall et al. 2016), we dated the mixture event where the proto-Druze merged their genomes with Syrian populations using Alder and cited the results of two other tools (Generation time is usually 20-30 years, so you can do the math):

Using Alder we found that there is evidence for admixture in Druze between the early 9th century and the early 12th century (38.62±5.8 generations ago). This estimate is within the range proposed by two recent studies using the less accurate ROLLOFF (46± 5) and CHROMOPAINTER (31.64±4 generations ago).

The GPS Origins model

In GPS Origins, I developed a new class of dating method to obtain the date of the last mixture event, which I will not discuss here or this post would never end. GPS Origins infers the major two components of ancestry, identifies their geographical origins, and calculates the time then they arrived in these regions. It also calculates the migration routes but does not date the historical stops. Previous attempts showed that such attempts would be highly speculatives. GPS Origins model is the most advanced model in providing two dates, instead of one, but it still far from being able to date all your ancestors. We are now developing even newer technology to call dates, but don’t hold your breath. It would just be another small step towards the utopian view. All we know is that they occurred before the headpoints and this is how they are being reported.


GPS Origins results, focusing on each of the headpoint shows the time of arrival to that region

This feature became very popular.

A daring dating model

Recently a large genetic testing company launched a new feature, which attempts to provide even more dates thus bolding moving towards the utopian world where genetic reconstruction = Genealogical tree (see their white paper).



The authors made the following assumption. If we can infer one’s ancestries as a combination of other ancestries (30% Irish, 20% Jewish, 50% African) then we can pair the ancestries in a way that MAY be aligned with the actual history. In such case, the parents of (30% Irish, 20% Jewish, 50% African) would be [60% Irish 40% Jewish] and [100% African] and the grandparents would be [100% Irish] who married 80% Jew (20% unknown). The two other grandparents are 100% Africans and that’s all we can say about them. Assuming a generation time of 30 years, this method takes us back a good 90 years! The more detailed the ancestral analysis is, the more fractions we would have and the further back in time we can go.

What are the drawbacks of this method? Unfortunately it is absolute nonsense. While the assumptions of this method have been noted in the white paper, they are completely unrealistic to the extent that their results are ridiculous. Why? Because the tool assumes that all your ancestors came of “pure” ancestry or race. This is a very problematic and dangerous assumption. The parents of the following individual: 30% Irish, 20% Jewish, 50% African could have bene produced by any other combination, such as: (60% Irish + 40% African) who married (40% Jewish and 60% African) or any other combination. On what ground can we assume that Africans did not marry non-Africans? They obviously did in the last generation. Moreover, having a 3% ancestry does not mean that you had a great-great-great grandfather (100/2/2/2/2/2=3) who used to have 100% of that ancestry. Why? Because that 3% ancestry may be shared across many populations, in which case every parent pass half of it (1.5%), but it remains at constant level by the combination of the two parents (1.5+1.5=3).

Roberta’s post here shows just how the timeline deviates from reality.


The timeline makes no sense even in the genealogical context. It remains unclear who are the ancestors of who. It is actually much worse, if populations can be split in such manner, what is the point reporting them in the first place? As I wrote here, they are meaningless social constructs and the Germans of 1870 looked nothing like Germans today (just look at the maps of Germany over time) so what is the point reporting German ancestry?

The Neanderthals who started World War I 

And this is where we get to the Neanderthals. Consider that all European5.jpgs reportedly share about 3% Neanderthal ancestry (Green et al. 2010). Does that mean that they all had a Neanderthal ancestor living 100 years ago who fought in World War I? Did they murder the heir to the Austro-Hungarian throne or were they like Woodrow Wilson and “too proud to fight”? So many questions.

In summary

The new features that aims to infer a timeline from the DNA of individuals is based on very poor models. It provides no new knowledge nor wisdom and is the outcome of very basic misunderstandings of human history. Dating models are far from perfect. Their predictions vary largely and it is very difficult to calibrate and test them due to our misunderstanding of our history. New models based on new concepts are therefore desperately needed.


Busby, G. B. J.G. BandQ. Si Le et al. 2016. Admixture into and within sub-Saharan Africa. eLife. 5:e15266.

Busby, George B. J.G. HellenthalF. Montinaro et al. 2015. The Role of Recent Admixture in Forming the Contemporary West Eurasian Genomic Landscape. Current Biology.

Gravel, S. 2012. Population genetics models of local ancestry. Genetics. 191:607-619.

Green, R. E.J. KrauseA. W. Briggs et al. 2010. A draft sequence of the Neandertal genome. Science. 328:710-722.

Loh, P. R., M. Lipson, N. Patterson, P. Moorjani, J. K. Pickrell, D. Reich, and B. Berger. 2013. Inferring admixture histories of human populations using linkage disequilibrium. Genetics. 193:1233-1254.

Marshall, S., R. Das, M. Pirooznia, and E. Elhaik. 2016. Reconstructing Druze population history. Scientific Reports. 6:35837.

Moorjani, P., N. Patterson, J. N. Hirschhorn, A. Keinan, L. Hao, G. Atzmon, E. Burns, H. Ostrer, A. L. Price, and D. Reich. 2011. The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genetics. 7:e1001373.

Pugach, I., R. Matveev, V. Spitsyn, S. Makarov, I. Novgorodov, V. Osakovsky, M. Stoneking, and B. Pakendorf. 2016. The Complex Admixture History and Recent Southern Origins of Siberian Populations. Molecular Biology and Evolution. 33:1777-1795.

This entry was posted in BLOG. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s