Genetic Genealogy through the Ages

The following is based on my talk at RootsTech 2016 sponsored by DNA Diagnostics Center (DDC) (Disclosure: I consult for the company).

Genetic genealogy has now celebrated its “Sweet Sixteen” anniversary. Yes, it has been a bit over 16 years old since FamilyTreeDNA introduced the first genetic tests that allowed assessing Y chromosomal STRs. I was in the midst of my career in the Israeli army at that time (1997-2004), but entered the field in 2010 shortly after autosomal tests became popular.

pic1.pngA A brief timeline of the major companies in the market

The Old Era

Until 2010 the market consisted mainly of companies offering Y and mtDNA analyses. Such analyses were and always shall be limited in scope due to the history of mixing in human populations. The narrative promoted by these companies was that some haplogroups are indicator of ancient history based on their frequency in modern-day populations, which were assumed to remain constant over time. They were not, but it would take another decade for everyone to realize that. When ancient DNA data became available it became clear that haplogroup frequencies fluctuated widely through time and space. And if haplogroups increased and decreased in frequency all over the world, what is the point of calling the mtDNA U4 haplogroup “Russian” when it used to be “European” in 6000 BC (Brandt et al. 2013)?

Picture1.png

MtDNA haplogroup frequencies inferred from ancient DNA data from Brandt et al. (2013). The “standard model” for mtDNA migration routes is shown in the inset (not part of the original figure). 

In 2010, autosomal DNA kits sold for about $200 and up. However it wasn’t long before newcomers introduced a lower price tag just around or below the profitability margin that helped secure their position as strong players in the field. It is therefore not surprising that over the past decade we saw little diversification in the type of tests companies offered: These can roughly be divided into kinship tests (family finders) and ancestral-breakdown (20% Irish, 30% Ashkenazic Jew, etc.). Over time the price war became even fiercer with genetic tests going at as little as $50, reflecting a loss of about $20 per kit (Illumina’s chips + reagents cost about $70). This unsustainable financial model was maintained through several alternative routes of income: selling additional over-prices tests (you don’t truly believe that a whole mtDNA sequence (16,500 bases) cost $200, when a whole-genome sequence (3 billion bases) cost $1200, right?), selling subscriptions (so it’s not $50 anymore), and selling your DNA data to external companies.

pic2.png

From DNAeXplained – Genetic Genealogy 

Around 2010, while at the School of Medicine at Johns Hopkins, I joined the Genographic Project. I was asked to design a  revolutionary microarray, one that would be slim and powerful, that eventually became known as the GenoChip 2.0 (Elhaik et al. 2013). I also developed most of the ancestry tests, introducing one of the first gene pool models to the industry, and developed the first Neanderthal and Denisovan tests. The GenoChip was the first microarray dedicated to genetic anthropology without any health or trait markers. It was small (only 150,000 markers) and powerful. Prof. Spencer Wells, the project’s leader, called it a shot to the moon and, unless removed from display, the Genochip kit can be seen at the Hall of Fame in the Smithsonian. The Genochip was accepted with mixed feelings by people who thought that more markers means a better array and others who wanted to see something new. It turned out to be a tremendous success. By 2015 The Genographic Project sold their last kit and now adopted a different approach, which I will not discuss here.

The New Era

Years passed and it was clear that genetic genealogy was going nowhere. With war prices going on, companies had little motivation to develop new concepts, and as more companies entered the field, the situation went from bad to worse. Currently all major players provide some ancestral breakdown (either gene pools or populations), but their conclusions lack accuracy, context, and scientific rigor. If there are no biomarkers for Ashkenazic Jews  (Falk 2014; Elhaik 2016) what in the world do those companies report? What does it even mean to be 20% Jew? Not eating pork only once a week? If the Cohen haplotype has been refuted (time and time again) (Tofanelli et al. 2014), why pretend that it links back to the biblical priests? If populations moved and mixed, what does it mean that a person is 20% German? The borders of Germany have been changing since it was founded. How can anyone report a Celtic ancestry if they have never seen one? And if all they do is relabel Irish as “Celts,” it is highly deceiving.

By 2014, the time was ripe for a new concept, which was when my colleagues and I introduced the Geographic Population Structure (GPS). A biogeographical or bio-localization tool that predicts the origin of a DNA with an accuracy down to home village, island, and city levels in some cases (for unmixed individuals) within a time frame of less than 1000 years. The paper was published in Nature Communications, highlighted in Science, and immediately gained global attention. It is ranked the 8th most-read paper in Nature communications.

GPS_science.gif

In 2016, GPS was employed on the genomes of 367 Ashkenazic Jews and identified Ancient Ashkenaz, where the DNA signature of Ashkenazic Jews has originated.That paper was ranked 1st mostread paper in GBE, but only 4th most-read out of all outputs of GBE. Provided that the first three places are occupied by my other papers, it is not as bad as it sounds. The press-release and follow-up articles hit one million readers in the UK after two weeks and over 10 million worldwide (see also my popular science articles in The Conversation and AEON).

Ashkenaz

GPS prediction for 367 Ashkenazic Jews cluster around four ancient villages whose names resemble the word “Ashkenaz.” From (Das et al. 2016).

A twin-publication followed up with a study of the origin of Druze in Scientific Reports demonstrating the accuracy and robustness of this approach (see also my popular science articles in The Conversation and AEON). These studies even caught the attention of the former and current members of 1600 Pennsylvania Ave NW, Washington (Elhaik 2016). You can find all media coverage here.

Capture.JPG

White House readership for the Elhaik (2016) paper, Washington DC (2017)

 

GPS Origins

In 2016, DDC entered the genetic genealogy market with GPS Origins™, a new test developed at my lab. Unlike regular GPS that provides one point of origin, GPS Origins provides two points (the most dominant ancestries), their migration routes, and time of origin. GPS Origins uses the geographical coordinates and dates to provide historical information on the region associated with migrations to help people interpret their results. It is the first and only autosomal-based test and as such it is the most accurate to the recent past while losing some accuracy as it moves back in time. The test does not report ethnicity\ancestry because these are all social constructs. They are not real. They are made up. Think of Yugoslavians, what shall we call them now if their country does not exist anymore? Can you truly classify people into 20-30 populations as competing companies do when the number of populations in the world is estimated at 5000-6000 populations?

An example of a full report can be found here and here are the results of self-identified  Ashkenazic Jew, Turkish, English, East Asian, and American individuals.

GPS Origins confirmed the half-Cuban ancestry of Katrina Gehman  (below) who was adopted at childhood and only knew that she may be half-Cuban. It must have been thrilling for her to get her results. It was also thrilling to read about it.

migration.JPG

GPS Origins results for 

GPS Origins uses 36 gene pools (not “ancestries”) and over 800 global populations that act like satellites to steer the DNA to the right place. Why so many? Because this is what it takes for accurate bio-localization. You can compare the 36 gene pool model to a model used by a different company that has low resolution in Asia and America.

pic3.pngA comparison of GPS Origins’ 36 gene pool and the ancestries provided by a competing provider.

Recently, Dilawer Khan (EurasianDNA.com) who studies East and North Eurasia has reported in his excellent blog that of all existing DNA tests, GPS Origins captures ancestry best.

The main takeaway for me was that West and South Asians have considerable North and East Asian ancestry that is not well represented by programs such as ADMIXTURE, or by results from companies such as 23andMe (except for their admixture date estimator), AncestryDNA, or FTDNA. The company GPSOrigins seems to capture that ancestry better.

With one third of the gene pools dedicated to Asians and hundreds of reference populations covering the entire continent, this is hardly a surprise.

Another piece of good news is that GPS Origins operates on markers shared between all existing microarrays and thereby users from other companies can also get the results without taking a new DNA test. However, that was exactly the problem.

The DREAM microarray

None of the commercial Illumina microarrays were designed for genetic genealogy. They were designed for medical purposes and had known biases toward Europeans (as we saw above). It is impossible to develop additional new concepts (or “apps”) with these microarrays. For that reason, I designed DREAM, a new microarray that can support concepts that do not yet exist. The difference between DREAM and the old-generation arrays is the same as between smartphones and plain cell phones. They can both make phone calls and text one another, but only smartphones allow running apps. In other words, some of the tests that would be developed on DREAM may work on the old arrays, but not all tests. We’ll do our best to support to all microarrays, of course.

If GenoChip was a shot to the moon, DREAM is a shot to Mars (with credit to Spencer Wells for the original analogy). It allows once again the development of novel concepts that are not supported by the old-generation microarrays and has a potential to move this field beyond the traditional haplogroup, kinship, and ancestral breakdown.

What now?

After two microarrays (and more to come), half a dozen genetic tests for two leading DTC companies, and numerous papers in population genetics, I can confidently argue that the world does not need more ancestral-breakdown tests nor kinship tests that are pretty good in finding 3rd degree relatives (whom you probably already know) but no more than that. If this is the best idea you have got – keep thinking. Rather, it is time to think of new-concepts that can accurately describe our shared history. Don’t bother wasting time deciding if you are 10 or 20% Italian or Spanish Jew. What’s the point? South “Italians” are remarkably similar to “Greeks” and the last Spanish Jew left Spain 500 years ago, but no test would tell you that. Instead, ask yourself what type of tests can stretch our imagination and help us learn something new. Men’s and women’s reach should exceed their grasp if they would only dare to dream.

A handout is available here.

Advertisements
This entry was posted in BLOG. Bookmark the permalink.

3 Responses to Genetic Genealogy through the Ages

  1. Eric Marcus Lent says:

    Thank you for your persistence in refining the DNA testing science .I found our false genealogy in a 1912 book located in the Library of Congress.Many thousands was spent based on one easily found book.DNA testing has revealed my true ancestry.And Scythian and Caucasus Mountain matches and Altaic are well documented.

  2. Pingback: The Neanderthals who started World War I | Khazar DNA Project

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s