The Downfall of 23andMe: Why the DNA Testing Company is Doomed to Fail

23andMe, once a pioneer in the DNA testing industry, is facing an uncertain future. Despite its early success in offering affordable and accessible genetic testing to consumers, there are several reasons why the company is destined for failure.

Privacy Concerns: One of the biggest issues surrounding 23andMe is the privacy of genetic data. Customers who use the service are required to provide their DNA samples, which contain highly sensitive and personal information. It is known that 23andMe sold customer data to the pharmaceutical company GlaxoSmithKline (GSK). Surprise? not really, 23andme never hid its intention to capitalize on customers’ data. These privacy concerns have eroded consumer trust, leading to a decline in customer demand and ultimately, business failure.

Regulatory Challenges: The regulatory landscape surrounding DNA testing is complex and constantly evolving. 23andMe has faced regulatory challenges from the U.S. Food and Drug Administration (FDA) in the past, including being ordered to stop marketing its health-related genetic tests without proper FDA approval. Compliance with changing regulations can be costly and time-consuming, and failure to do so could result in legal penalties and damage to the company’s reputation. 23andme has lost a major chunk of its workforce (e.g., 14% of the workers were laid off in 2020), they may struggle to invest in regulations.

Limited Market Reach: While 23andMe initially gained popularity for its direct-to-consumer DNA testing kits, the market for such services may be reaching saturation. The company’s core product, the genetic testing kit, has become commoditized, with numerous competitors offering similar services at lower prices. As a result, 23andMe’s market share is declining, making it difficult for the company to sustain its growth and profitability. 23andme already responded with a round of layoff in 2020 (14% of its workforce) and as of now employs 768 people.

Lack of Diversification: Another weakness of 23andMe is its lack of diversification. The company primarily relies on its genetic testing services for revenue generation, with limited offerings beyond DNA testing. This lack of diversification leaves the company vulnerable to changes in consumer preferences and market dynamics. As the demand for genetic testing wanes or shifts to other providers, 23andMe may struggle to adapt and diversify its revenue streams.

Competitive Landscape: The DNA testing industry has become increasingly competitive, with numerous players entering the market and offering similar services. This increased competition has led to price pressures, making it harder for 23andMe to maintain its profit margins. Additionally, larger companies with more resources and established brand names have entered the genetic testing space, posing a significant threat to 23andMe’s market share. The kit’s $100 price remained unchanged for years. After correcting for inflation, 23andme is actually making $70.

The lack of meaningful engagement and long-term retention of customers. While the initial novelty of receiving genetic testing results may attract customers, the company’s ability to retain customers and generate repeat business is limited. Unlike subscription-based models that provide ongoing services or products, 23andMe’s one-time genetic testing service may not lead to sustained customer engagement, reducing the company’s ability to generate recurring revenue. Today, 23andme has a subscription model, but it is geared toward diseases and has a relatively small number of subscribers.

Scientific Limitations: The list here, is a bit longer.

  • While 23andMe’s test provides some valuable genetic information to consumers, it has many limitations in terms of the accuracy and comprehensiveness of its testing. The tests are limited to specific genetic markers and do not provide a comprehensive picture of an individual’s health risks or conditions. In other words, if you are worried about breast cancer, see a real doctor that will run a full test of all 1000 known mutations, don’t buy 23andme’s test that has only 3 mutations! As consumers become more knowledgeable about the limitations of genetic testing, they seek more robust and accurate options, leading to a decline in demand for 23andMe’s services.
  • 23andMe’s reliance on self-reported data, on the one hand, and limited genetic markers, on the hand, other may raise concerns about the accuracy and comprehensiveness of its testing. The company’s genetic testing is primarily based on self-description, where customers provide information about their traits, health conditions, and other factors. However, self-reported data can be subjective and may not always be accurate. Additionally, 23andMe’s chips, which are used to analyze DNA, have changed over the years, resulting in different chip designs that may not capture the same information from patients. This inconsistency in data collection may impact the accuracy and reliability of the results provided by 23andMe, raising questions about the company’s testing methods. 23andme masks this information in their investor reports in two ways a) claiming that they have “genomes” (most of their data are a fraction of the genome), b) claiming that they can impute >35M SNPs (without reporting the accuracy).
  • 23andMe’s use of tools like Principal Component Analysis (PCA) to control for population structure in its genetic testing may also be a cause for concern. PCA is not a reliable tool to correct for population structure, as I recently reported. Using it results in misleading or inaccurate results. This reliance on potentially flawed tools may impact the validity of 23andMe’s genetic testing and raise questions about the scientific rigor of its methods.
  • 23andMe’s health focus made it neglect the ancestry aspect of its report, the feature which attracts most of its customers (at least those who don’t prefer 23andMe to a doctor visit). This is not surprising, the company DOES NOT make a lot of money on the ancestry report. It is only a teaser to sell the health report, which costs the company almost nothing to generate (as the customer has already purchased the kit). As a result, 23andme tests lack innovation and rely heavily on “influencers” to promote sales. Whoa re these people? That’s in the next paragraph.

Why no one else is talking about it? We live in a world where journalism is nearly dead. Trends are created and promoted by bloggers and other paid writers who act as affiliates of 23andme and alike companies. They get % of the sales they send to the company. They have no interest in criticizing it. They will happily criticize companies that do not pay them as it will give them another chance to promote their cash cows. Impressed by 23andme constantly ranked as the “best” genetic kit? Don’t be. You can also be #1, #2, or even #3, it all depends on your budget. Top10, for example, has a warning “The listings featured on this site are from companies from which this site receives compensation. This influences where, how and in what order such listings appear on this site.” Many sites won’t even report that. Influencers typically take 10% of the sale, which is about $10 in the case of 23andme, so those $70 net, are more like $60…

Why the company loses money? It costs the company $35-40 to run a test. If they make $60 on the ancestry kits, they have very narrow margins. In 2022 the company added (here and here) 1.2M users. Who are these users? We don’t know. Did they purchase the plain kit or the “health” kit? We don’t know either. Let’s assume that 80% of the users got the ancestry kit ($99), in which case the company made, $20*1200000*0.8=$19.2M, If 20% bought the “health” kit ($199), the company made, $120*1200000*0.2=$28.8M => let’s call it $50M, net profit. This is simply NOT enough to sustain a company of 1000 employees with an average salary of about $150K (it’s California).

Assuming an average salary of 150K/year for its 768 employees, the company already pays 115M in salaries alone! Add to that $35M in rent, equipment costs, and marketing and that’s how much money the company lost in 2022 ($151M). Of course, this is a VERY simplistic calculation, but it gives an idea of the costs, which are not shown in the spreadsheet available to investors.

Promises, promises, and more promises…. 23andme’s only hope is to stumble upon a promising marker that would result in a great drug. But don’t hold your breath for that. According to their last investor’s report, the company is not even close to finding the next drug:

Only one product(?) is about to come out of phase 1. 23andMe knows that most drugs fail the process (its own report says 90% failure rate). However, what they say is that stratification bias (which, as you may recall, is controlled using PCA) plays a major part in those failures. This is quite disappointing for a company that repeatedly promises to reduce the time it takes to develop new drugs from 7 years to… well the company started in 2015, so we are at 8 years and counting. With $400M in its registry and $150M in losses per year ($180M losses are projected for 2023), it’s hard to see how 23andMe survives to see any drug coming out of phases 3 & 4 before the company runs out of cash ($411M as of now).

In conclusion, while 23andMe initially gained attention and popularity as a pioneer in the direct-to-consumer DNA testing industry, there are many factors including privacy, regulatory challenges, reliance on inaccurate data and tools, limitations of its genetic testing methods, lack of diversification, increased competition, and more. Due to these obstacles, 23andMe struggles and may face major challenges in the long term. The company is not profitable and is pouring money.

As of now, the road ahead for 23andMe appears uncertain, and the company may face significant challenges in maintaining its dominance in the DNA testing market.

So what? Here is a question for the audience. What will happen to the amazing customer data (genotype and annotation) when 23andme would go bankrupt? I have no idea, but if you have your data with that company, you should have some idea.

Update 5/2023

As of now, 23andme’s stock is below $2 and its strategic partnership with GSK has ended. In other words, GSK decided NOT to invest any more money in 23andme. CEO Wojcicki is fishing for new pharma companies, those who will see something in 23andme that GSK no longer see.

A clarification. This is not a recommendation to buy or sell 23andme products or shares.


Reconstructing the first Levites

The biblical Levites were a group of people from the tribe of Levi set apart for religious service in ancient Israel. The most famous Levite was Aaron, Moses’s big brother and the first priest. The Levites played an important role in the worship and rituals of the Israelites. Their duties included serving as priests, caring for the tabernacle and later the temple, and teaching and interpreting the law. The Levites were not shy about using force to enforce their will.

drawing by Chrispijn van den Broeck (public)

During the time of Solomon’s temple, the temple work was divided into 24 priestly divisions, corresponding to 24 priestly lineages (whose connection with Aaron’s lineage is not fully explained). After the Babylonian exile, most of these lineages chose to remain in Babylon, and only four joined Ezra and Nehemiah. After the establishment of Herod’s temple, the duties of the four families have been divided again into 24 divisions.

Throughout the history of Israel, the Levites remained an important part of the religious life of the nation. However, their role evolved over time, and by the time of Jesus, the Levitical priesthood had become corrupt and needed reform. Jesus, for example, was critical of the religious leaders of his day, whom he saw as having lost sight of the true purpose of their role. Such criticism evolved into more complex conflicts between the public, the leadership elite, and the priestly elite, which tore the people and resulted in a civil war that culminated in the destruction of Jerusalem and Herod’s temple. Records of the priests have largely discontinued following the destruction.

Copy of Roman Triumphal arch panel from Beth Hatefutsoth, showing the spoils taken from Jerusalem temple. ( CC BY 3.0 )

However, curiosity concerning the genetic legacy of the Levites did not disappear. This prompted a series of genetic studies in the last 1990s that aimed to find common genetic markers in the Y chromosome (inherited from father to son, just like surnames) between individuals that carry the surname “Levi.” Initially, these studies reported a small number of mutations, termed the Cohen Modal Haplotype (CMH), found in a third of the people with the “Levi” surname, compared with the controls. These studies caused a major excitement in the field, and the public, envisioning that true genetic markers of the Aaron lineage survived for thousands of years along with the Levite legacy even before surnames were invented. Some scientists have criticized these studies for selecting mismatched controls of European descent that lacked the CMH. Those concerns were confirmed when later studies observed the CMH throughout the middle east and Africa, confirming that the CMH was not unique to Jews and was no more than an African-Middle Eastern signature because it was unreasonable to expect so many carriers of the truly Levite lineage. Moreover, “Levi” is the most popular surname in Israel, second only to “Cohen,” a smaller priestly cast selected from the Levite group. It is unreasonable to expect them all to descend from the same ancestors, and even if some individuals carry the markers of an ancient patriarch, there is no way to know which ones are the true Levites. The choice of markers carried by one-third of the participants was random.

Ethiopian Jews. (Government Press Office, Israel / CC BY-SA 3.0 )

Geneticists were not the only ones asking these questions. As the field of bible criticism expanded, researchers began challenging the Exodus narrative, in which the Israelites are said to have been enslaved in Egypt and then led out of bondage by Moses. Scholars have pointed out that there is a lack of historical evidence to support the biblical account of the Exodus, including the absence of records of running slaves. The main arguments against the biblical Exodus were the absence of any historical or archaeological evidence to support the idea that the Israelites were enslaved in Egypt. Indeed, there are no known Egyptian records of the enslavement of a large group of people and no archaeological evidence of a sudden and large-scale departure of people from Egypt. Many scholars have pointed out that the biblical account of the Exodus is rife with anachronisms and inconsistencies. For example, the Bible describes the Israelites wandering in the Sinai desert for 40 years, but there is no evidence of a large population living in the desert for such a long period of time. There is also no evidence of the Israelites living in Egypt during the period in which the Exodus is supposed to have occurred and no evidence of a large-scale migration of people from Egypt to the land of Israel. Of course, the absence of evidence is not evidence of absence, but the exodus narrative is not supported by any evidence, and the doubts cannot be brushed away. If the Exodus did not occur, we must then ask, who were the Levites? Who invented those stories, and for what purpose?

Departure of the Israelites by David Roberts. ( Public domain )

Ancient DNA and the Egyptian Origins of the Levites?

Modern bible scholars like Israel Knohl and others have put forth a theory concerning the Egyptian origin of the Levites. According to this theory, the Levites were originally an Egyptian cult that migrated to Israel and merged with the native Israelite population.

This theory is based on several lines of evidence, including historical, archaeological and linguistic data. For example, the similarity between the names of some Levitical figures, such as Moses and Aaron, and names of Egyptian origin. This suggests that the Levites may have had close cultural and religious ties with Egypt.

A cult of priests associated with the god Seth may have left Egypt at some point and later merged with the Israelite population to form the tribe of Levites . This is the most supported theory in modern research.

A High Priest of Amun in Thebes from his Book of the Dead. (British Museum / CC BY-SA 2.5 )

With the availability of ancient DNA data, geneticists can get back and lead the research on Levites. Ancient DNA is the DNA that has been preserved in ancient human remains, such as bones or teeth. By extracting and analyzing the DNA from these ancient samples, researchers can reconstruct the genetic makeup of ancient populations and study their evolutionary history.

Considering the Egyptian origin of the Levites and the time of their origin, myself and others were able to reconstruct their ancient DNA from the available genomes in the region, producing, for the first time, a complete reconstruction of the original group of Iron Age Levites (1250 to 1170 BC). The most important advantage of this approach is that it allows anyone to compare their DNA with the ancient Levites’ DNA to test their genetic ancestry and study their priestly legacy.

The Ancient DNA Origins DNA test provides powerful DNA-based tools to help people find ancient ancestors, trace lineages and determine ancient tribal origins. ( Ancient Origins DNA )

It is important to note that ancient DNA data are continuously improved both in size and quality, allowing us to continue and update the Levite reconstruction over time. The search for the origin of the Levites is one the most exciting and important endeavors in modern genetics that can deepen our understanding of ancient Jewish history and culture as well as the history of populations in the ancient world.

Unearthing Biblical Ashkenaz, the motherland of Ashkenazic Jewry and Yiddish

For the past 1,000 years or so, the search for the land of Ashkenaz — thought to be the birthplace of Ashkanazic Jews and the Yiddish language — is one of the longest quests in human history. It is perhaps second only in length to the search for Noah’s Ark which began in the 3rd century AD. But, whereas the golden evidence for Noah’s Ark may be the wooden residues of a shipwreck with some animal remains dated to the first millennia BC, the identification of ancient Ashkenaz mentioned in the bible only three times (twice in the same context) requires a different type of evidence, such as the discovery of place names crossed with genetic or historical evidence.

The place name Ashkenaz occurs three times in the Bible, but by the Middle Ages the exact origin of Ashkenaz was forgotten and as of now, there are no known place names bearing the name “Ashkenaz” that can indicate the origin of Ashkenazic Jewry. Ashkenazic Jewry was first found in Germany, but only in the Middle Ages. Because of the migration of the Ashkenazic Jews, it later became associated with Germany. This led to all German Jews being considered “Ashkenazic”, a term which was then applied to central and eastern European Jews who follow Ashkenazic religious customs and who speak Yiddish.

The Yiddish language — which consists of Hebrew, German, and Slavic elements and is written in the Aramaic alphabet— has been spoken at least since the 9th century AD, but its origins have been debated by linguists for several centuries. While some have suggested a German origin, others believe a more complex beginning for the language, starting in Slavic lands in Khazaria — the Middle Age Khazar Empire that covered present-day southern Russia, Kazakhstan, Ukraine, and parts of the Caucasus — and followed by Ukraine, and finally Germany. Although the language adopted a German vocabulary it retained its Slavic grammar — which is why Yiddish is often referred to as “bad German”. The inability of linguists to reach a consensus has led some to decry that the mystery of where Yiddish came from will never be solved.

In 2014, my lab pioneered a new genetic tool that converts genome data into ancestral coordinates, termed the Geographic Population Structure (GPS) — which operates in a similar way to the GPS or sat nav in your car. We reasoned that applying GPS to the genome of Ashkenazic Jews who are sole Yiddish speakers (or their direct descendants), can be used to pinpoint the origin of their DNA. In the largest genomic study of Ashkenazic Jews, and the first one to study Yiddish speakers, we applied our Geographic Population Structure (GPS) tool to the genomes of more than 360 Yiddish and non-Yiddish speaking Ashkenazic Jews.

Surprisingly, our GPS homed in on north-east Turkey, where we found four primeval villages, one of which was abandoned in the mid-7th century AD.

The DNA of Yiddish speakers originated from four ancient villages in northwest Turkey.

A historical review of the region uncovered four ancient villages — Iskenaz, Eskenaz, Ashanaz, and Ashkuz — whose names derived from the word Ashkenaz all clustered close to the Silk Road — the ancient network of trade routes. And it is likely that these are the villages that mark the location of the lost lands of Ashkenaz.

A map depicting the predicted location of Jewish (triangles) AJs. The locations of the four villages derive their names from “Ashkenaz,” and adjacent cities are noted.

The history of a people

Located on the crossroads of ancient trade routes, the region we termed “Ancient Ashkenaz” suggests that the Yiddish language was developed by Iranian and Ashkenazic Jews as they traded on the Silk Road from the first centuries AD to around the 9th century when they arrived in Slavic lands. The Silk Road was an extensive network of trade routes that connected the East and the West, facilitating the exchange of goods, culture, and ideas between the two regions. This network of trade routes is believed to have been established as early as the first century AD and remained active until around the 15th century. During this time, “Ancient Ashkenaz” was a hub of commercial activity, attracting traders from all over the world. Iranian and Ashkenazic Jews were among these traders, and as they traveled along the Silk Road, they came into contact with different cultures and languages, which likely influenced the development of Yiddish.

Putting together evidence from linguistics, history, and genetics, we concluded that the ancient Ashkenazic Jews were merchants who developed Yiddish as a secret language — with 251 words for “buy” and “sell” — to maintain their monopoly. They were known to trade in everything from fur to slaves. Yiddish is primarily the language of traders, not scholars or warriors.

By the 8th century the words “Jew” and “merchant” were practically synonymous, and it was around this time that Ashkenazic Jews began relocating from ancient Ashkenaz to the Khazar Empire to expand their mercantile operations. This Jewish migration led the Turkic Khazar rulers and numerous eastern Slavs living within the Khazar Empire to convert to Judaism so they didn’t miss out on the lucrative Silk Road trade between Germany and China. This was not the only religious reform influenced by Jews, throughout history Jews survived by supporting the rulers and the elite, who in turn converted to Judaism or supported the Jews. Examples of that extend over two millennia from Queen Helena of Adiabene, who lived in the first century AD and was a queen of a small kingdom in present-day Iraq to Ivanka Trump.

The demise of Khazaria due to continued invasions and finally the Black Death devastated this last Jewish Empire of Khazaria. This led to the Ashkenazic Jews splitting into two groups — some remaining in the Caucasus and others migrating into eastern Europe and Germany. The two groups still called themselves Ashkenazic Jews, however, the name Ashkenaz became more strongly associated with Germany and the European group — for whom Yiddish became their primary language.

A secret language

Since north-east Turkey is the only place in the world where the place names of Iskenaz, Eskenaz, Ashanaz, and Ashkuz exist and since the region is part of a much larger region known in ancient times as Ashkenaz, this strongly implies that Yiddish was established around the first millennium at a time when Jewish traders moved goods from Asia to Europe. This was done by developing the language of Yiddish, which very few can speak or understand other than Jews.

Further evidence of the origin of Ashkenazic Jews can be found in many customs — such as the breaking of a glass at a wedding ceremony and placing stones over tombstones, which were probably introduced by Slavic converts to Judaism and were reported by linguist Paul Wexler.

By studying the origin of Yiddish using our GPS technology, combined with a citizen science approach, we were able to shed light on one of the most forgotten chapters of history and demonstrate the use of bio-geographical genetic tools to study the origin of languages. For Ashkanazic Jews these are the ties that bind their history, culture, behavior, and identity.

The genetic ancestry quest

Three weeks after this study was published it was read by 10M people worldwide. A follow-up study was published a year later, addressing some of the issues that have been debated and expanding the analysis to include ancient DNA evidence (links are at the bottom). One of the most common questions that I am asked includes why genetic tests taken by Ashkenazic Jews do not point individuals to ancient Ashkenaz and why non-Jews appear to have “Jewish ancestry.” The answer is simple. Genetic tests are the bread and butter of genetic testing companies and are designed to maximize their customer‘s satisfaction, rather than their knowledge. They also lack localization ability and, instead report “ancestry.” In the case of Jews, their ancestry would be reported as the Middle East, i.e., Israel. You can read about it in my blog (and also here and here). Ancestry tests are typically biased and should be avoided if you search for the truth. In 2015, after developing GPS, I developed GPS Origins for HomeDNA. This is the only unbiased test that provides a detailed view of your gene pool and the geographical origins of your maternal and paternal lines. The localization of ancient Ashkenaz in Turkey also explains why non-Jews are often “mistaken” by DTC companies as Jews. Turks (or their ancestors) and Ashkenazic Jews share a long history from their time in Ashkenaz lands. Note, those interested in their most ancient origins are advised to take the Ancient DNA Origins tests that will compare your DNA to the ancient DNA of ancient populations, including the lost Tribes of Israel (before their exile).

The 2016 original study:

The 2017 follow-up study:

How math was recruited to invent the Jewish people

Who is a Jew is a debate that finds its roots in the Iron Age Kingdom of Judah and its conquest by the Achaemenid Persian Empire. The Hebrew word ‘Yehudi’ (Jew in English) has been used at least since 539 BCE to refer to the inhabitants of the conquered Kingdom, now called Yehud. ‘Yehud’ contributed its name to ‘Yehudi’ (Jew), initially for administrative purposes to denote someone from Yehud. However, it was not so simple. The Judaean deportees were the first to grapple with the challenges posed by this new concept of ‘Yehudi.’ They were from Judea, but they also retained their Israelite tribal affiliations. Should Yehudi be used as an honorific or disrespectful term (to criticize the people of Yehud)? The terms ‘Yehudi’ and ‘Yehudim’ are very rare in the Bible, appearing only 75 times, mostly in the Book of Esther, which shows how it was gradually adopted. At that time, circumcision began spreading for various reasons unrelated to religion. While over time, it became a hallmark of Judaism, it was also it’s undoing as it repelled new believers and prompted the successful spread of Christianity. As circumcision has always been practiced by non-Jews, it could never be the hallmark it was envisioned to be. The millennia-old debate concerning “Who is a Jew?” thereby persisted to our time, eventually becoming a pseudo-scientific question that geneticists have begun to tackle since the end of the 20th century.

Geneticists typically seek DNA markers (mutations) unique to specific groups and allow differentiation of one group from others. Over the years, several candidate markers — such as the Cohen-modal haplotype on the Y chromosome allegedly identifying members of a priestly class and even the BRCA genes — were proposed as genetic hallmarks for Jews. However, none of those markers performed as hoped, i.e., existed in most/all Jews while absent from most/all non-Jews and the search continued (Elhaik 2016).

In the early 21st century, scientists no longer bothered themselves with individual markers that, frustratingly enough, popped up later in non-Jewish populations. Instead, they analyzed hundreds of thousands of markers altogether, using complex mathematical tools that they did not fully understand. One of those mathematical tools, Principal Component Analysis (PCA), allowed condensing the complex genetic dataset into a much simpler dataset that could be visualized by a simple, colorful scatter plot. In a previous article, I demonstrated the dangers of this tool and how it shaped the political career of Sen. Elizabeth Warren. From these articles, it should be clear how anyone can produce their favorite results using PCA and why PCA became geneticists’ best friend forever, northern star, crystal ball, used tea leaves, and Wish Bear — all combined.

No sooner was PCA harnessed to tackle the millennia ol’ question — “who is a Jew?”

In 2009, David B Goldstein’s (2009) led a study that claimed, based on PCA, that Jews (i.e., Ashkenazic Jews [AJs]) are genetically distinct from non-Jews (i.e., Europeans).

Goldstein et al. (2009), Link to the paper

Their PCA results were a devastating blow to the “Jews are not a race” proponents. Goldstein concluded that AJ genomes carry an “unambiguous signature of their Jewish heritage… this seems more likely to be due to their specific Middle Eastern ancestry than to inbreeding.”

Other authors followed these footsteps, cementing the racial identity of Jews and their Levantine Biblical-like origins and enshrining PCA as the ultimate Truth Sayer device on ancestry, genealogy, history, evolution, epidemiology, and biogeography — all in one plot! After all, math has spoken! “The evidence for biological Jewishness has become incontrovertible,” declared Harry Ostrer (2012) and offered to settle land disputes in Israel according to the magnitude of the Middle Eastern ancestry in one’s genome, in line with the Zionist vision, at least the way he understood it. Ostrer’s offer was extremely generous towards the Palestinians and Bedouins, whose genome has 56–59% of that ancestry (Das et al. 2016), compared to AJs, who are already a minority between the Jordan River and the Mediterranean Sea, with only 50–0% Middle Eastern component (Elhaik 2017).

Putting aside Ostrer’s gift for diplomacy, the question of the prophetic powers of PCA, remains: can it really be used to differentiate Jews from non-Jews without even being curious about their genitalia? This was no longer a theoretical question as direct-to-consumer ancestry companies, like 23andme, already adopted PCA to assess ancestry, disease risk, and “cultural traits,” whatever that is. No sooner, “genetic Jewishness” became a product to be purchased, and genetic Ashkenazic origins — a trophy to cherish, no matter how minuscule that trophy was. Math took over where orthodoxy failed and picked up the fight lost 2000 years ago now to Christianity, offering the shortest possible route to Jewishness with intactivists welcome more than ever. But was it real?

In my recent paper, I showed that PCA results are not reliable, robust, or replicable. I demonstrated how expert users could easily manipulate PCA to generate any desired results (as ridiculous as they may be). Is it possible that this is what Goldstein and his colleagues did? To answer this question, let us first replicate their result along with their poor terminology (A in the figure below). Using the same approach, I can use PCA to show that Turks are distinct from non-Turks (B). Are they also a race? I can show that AJs and Turks either cluster, which by PCA logic indicates identity © or not, just because (D) and that AJs cluster with Spaniards (D), creating conflicting results.

Elhaik (2022), Link to the paper.

The trick with PCA should be evident by now! One can select the markers, number of individuals, and populations that will almost always give us the desired results (Here, I only manipulated the populations). Showing that PCA creates conflicting results should be enough to disqualify it as a scientific utility, yet, although scientists noticed that, they continued going back to their Wish Bear, drawing further conclusions about AJ’s origins. Let us examine these claims too.

The next test series (see the figure below) showed that AJs (solid green circles) are a “population isolate,” a group separated from all other populations (A) as the “Jews are a race” school uphold. I can also show that AJs cluster with Caucasus populations in support of their origin from Ancient Ashkenaz (Das et al. 2016) (B). I can show that AJs cluster with Amerindians, which must be due to the north Eurasian or Amerindian origins of both groups ©. Could these exciting results be used to support legal claims for Jewish resorts with Casinos in places like Brooklyn? I can also show that AJs cluster closer to South Europeans than Levantines (D) and may be entitled to EU passports! AJs who can no longer take Tel Aviv’s heat and humidity may find relief in their overlap with Finns, solid evidence of their ancient Finnish origin (E). Those who insist on living in the Promised Land can find comfort in the last analysis, which not only refutes all our previous findings but also proves that half of the AJs are of Finnish origin and the remaining have the lucrative Levantine origin. I can only hope that each half will find their grouping satisfactory.

Elhaik (2022), Link to the paper.

These examples demonstrate how genetic tools can be abused to support imaginary historical narratives. PCA earned its place as the most popular tool in genetics precisely exactly because of its great flexibility, which means that none of those results can be trusted.

Looking at the enthusiasm of scientists for PCA, remind me of Shakespeare’s Macbeth who probably described it best “a tale Told by an idiot, full of sound and fury, Signifying nothing,” or in a free translation to modern English, “anyone can do PCA and use it to create a fancy plot that tells a great story that lacks any statistical significance.”

The question of “Who is a Jew?” shall remain open, perhaps forever, as it was never posed as a scientific question but rather as a dilemma forged by the unique historical circumstances in Yehud. The conversions made by Rabi PCA will need to be undone, at least for those who did not ask for them in the first place.

The fate of the Ten Lost Tribes and the people of Yehud are some of the most fascinating questions in history, and fortunately, we do not require PCA to answer them. Instead, projects like Ancient DNA Origins that employ novel machine learning tools with ancient DNA from Israel (full disclosure: to which I contributed) to study the heritage and legacy of the ancient Israelites, and our connection with them has already made remarkable discoveries.


Das R, et al. 2016. Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz. Genome Biol. Evol. 8:1132–1149.

Elhaik E. 2016. In search of the jüdische Typus: a proposed benchmark to test the genetic basis of Jewishness challenges notions of “Jewish biomarkers”. Front. Genet. 7.

Elhaik E. 2017. Editorial: Population Genetics of Worldwide Jewish People. Front. Genet. 8.

Need AC, et al. 2009. A genome-wide genetic signature of Jewish ancestry perfectly separates individuals with and without full Jewish ancestry in a large random sample of European Americans. Genome Biol. 10:R7.

Ostrer H. 2012. Legacy: a genetic history of the Jewish people. Oxford: Oxford University Press.

Elhaik E. 2022. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Sci. Rep. 12:14683.

Who needs Ancestry informative markers (AIMs)? Are they still useful or relics of the past?

To say that my criticism of PCA rattled some people would be an understatement, and while a pushback against (or just ignoring) the results was expected, some appear to attack the mere concept of AIMs and their usefulness. I am not even certain how it challenges the criticism, but for their benefit – and for anyone else who wishes to grasp how population geneticists get things done – I put together this AIMs primer.

What are Ancestry informative markers (AIMs)?

Ancestry informative markers (AIMs) are specific genetic markers that are chosen for their ability to reveal information about an individual’s ancestry and geographic origin. They are particularly useful for understanding the complex histories and migration patterns of different populations and for identifying genetic variations that may be associated with particular diseases.

An example of an ancestry informative marker (AIM) is the genetic variant known as rs1426654, which is located on chromosome 9. This variant is found at a high frequency in East Asian populations and at a low frequency in European populations. Therefore, the presence of this AIM can be used to infer ancestry in individuals with East Asian ancestry. We can use the Geography of Genetic Variants Browser to examine this marker.

AIMs are typically chosen based on their ability to differentiate between different populations or geographic regions. They are often used in genetic research to identify genetic patterns and variations that are associated with particular populations and have a wide range of applications, including in the fields of biogeography and disease identification

These properties make AIMs a crucial tool in genetic research and have a wide range of applications, including in the fields of biogeography and disease identification. In the field of biogeography, AIMs have been used to trace the migratory patterns of different populations and to understand how human populations have dispersed and interacted with each other over time. This information is vital for studying the evolution and history of different populations, as well as for developing strategies for conserving biodiversity. AIMs can also be used to identify the geographic origins of ancient human remains, providing valuable insights into the movements and interactions of past civilizations.

I used AIMs to study the genetic history of different populations, including the development of the GenoChip and the DREAM and mini-DREAM arrays – all utilized large AIM panels to accurately identify an individual’s geographic ancestry and geographical origins. AIMs are the foundation of GPS, and GPS Origins, which allows tracing the geographical origins of your parental lines and their migration routes over the past 1000 years.

In addition to their use in biogeography, AIMs are also important in identifying disease markers. Certain genetic variations are more commonly found in certain populations and may be associated with an increased risk of certain diseases. By identifying these markers, researchers can more accurately predict an individual’s risk of developing certain conditions and tailor treatment plans accordingly. AIMs can also be used to identify genetic variations that may be associated with drug metabolism, allowing for personalized medicine approaches that take into account an individual’s unique genetic profile.

Are AIMs still being used in genetics or are they yesterday’s news?

AIMs are still being used extensively in genetic research because they are simply part of our genome. Here are some examples of literature sources from 2020-2022 that discuss the use of AIMs in genetic research:

What would genetics look like without AIMs?

Without AIMs, genetic studies would likely be significantly limited in their ability to accurately infer ancestry and understand the genetic structure of different populations. AIMs are specifically chosen for their ability to reveal information about an individual’s ancestry and geographic origin, and are particularly useful for identifying genetic patterns and variations that are associated with particular populations or geographic regions.

Without AIMs, genetic studies would likely have to rely on other types of genetic markers, which may not be as effective at identifying ancestry. This could make it more difficult to accurately infer the ancestry of individuals and to understand the genetic structure of different populations. It could also limit our understanding of the evolutionary history and migration patterns of different populations, as well as our ability to identify disease markers that are associated with particular populations.

In addition, without AIMs, genetic studies may be limited in their ability to accurately predict an individual’s risk of developing certain diseases or to tailor treatment plans accordingly. AIMs can be used to identify genetic variations that may be associated with drug metabolism, allowing for personalized medicine approaches that take into account an individual’s unique genetic profile. Without AIMs, these personalized medicine approaches may not be possible.

Overall, the use of AIMs in genetic research has greatly enhanced our understanding of human ancestry, biogeography, and disease susceptibility.

Warning: scholars are after your child’s penis

“More than the Calf Wants to Suck the Cow Wants to Suckle” from the Jewish Talmud (Pesachim 112a) teaches us that sometimes we want to provide something more than is possible. I thought about it today when I read a study published in the journal AIDS and Behavior.

The study titled “The 1982 Medicaid Funding Cessation for Circumcision in California and Circumcision Rates” was authored by Linfield et al. (2022) and funded by the University of Kansas. The abstract:

We investigated California’s 1982 decision to stop funding Medicaid neonatal circumcision. We examined male neonatal circumcision rates for those born 1977–1981 and 1983–1987 by region, race, and insurance status. Overall, West-Medicaid circumcision rates decreased from 56.5% in 1979-81 to 26.7% in 1983-85. California’s 1982 decision to defund Medicaid circumcision coverage was associated with a 25.0-30.8% point decrease in West-Medicaid circumcision rates compared other groups, p < 0.01. This provides the earliest data to support that funding coverage for neonatal circumcision affects circumcision rates and magnifies healthcare disparities. Other states have since defunded Medicaid male neonatal circumcision. Circumcision have been associated with lower rates of sexually transmitted infections including HIV, and urinary tract infections. Lawmakers should consider re-funding Medicaid male neonatal circumcision.

In other words, the authors showed that the reduction in Medicaid in California was linked with a decrease in circumcision rates (DAhhh). And why is this important? because circumcision is propagated to protect against HIV. So, if this is the case, why not show that HIV went up in California?

Well, because it’s not true. This figure is from the HIV Epidemiology Annual Report County of Santa Clara 2018

As you can see HIV picked up in the 80’s and has been going down since the 90’s and remain consistent since the 2000’s.

These trends are NOT correlated with the reduction in circumcision from Linfield et al. (2022).

In fact, Linfield et al.’s study has NOTHING to do with AIDS or behavior. So, why was it published in an AIDS journal? Because the journal’s editor is Seth Kalichman who spent his career pushing the BS circumcision story. This is what scientists do when they have no fresh ideas, they circulate the same story over and over. This goes back to what the Talmud teaches us at the beginning, sometimes the cow (Kalichman, in this case) wants to feed so badly (push papers) that the truth (the calf is not hungry, or cannot do a proper study, in this case) is only a minor issue.

Why does genomic dating terrify The Jerusalem Post /

The Jerusalem Post (a right-wing news site that used to be popular) published a very redacted coverage of our latest study of dating genomes using AI here. By now the study has been covered by dozens of news sites around the world and is ranked the top first or second most-read study in Cell Reports Methods. This is the shortest version ever published (about half the length of the original press release).

I am very curious to know what they are so afraid of. I assume that it is something along the line of dating (or misdating) skeletons in Israel, yet I am perplexed by the nature of the feared bias here (dating skeletons from the period of Moses to Abraham? or is it the other way around? Dating Jesus to the 21st century? Messing up the bible? How would that go?)

Seriously, do share, we are always looking up new ideas of groundbreaking research here, and I promise to share credit. I supposed that the Jpost editors don’t know either, otherwise, there would be some kind of warning, which is quite disappointing.

Between engaging with their small circle of readers and maintaining the party line – they went with a safe solution of publishing a redacted version of the original press release.

To clarify, The Jerusalem Post / should be concerned when science conflicts with their political ideology, it has nothing to do with this study though, but the next one.

As the old saying goes, when dating a mummy – don’t be late!

It is no secret that in dating, timing is everything. When studying the past, whether finding coins, bones, or pathogens buried in a mound – the question of when they are from makes the difference between a meaningful or a misleading discovery. Dating is so crucial that Willard Libby of the University of Chicago won the Nobel Prize in Chemistry in 1960 for developing radiocarbon dating that allows dating organic remains.

Unfortunately, radiocarbon dating (RD) has many shortcomings, although it is considered the golden standard in dating. The only other alternative is archeological dating, which is even worse as it is highly subjective. As of now, of the 6500 sequenced ancient genomes, 50% are RD, 40% are archeologically dated, and 10% are not dated. If only we could figure out WHEN someone is from just by looking at their DNA…

And this is where we come in. Our goal was to develop a third way, the first one in 80 years since the invention of RD – genomic dating. Simply put: DNA to age. Our method takes the DNA of any human and outputs their age. How?

We already know of DNA mutations that can tell WHERE someone was from, but is it possible that there are DNA mutations that can tell us WHEN someone is from?

To understand how, consider the LCT gene mutation, which allowed our ancestors to process lactose and increased rapidly since the Neolithic. Looking at two ancient genomes, without and with the mutation, we can date them before and after the Neolithic, respectively.

This may not seem very precise, but if we had thousands of such mutations, each with its range, we could be far more accurate, especially if we have an artificial intelligence algorithm working for us and matching the patterns of such mutations with those of well-dated genomes.

We developed the Temporal Population Structure (TPS) tool and used it to date 5,000 ancient and modern genomes, including nearly 500 families – the ultimate challenge of any dating algorithm. In dating families, it is not only critical to predicting the correct age of ancient family members but also to place all family members at the same time, not hundreds of years apart. TPS dated relatives within 17 years of each other more accurately than alternative dating methods. TPS is freely and publicly available in Dryad.

Presenting TPS

Our study was published this week in Cell Reports Methods.

The press release is here.

A popular article was published in Ancient Origins.

A YouTube video is available here.

Citation: Behnamian, S., Esposito, U., Holland, G., Alshehab, G., Dobre, A.M., Pirooznia, M., Brimacombe, C.S., and Elhaik, E. 2022. Temporal population structure, a genetic dating method for ancient Eurasian genomes from the past 10,000 years. Cell Reports Methods.

My Bronze Age Origins test – update (1)

In case that you missed my previous post, this is a quick reminder that my latest Bronze Age test has been available on GenePlaza for almost six months now and got mostly excellent feedback. You can read about the test here. You can look at the preview of the test here.

A screen caption from My Bronze Age Origins Test

Some people may still ponder over the use of ancient DNA in the Bronze Age test versus modern-day people, which all the other tests use, including GPS Origins. Does it really work? If it works so well, why doesn’t everyone do it? The following message that I got from an unidentified user who took the Advanced version of the test address the first question:

I just wanted to tell you how accurate your Bronze Age test is. I confirmed my connection to the San Nicolas Island culture. My grandmother always said that there is some Native American blood in our family, but my parents always dismissed it and then she died and I couldn’t get it verified with any other test. Too bad that I didn’t know that there are different tests with different ranges of dates before I spent money on other tests. I also found out that i am connected to east Asian cultures so it all makes sense and I am reading about it now! Thank you so much for helping me to reconnect with my people.

The San Nicolas Island culture represents one of the most diverse cultures of the Bronze Age test. Unlike modern DNA tests that include modern-day individuals that may have experienced admixture with other populations, the ancient populations preserved the original gene pool signature, in this case, the Amerindian one and, of course, the East Asian-Siberian gene pools. Americans may be familiar with the book Island of the Blue Dolphins; it was inspired by the story of the San Nicolas people.

As for the second question, the answer is simple. If companies reported the genetic similarity with ancient people, their loudest users would revolt. Some people don’t REALLY want to know where they are from. They want a confirmation of what they already know. They are even willing to pay a lot of money for this acknowledgment. Modern tests (except GPS Origins) are designed to make people happy, not to teach them their history. My guess is that large companies would continue avoiding ancient DNA.

Do you want to learn more about your ancient origins? I recently developed a DNA test, that compares your DNA to the DNA of the ancient Israelites and many other ancient populations using ancient DNA from real people who lived in the past. The tests also include a very detailed background on each culture. Check it out here  Just upload your DNA file and order a test or order a DNA test kit, if you never took a test before.

New paper: How NOT to apply supervised machine learning in evolutionary studies

Over the past five years, Schrider and Kern developed a series of evolutionary models and tools that aim to apply supervised machine learning to evolutionary studies. What is supervised machine learning? It’s a statistical tool that is trained on a dataset, say how 200 people write the numbers 0-9. Then, the tool learns to identify the number 0-9, and then the postal office can have a machine that sorts out the mail by zip code without a human reading it. So what is the problem in applying it to evolutionary studies? We don’t really have a dataset to train on… those events happened a very long time ago, predating even the current ancient DNA datasets. So, how Schrider and Kern were able to publish all their papers? That’s the question that we were set to investigate… from the title of the paper; you can guess that we are not the first to say that it’s a no-no, but our paper is the most read one in the (endless?) series of criticism on this work, and it comes with a cartoon! Who says science is not fun?

On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t

Eran Elhaik and Dan Graur

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut201734(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut201835(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt. View Full-Text

To read the full paper: