Your DNA Is a File. Permanent & Irreversible One.

Aastha Thakker
Apr 23
7 min read

Walk into any crowded place, be it a mall, a train station, a college cafeteria. You immediately start seeing differences. Heights, skin tones, eye colors, the way someone carries themselves. We look wildly different from each other. And yet, if you pulled out a copy of literally anyone’s genome and placed it next to yours, you would find that more than 99.9% of it is identical. That sliver, one base change per thousand, explains almost everything that makes you look, age, and get sick the way you do. That is what genomics studies.

And here is the thing that should immediately grab the attention of anyone in cybersecurity: genomic data is the only data type where a breach does not just hurt the person who was breached. It reaches backwards and forwards in time, implicating parents, siblings, and children who never agreed to anything. You can change your password. You can get a new credit card. You cannot change your genome. Not now, not ever.

So, What Actually Is Genomic Data?

Your genome is the complete set of DNA instructions packed inside almost every cell in your body, roughly 3.2 billion base pairs of A, T, G, and C. When a lab sequences your genome, what they are producing is a digital file. Formats like FASTQ, VCF, or BAM. These files can run several gigabytes and contain information about your eye color, how your body processes medications, whether you carry variants linked to breast cancer, early-onset Alzheimer’s, or dozens of other conditions.

The data arrives from multiple sources:

Whole Genome Sequencing (WGS): the full read, all 3.2 billion base pairs
Whole Exome Sequencing (WES): just the protein-coding regions, roughly 1–2% of the genome
SNP Arrays: what 23andMe and AncestryDNA use, scanning thousands of known variation points
Clinical Sequencing: hospital-grade, focused on diagnosing a specific condition

Why It Matters? Including a Pandemic You Already Survived

Genomics is not theoretical. It is actively reshaping how diseases are diagnosed, how drugs are chosen, how outbreaks are controlled, and how we grow food. Here is the most visceral example most of us have lived through:

COVID-19: A 30,000-Letter File That Changed Everything

In January 2020, researchers in China sequenced SARS-CoV-2 and publicly shared its genome within days of confirming the outbreak. The file was roughly 30,000 base pairs. That single act of sharing unlocked something extraordinary, vaccine developers around the world immediately began designing candidates. The Moderna and Pfizer-BioNTech mRNA vaccines were built targeting one specific genomic feature: the spike protein gene. The sequence was the starting gun (Topol, E. J. (2023). Genomic epidemiology as a public health tool. Nature Medicine).

Without that early genomic share, the vaccine development timeline would have been measured in years, not months. The same genomic surveillance infrastructure then tracked every variant i.e. Delta, Omicron, and beyond (nvm this still gives me goosebumps) giving public health systems real-time intelligence on how the virus was evolving. That is what genomics looks like when it works exactly as intended.

Beyond COVID

In oncology, the shift away from one-size-fits-all chemotherapy toward treatments guided by a patient’s specific tumour genetics is already underway. Pharmacogenomics, the science of how your genes affect how you respond to drugs, is being used in psychiatry and oncology to avoid dangerous side effects and pick treatments that are actually likely to work for a specific person.

Genomics even extends beyond medicine. Agricultural scientists use it to develop crops that can survive drought. Environmental researchers detect pathogens in wastewater before outbreaks reach hospitals. The WHO’s One Health framework treats human, animal, and environmental health as one interconnected system, with genomics as the shared tool running through all three.

The 23andMe Attack: No Zero-Days. No Fancy Malware. Just Reused Passwords.

In 2023, 23andMe, at the time one of the most trusted names in consumer genomics, had genetic data for 6.9 million people exposed. The part that most writeups skip past: their systems were never directly breached. What actually happened is more instructive, and more embarrassing, than a sophisticated hack.

How It Started: Five Months Nobody Noticed

Starting April 29, 2023, a threat actor operating under the handle “Golem” began running a credential stuffing campaign against 23andMe’s login page. Credential stuffing is exactly what it sounds like: take username and password pairs leaked from completely unrelated breaches at other companies, automate login attempts against a new target, and wait for the hits. It works because people reuse passwords. The attacker had no zero-days. No novel exploit. Just dark web combo lists, bulk compilations of previously breached credentials and an automated tool [23andMe Data Breach: Analyzing Credential Stuffing Attacks, Security Vulnerabilities, and Mitigation Strategies].

23andMe had no effective rate limiting. No screening of passwords against known-compromised credential databases. Two-factor authentication existed but was optional & the vast majority of users had not turned it on. The minimum password requirement was eight characters with minimal complexity. That was the entire defence [Lessons from the 23andMe Breach and NIST SP 800–63B | Enzoic].

In July 2023, 23andMe’s systems did flag something: an unusual spike of around 400 attempted account transfers. The team investigated. They called it an isolated incident and moved on. It was not isolated. A second, heavier wave of stuffing hit in September. The company only launched a full investigation in October 2023 when an employee found the stolen data being sold on BreachForums. The attack had been running for five months before anyone looked properly [23andMe Data Breach: What Was Exposed, Who Was Affected, and What Happens to Your DNA Now | Security.org].

How 14,000 Accounts Became 6.9 million Exposed Profiles

This is the part that every security engineer building platforms with social features should study carefully. Attackers directly compromised around 14,000 accounts, less than 0.1% of 23andMe’s roughly 14 million customers. A small, boring number. But 23andMe offered an opt-in feature called DNA Relatives, which let users share profile data with others matched as genetic relatives. Once inside those 14,000 accounts, the attacker scraped every DNA Relatives profile those accounts were connected to, cascading outward to expose 5.5 million additional profiles. The Family Tree feature added 1.4 million more.

One opt-in feature, connected across a social graph built on biological relationships, turned 14,000 accounts into 6.9 million exposed people. That is not a vulnerability in a traditional sense. There was no bug. The feature worked exactly as designed. The attack surface was the design itself.

The Bill: From $1 Million to Bankruptcy

23andMe’s initial incident response estimate: $1–2 million. Forensics, legal fees, user notifications, third-party advisors. That number was almost immediately overtaken by what came next. Multiple class action lawsuits hit, accusing the company of negligence and delayed notification.
Settlement: $30 million for affected U.S. customers, plus five years of free identity theft protection and genetic anomaly detection services. Then, in March 2025, 23andMe filed for Chapter 11 bankruptcy with the 2023 breach explicitly cited as a contributing factor. A company that once carried a multi-billion-dollar valuation collapsed, in part, because of a credential stuffing attack that required zero technical sophistication to execute.

The attacker, for reference, was selling individual genetic profiles for $1 to $10 per record on dark web forums. The asymmetry is hard to overstate a company partially destroyed by someone who probably cleared a few thousand dollars.

And the worst part:Unlike stolen credit card numbers or Social Security numbers, this data cannot be recalled, reissued, or changed. Every person whose genomic data is floating on dark web forums right now carries that exposure permanently. So do their biological relatives — who never used 23andMe at all.

It Is Not Just One Company. Everyone Has Skin in This Game.

Laboratories and Clinical Institutions

Clinical labs generate and handle genomic data every day, frequently through systems that predate modern threat models by a decade or more. The 2023 ransomware attack on Change Healthcare, which disrupted prescription processing for millions of Americans and cost UnitedHealth Group over $870 million in direct response costs, showed exactly how high value this sector is to attackers. Genomic data raises those stakes further. Billing records can be corrected. Genomic sequences cannot be revoked.

Individual Users

When you mail in a spit kit, you are signing a term of service document that most people never read and that may allow your data to be shared with pharmaceutical research partners, passed to law enforcement under a legal compulsion order, or included in research datasets without further individual consent. The FTC has started scrutinising DTC genomics companies more closely, but enforcement is patchy and the legal framework has real gaps, particularly for companies that do not fall cleanly under HIPAA.

Researchers

Academic genomics runs on data sharing. The science only works at scale. But open sharing is in direct tension with privacy requirements and most institutions are still using data-sharing models that were not designed with adversarial actors in mind. Federated learning and secure multi-party computation are being researched as alternatives, letting analysis run across distributed data without centralising raw sequences. Adoption is slow.

Nations

Genomic databases have quietly crossed a threshold. They are no longer just healthcare infrastructure, several governments now classify them as strategic assets, in the same category as satellite networks, energy grids, and weapons stockpiles. The logic is straightforward once you see it: a foreign government that holds the genomic profiles of your soldiers, intelligence officers, and their families has a form of leverage that no other stolen dataset can provide. You cannot rotate biology. You cannot patch a genome. The exposure does not degrade over time, it compounds, because the children of those individuals will carry the same biological signatures.

Genomics is one of the most genuinely exciting fields in science right now. The ability to sequence a virus in days and begin vaccine development within weeks, to look at a tumour’s genome and choose the drug most likely to work for that specific patient, to catch a disease before a single symptom appears, these are real things happening right now (infact since long) , not future promises.

But the same properties that make genomic data so scientifically powerful, its permanence, its specificity, its biological reach into families make a breach of it unlike any other breach on the threat landscape. Genomic data is not just personal data. It is inherited risk.

See ya next Thursday! Till then wish me luck for exaammmsss :/ Honestly, much needed Again!!!

Aastha Thakker