Could We Store Our Data in DNA?

Could We Store Our Data in DNA? | line4k – The Ultimate IPTV Experience – Watch Anytime, Anywhere

Streaming Service Promotion

Ready for uninterrupted streaming? Visit us for exclusive deals!
netflix youtubetv starzplay skysport showtime primevideo appletv amc beinsport disney discovery hbo global fubotv
netflix youtubetv starzplay skysport showtime primevideo appletv amc beinsport disney discovery hbo global fubotv

A zettabyte is a trillion gigabytes. That’s a lot—but, according to one estimate, humanity will produce a hundred and eighty zettabytes of digital data this year. It all adds up: PowerPoints and selfies; video captured by cameras; electronic health records; data retrieved from smart devices or collected by telescopes and particle accelerators; backups, and backups of the backups. Where should it all go, and how much of it should be kept, and for how long? These questions vex the computer scientists who manage the world’s storage. For them, the cloud isn’t nebulous but a physical system that must be built, paid for, and maintained.

Storage experts speak of a data-temperature scale. On one end, there is “hot” data—Wikipedia or your bank balance—which needs to appear on your screen almost instantly. On the other, there is “cold” data, which might be minutes or even days from your fingertips. The “warm” data in the middle, such as your old photos, can take a few seconds to retrieve. Most data is cold, and a lot of it could probably be erased without consequence. Yet some of it might one day prove critical—say, in a criminal case—and its potential value means that much of it must be preserved, intact, for uncertain lengths of time.

One of the most popular mediums for cold-data storage is magnetic tape. Invented in the nineteen-twenties, it has steadily improved, doubling in capacity every couple of years. The company Quantum, a leader in archival technology, sells tape libraries that are like jukeboxes the size of shipping containers. Inside them, a little robot retrieves data by finding the tapes, which are housed in VHS-like cassettes, and plugging them into drives so that they can be read. “There’s thousands of Quantum robots in the cloud right now, moving your data around,” Eric Bassier, who worked at Quantum for more than sixteen years, told me.

Tape usage increases each year, thanks in part to the hunger of data hoarders like Google. But a year’s worth of humanity’s data, on modern-day magnetic tape, would fill thirty thousand shipping containers. Meanwhile, tapes and drives degrade over time. Tape Ark, an Australian company, helps retrieve data from damaged tape; its C.E.O., Guy Holmes, described rescuing measurements of lunar dust that had been beamed back from the moon after the Apollo missions. He also showed me a video of old tape disintegrating as it moved inside a drive. “These little black specks that you see here on the left of the screen—those are Word documents and Excel spreadsheets that have fallen off the tape because it has become so brittle,” he said.

Magnetic tape may seem like an antiquated technology. And yet some researchers looking to replace it have begun gravitating to an even more ancient alternative. Billions of years ago, evolution stumbled upon DNA as a storage medium. There would be several advantages to translating a computer’s ones and zeros into the bases of genetic material (A, C, T, and G). First, at its theoretical limit, molecules of DNA could store up to a billion gigabytes per cubic millimetre—a density level that would make it possible to fit a shipping-container’s worth of tapes into the volume of a few sesame seeds. Second, properly prepared strands of DNA can reliably last thousands of years: the oldest extant DNA sample is two million years old and is still readable. And, finally, DNA won’t grow obsolete. Because of its importance in the life sciences—and in the functioning of our own bodies—we’ll likely always have the tools to read what we’ve written.

The Soviet physicist Mikhail Samoilovich Neiman proposed the idea of using DNA to store data in 1964, about a decade after the double helix was first mapped by James Watson, Francis Crick, and Rosalind Franklin. But building an actual DNA storage system has proved complicated. First, scientists have to decide how to mathematically encode zeros and ones into DNA’s bases. (There are many options.) Then they have to manufacture chains of those bases on demand. Next, they have to safely store, retrieve, and read those chains, and finally translate them back into bits. The first demonstration of the technology took place in 1988, when Joe Davis, an artist, created a stick figure that he called Microvenus. Davis used an encoding scheme to translate the image, which was five pixels by seven, into a sequence of eighteen bases. With the help of a Harvard lab, he inserted the DNA into E. coli bacteria, which could maintain and replicate the message. The researchers succeeded in reading it back two years later. In 2007, another group performed a similar feat, encoding “E=mc^2 1905!” into a bacterial genome.

In 2010, the biologist Craig Venter, who played a key role in sequencing the human genome, worked with colleagues to create a synthetic bacterial genome, which they “watermarked,” encoding text that included their own names and quotes from James Joyce and Richard Feynman. Before they published their paper, in Science, one of its reviewers, the groundbreaking Harvard geneticist George Church, playfully sent his comments to the article’s editor encoded in DNA. That experience piqued Church’s interest, and, in 2012, he and two colleagues successfully stored around six hundred and fifty kilobytes of data in DNA—about seven hundred times the previous record. Their data contained a computer program and a draft of Church’s book “Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves.” On “The Colbert Report,” Church handed Stephen Colbert a dot of DNA containing twenty million copies of his book; Colbert pretended to try to eat it.

In 2018, Microsoft said in a paper that it had stored two hundred megabytes of data in DNA, including a music video, a database of seeds in the Svalbard Global Seed Vault, and the “Universal Declaration of Human Rights” in more than a hundred languages. “Every I.T. company has storage challenges,” Karin Strauss, one of the paper’s senior authors, told me; the researchers wondered if DNA storage might offer a practical solution. Their work incorporated a form of error correction and a type of random-access memory (RAM). If you want to find the encyclopedia entry for “zebra,” you don’t want to have to scan through the whole alphabet; you want to jump straight to “Z.” The team enabled this by including, in their DNA, sequences of bases that functioned as I.D. tags.

The technology suddenly seemed practicable. The Intelligence Advanced Research Projects Activity (IARPA) soon launched the Molecular Information Storage (MIST) program and awarded fifty million dollars in grants to develop the technology further. In 2020, Microsoft and other companies founded the DNA Data Storage Alliance. “We think, over probably the next decade, tape is the way to go,” Bassier, the former Quantum employee, told me. “Then we do think DNA data storage has a lot of viability long-term.”

One of the biggest challenges of DNA storage is the actual manufacturing of DNA, known as synthesis. The most common method is slow: it adds bases one at a time. Imagine a single typist entering data letter by letter; to up the speed, you’d want to employ many typists who can work in tandem. In preparation for their 2018 paper, the Microsoft researchers ordered their DNA from a company called Twist, which had developed a silicon chip that has about the same area as a paperback. It is capable of constructing a million different sequences of DNA at the same time. Twist is now working on a chip that can code three orders of magnitude more data, according to Emily Leproust, the company’s C.E.O. and co-founder. The goal is to write DNA at terrific speeds and on a vast scale.

In 2022, I visited Catalog, a startup in Boston that’s pursuing a different approach to DNA writing. In a large space in the former Schrafft’s Candy Factory, Catalog has built a machine it calls Shannon, after Claude Shannon, an early innovator of information theory. The version of Shannon I saw looked like a high-tech stainless-steel printing press; the company is now finalizing a commercial version that’s the size of a large photo booth. As I watched, hundreds of inkjet nozzles deposited tiny droplets full of bases onto a long sheet of clear plastic, which was moving from one end to the other. The bases had been connected together in units called oligos, which are more like words or sentences than letters. Shannon printed collections of them, then added an enzyme that bonded them together into the equivalent of paragraphs. The sheet zigzagged through an incubation chamber, then passed a tool that squeegeed droplets of DNA into a vial—the data archive. It was like a hard drive, in liquid form.

I held a plastic sheet on which the droplets had been allowed to dry, instead of being collected. It had a slight orange tint from an added dye. Looking closer, I saw thousands of tiny dots. In another nearby lab, Hyunjun Park, Catalog’s C.E.O., handed me a small vial containing a droplet of fluid, which held many copies of eight Shakespeare plays. Perhaps the future of data was not a data center, with its humming servers and blinking lights, but a wet lab with beakers and an emergency shower.

Catalog’s system is a mechanical challenge, but also a mathematical one; the encoding scheme that the company uses is not exactly intuitive. Swapnil Bhatia, a Catalog engineer, spent an hour at a whiteboard helping me almost understand the basics. The system, I learned, could use hundreds of bases just to represent a single bit of information—but what it lost in data density it gained in writing speed and cheapness. So far, so good. But then Bhatia moved on to a more complex topic. A DNA-based computer might be able to perform calculations, but with data stored in vials.

Bhatia explained a simple form of processing: searching through text for a word. This could be done chemically, without translating the bases back into bits. It’s possible that other kinds of computation—for example, comparing databases or finding patterns in radio signals—could be performed using data in DNA form, requiring much less energy than an equivalent operation on a silicon-based supercomputer. “I just think of DNA as, like, nature’s data structure,” Bhatia said. “We’re just borrowing.” I imagined the cells in my body not as the components of organs but as a form of information processing that blurred the line between chemistry and computing. The brain can be described as thinking meat—but so can the rest of us.

In the right conditions, DNA can last for millennia; in the wrong ones, it degrades. An easy protective step is to embed the DNA in a compound that isolates it from water, oxygen, radiation, enzymes, microbes, and the like; the compound can then be dissolved later. Or you can dehydrate the DNA into powder and stash it in vacuum-sealed steel capsules. (In January, Catalog and Asimov Press released an anthology of essays and science fiction as both a paper volume and a capsule of dried DNA—the first commercial publication of its kind.) Dried DNA appears to have a long shelf life. Last September, researchers from Microsoft and elsewhere reported that they had placed two DNA-encoded files—a world map and an image of a space shuttle—into a particle accelerator. The DNA was bombarded with as much neutron radiation as it would encounter if it sat in New York City for 4.4 million years. The files remained intact.

A startup called Cache DNA uses another approach: storing DNA in tiny clear spheres. Cache grew out of the lab of Mark Bathe, a biological engineer at M.I.T. At first, Bathe and his team placed their DNA “files” inside silica beads that were about a tenth of the width of a human hair. (They’ve since learned how to use polymers, which are safer and more convenient.) Bathe’s lab also took the step of attaching single-strand DNA “barcodes” to the outside of each sphere. Beads containing images of a tabby cat had labels representing “cat,” “orange,” and “domestic”; beads containing tigers had “cat,” “orange,” and “wild.” The team could distinguish one image from another by using chemicals that made only certain labels glow.

At M.I.T., Bathe and one of his collaborators, Joseph Berleant, showed me some stored DNA in a lab. Berleant handed me two small vials. One had capsules containing images of lions, tigers, and house cats. The other had other images—an airplane, some fruits, and so on. He’d added fluorescent cat “probes” to each vial, let them sit overnight, and then centrifuged out the “unbound” probes, which hadn’t attached to beads.

We put on tinted glasses and he held the two vials over a special light. The cat vial, but not the other one, glowed pink. It was possible to imagine practical uses for this kind of tagging technology; James Banal, Cache’s co-founder, suggested that, during a pandemic, airport officials could tag viral RNA from nasal swabs with the ages of passengers and the flights they’d taken. Later, scientists could search for the RNA from a new variant and trace it back to its source. Last year, the team demonstrated a model of this system.

There are two ways of imagining the future of DNA data storage. One is to picture it like today’s storage systems, only denser, wetter, and hardier. David A. Markowitz, who launched IARPA’s MIST program, envisioned a system that can—in a day and for a thousand dollars—write a terabyte of data, randomly access and read ten terabytes of data, and fit on a table, in the near future. It’s a “big swing,” he said. Meanwhile, the DNA Data Storage Alliance seeks to conduct market research, educate the public, and set technical specifications so that DNA archives will be interoperable. (They want to avoid standoffs like what happened between Blu-ray and high-definition DVD.) Strauss, of Microsoft, told me that she can imagine the company utilizing DNA for its cloud services.

Another way of picturing DNA storage is as a fundamental reimagining of data—one that will open up new possibilities by allowing information to exist in new places. Bathe imagines watermarking medicine to trace pills; Church, the geneticist, has developed methods that could allow cells to record data in their so-called “junk DNA”—the material that sits between genes and makes up the majority of a genome. (Cells know not to try to turn their junk DNA into proteins.) Such a system might act as a “flight recorder,” Church told me, which means that data about the body’s functioning could be recovered in the case of a heart attack or cancer. Perhaps, he said, visual data could be deposited in the retinal cells of a fly, “turning an insect into a video camera.” Maybe molecular computers, of the sort that other researchers are developing, would write the data into the cells.

Could we write data into our genomes, passing it on when we have children? Some scientists, including Francis Crick, have speculated that aliens or ancient civilizations might have inserted messages into the junk DNA of humans or other animals. In 1999, the computer scientist Jaron Lanier imagined a time capsule that could preserve human knowledge by inserting it into cockroaches’ genomes. Let loose in Manhattan, the time capsule would be “easy to locate, impossible to destroy,” he wrote. Bathe told me that we could preserve a record of our accomplishments in DNA, then scatter it around our solar system.

There’s a sense in which the DNA in our bodies never forgets. Even though it mutates and recombines, we can still track its lineage back billions of years. What would it mean for society if we harnessed DNA to store everything forever? Today, we find archeological remnants of earlier civilizations—tools, tablets, monuments—and use those to guess at what it was like to be them. But, in another couple of decades, we might use biology to store every pixel from every camera, every datum from every scientific observation, every thought, statistic, or transaction.

Whether that sounds utopian or dystopian, a great deal of human life could be immortalized in a DNA cloud—or lake. The data won’t pile up like copies of The New Yorker; instead, through chemical computing, the information will be finely searchable and analyzable. The double helix, which evolved to preserve the best of what nature has to offer, will be conscripted to preserve the best that we have to offer—and the worst, and everything in between. ♦

Premium IPTV Experience with line4k

Experience the ultimate entertainment with our premium IPTV service. Watch your favorite channels, movies, and sports events in stunning 4K quality. Enjoy seamless streaming with zero buffering and access to over 10,000+ channels worldwide.

Live Sports & Events in 4K Quality
24/7 Customer Support
Multi-device Compatibility
Start Streaming Now
Sports Channels


line4k

Premium IPTV Experience • 28,000+ Channels • 4K Quality


28,000+

Live Channels


140,000+

Movies & Shows


99.9%

Uptime

Start Streaming Today

Experience premium entertainment with our special trial offer


Get Started Now

Scroll to Top