One More Use for DNA
Over the last decade, everything has become digital. We don’t capture images on film anymore, but in digital files. We don’t send letters, we send email messages. We don’t buy books, we download documents to an e-reader. Every organization has a website. Information is at our fingertips, but the whole system is extremely fragile.
The problems with our digital storage technologies are twofold. The data don’t last once they have been laid down and must be transferred to keep them fresh, while the technology for storage and reading keeps changing. An amusing example of this is NASA, which in the early 2000s, found that it was unable to access data from the space program of the 1960s and 1970s. So there they were, scouring internet auction sites to find second hand eight-inch floppy drives which could read their priceless data. Similar events of loss or near loss happen all the time. In 2009 when Yahoo! closed their GeoCities server, a huge amount of data was lost, perhaps “the most amount of history in the shortest amount of time, certainly on purpose, in living memory.” Nobody seemed to notice, but if these had been paper documents which were lost from a library, the outcry would have been anguished indeed. The take home lesson is that as a digital society, we need better systems to store and read data. In view of this, some scientists have turned their attention not to a new system, but to a tried and true system, much better than modern devices. Enter DNA to the discussion.
Inside every living cell, there are long strands of a molecule called DNA, which carry information. This information determines how the creature develops from a single cell, and how the creature will function when mature. This molecule, first described in 1953, consists of a chain of sugar molecules joined by phosphate groups. Each sugar molecule has attached to it one or other of four small nitrogen carrying molecules called nucleotides. The order of the nucleotides along the DNA chain, determines the information which the molecule carries. It is also a feature of DNA that it can be exactly copied endless times. This is because of the way the nucleotide bases fit together. DNA therefore is a system which stores and uses digital data.
Turning back to human technology, everyone agrees that we need a high density storage medium for data, one that can be preserved for long periods of time under easily achieved conditions, and one with a proven track record as a bearer of information. According to all these criteria, DNA is a proven winner. This molecule easily stores 2.2 million gigabytes of data per gram of DNA, the equivalent of about 468,000 DVDs in a tiny speck of material. This is superlative information storing capacity! The Large Hadron Collider in Switzerland, for example, generates about 15 million gigabytes of data per year. With this storage capacity of DNA, storage space should be no trouble at all! As to conditions required for best storage, it seems that dry storage at room temperature works perfectly fine. The molecules should remain stable for thousands of years, if required.
The remaining issue is what kind of promise does DNA provide for storing our digital recordings, images, text etc. And how do we extract the information once it has been stored in DNA? These are issues that occupy the attention of some scientists today. A further concern of course is the economics of the technique. In August 2012, three scientists/bioengineers at Harvard’s Medical School, published an account of how they stored the entire text (53,000 words and 11 digital images) of a genetics text in DNA code. An inkjet printer embedded the chemically synthesized DNA onto the surface of a tiny glass chip. Later they re-suspended the DNA in liquid, and fed it into a DNA sequencing machine after which a computer translated the coded information into English text. And there it was! The text of the book restored, with only an error rate of 2 errors per million bits. Since in digital code, there are 8 bits per character or letter, this translated into only a few single-letter typos in the whole book. Not too bad a record!
What the scientists did to convert the book’s text into DNA code, was to assign two of the 4 nucleotide choices in DNA to represent a 0 in the binary code, and the other two nucleotide choices to represent a 1. Then the English text was translated into binary code, and then into the equivalent in DNA code. To turn this into physical reality, the DNA code was then (metaphorically) chopped into very short blocks of code, with information added at the end of each short chunk to show where in the large scheme of things, this piece occurs. Everything was just theoretical to this point however.
There are commercial laboratories that are able to piece together (synthesize) short strands of DNA with a specific order of nucleotides . The next step then was to order the synthesizing of about 55,000 different short strands of DNA and to multiply each of these millions of times as well. These were then stored in dry form. Later, to recover the information, the DNA sample consisting of all these different strands, was sequenced and read by special machine/computer systems. It is evident that this is not a cheap process!
Then on February 7, 2012 an article in Nature reported on some improvements to the system. One of the main sources of error in retrieving the data in the first study, was when the computer failed to count repeating nucleotides. Thus in a list like TTTT, the computer might miss one of the repeats. As a result, a large team of scientists devised a coding system where there would be no repeats. The rules seem complex, but computers follow whatever rules are programmed into them.
The team of Goldman and others (including Ewan Birney, ENCODE’s lead analysis coordinator), stored 5 files in their DNA sample: all Shakespeare’s sonnets (in ASCII text), a medium resolution photograph (in JPEG2000 format), a 26-second recording (MP3 format), and a PDF of Watson and Crick’s original brief 1953 paper on the structure of DNA, as well as code which converted the data to base-3 digits (in ASCII text). In a cute stunt, they then shipped the dried material at ambient temperature (without any specialized packing), from the USA to Germany via the UK. In Germany the 117 nucleotide long strings of DNA were read by machine and converted back into their original formats.
This study involved only 739 kilobytes of data. However an interesting part of the discussion was economic. At commercial rates, the DNA storage method costs about $12,400 per megabyte stored and a further $220 per megabyte of data read from the system. In a world where the Large Hadron Collider, for example, generates about 15 billion megabytes per year, nobody is going to pay to store the data in DNA! However if one desired to store the data for an interval of between 600-5000 years, then this technique is economic even now, since the data do not have to be frequently transferred. If costs associated with DNA technology fall as expected, within a decade, it might be reasonable to store data for 50 years or more with this method.
The advantages of DNA data storage are its extremely high density, and easy storage requirements (on a shelf at room temperature) and of course its permanence. The disadvantages are the high cost of suitable computer/machine systems needed to encode the data and later to read it. In addition, accessing the data can be a slow process, depending on how fancy the machines are and how many are used. Also there is no random access memory in this system. One has to decode the whole file, and there is no modifying the data once it has been deposited.
So our technological society happily seeks to exploit a system which exhibits capacities far, far beyond our pitifully inadequate methods of data storage. A code that stores information never has, and never will develop by chance. This God-given system with DNA may help us protect some information for generations to come. Once again, our technological society borrows designs which God has provided to us in nature.
Subscribe to Dialogue