DNA: Unravelling God’s data storage medium

BY PETER WANYONYI

How come a leopard cub is born with spots and not with stripes? Why are tiger cubs born with stripes rather than spots? After all, both species are members of the Genus Panthera, and can actually cross-breed and hybridise with each other, and with careful crossing can produce fertile offspring. What about humans? An African father and an African mother will never produce a half-Chinese baby. Yet we are all humans and can interbreed easily. What drives these breeding results?

The answer is, of course, DNA. Deoxyribonucleic acid, perhaps the most miraculous thing that occurs in nature. DNA carries the genetic instructions needed for reproduction, development, growth, and functioning of living things. A typical DNA molecule usually consists of two strands coiled around each other in a double helix shape. Each strand is made up of smaller molecules called nucleotides, and each nucleotide is composed of one of four nitrogen-containing compounds called nucleobases: Cytosine, Guanine, Adenine, and Thymine, (abbreviated, respectively, as C, G, A, T), alongside a sugar and a phosphate group.

Now, the reader no doubt is aware that computers use a binary code to store digital data: a two-bit code represented by the numbers 1 and 0. DNA is a four-bit (quaternary) code, so data stored in DNA, such as the genetic information in our cells, is stored using four bits per item. This is where it gets interesting: the human body has about 3 trillion cells. Each cell contains 6 billion letters of DNA, which are various combinations of the nucleobases C, G, A, T. When calculated, the total DNA in one human being lined up end to end would stretch from earth to the sun and back – over 100 times.

The average computing person is familiar with basic data storage units: a bit is a binary digit that is either 1 or 0, and which stores a given state (such as, 1 for “on” and 0 for “off”). 8 bits make up one byte. 1000 bytes = one megabyte (MB). 1000 MB = 1 gigabyte (GB). 1000 GB = 1 terabyte (TB), 1000 TB = 1 petabyte (PB), 1000 PB = 1 1 exabyte (EB), and 1000 EB = 1 zettabyte (ZB). 

One gram of human or animal DNA can, using the most basic compression technologies today, store 25 PB of data. That’s incredible storage potential. It means, for example, 200 million high definition DVDs can be stored in a ball of DNA the size of a golf ball. Using better compression technologies, one can store up to 214 PB per gram of DNA – which means all the information ever generated and recorded by humankind could be stored in a room measuring 4m x 4m x 2m.

Even more mindboggling: unlike current storage media, DNA storage would last hundreds of thousands of years. That’s how genes passed down from some doozy great-great-great-great-grandparent of your great-great-great-great-grandfather still find their way down to you and your grandkids. And it has been proved that data stored in DNA will still be a perfect copy of the original even after being retrieved 1000 million million times.

But how would the data be stored in DNA, where would the DNA come from to store data in? No one wants to be a walking hard disk, after all. Easy: bacteria. Bacteria have genetic information in them, in the form of tiny circular rings of double-stranded DNA called plasmids. This data is routinely transferred from one cell to the other using a process called conjugation (hence, “conjugal” rights). The plasmids can be used to store data in bacteria that are held in a given location. To retrieve the data, one sends special, genetically-engineered and mobile (“motile”) bacteria to this site, where they conjugate with the trapped bacteria and capture the data-carrying plasmids. The motile bacteria then carry this information to a device that extracts the plasmids and reads the data they carry. A type of global positioning system that works at molecular level (using special chemical “beams” to attract bacteria to a given location, from where their data can be read) is used to the location of bacteria: three chemical beams will triangulate a given set of bacteria as accurately as GPS locates a signal on the earth today, but at much finer detail at molecular level.

This technology is still in its infancy, and it might conjure up some uncomfortable memories for those who have watched sci-fi movies like The Matrix: the possibility that some dictator could gain power and then imprison human DNA slaves in giant vats where they are used as little more than data storage beings.

As with all new technologies, DNA data storage is yet to mature, and will take time to come to fruition. There are other challenges – bacteria are notoriously slow at moving, so data transfer rates are still very slow. But the development of conducting fluids and the use of molecular GPS systems promises the potential of super-fast nano-networks of massive DNA storage bacteria pools, and the applications of this sort of technology would be absolutely astonishing. Humans, it seems, may be on the verge of cracking God’s own data storage medium.   

The author is an information systems professional.

Sign Up