A Personal Quest for Happiness
The era of personal genomics is upon us! If you believe the hype.
My initial ambition was simply to find out from where our kids got their manifestly lovable quirks. In principle, the idea is simple enough: Sequence mom’s, dad’s, and the kids’ DNA. Then just sort out who got what from whom, and what bits they made up all on their own. It’s a fun science project for the whole family! I’ll cut to the chase, but only if you promise to keep reading. I didn’t get that far… yet. But it’s been a fun ride to that eventual destination, and, if you’ll bear with me, I’d like to take you where I’ve been.
Way back in 2014, I learned that it was possible for a simple layperson (i.e., not a scientist or a medical doctor) to obtain a copy of his/her own DNA. After a bit of searching, I found a company called GeneByGene which graciously provides this service to the general public. Sequencing a complete genome costs some $7000, which was a bit rich for me. I settled for just the exome, the part that codes for actual proteins. These are the genes that we hear so much about. The rest is just “junk” anyway, so I wasn’t about to pay for it. A couple weeks and $1300 later, I was scraping cheek cells onto a swab and sticking it in the mail.
Note that GeneByGene is not 23andMe. This service does not pretend to provide a predigested analysis for the average consumer. What you get is your DNA, not a laundry list of physical attributes that you possess. What you choose to do with this data is entirely your business. I would recommend getting it before the FDA puts your genome behind a wall of regulatory bureaucracy. I went into this adventure with my eyes open, expecting and hoping for some real data to analyze, and some real work to do.
I waited patiently for notification to arrive in my inbox that I had been successfully digitized. After the promised 10-week turnaround time had elapsed, I contacted customer service, who politely informed me that the analysis provided by their lab did not meet their quality standards, and I should probably send in a new sample. Maybe I’m not human after all, I thought to myself, hopefully. Maybe I’m a robot… or an alien! Which would be awesome, since I’d always suspected as much, but I’d go ahead and sent in the new sample anyway, just to be sure.
To my disappointment, I received my complete exome a few weeks later, without so much as a raised eyebrow from the good people at GeneByGene. It had been five months since my order initially had been placed. Just downloading this mass of data was a challenge, even for an IT professional with a good internet connection such as myself. I’m a deep and complicated person (so I’ve been told), and so the total size of my source code comes to some 50 GB. Even here in 2015, that’s a lot of stuff.
Not having been formally initiated to the field of bioinformatics, I was expecting to receive a file containing a concise list of genes. Instead, I was surprised to find instead raw output from the sequencing machine, intimately linked to the messy physical details of the inner workings of that machine. Such is the gap between biology and informatics. This forced me to understand the mechanics of DNA sequencing. The DNA that I had provided had been sliced up into small strings of nucleotides. Those strings were read by the machine, but in no particular order. What I received was a large set of all the reads of these short strings. Putting them together into a coherent whole is like assembling a giant puzzle, with millions of overlapping pieces. Fortunately, I had had the foresight to order (at an extra charge of $200) the file that assembles all these pieces. I had no idea how valuable this would be, and I thanked myself for having it.
To behold one’s one genetic makeup is an experience I highly recommend. To gaze for the first time upon one’s own makings is, in a certain sense, to come face to face with one’s maker. We are, after all, organic machines, and to be able to view the code by which this particular machine operates makes me feel connected to this material world in a way I never have before.
I had hoped that spending such a sum of money on this project would motivate me to do something interesting with this data. Having it in my hands, I was now forced to confront the awkward question of what my question should be. I decided to hunt for a gene I’d read about in the news, the so-called “happiness gene”. Of course, in characteristic medical fashion, the scientific literature prefers to refer to this gene as SLC6A4, and most of the analysis of its function focuses on depression, schizophrenia, OCD, and other such unpleasantness. Such is the nature of the disconnect between actual science and popular science.
I proceeded to install a long series of open-source tools in my quest to open the 8-GB file that contained my makings. On the edge of despair, I finally found one that worked: Ugene. I proceeded to locate my copy of the SLC6A4 gene, right where it should be, on chromosome 17. At this point, I was experiencing mixed feelings of being, on one hand, quite pleased with myself for having gotten this far, and, on the other, again disappointed to see that my version of SLC6A4 corresponded quite precisely to that found in the genome of the default human.
My next step was to find out exactly which versions, or alleles, of this gene were mine. Each chromosome comes in a pair (one each from mom and dad), but at this point in my bioinformatics education, I still don’t know how to make that distinction in the data I received. I had just one file for chromosome 17 (which I extracted), and it contained a single string of nucleotides, not two. Nonetheless, I persisted.
I spent an afternoon looking all over the internet for a database of alleles, all to no avail. I found a great many fabulous resources and searchable databases, but not what I expected. I began to believe that I was asking the wrong question, or that I had no idea what I was doing (a belief which I still hold). I posted a question to a forum, and got a single reply, which was just a link to yet another not-useful database.
Finally, I decided to work from an ostensibly seminal paper from way back in 2000, which details the alleles of this gene, or at least those that were known 15 years ago. I found myself straining my eyesight to read a low-resolution scan of a figure from this paper which showed the nucleotide sequences for these various alleles. Sure enough, I found them in my genome… mostly.
It appears that a certain critical segment of my DNA is missing from the data that I received. This is the crucial bit that would tell me if I have the “long” allele or the “short” one. In the popular press, the long type is associated with those people (surely you know some) who seem happy no matter what, whereas the short type is typical of grumpy old men. I thought I should check, if only to prove to my wife that, actually, despite all appearances, I am a happy camper.
My conclusion from this experience, so far, is that the current state of bioinformatics is far from the future promised by Gattaca. Being an IT specialist, I half expected that I would find standard, consistent, coherent datasets. Instead, I found confusion, contradiction, and chaos. Having worked for a long time in corporate environments, I’m used to this. I expected, however, more structure from the research community. I eagerly await the day when one can spit into a cup and obtain every bit of data that can be extracted from one’s DNA, not just the nature of one’s earwax. Until that day, my quest to find my personal happiness goes on.