Just a quick note that the weigh in is a bit delayed today as I am transferring the database to a new install. I hope to post tonight!
To keep in tune with yesterdays theme of Bioinformatics and big data, today I was looking at the growth of our lab, in particular the amount of sequencing preformed at our facility.
The HiSeq2500 is illumina’s major workhorse, and perhaps the most versatile and widely used next generation sequencing platform used in the industry. Running this machine can cost between $10,000 to $30,000 per use, depending on type of sequencing being performed and the “depth” of sequencing we are looking to do. Depth of sequencing typically refers to how much we look at a particular part of the genome. For most experiments, we like to look at each region about 30 times . . . which gives us some degree of confidence about the underlying sequencing at a given spot.
The number of times we run the sequencer has steadily increased over the past few years, though the 2009 time point is a little bit misleading, since we received the instrument in November and were getting trained (and technically it was the GAIIx sequencer). This steady production of data has lead to that data management issue that I brought up yesterday. It also has led us to many cool projects to work on too.
More and more, newspapers and articles use catchy headlines about personalized medicine: using a persons underlying DNA sequence to inform a physician about the best treatment path. The more I’ve worked in the field, the more I truly believe this is rapidly approaching reality. Most articles kind of summarize a whole lot of really very difficult and technical things, I believe we are getting closer and closer to wide-spread adoption of NGS based treatment paths.
In our lab alone, we are initiating projects to profile hundreds of Western New Yorkers for all sorts of different things, looking at their genome, epigenome (dna structure), and microbiome (bacterial profile). This information, while insanely difficult to process and understand, will give us fantastic insights into disease development.
The more we sequence, the more we learn and the more we can improve our algorithms and data analysis pipelines. The growth has been exceptional, and I am excited to see the continued adoption of this technology.
Today I was asked to create a data retention policy that manages ~60TB of our generated sequencing data. This data is both in-house and customer, collaborative and cross-institutional, and generated at our facility and elsewhere. No singular policy can surely cover all of these things, but nevertheless, I have to formulate something that will work.
The root of the issue is the cost of long-term storage versus the onslaught of data being generated. In an ideal world we could continue to throw resources into hosting these enormous genomic datasets. In reality, the cost of powering, and cooling our massive data storage arrays eventually far outweighs the cost of just sequencing again.
A typical sequencing project from raw DNA to FASTQ (Fast A/Quality File). This file looks something like this:
The first line tells me a lot about where this sequence came from. It tells me that it was sequenced on UB-NGS-01 (our primary sequencer here at UB), on Flowcell 1, as well as the location that this sample was loaded onto the machine. Line two is the actual DNA sequence. At this stage, I have no idea where his DNA belongs to in the genome, but after some processing, I can figure that out.
Storing these FASTQ files can take up quite a lot of space. Even compressed, we are currently sitting on 30TB of FASTQ files from all of the experiments generated at our facility. These are the most essential piece of raw data, as from this all downstream analysis can be preformed.
So back to my problem, how long do I really need to keep these FASTQ files for? Should I trust the average researcher to be responsible and hold on to their data. Do I care if they lose it? There is a delicate balance of being a safety net against catastrophic hard drive crashes, and over-protecting to the fault of significant economic loss.
It is an exciting time to be working in the field of Bioinformatics, but there are very real issues with regards to so called “BigData”, and some tough decisions are going to have to be made about the long term value and retention of what we’ve produced thus far.
I got a gym membership at the local YMCA last weekend, and so far this week I have gone every day. I’ve been swimming, trying to get my lap times back to where they were when I was swimming every day a couple years ago. I’ve also been spending some time on the stationary bike, because as my foot is still recovering, I thought it would be good to turn up the resistance and building some strength in my legs until I can run again. The one thing I have learned this week is that the stationary bike is boring. Like, really boring.
Yesterday after my swim I was so bored by the idea of sitting on the stationary bike, that I decided to play with fire and walk on the treadmill. After about three minutes of walking on the treadmill, I turned up the speed and started jogging. I figured if my foot started to hurt, I would just stop it, and then I would know that it’s not ready yet. I jogged for 10 minutes, and my foot felt just fine! I walked after that, because I am out of shape, but I’m really excited to know that my foot is definitely ready to try running again.
I’m going to start slow, maybe intervals of walking and jogging until I am confident that my foot isn’t going to be mad at me. But after 6 weeks off my foot, I have been going stir crazy. Time to run! Time to train! Time to set a goal and work towards it. I’m beyond excited and ready for the challenge of working back up to running a 5k, 10k, and even that half marathon.
I’m going to go out for a run this afternoon and see what I can do. It would be awesome to be able to run a Turkey Trot this Thanksgiving. Who’s with me?
Here at 2 Fat Nerds each Wednesday is a Weigh-In day, where all of the data from the previous week is collected, displayed, and analyzed. If anyone is interested in taking part in weekly tracking, leave a comment!
Each week, members who have opted to have their weights publicly displayed for the world to see have them graphed on the main blog. It is a great way to keep track over time, and it gives a bit more accountability to your workouts. If you are interested, leave a comment!
Anti-fat points (AFP) is a system designed to give some sort of value to each and every workout, regardless of type. It is based on metabolic-equivalent values out of the compendium of physical activities. Submit a workout, get AFP. Each week we post the totals for each person for the week. In addition, you may see the monthly AFP leader-board on the right-hand sidebar. This is meant to be a fun way to encourage competition to see who can earn the most points!
The threshold is back to 100 AFP, Green is over and Red is under. Try to get 100 AFP and turn your bar green!
To put more emphasis on current progress and current streaks, I am now only displaying the current active streak for each player. It is calculated the same as before, minimum three days a week. Each consecutive week you gain a link.
I too returned to the gym this week, much like my mother and sister. It has been a long time since I’ve worked out with any regularity to a multitude of reasons, and it was finally time. Despite having to return back into the house several times for headphones, gym cards, keys and every other possible thing we forgot to grab in order to work out, we finally made it.
On our way, I realized that Monday Night Football on ESPN was on. I was very excited, because last year we cut the cable-cord on our expensive packages for basic broadcast tier. I.E – No ESPN/ NFL Network / MSG for Sabres games. I love watching hockey while working out, the amount of action in the game naturally captures my attention. I thought Football would have been the same way, but boy was I mistaken.
After fifteen minutes on the treadmill, I was absolutely blown away at just how boring football is. Even watching Fitzpatrick toss touchdowns and J.J Watt dominating, I grew bored with the constant commercials and the 2 minutes in between plays. I guess I never realized just how slow watching football really is.
After a solid half-hour on the treadmill, I hoped off and spent a while stretching my ever-so-sore left hip area. Football was perfect for this activity . . . low and slow.
Considering the popularity of the NFL in this country, I wish there were ways to speed up the game. Either being less strict on penalties, lowering the play-clock, not taking television time outs, or whatever . . . something should be shaken up to make it a little bit more exciting.
Maybe I’ll start participating in the football workout program. That’s a guaranteed 4 hours+ of working out.
Last week I got to watch my Dad run a Half Marathon. I never, in my wildest dreams, would have imagined my family would turn into half-marathon-running superstars. It’s amazing! Being in the stands at the finish line at the Half was bittersweet. I was incredibly happy to be there, given my recent move half way across the country. I was excited to watch my dad finish the race and I was proud to hold up the sign that my mom and I made. All the same, it was hard to be there because I had my bib for the same race in my handbag. My bib that didn’t cross the start line, and would later prompt an email from the Boston Athletics Association asking me to complete a survey about why I didn’t run.
Due to a hairline fracture in my left foot, I was unable to run the B.A.A Half Marathon with my dad. Gracie, who trained her heart out and was really the most ready to run, came down with the flu the night before the race, and decided it was in her health’s best interest to sit that one out. The Distance Medley that the three of us started came down to my dad, running 13.1 for himself and for all of us. It was a difficult day, but despite my disappointment at my lack of participation, attending those events is always inspiring. I left Franklin Park in Boston that day knowing I will run a half marathon. (Although perhaps not as soon as Gracie, she signed up for a race this coming weekend in Ashland, MA. If you’re in the area, go cheer her on!)
When I got back to Michigan I sat on the couch for 5 days, and then went out and got a gym membership. I’ve had a hard time being active since moving. My schedule has been disrupted, and my lack of a full time job (combined with my foot injury) has shifted my priorities from fitness to cover letters. But listening to myself make excuses reminds me that motivating yourself to get active is never easy. Maintaining fitness is just as hard as getting going for the first time. But my foot feels fine now, and I need to get going again before I gain back the 20 pounds I lost this year drinking my weight in lattes at my part-time barista gig.
If you have fallen off the wagon just like I have, now is the time to get back on. I know my lack of exercise has made me feel crazy, lazy, and incredibly bored. I don’t feel like myself, and I know working out again will make me feel amazing. Working out will probably make you feel amazing too, so get outside tonight and go for a walk. Or go for a run. Or a bike ride, or a swim, or run up and down your stairs for a while. Just get up and I’ll see you out there.