A Survival Guide for Students Starting an MS in Bioinformatics

29 Sep 2021

I’d been toying with the idea of making a survival guide for my students for a few semesters for, and I decided to actually sit down and do it this time. It’s written based on my experience in the MS program where I now teach the introductory course. While it’s written with students going into a particular Bioinformatics MS program in mind, I suspect the lessons apply to a lot of programs. I also imagine a lot of it would apply to a more general data science student as well, and a few things that could be handy for grad students in other fields. I could be wrong but I decided to run it up the flag pole in case it’s useful to anyone else. Learn from my biffs and triumphs.


Stuff I learned doing an MS in Bioinformatics (besides Bioinformatics).

This is a survival guide I’ve been meaning to make for my first semester students.

You are not an imposter

It’s sometimes intimidating to be around the kind of people you meet in the Boston biotech scene. It’s a bustling, affluent area packed with hugely successful companies, top schools, and really smart, driven people.

It’s important to keep in mind you’re not the only regular person who doesn’t get it in a sea of geniuses. Nobody is as sharp as they are at their best all the time - you can be a famous professor and still spill coffee on stuff and start talking in a zoom meeting without turning on your microphone all the time.

Even the biggest rock stars started somewhere, and it’s more likely they got where they are by making the most of the opportunities they had, working in a persistent, intelligent way, and getting a little lucky than by being a magical genius you could never hope to comprehend.

Success is not guaranteed but keep in mind that you wouldn’t be here if the committee didn’t think you had a strong chance to hang in the scene.

A note on ‘math people’

Math has been terrifying bio people from time immemorial (and now they have code to deal with too). I was far from a math person, having struggled in undergrad, feeling as though the “why” was missing from the content, before eventually forcing my way into decent standing. The stats used in bioinformatics tend to come from practical concerns, so the “why” is easier to see. It’s still hard sometimes (you’re never really done learning), but it’s gone from something I dreaded to something I can use practically and enjoy, and I find it’s one of the main ways I contribute to my team now. The bottom line is that not feeling like a ‘math person’ doesn’t rule you out of a productive career for a few reasons:

1) (Bio)statistics is a quirky subfield of math - you can’t necessarily assume that having a tough time in trig is going to translate to a tough time here. 2) You can make computers do the grunt work so you can focus on the concepts. We often use code libraries for statistical procedures. This doesn’t mean you can stop trying to understand what is happening but it does mean you can be effective without remembering what every Greek character in a formula stands for. It also means there are fewer errors along the lines of “Drat, I forgot to carry the 1”. Understanding what is an appropriate statistical procedure for your situation (or knowing when you don’t know, and where to look to find out) is more important than being able to write out matrix multiplication by hand.

When I was finishing my MS I took a PhD level stats course in each of my final terms, getting an A and a B while working at a co-op full time. I’m still waiting to feel like a “math person”. I’m not, however, waiting to be able to make meaningful contributions in my lab.

Bioinformatics is hard

I’m not saying most people think this field is a cake-walk or anything, but it’s easy to forget just what we have to juggle. You’ll eventually be expected to know a lot of what a statistician knows, a lot of what a biologist knows, and a lot of what a software developer knows. You don’t have to be a stats whiz to see the numbers get a little concerning here. It’s normal to come across something you’re not familiar with (or don’t remember well) and to look it up. Which brings me to…

Adopt the mindset of a perpetual student

You might think of yourself as a bioinformatics apprentice now. In a way, if you succeed in the field, you’ll be one for the rest of your life (or at least your career). There is just too much to know.

Your job is to be good at learning stuff now

For my first publication, my contribution to the paper was co-writing an app in a language I’d never used before. What’s interesting about this is actually how uninteresting it is - this kind of thing happens all the time. Some people say “I’m starting a new project, time to learn a new technology!” and are only half joking. Stay open minded about learning new stuff and get used to searching and reading documentation carefully. The is true for the biological and statistical aspects of the field as well - new methods and findings are being reported all the time.

Aggressively pursue mentorship

One important factor to consider when looking at jobs and co-ops is the availability of a mentor. This is so important to me that when offered a transfer to a new department in my work, I told them my acceptance was contingent on them finding me a biostats mentor. Ask about who you’ll be able to go to when you get stuck, or who will help you learn the job. It doesn’t have to be one person either - in a field like this you might have somebody you got to to for help with biology concepts and someone else for code. This is fine, just make sure you know these people exist and that you’ll be able to access some of their time. If offered two jobs, one at the most famous institution on earth with an unavailable grouch, and one somewhere you’ve never heard of with someone who cares about teaching, take the second one. You can get fancy later - get good first, and it will be easier.

Read

Grad school is very concentrated in terms of acquiring new information, which is part of the point. However, our field is composed of a lot of nitpicky details - programming minutia, subtle statistical distinctions, complex biological pathways. This isn’t even to mention the technical processes of sequencing itself. It’s easy to miss the big picture. Reading this book helped me see the forest through the trees (and I find myself reviewing sections). Review-style papers in scientific journals also help catch you up on a topic at a high level. Additionally, take some time to read articles on biology, medicine, and computation that appear in quality magazines and attend talks that strike your interest (you’ll probably have a chance to do so at a lof of companies, and will be bombarded with invites at most academic institutions).

Ask questions

It can sometimes be anxiety-inducing to ask a question, especially if you think it’s a basic one. I know I felt this way sometimes (and still do). It’s usually worth it though, and more often than not, in my experience, people aren’t going to give you a hard time about it.

Treat co-op like a real job

Co-ops vary in their scope and responsibility so it’s not always clear how much of a chance to run with your own project you’ll have. Either way, make the most of the networking opportunities and dive into whatever task your given - you never know if it will lead to jobs or more learning opportunities.

Consider not taking classes during co-op if it’s paid

If your co-op pays you enough to live on, consider making it your complete professional focus. I was a TA while on co-op but, at least for me, that was easier than taking a difficult class. The time where I was taking really difficult coursework and working was very demanding. I feel I could have been a better student and worker if it was spread out better.

Mingle

One of the biggest benefits of grad school is the networking. Introduce yourself to other people and treated them respectfully. The people you meet, both faculty and classmates, can inform you about job opportunities and become future coworkers, friends, or collaborators (all of these have happened to me).

Be intentional with your time and energy

Get organized

Develop a consistent way of capturing and organizing tasks. It doesn’t matter what it is, it doesn’t have to be perfect, and it doesn’t have to be set it stone, but it does need to exist. Find something that works and tweak it as needed. The same goes for planning your days and weeks - do not make a habit of reacting to your day or you’ll be on your heels, stressed, and doing less than your best work for the next 2-3 years.

Get good

The most important book I’ve ever read is called So Good They Can’t Ignore you by Cal Newport, who got famous by being good at the kind of things people do in grad school. I read it in 2012 and I still consider the ideas in it when making all my career decisions. Consider reading it, or find a detailed summary.

Practice

This might seen obvious but is easy to overlook - learning new languages, programming or otherwise, requires practice. Passive consumption of material is often needed to start a new skill but you won’t ever get good enough to hire without pushing yourself to new levels by enacting what you know in a repeated, focused way. There is nothing wrong with repeating exercises that still challenge you. Don’t look at coding like collecting information you need to know, think of it as practicing for a recital. In the internet age, simply knowing stuff takes a back seat to being able to do something with what you know to an even greater degree because most information is a search away.

Focus

Doing well at things that are hard generally requires focus and skill. Grad school is hard (in and of itself) and so are the individual things you will do. Make time to really dive into them without interruptions. More on this (another key book for me) is here, also from Newport.

Multitasking is, by and large, a myth

“Multitasking” on substantive mental work is not real. What people think of as multitasking is just switching between tasks in alternation. This costs more energy than doing two things one at a time. Your mental energy is perhaps the most valuable currency you have in terms of getting graduate-level work done - don’t pay extra taxes on it when you don’t need to.

Use social media thoughtfully

Social media apps are an attempt, by some of the most world’s most successful companies, to take your attention away from where you intended to put it. They are deliberately designed, using the same psychological principles that went into the slot machine, to produce addictive behaviors and maximize the time you spend on them. This principle is in direct tension with working intentionally and focusing on difficult material. It’s hard to find two hours a day, but that’s how much time most Americans spend on social media. The cost of this is greater if it makes it harder to focus even when you’re not on it (by training your brain to stop what it’s doing and look for a reward). I’m not saying using them has no benefits or that they will turn you into a zombie, but it’s worth being intentional about how you engage with them, attempting to get the benefits of socializing with minimal drawbacks. For me this looks like checking the platform I like one or twice a day in a browser for a few minutes, but not having it on a mobile device during the work week. The 30 seconds I spend installing it and logging in on Saturdays are nothing next to the mental health and productivity benefits I felt after using it on my terms, not the terms of the people who profit from me (why would they have my best interest in mind?). Some people might be fine with their habits as-is and what works is personal, so experiment if you feel the need to - you can read more about this kind of thing here.

Get stuck as soon as possible

This might sound weird, but when you’re working on an assignment, think of your job as getting stuck as soon as possible. When you’re planning your work, assume you will get stuck and need to wait on a reply from a (probably very busy) classmate, TA, or instructor. Getting as far as you can as soon as you can helps you get things in on time and also make it easier to help you - if we’re sending a rushed reply or trying to squeeze in a meeting, we’re not going to have as much time to get into the material in detail. I’m not saying you can’t ask for help as things get closer to being due, just that it’s a better experience for both parties if you start early enough to know if you need a hand.

Take care of yourself

Sleep

I once asked one of my undergrad professors for advice on becoming a better student. He is a really smart dude - he had a seemingly inexhaustible knowledge of cell and molecular bio, could lecture beautifully for hours from a single page of notes, and had published as a post-doc at a top university. I assumed he had some workflow that I could never dream up. I was surprised by the simplicity of his answer when he told me he wished he slept more during school. He felt he would have retained more information. Even the highest performers answer to their physiology. I’m not saying I made it though grad school without a few all-nighters (or real late nights), especially balancing a job for most of it. But they were rare. Prevent as many as you can by being strategic.

Maintain a compartment for your life

Grad school is great at invading your life. I found though, that by planning carefully and focusing hard, I didn’t have to give up everything. If playing board games with your friends on the weekend is what gets you through the day, plan around it. You’re probably going to take a few hits in terms of what you have the time and energy for, but, with care, your outlets can take care of you, if you take care of them.

Ask for help

If you’re struggling personally, help is available and you should use it. Grad school is hard enough without leaving help on the table.

You’re a quant now

Congrats, nerd.

Take statistics seriously (and early)

A lot of work in our field is executed by code but motivated by statistics. I personally loved learning to code (and still do) - I love stats too, but it was less intuitive to get started. Code will sometimes tell you when you’re wrong with an error or a warning - a statistical test will not, silently letting you draw bogus conclusions. Thus, it pays to take it seriously, not just a means to and end. Give these classes the attention they deserve, and…

Consider a data science elective (DA5020/5030)

There is statistics in the program, but it’s an important thing and there is too much to learn in one semester. Additionally, these additional courses may also expose you to the nuts and bolts of getting datasets in out of a database and messing around with them, which isn’t always covered in more theoretical stats courses. They may also include “machine learning” related topics in addition to more “classical” statistics. That being said..

Machine learning is built on statistical foundations

Please do not be the person who doesn’t know what logistic regression is for but tries to solve every problem using a neural network. It’s good to explore, and neural nets are awesome, so you should learn about them eventually. But they’re made of pieces of statistics - it’s been said a neural net is just a bunch of logistic regressions standing on top of one another in a trench coat - and you’ll want to know what those pieces are. Misunderstanding and net can lead to sorrow, so be wary of putting the cart before the horse. They are one one of many tools, all of which have different uses, strengths, and weaknesses (the drawbacks of a neural nets beyond the scope of this article but are important enough to look into).

Learn (make your peace with?) R

The R programming language is notoriously quirky. It’s also built from the ground up to eat statistics of basically any complexity for breakfast. The phrases I’ve uttered in painstakingly becoming a proficient R users would make a British soccer (er, football) fan blush. In fact, I learned it only because I was tricked and forced into doing so by an out-of-date course description that promised me sweet, beautiful Python. I was pretty salty at the time, but this is easily the best annoying thing that ever happened to my career. People all over the place are just waiting to blow up the slack channel asking you for plots, and R is galaxy-class tool for getting them the goods. A few things:

You won’t escape R for differential expression, and lots of people are going to ask you to do this for them.

Look up sleuth, limma-voom, and DESeq2 if you think this these experiments are coming your way on co-op. If it’s not obvious which is right for the experiment, I go for limma-voom.

Use the tidyverse

This is essentially a dialect of R at this point. Using tidyverse R vs “base” R is the kind of thing people fight on twitter or get ideological about, but ignore that. It’s more practical to start with the tidyverse for most people (way less quirky and obtuse, extremely well documented). dplyr and ggplot2 should be among the first components you learn.

Check your types

Do this in any language, but especially R. Part of why makes R weird is the “type system” it uses, so read up on how to check what types are in your dataframe (class, and similar functions). Read about the factor type. We can talk more about R later - let me know if you’re interested, but I’m pushing it in terms of the scope of this article already.

Misc: On being noticed, being practical, and not shooting yourself in the foot (at least not too often, or without meaning to)

Git is part of your life now

Using version control (which is usually git) is now part of being a functioning adult for you. If you’re writing anything that’s bigger than a throw-away, one-off script, make it into a git repo (I made one just for this article). If you’re not yet experienced enough to tell if it’s a throw-away, one-off script, put it in git just in case. If you think you’re experienced enough to tell, you might be mistaken, so it couldn’t hurt to put it into git anyway. Commit and push your work all the time. It’s difficult to overstate the importance of version control to a good workflow. Think of it like backing up the data on your phone, except that you can instantly go back to every version of your data. And make “branches” of your data where you try and save different things in different ways, so two or more working versions can exist. And, if you like one version better, you can make it the new official version. Or merge them both into one. Or send it to a friend, and have them make a version, which will exist without messing with your version. I once needed to exactly recreate a plot of randomly partitioned data from 6 months prior for the VP of Data Science at my co-op. I’d forgotten how I did it and deleted the code from my computer. If I hadn’t been using git carefully, I’d have been toast. If you’re on a project of non-trivial size or importance and someone tells you that you don’t need version control, it should feel like they just told you “I don’t need to brush my teeth”.

Read the error message, then paste it into a search engine

Seriously, the degree to which this will make your life easier is almost absurd. You’re going to be seeing error messages in several languages throughout the semester and many more in your career. This is where a lot of your time blocks will come from. If the code is syntactically incorrect, it simply won’t run, and there is no way to know if it will take you ten seconds or ten hours to fix it. Luckily, I learned this early on when I took an error I had to a TA and he simply copied out of my terminal and into a search bar. A post on how to fix it was in the first few results. The internet is bursting with people who want to know why their code is broken, and there are many forums of questions with answers to common questions. Get used to navigating them.

If a post bails you out, bookmark it

Getting stuck is a pain. Getting stuck on something you’ve been stuck on before but can’t recall how to fix is a huge pain. Bookmark or clip posts that save your butt.

Use a virtual machine or spare laptop if you want to tinker with a Linux system

I found it fun to explore Linux - different software, distros (I use Arch BTW), and desktop environments, and so on. However, if you tell a Linux terminal to do something insane, and you have the authority to do so, it will listen to you. Sometimes this leads to a learning experience such as “Hmm, why is my desktop gone?” (true story) or “Hey, GRUB is broken and my computer boots to a black screen and flashing white cursor, what do I do now?” (true story 3+ times). If you want to break stuff and learn to fix it, that’s awesome, just don’t do it on the machine you’re taking a final on tomorrow.

Make something

It’s nice to have a portfolio, hosted somewhere like GitHub, where you can show off things you’ve made. If you do a creative final project, clean it up and put it somewhere public so you can flex for future employers. I also found it fun to do coding or data projects on random things I found interesting, and to show those off too. This will set you apart from those who can only show homework they did. You can look at my site or portfolio for a sense of what these might look like, but don’t be afraid to do you own thing.