Carlos

Carlos is a plant geneticist at Michigan State University. He is new to next generation sequencing and would like to use a graphical user interface (GUI) to interact with his data.

Quote: “Make it easier for me to analyze my own data, so I’m not stuck in a bottleneck waiting for computational assistance to get the job done.”

Research Interests
Carlos conducts bench level research to understand how genes and genomes are regulated in Arabidopsis.  Specifically, he is interested in discovering how non-genetic factors cause changes in a plant’s phenotype.  The data he collects via experimentation are the inputs to high throughout sequencing methods. 

Scientific Goals
The goal of Carlos’ research is to understand how genomes are transmitted from generation to generation.  How do plants “perceive” various inputs such as sun exposure and temperature?  How does this affect output, or genome transmission?  How and why would 2 plants with identical genetic information have different phenotypes?

Work Environment
Carlos works in a lab with several colleagues.  He uses a laptop computer and appreciates the mobility it offers, allowing him to accomplish his work anywhere, any time.  For this reason, he doesn’t bother with a large external monitor to view his data, although some of his colleagues do.

Core Activities
Here’s a typical workday for Carlos:

  1. Test tube work, prepping DNA and RNA samples for extraction
  2. Creating libraries of molecules in specific formats
  3. Shipping the libraries off to the in-house sequencing facility
  4. Receiving large digital files back from the sequencing center
  5. Working with a computational biologist to mine the data to answer his research questions
  6. Inspecting the data for patterns and testable hypotheses

Collaboration
Carlos collaborates with computational biologists and statisticians on data analysis.  He also works with other plant geneticists, helping them implement the protocols used in his lab so that they are all working from a shared cognitive model of the data. 

Attitudes & Motivation toward Technology
Carlos sees the value to his field of advances in technology that allow for the analysis of millions of data points at a time, but is a little suspicious of the digital information produced by these advanced computational methods.  He views such a methodology as a “black box” that mediates between his raw data input and the digital output, and has concerns that small, unknowable changes within the black box could have a misleading impact on his experimental findings.  

Carlos thinks that GUIs provide the user with increased transparency and a broader range of analytical possibilities than using command line tools, which offer a precise result but a narrow focus.  He likes point and click, but dislikes interfaces that require him to make what he considers repeated, unnecessary movements on screen – clicking, selecting, scrolling, zooming, etc.  He considers the genome browser he uses to be clunky, providing a poor visualization of the data. 

Tools & Applications
Carlos uses Google and other web based tools for searching. He uses a genome browser set up by a computational colleague to view graphical displays of data. He also uses the point and click collection of tools in Galaxy for quality control and some preliminary data analysis.

Technology Skill Level
Carlos’ statistical skills are good, but not expert.  With respect to computational skill, Carlos understands the basics of what is being done to his data, but overall this is not a happy place for him.  He did some computational work in grad school, but in the intervening 10 years, the technology has exploded.  Carlos is more interested in being able to work with his data autonomously via GUI than he is in the actual behind the scenes operations on the data. What really bugs him is giving the data to someone else to analyze.

Pain Points
Carlos laments that, until relatively recently, biologists were able to work independently and get the job done pretty fast, analyzing their own data on their own computers.  Back then, a biologist might have been conducting an experiment investigating 12 progeny.  That number could now be 12 million.

In the brave new world of enormous digital data sets, scientists like Carlos have become dependent on shared resources, requiring outside assistance with sophisticated computational analyses run on supercomputers. This step has also become a bottleneck in the research process, since there are only so many computational biologists available to lend a hand. In addition, Carlos experiences some discomfort with the idea that his data are not fully under his control at all times any longer. 

Carlos is also concerned that while the computational scientists are swamped performing this type of 'service' work for their less computationally experienced counterparts, they are being prevented from advancing current -- and developing new -- technologies.  Carlos thinks that the plant science community underestimates the rapid evolution of technology and that the Discovery Environments are not anticipating what’s going to be needed in 5 years; they are only catching up with what’s currently available, rather than innovating. 

Carlos would like to at least be able to do some initial quality control on the data himself, to see if it’s worth analyzing at all.  He feels that plant scientists today are less selective about data collection than in the past, taking a “more is better” approach and depending on the computational analysis to pick up the data of interest and leave out the noise.

Wish List
Carlos wishes he had a tool that would allow him to store notes and snippets of data while he’s working, for closer examination later.  For example, if he’s viewing data with the genome browser, he’d like to be able to grab a window and drag it into a “cloud”, add a note, and come back to it later.