Insulin resistance and diseases
Fragile X Syndrome, and stem cell research
Bacteria & Cardiovascular Disease
STEM CELLs--Scientific American
Serotonin--a detailed overview
Hedgehog genes & embryo development
DNA Repair--Mechanism Revealed
Immune response to cancer observed
CANCER, Classification of cell types
INTRONS, vital roles revealed
Sex-linked Brain Differences--Scientific American
Gene For Diabetes Found
Why People Have Different Blood Types
Dupuytren's Contracture: most common connective tissue disease
New Technique for observing superoxides

Managing the Flood of Metabolite Profiling Data
As the field of metabolic profiling evolves, interpreting data and comparing it with genomic and proteomic databases poses a challenge

by Lori Valigra | Dec 01 '04

Metabolic profiling is still an emerging field, a status underscored by the fact that no large public database of metabolites exists yet. In addition, researchers in government, university, and pharmaceutical company labs have been patching together off-the-shelf software with their own proprietary analytical tools to help acquire, interpret, and manage metabolite data, which they keep in comparatively modest and often private databases. Putting together that data, comprising information from mass spectrometry (MS) and/or nuclear magnetic resonance (NMR) techniques, is just the first hurdle. Scientists will face an even bigger challenge in the future when they try to compare metabolic profiling data with that from proteomics, genomics, and other emerging "omics" fields such as kinomics.

One of the big challenges with metabolic profiling is coping with the large amount of data and statistical issues, says Glenn Cantor, DVM, PhD, principal veterinary pathologist at Bristol-Myers Squibb's Pharmaceutical Research Institute in Princeton, N.J. "There is a certain amount of noise in NMR and MS systems, so it's very challenging to figure out what's noise and what's a real signal. There's a lot of data, too much for a human to scan through, so we need software to help." Cantor added that there are no out-of-the-box solutions that can help answer all questions about a metabolite. So his company uses software from instrument or software manufacturers and also develops its own software, which includes expert systems and novel ways of doing multivariate data analysis.

What's in a name?

The field of studying metabolites has several names, including metabolic profiling, metabonomics, and metabolomics. Some researchers increasingly are using the terms interchangeably. Others still prefer to use the names separately to mean different aspects of metabolite study, for example, metabonomics for a systems approach to metabolite profiling and metabolomics for cell-based metabolite profiling. Metabonomics also is used often by drug companies, and metabolomics is used as a general term for metabolite studies. For this article, we will use the terminology used by researchers to describe their work. (see related story, "Metabolic Profiling: Meet the Latest 'Omics'" on page 39).

For Cantor, the power of metabonomics is more in hypothesis-generating than in getting complete answers. "It can yield new insights as to how things work, from which we go forward and look at whether patients with a given disease have a particular protein or a particular combination of transcripts or metabolites."

The Holy Grail, though, will be the integration of genomics, proteomics, and metabonomics, as well as toxicology and biology, and devising informatics tools that produce information that leads to new biological insights such as potential drug candidates and pathways, he says. In five to 10 years, "I think we're going to have much more complete integration of all the 'omics,' and by that I don't only mean metabonomics, genomics, and proteomics, but all the newer omics, like kinomics, glycomics, and so on. And I think we'll refine our ability not just to accumulate data, but to insightfully interpret it." A research arm of the US Food and Drug Administration already plans to create such an integrated database over the next several years (See "FDA Research Group Plans Comprehensive Database," p. 49).

Getting back to basics

At the most basic level, managing metabolite data begins with the ability to identify the metabolite. That may sound simple, but there are many molecules in biofluids that remain a mystery, says Gary Siuzdak, PhD, senior director for The Scripps Research Institute, Center for Mass Spectrometry, La Jolla, Calif. Scripps has a database called METLIN that is a repository for mass spectral data on about 1,000 endogenous and drug metabolite samples and tissues ( ).

"It's striking, because we believe we know so much already about what's going on in humans, but the reality is not only do we not know what's going on, but we also don't even know what the molecular players are," Siuzdak says. "One of the main challenges of the metabolite field is the one thing that's easy to do with proteomics, which is to identify the protein on the basis of the fragmentation pattern that you generate from the MS data. With metabolites we're finding it's a lot easier to do quantitative analysis." MS will reveal the mass, but identifying the molecule is tricky. "There's no comprehensive database right now," he says.

Siuzdak adds that metabolite data for 50 patients is about 20 GB in size, which is not a huge amount. "The more difficult part is processing the data, looking for unique molecules that are indicative of whether an individual with a disease has certain ions that are disappearing or ions that are appearing. So we're looking for metabolites that are going to tell us whether the disease is occurring or whether it is in its early stages or not, and that's a real challenge."

John Ryals, PhD, agrees. Ryals, CEO of Metabolon Inc., Research Triangle Park, N.C., says the volume of data of metabolic profiling is not as dense as that from RNA profiling or proteomics. That's because there are only about 2,400 to 2,600 metabolites, whereas there are 100,000 mRNAs and one million proteins. "These are not as dense a data set as you see in RNA profiling, because we just don't have that many variables. So there isn't that big of a demand on the data processing or the databases," he says. The hard part comes after the basic information on the metabolites is in the databases and statistical methods have to be applied to analyze the data. Biologists aren't accustomed to handling this data, so companies hire mathematicians and statisticians to interpret the data.

Metabolon developed software to analyze metabolomics data. "We're actually able to name the molecule and get the quantitation on it," says Ryals. "We essentially had to reengineer very high-end mass spectrometers. We've rewritten a lot of the software on how the data is gathered and analyzed." One problem the software can handle is peak deconvolution during the liquid chromatography mass spectrometry (LC/MS) process. When more than one molecule is under a peak, which happens often, it is necessary to identify the molecules and reassemble them.

Intelligent software

The Metabolon software has what Ryals calls "chemical intelligence," which allows it to go into very complicated peak patterns and identify known molecules by the way they ionize. It took a few years and talented software writers who thoroughly understood the MS instruments to code the software. "If you try to do it by hand, the experiment will take a year just to crunch the numbers." Ryals says MS hardware is very good but, in general, the software for the instrument from the time the signals are produced until there is an answer still needs work.

Donald Chace, PhD, director of the Division of BioAnalytical Chemistry and Mass Spectrometry, Pediatrix Screening Laboratory, Bridgeville, Pa., is using metabolomics to detect and study 35 inborn metabolism disorders in infants. Chace uses some off-the-shelf mass spectrometry software, plus his own company's analytical tools, to look at metabolite ratios to diagnose a disease. The company looks at amino acids in blood to get a picture of their metabolism, as well as acylcarnitines, which are metabolites that transport long-chain fatty acids across the mitochondrial membrane.

The challenge to a laboratory is to do MS well on many samples a day. "The last analysis has to be as good as the first even if you've done 1,000," says Chace. Most of the processing and sample analysis Chace does is run on personal computers. The computers use software packages from instrument companies such as the NeoLynx Screening Application Manager from Waters Corp., Milford, Mass., for quantitative measurements of phenylalanine and tyrosine in neonatal blood samples by tandem MS. As well as other programs, he also uses the Analyst software, from MDS Sciex, Concord, Canada, for MS data acquisition and processing.

NMR versus MS

There is no shortage of opinions among users and vendors about whether NMR or MS is better. MS is very sensitive and gathers more data, but NMR is good at identifying unknown molecules by looking at signal changes. An emerging trend is for drug scientists to use both NMR and MS. For example, researchers at Imperial College in London and six pharmaceutical companies collaborated in an effort called COMET, the Consortium on Metabonomic Toxicology, which resulted in a database of NMR spectra covering 150 compounds or biofluid fingerprints that can be used to check for toxicity. The second stage of COMET will perform more mechanistic toxicology studies and supplement the NMR database with MS data.

"With NMR we may see things we cannot see by MS and vice versa. So, it is a complementary technique," says Jose Castro-Perez, LC/MS market development manager at Waters Corp. "When you look at any biological insult toxicity base, you are not looking at just one biomarker, but a series of biomarkers, maybe 20 or 30. The more information you can get out of the system the better," Perez says of the strength of MS.

But MS is not without its weaknesses, says John Shockcor, business development manager for metabolic profiling at Bruker BioSpin. NMR yields strong quantitative data, and everything in the sample is visible within the limits of detection, which is not the case in MS. "Say our MS colleagues find a new lipid with a mass of 600 and they notice that it is two mass units lower than another lipid they've just found. The first thing they think is there's unsaturation, that there's a double bond. They have no idea where that double bond is and they don't know if it is [a] cis or trans [isomer], so they don't know anything about the stereochemistry. If they want to answer those questions, they have to do NMR." NMR sensitivity is still as much as four orders of magnitude less than that of MS, he says. Shockcor says although NMR and MS data can be merged, he prefers to keep the data separate so he can look at separate data sets.

Chace of Pediatrix says that even with the vendor software to help process and analyze some of the data, it still is the job of the laboratory to understand the relationship between metabolites and to diagnose a disease. "You get the software shell from the companies and then you write the algorithms for picking up and deciding what to flag as abnormal or not. The software helps eliminate the normals, so you're left with the abnormals," says Chace. "But a lot of clinical interpretation still is required."

Scripps' Siuzdak adds, "Everybody has unique problems they're going after or they see issues they haven't seen addressed by off-the-shelf packages." Metabolites undergo all kinds of statistical distributions that require unique analysis, says Metabolon's Ryals. He points to estrogen as a good example. "It's going to be present in women but not very present in men, and so if you just go across a row looking at estrogen in a population of people, you're going to see a really weird statistical distribution. So how do you handle that kind of data? It's these types of issues that I don't think we have the proper statistical tools for yet."

For more on the organizations mentioned here, refer to


Enter supporting content here