calit2

BIRNing to Create a New Standard in Collaboration

1.7.03 -- John Wooley, associate vice chancellor at UCSD, joked recently about the term data mining, which describes the process of analyzing increasingly massive amounts of data so as to discover more fundamental principles.

BIRN logo"It really refers to 'data mine, not yours,'" said Wooley, with a laugh. He was referring to the unfortunate scientific standard that, if you don't have a hand in creating a data set, you can't expect to be able to access it for your research. That common attitude has held scientific progress in check.

Fortunately for the future of both science and society, that tradition is beginning to reverse course, thanks to visionaries like Mark Ellisman. He's a neuroscientist and bioengineer with joint appointments in UCSD's School of Medicine and the Jacobs School of Engineering. He's also a member of Calit²'s Digitally Enabled Genomic Medicine layer.

Mark Ellisman
BIRN director Mark Ellisman talks about the buildout of this 'living lab' and how it will accelerate research.
[Video] Length 1:06
[Transcript]

Ellisman, a year ago, embarked on a multi-campus project funded by NIH's National Center for Research Resources (NCRR), known as the Biomedical Informatics Research Network (BIRN). BIRN, which also serves as one of Calit²'s living laboratories, received its second infusion of NIH support this last fall, which brings in Calit² partner UCI.

BIRN's goal to enable data sharing requires development of new and ambitious national-scale information infrastructure to store and compare a vast number of data objects - an unprecedented capability for the biomedical research community. BIRN's scientific goal is to integrate a wide variety of types of data, acquired by the most advanced biomedical imaging and General Clinical Research Centers around the U.S., into a extensible knowledge system. To make this possible, BIRN is designing and implementing a hardware and software infrastructure, supported by the necessary expertise, to manage, provide access to, and support analysis across geographically distributed data sets.

Why does this project matter? Imagine you wanted to do a study of all female patients of a certain age that have certain symptoms. And suppose you wanted to investigate the correlation between this set of characteristics and the onset or progression of a particular disease.

You would want to be able to access all relevant data sets no matter where they were, who had collected them, or by what technology they had been acquired. You'd want all this data to be calibrated in a scientifically credible way so you could access it seamlessly, analyze it as quickly as possible, and move on with the next step of your research.

BIRN is using this type of scenario as a carrot to motivate collaboration as a scientific virtue: BIRN hopes to convert convinced data mine-ers into data sharers.

Data sharing will enable better understanding of the morphology of neurological disease, including multiple sclerosis, schizophrenia, Alzheimer's, and Parkinson's, some of the diseases slated for study during the early years of this project. If you were to add in other diseases that pose long-term health problems and seriously impact health care institutions and costs, you can see that, yes, this project matters. And it will matter more, as it grows.

Altogether, there are four grants:

  • Mouse BIRN - Dr. G. Allan Johnson, Duke University, PI
  • Brain Morphology BIRN - Dr. Bruce Rosen, Harvard University, PI
  • Function BIRN - Dr. Steven Potkin, University of California, Irvine, PI
  • BIRN Coordinating Center - Dr. Mark Ellisman, University of California, San Diego, PI

The first three BIRN projects each comprise teams of institutions located around the country.

BIRN Sites
View PowerPoint

Mouse BIRN will allow researchers to share and integrate very large, three-dimensional images of the mouse brain, investigating mammalian models associated with human diseases such as Alzheimer's, dementia, and Parkinson's. Brain Morphology BIRN will advance use of biomedical imaging for diagnosis and treatment of neuropsychiatric illness. And Function BIRN will develop a common functional magnetic resonance imaging protocol across instruments and study human brain dysfunction related to the progression and treatment of schizophrenia. (More information on these projects can be found at the BIRN Web site: www.nbirn.net.)

Sounds straightforward, but the technical challenges in making this work are daunting.

Multiple species are being studied, notably, the mouse and human, with the possibility of adding more species that have the potential for yielding information relevant to the human brain. Data are being collected on different types of brain activity, over a range of scales (molecular to the whole brain), over a range of time periods depending on relevance to the image acquisition technology and the topic of study, and using different laboratory methodologies (such as positron emission tomography, high-voltage electron microscopy, and magnetic resonance imaging).

Surprisingly, even like models of laboratory equipment made by a given manufacturer need to be calibrated so that a data set collected by one machine can be correlated accurately with that collected by another. Not to mention the fact that the image files use different filetypes and live on different storage technologies, based on different computers, using different operating systems - and in different places!

Was Ellisman crazy to try to address this problem?

On the contrary: Craziness had nothing to do with it. Instead, Ellisman's motivation arose from a lot of hard work over many years, building relationships with respected colleagues and exploring what technologies could advance their mutual scientific interests.

In the tradition of breakthrough science that gathers together the best minds to tackle a problem, this project built carefully on long-nurtured, successful collaborations supported by NIH and the National Science Foundation. Today, the NIH supports research at the following BIRN institutions:

(A full accounting of all participating schools, hospitals, and labs can be found at www.nbirn.net.)

This national partnership is anchored at UCSD by the BIRN Coordinating Center (CC) to support the national sites. The BIRN CC leverages the National Center for Microscopy and Imaging Research, Ellisman's home lab at UCSD; the National Biomedical Computation Resource at UCSD, which coordinates this grant; the San Diego Supercomputer Center (SDSC), which staffs the BIRN Network Operations center and Help Desk; and the National Partnership for Advanced Computational Infrastructure, an NSF grant that has solidified many important and long-standing research collaborations across a variety of institutions and the breadth of the country.

That's a lot of talent to throw at such a complicated scientific problem. But that's exactly the kind of partnership needed to make headway.

Scientific research continues to build on innovation arising from individual PI-driven research. However, to attack problems requiring development of national infrastructure, federal agencies are supporting larger, longer grants for these kinds of ultra-complex research projects that require a broad range of expertise spanning computer science, disciplinary science, and social science. It's that combination of talents that makes it possible to address larger-scale issues.

Mark Ellisman
Ellisman discusses how the Calit² - led OptIPuter project will help BIRN 'see the forest, the trees and the leaves.'
[Video] Length 1:24
[Transcript]

"Scientific progress is being driven at the interfaces of disciplines that work together to address more complex problems," say Larry Smarr, Calit² director. "We consider BIRN a showcase for this new kind of science."

Francine Berman, director of SDSC, underscores this point: "BIRN is already paying dividends by serving as a model for yet another disciplinary domain, geoscience, through our GEON project. In fact," she emphasizes, "this model really applies to large-scale projects and cyber infrastructure generally."

Day-to-day operations for BIRN are overseen by project manager Mark James at the UCSD BIRN CC. He merges backgrounds in computer science (simulation and modeling) with private sector experience running data centers and software projects.
"We're trying to define processes and procedures, and establish best practices to build a standardized system that's reliable and scalable to support the work of thousands of researchers," he says. By "thousands," though, he's talking only about neuroscientists.

What he doesn't say is that this type of system could potentially be used, with perhaps only minor modification, by nearly any discipline. That's the grander glory of this project: scalability across not just numbers of people, but numbers of disciplines. So multiply the thousands of neuroscience researchers he mentions by an additional number of disciplines of your own guessing that could benefit, and you'll gain a sense of the potential long-term impact of this work.

The development of this distributed cyber infrastructure, invariably, is also encouraging research on the infrastructure itself: new techniques in databasing, information retrieval, visualization, and computational processing. The infrastructure is perceived as a "Data Grid," but, in the future, as the demands of computational models grow, the emphasis is likely to include more parity with computation.

The infrastructure is based on a "standard issue" local node designed and provided to each site by the CC. This includes a grid computing rack with three Linux-based computers, 1-10 terabytes of storage, the Oracle database, the SDSC Storage Resource Broker, and gigabit network access via Internet2's Abilene network. The cost of each rack is on the order of $90,000, which is paid for by the grant.

Local campus, medical school, or hospital connectivity to the Internet2 gateway is assumed to be gigabit per second, which each institution commits to as part of its participation. Each site is also required to provide a network administrator and half-time equivalents each of a database programmer and an applications programmer trained by the CC to support the local node.

The CC is responsible for deploying the network infrastructure, developing software tools (such as for structural and functional analysis) and a common Web portal interface, sharing tools/code across institutions, converting data between tools, and providing support services, such as network monitoring, performance measurement, statistical analysis of how the infrastructure is used, a help desk, problem tracking and resolution, and documentation.

Commonality of the infrastructure for each site is the key to the rapid national-scale buildout of BIRN. The BIRN standard infrastructure is not only supporting NIH biomedical researchers but helping drive requirements of the NSF Middleware Initiative. So, over time, BIRN is likely to have an even more far-reaching impact across a wider range of disciplines.

Just as important as the definition of the hardware, software, and networking required for the local nodes is the CC's role in training and installation. Staff from the CC conduct a pre-site survey and site visit prior to each installation to match their expectations to the site's needs. Then the local node system is pre-configured at SDSC, shipped, and installed. The devil, as they say, can be found in the details of the site in question: What is the available power supply, is the door large enough to accommodate entry of the equipment, can the flooring support 1500 pounds…?

Mark Ellisman
Ellisman talks about the challenge of interactively analyzing BIRN's 50 Gigabit+ imaging datasets.
[Video] Length 1:09
[Transcript]

With a local node system installed, users can access data through a standardized Web portal that accommodates use of individual sites' applications. And part of the bargain to gain access to the data is that each site contribute data.

One main advance of this project is the decision not to ask sites to change their data when contributing it. Rather, the choice was made to deploy software to enable the various data sets to "talk" to each other seamlessly as is, regardless of data format, storage medium, hardware platform, or location. That's where the SDSC Storage Resource Broker makes such a formidable contribution to collaboration.

Inevitably, this project trips up against issues of data security and confidentiality. Clinical data is governed by the Health Insurance Portability and Accountability Act (HIPAA) of 1996: If you're a patient, your doctor can't release information about you to an outside person without your permission. The act further stipulates that your identity be "anonomized." So the needs of this project are pressing the federal granting agencies and Institutional Review Boards (that oversee research projects with human subjects) to adapt their guidelines on the use of human clinical data to the modern world of Data Grids while preserving all patient rights.

Besides wanting access to all relevant data, researchers are also beginning to want that access to be available anywhere, anytime. And that means wireless. At the Supercomputing 2002 (SC02) meeting in Baltimore in November, BIRN researchers demonstrated wireless access from an iPAQ.

BIRN is also exploring the changing sociology of research management with respect to the various levels of management (CC management, individual project management, or funding agency oversight) and the mechanisms (read: technologies) by which management is supported.

National BIRN participants meet frequently in various groupings: all members of a given project, all project managers, CC staff with participants at a site experiencing a problem, and so forth. These groups meet by phone, videoteleconferencing (supported by the standard issue equipment), and in-person visits.

Weekly status reports are also provided to NCRR. While this frequency is highly unusual in federal granting circles, NCRR considers this degree of communication important to monitor the pulse of the project, and identify and help BIRN management address problem areas to maintain momentum of the project and ensure smooth growth of the infrastructure.

Other questions remain to be answered of the kind that plague many efforts to bridge communities or special interest groups, including how to minimize barriers of potential mistrust or perceived "competitive advantage," how to determine who gets credit for accomplishment in a shared environment, and how to integrate new participants, both technically and sociologically, as the project grows.

Last year, five institutions joined BIRN. Now the overall infrastructure counts 10. And BIRN expects strong growth going forward: NIH expects to add some 35 General Clinical Research Centers to join the BIRN Data Grid over the next three years. And demand will increase to incorporate imagery of other organs, diseases, and species.

Looking further down the road, as visionaries do, PI Ellisman says, "The next big push is building on BIRN to support clinical informatics."

BIRN, in its fullest incarnation, will help provide society an information infrastructure supporting individualized health care by aggregating and enabling access to relevant information obtained from all biomedical science, clinical practice, demographic characteristics, and treatment histories.

Related Links

DeGeM Layer
Living Labs