GEON All-Hands Meeting Highlights Two-Week Focus on Geosciences
|
8.20.04 --Last weekend's Calit²-sponsored, two-day All-Hands Meeting of a project to create "Cyberinfrastructure for the Geosciences" - affectionately known as GEON - was one of five sequential drumbeats struck in San Diego highlighting the importance of the geosciences with respect to driving development of national cyberinfrastructure. This series of events began with the ESRI 2004 International User Conference August 9-13, a GEON two-day NSF site visit, and a meeting of the GEON Advisory Board, and concludes this week with the five-day "Cyberinfrastructure Summer Institute for the Geosciences" at the San Diego Supercomputer Center (SDSC).
GEON is a five-year NSF-funded collaborative Information Technology Research project among IT and geoscience researchers, with the IT aspects led by Chaitan Baru at SDSC and the geoscience led by a group of PIs from 10 universities. Baru, head of the Data and Knowledge Systems group at SDSC, is recognized as a world leader in development of national cyberinfrastructure supporting data-intensive science across a wide range of disciplines. His work has culminated most recently in the $11.6-million GEON grant launched in 2002. GEON has the daunting task of creating infrastructure for the geosciences community and facilitating scientific breakthroughs in the two testbeds (Rocky Mountains and Mid-Atlantic region) on which the group is focusing.
|
GEON is a flagship SDSC project that has helped focus and drive SDSC's vision and organization. In addition, it has provided a deep, technical "core" - to use a relevant geological term - upon which to build out important and long-term partnerships with the scientific community interested in geosciences and IT issues, including Calit². The all-hands meeting was just one manifestation of that buildout.
This meeting attracted a standing-room-only crowd in SDSC's auditorium, including members of the GEON Advisory Board Larry Smarr and Malcolm Atkinson (all the way from the British Isles), and guest visitors from El Centro de Investigación Científica y de Educación Superior de Ensenada (Center for Scientific Investigation and Higher Education in Ensenada) who are interested in becoming involved in the project. This was the second all-hands meeting the project has held. "I'm delighted to see a larger turnout for this meeting by comparison with last year," said Baru, encouraged by this indication of growing interest in the area. A particularly gratifying development was the growing interest in the geoscience community as evidenced by several groups wanting to become affiliated with GEON.
|
GEON IT Overview
Baru provided an information technology overview of the project. The goal is to create a services-based, distributed environment that still enables a certain amount of local control by applications scientists. "We're developing cyberinfrastructure," Baru said, "to support the day-to-day conduct of e-science, not just 'hero computations.'"
|
The development team, by necessity and design, includes both computer and geological scientists to create data-sharing frameworks, identify best practices, and develop useful and usable capabilities and tools. Given the two groups' need to learn each other's "culture" and vocabulary, Baru acknowledged, with a characteristic smile, that "some of us are frankly terrified, but we're getting over that."
To move the results of their work expediently into production practice, the group is using a two-tier approach: Identify commercial tools that can be embraced and develop advanced technology for the public domain.
"We're trying to balance the need of the computer scientists to do research with the need of the applications scientists for real tools that work and provide better access to data," said Baru who, as a computer scientist, welcomes this "tension" with applications disciplines. "They say that the most interesting research happens at the borders of scientific disciplines, and this one between computer scientists and geoscientists is one of the ultimates - it's a very fruitful area for innovation."
This expediency of the GEON approach is based on leveraging the work and expertise in other parallel projects sharing many of the same interests, including:
. The Calit² OptIPuter project: According to Baru, in an allusion to the metro in the Washington, DC, area that comprehends the NASA Goddard Space Flight Center in Maryland, "the GSFC is just one subway stop away from providing a connection between Maryland and the Scripps Institution of Oceanography at UCSD." This link will provide GEON researchers with access to NASA-collected satellite and other remote-sensing data sets. In fact, GSFC may become a node on the OptIPuter network.
.
Biomedical Informatics Research Network: "We think of BIRN as our big sister project," said Baru and, with it, share personnel, leverage technology for portal development, and learn from their approaches to software releases and engineering.
. Science Environment for Ecological Knowledge: GEON is leveraging SEEK's workflow and scheme technologies.
. Southwest GeoNet: GEON is using their Net-based services, which provides an alternate Web services environment.
. Digital Library for Earth System Education (known as DLESE, pronounced de-lease-ee): GEON is using this project's infrastructure, developed for education, in its research applications and approach to metadata.
. GRid Assessment Probes: GEON will run Grid benchmark probes and implement Inca monitoring infrastructure in the GEONgrid (see below). (Baru is PI of GRASP.)
. NMI GRIDS Center: This center is developing easy-to-use Grid security infrastructure for the GEON, BIRN, and UCSD telescience projects.
. TeraGrid: This multi-year effort to build and deploy the world's largest, most comprehensive distributed infrastructure for open scientific research is using GEON as its chosen application, hosting data and services related to LiDAR data.
. National Laboratory for Advanced Data Research: GEON is working with this joint activity between SDSC and NCSA (co-directed by Baru and Michael Welge of NCSA).
Based on the BIRN philosophy of the importance of standardization to quick launch and robust functionality, GEON is building its GEONgrid, a network of standardized nodes. Each member of the grid runs a point of presence (POP) node with 760 GBytes of disk (half supports the applications work and half the caching and replication of data). Some nodes additionally have a compute component (four-node cluster), and some have additional data storage (four-TByte disk). The Lawrence Livermore Laboratory is working towards providing a one-Teraflops linux cluster to the computing mix. Other partners include the CUAHSI hydrology consortium, the Southern California Earthquake Center, Chronos, the US Geological Survey, and the Geological Survey of Canada. Industrial partners supporting this project include ESRI, HP, and IBM.
The GEONgrid consists of five software layers, from top to bottom in the stack as follows:
. Portal - This is the entry point with personal login and "myGEON" area where the user can access, manipulate, and save his/her work.
. Application services the user has access to - ability to register data sets and ontologies, GEONsearch capability, and GEONworkbench (which includes the community modeling environment). (An ontology is defined, by www.dictionary.com, as "[a]n explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them." This is the kind of thing that the traditional library community has been doing for years, e.g., the ISBN numbering system that sites the content of individual books within the larger context of all books.)
. Software under development - This layer includes registration and data integration services, indexing services for searching, workflow services, visualization (2-4D), and mapping services.
. Core grid services - This layer includes authentication, monitoring, scheduling, cataloging, data transfer and replication, and so forth.
. Physical grid - This layer includes Redhat Linux, the ROCKS cluster management software developed at SDSC and led by Phil Papadopoulos, and software related to connection to Internet2.
The GEONgrid is extending the approach of BIRN by accommodating partners, like Bryn Mawr College, that are not connected to Internet2. "From Day 1," said Baru, "we embraced the notion that our system would be based on a heterogeneous network."
GEON is also building on BIRN's revolutionary approach to data sharing. UCSD assistant vice chancellor John Wooley has been quoted as jokingly defining "data mining" as meaning "data mine, not yours." He was referring to the unfortunate scientific standard that, if you don't have a hand in creating a data set, you can't expect to be able to access it for your research. This common attitude has held scientific progress in check - until BIRN decided to turn this paradigm on its head by developing the needed infrastructure that would help the scientific community share their data to support more comprehensive studies for all.
GEON is extending this sharing paradigm by enabling researchers to share their data sets two ways: They can register it through GEON (second layer listed above) to host the data or they can register just schemas, retaining the data sets in their home environments.
Searches in GEON can be spatial, temporal, ontology-based, or natural language conceptual queries. Given that, what does registration of data sets provide the community when a scientist can use a crawler to locate data of interest? Baru responded: "We want to extend the crawler concept. In crawler mode, you don't have any measure of the quality of the data you've found because the reply to your query doesn't come with metadata you can evaluate. Registration, by contrast, requires explicit metadata - a form has to be filled out - so you know who's submitting the data and under what conditions and constraints. Crawling, though, is still useful and important - we want to leverage the NLADR 'deep Web' work led by Michael Welge. Another validation of our approach, by contrast with pure crawling, is that we can collect statistics measuring use through portals, which enable generation of logs of user activity. We believe these logs showing how people use the portal will help improve the quality of our ontologies over time."
GEON IT Advances
Bertram Ludaescher, co-PI, data mediation and integration, discussed GEON IT advances at the all-hands meeting. He said the issue for most domain scientists is how to progress from using available databases and data sets to answering the kinds of natural language queries they want to pose. "Scientists need a more conceptual view of the data," he emphasized, the first steps toward which are becoming possible in the GEONsearch capability, described above. What researchers need, he said, are scientific problem solving environments that provide the scientists' view of data sets and make tools available, including tools for scientists to create new customized tools.
|
One interesting example Ludaescher cited was the difference between a Canadian and a British ontology. With a given query, the two ontologies will produce different results. But, by articulating the relationship between the two, GEON is able, for example, to enable the British scientist to study Canadian data through "British glasses" and vice versa. Said Ludaescher, "One of our goals is to move from static concepts to process ontologies to enable the domain scientist to 'wander' between process and data that either supports or contradicts that process."
Ludaescher, with help from colleagues Kai Lin and Ilkay Altintas, demonstrated the GEONworkbench, which enables scientists to access data, manipulate it, and conduct various kinds of searches. "This provides the scientist's view of a problem-solving environment," he said.
He distinguished that from an engineer's view, which GEON is also enabling through the Kepler Workflow System, which he described as "an emerging open source tool for scientific discovery workflows." Ludaescher said you can also think of it as "a canvas for wiring components in new ways, then deploying the composite capabilities as one component."
Kepler is being designed to be easy to use, especially for non-experts, but extensible for expert users (via a visual programming interface); have reusable "generic" features; enable registration and publication of data and process "products"; and support a range of technical IT and applications requirements (error detection, recovery from failure, data- and compute-intensive tasks, status checks and on-the-fly updates, visualization, semantic and metadata queries, certification, etc.)
Kepler, like GEON development in general, is built on the work of several projects before it, including cheminformatics work by Kim Baldridge, Encyclopedia of Life work by Mark Miller, data mining work by the SKIDL team led by Tony Fountain, and neuroscience work by the BIRN team led by Mark Ellisman. The Kepler 1.0 alpha version is being launched this week at the SDSC summer institute (see kepler-project.org).
GEON Work of Interest to the Standards Community
The GEON project has already caught the attention of the standards community. The WorldWideWeb Consortium is interested in moving the work into its standards committees, and the Open GIS Consortium is interested in doing the same. "We appreciate these votes of confidence," said Baru, "but the issue is the amount of time we have available. Our first priority is research and development. Additional funding would certainly help us work with the standards community."
Ontology Development Workshops
To keep the project intellectually motivated, GEON is organizing ontology development workshops. Each workshop, led by GEON PIs, typically involves a small group of domain experts from a given community interested in developing an ontology to serve their specialized area of research. By necessity, these workshops also include a few IT experts in data modeling and knowledge representation. To date, workshops held and planned include on the following topics: igneous petrology, seismology, aqueous geochemistry, structural geology, and metamorphic petrology. "These are hands-on meetings," said Baru. "The goal in each case is to develop an ontology during the meeting to provide an electronic 'memory' thereafter to motivate future work."
Increasing Focus on Software, Says SDSC Director Fran Berman
SDSC director Fran Berman opened the all-hands meeting with a presentation that discussed the importance of cyberinfrastructure to the nation. "It's a big push at NSF right now," she said. "Science over the last couple of decades has become a large-scale, multidisciplinary 'team sport.' It's based on technology and collaborative teams trying to address bigger and more complex problems."
Technology has become the "enabler," not limited to computers, but including networks, visualization capabilities, data storage, remote instruments, sensing and handheld devices, and supercomputers. "Witness the complex technology environment that's supporting the 2004 Olympics," Berman said, which had started the previous day. "In particular, over the last decade, there has been growing recognition that it's the integrated software systems that provide the glue, so you'll see an increasing focus on software as we go forward."
UCSD Division Director Ramesh Rao Addresses Relationship with GEON
At the dinner Saturday night, UCSD division director Ramesh Rao addressed the group with some thoughts on the relationship between Calit² and GEON. He was followed by Nick Weston, Sun Microsystems, with comments on Sun's ambitious technology development plans over the next 10 years.
|
In addition to Calit², the meeting was sponsored by SDSC and Sun Microsystems. It was Webcast live, and an archival video is to be posted online on the GEON Website (http://www.geongrid.org/).