6 Data Management

6.1 Introduction and Rationale

The data sets obtained throughout the life of the U.S. JGOFS and the U.S. GLOBEC programs form the basis for scientific papers which collectively represent the legacy of both programs. The data sets will be used in the post-JGOFS era as test beds for models and as reference points for future studies. These data must be collected, carefully checked, properly assembled and made accessible to the user community. To this end, a data management system has been developed to meet program needs throughout and subsequent to its lifetime.

The U.S. JGOFS Long Range Plan outlined an approach to data management based on a distributed system of networked databases and a strong coordination effort. During the pilot studies, with funding from the NSF, the team of Glenn Flierl, James Bishop, David Glover and Satish Paranjpe developed prototypes of a networked, object-based data system. The NODC office at the Woods Hole Oceanographic Institution (G. Heimerdinger and R. Slagel) coordinated assembly and quality control of the data sets. While these efforts were successful as an interim measure, it was necessary to merge and expand these two activities.

With the initial development stage of a data management system ended, we needed to set in place an operational system which could efficiently handle the huge flow of data from the North Atlantic Bloom Experiment, the Equatorial Pacific Process Study and subsequent process studies (e.g., Arabian Sea, Southern Ocean, North Atlantic) as well as the accumulated data sets from the Time-Series stations at Bermuda and Hawaii, ocean color and CO2 survey data sets.

In response to this need, a proposal to implement data management was submitted to and funded by NSF. The proposal laid out a scheme designed

The primary mechanism for meeting these needs is establishing a new Data Management Office (DMO), ensuring that the data produced during the program are rapidly and efficiently available for use.

6.2 Approach and Definition of Program Data Management Needs

U.S. JGOFS and U.S. GLOBEC/George's Bank principal investigators require two distinct types of computer access: local data systems in which they can enter their data and work with them, and access to a much wider system containing data from other projects, as well as historical data sets. To accommodate the varied needs of PI's and other users, the JGOFS Data System was designed with maximum flexibility in mind.

Early in 1994, Glenn Flierl saw an opportunity to adapt the Data System server to the emerging technology of the World Wide Web. He altered the system's protocol to utilize HTTP, the HyperText Transfer Protocol. The major advantage in this change is to offer a standard, well-accepted interface to users of the Internet. This represented a turning point in the acceptance of the JGOFS Data Server since access to the Data System became easy and familiar; just like looking at other Web information.

In addition to observational data, the U.S. JGOFS program needs to be able to serve many other kinds of information - meeting and ship schedules, project abstracts, PI information, data status, etc. This needs to be readily accessible and kept up-to-date. Building and maintaining a WWW Home page seemed like the most effective way to meet this need and it goes hand-in-hand with the interface we chose to make the Data System available.

While the basic software structure is in place, many modifications and improvements can be anticipated. Thus, for the next several years, a system manager/programmer will be required. Another requirement will be additional storage capacity for the expanding data collection.

6.3 Structure of the Data System

The JGOFS data system was designed with a distributed, object-oriented approach. The guiding philosophy was that the closer we get to the actual data that the originating PI uses in her/his own research, the better. The storage format should be the PI's choice. To date, what we have discovered in the U.S. JGOFS program is that few PI's are able to or willing to serve their own data. Many reasons account for this - underpowered PC, unavailable HTTP server, lack of access to a networked Unix workstation - but, whatever the reasons, most of the investigators choose to send a copy of the data to the Data Management Office.

Nonetheless, the data can be stored in a format very close to that originally submitted. Since the system uses translation programs ("methods") to read the data and standard writing routines to create a common appearance, others can network to the data without regard to storage format or location. These methods know how the data are formatted, and the user of the system does not need to know it.

Although the core of the system is the distributed workstation structure, it is clear that there is a need for a central data management office to oversee and maintain a smooth operation. Full cooperation from program PI's will be necessary and here follows a breakdown of where these responsibilities lie and interrelate.

6.3.1 Data Management Office Responsibilities

The NODC data management effort during the North Atlantic Bloom Experiment has demonstrated the necessity for a data manager committed to collecting, tracking, and quality- controlling databases. In the future, such activities will be even more important and significantly greater in scope. Additionally, the personnel at the data management office are required to work with PI's to ensure that all data are properly documented.

Thus, the following activities are the fundamental responsibilities of the data management office:

These functions/duties are divided among 3 positions; a liaison officer from NOAA's NODC, a Systems Manager, and a Data Management Officer. The latter two are under the nominal supervision of a member of the current Data Management Project and the head of the Planning and Implementation Office, and comprise the staffing for the U.S. JGOFS Data Management Office.

Because of its responsibilities for ocean data archiving NODC has expressed interest in a continuing role in the U.S. JGOFS data management program. Coordination between NODC and the data system project is required to fully develop the data system structure and data management office plan. NOAA's National Oceanographic Data Center has supported this task with its Liaison Officer located at the Woods Hole Oceanographic Institution and has committed this resource for the long term. Clearly the effort envisioned is several times that expended during the Bloom Experiment and resources are required to ensure that the data activities function properly.

6.3.2 Principal Investigators' Responsibilities

The PI's, likewise, bear crucial responsibility for data management. In order to satisfy the requirement of JGOFS for data submission, the PI must:

6.4 The Data Management Office

To implement what has been outlined here, we have established the Data Management Office (DMO) in conjunction with the U.S. JGOFS Planning and Implementation Office, at the Woods Hole Oceanographic Institution.

For the near term future, and throughout the life of the program, we hope to continue to rely on the support from NODC, which they have provided in making the services of George Heimerdinger available to U.S. JGOFS data management. We look to the NODC for support for George's successor, when he chooses to retire, in a like manner. It is our belief that the linkage is a crucial one, both for NODC and for the U.S. JGOFS data management effort.

6.5 Synthesis

As the US JGOFS program moves from the data gathering to data analysis phase, the emphasis of the program is expected to focus increasingly on synthesis of the accumulated knowledge, incorporation into models etc. Access and use of the data by both experimentalists and modelers alike will require a data management system which has developed to meet these needs. The Data Management Office expects to work closely with the community to ensure that the accumulated data sets are available in a manner which will facilitate the synthesis process.

6.6 Resources

Including the contribution from NODC via its support of a liaison officer, the cost of operating the Data Management Office is approximately $350K per year. Depending on the maintenance needs subsequent to the end of the data acquisition phase of US JGOFS (which could continue data generation through 2001), the total cost of data management is estimated to be up to $2.5M.