STB Save the Bits Archive Flowchart


Nighttime Data Archiving at NOAO

An observation is made
Any data acquisition software that can execute a postprocessing command after each exposure or sequence of exposures can trigger an archive transaction with the resulting data. Any computer that can access a remote unix printer queue can archive data.

A unique observation identifier composed of the telescope name and the UT date and time is added to each image header.

Front end FITS queue
The data acquisition system supplies the name of any new image (or images) to a front end queue that converts the data to FITS. The impact on observing is negligible since only the image name is transferred by the observer's software - a system level queue handles the data after this point.

FITS formats other than images may also be archived, for instance, binary or ascii tables.

Transfer the FITS data to the central archive computer
The output of the front end queue at each telescope is the input of the network archive queue. The FITS data are transferred to a central archive computer and queued for processing.

If the network or archive computer goes down, the data remain in the queue at the telescope until the network or computer recovers.

Translate FITS images into FITS image extensions
The incoming FITS images are translated into FITS image extensions. Other FITS extensions are passed through unchanged. An archive sequence identifier is appended to each header to provide indexing between the image header catalog and the actual archived data.

The checksum of each extension file is written into its FITS header.

Tape large FITS image extension files
Every 50 Mb of translated FITS images are written to tape as a single FITS extension file. This is more efficient, both to write and later to search and read, than writing each individual image to tape.

The archive can be configured to write any number of simultaneous tape copies. At least two copies are recommended to safeguard the data.

Update the archive catalog and index
After each large archive file is successfully taped, the header catalog is updated. A separate index file cross-references the catalog and the individual tape, file, and image.

Daily, the catalog and index are automatically copied from the mountain to NOAO's Tucson headquarters.

Verify each full tape
As each duplicate pair of tapes fills up, the archive takes the drives offline and verifies the file checksums. Meanwhile, taping is shifted to a second pair of drives to keep the archive on line. Four tape drives (two pairs) provide this redundancy, but a single drive is sufficient to run the archive.

Swap the verified tapes
Only verified tapes are swapped out of the archive drives in normal operation. A mountain technician keeps track of the tapes as they fill using a monitor program. A simple swap command leads the staff through removing and labeling the filled tapes and mounting new ones.

Ship the tapes to a remote data center
Tapes are regularly transported from the mountain to the downtown data center. Duplicate copies of the tapes can be stored in physically separate locations to protect the data.

The separation of the data and the catalog from the index that cross-references them allows future recasting of either the data or the catalog without affecting the other.

Access the archive database
A simple program unpacks the large FITS extension files into individual FITS images. Given a list of index entries, the archive staff can recover the appropriate images using an IRAF script which requests each needed tape in turn. The desired images are written into a directory where they may be retrieved directly over the network, or written to a distribution tape.

Any image in the archive is accessible in less than 10 minutes, depending on its location on a particular tape.

Our online database, currently being designed, will be loaded directly over the network either daily or as the archive catalog is updated. No manual handling of the taped data will be needed. Rob Seaman