Phase 3 Upgrade: Summary of News and Changes

Date: 25.01.2016

The Phase 3 system supports the preparation, validation and ingestion of science data products for storage in the ESO science archive facility, and subsequent data publication to the scientific community.

The underlying software, which has been originally developed more than 5 years ago, is currently being upgraded in order to provide a number of essential features that were missing thus far and substantial improvements to Phase 3 operations both from the user’s and the operator’s perspective. For instance, to reduce the overall time to publication, shortly after having uploaded the data, before in-depth content validation takes place, the user gets notified in a timely fashion if mandatory FITS header keywords are missing.

This page summarizes the news and changes being introduced with the upcoming system upgrade in order to support users in their planning of the Phase3 release and data preparation.

In case of any questions please contact usd-help@eso.org, quoting “Phase 3” as subject.

Data Interface Changes

  • Scope of PRODCATG keyword extended. PRODCATG becomes mandatory for associated FITS files (while before it was only mandatory for SCIENCE files). Previously the category of associated FITS files was defined by the ASSOCi keyword within the referencing science file. Now, the category must be defined by PRODCATG within the associated file using the same value that was previously assigned to ASSOCi. ASSOCi becomes obsolete and should not be defined anymore for associated FITS files but only in case of associated files in other formats like PNG. This change affects the following data types:

        ANCILLARY.WEIGHTMAP
        ANCILLARY.VARMAP
        ANCILLARY.SPECTRUM
        ANCILLARY.IMAGE
        ANCILLARY.MASK
        ANCILLARY.RESMAP
        ANCILLARY.RMSMAP
        ANCILLARY.SNRMAP
        ANCILLARY.2DSPECTRUM
        ANCILLARY.MOSSPECTRA*

  • New syntax of CHANGES.USER information. Each line in the CHANGES.USER special file must follow the pattern
        <new_file> UPDATES <old_file>
    with <new_file> referring to a file in this batch submission, and <old_file> referring to a previously submitted file within the same data collection/survey (both in terms of the originally submitted filename, also known as ORIGFILE).

    The previous syntax using DELETE and REPLACE as tokens is not supported anymore.

  • Automatic tracking of data versions. If a new file is submitted with the same filename (ORIGFILE) as the previous version, then these two files are automatically recognized by the system as different data versions without a line in CHANGES.USER being needed for this file.

  • New header update mechanism for science files. To update the astrometric or photometric calibration of a previously released FITS image it is now possible to submit just the modified portion of the FITS header information instead of the entire FITS file, thereby drastically reducing the total amount of data to be transferred in such case.

  • PHASE3FILELIST extension. For catalogue deliveries using the tile-by-tile scheme, the data files (PRODCATG=SCIENCE.CATALOGTILE) must be logically associated to the catalogue meta file (PRODCATG=SCIENCE.MCATALOG) using the following scheme: MCATALOG includes the list of associated CATALOGTILEs encoded as a dedicated FITS binary table extension. The table stores the filename (ORIGFILE) of each CATALOGTILE, using one record per CATALOGTILE.

    Specifically required FITS header keywords (binary table extension):
        XTENSION= 'BINTABLE'
        BITPIX = 8
        NAXIS = 2
        NAXIS1 = %d
        NAXIS2 = %d
        PCOUNT = 0
        GCOUNT = 1
        TFIELDS = %d
        TTYPEi = 'ORIGFILE'
        TFORMi = 'nA '
        EXTNAME = 'PHASE3FILELIST'

    NAXIS2 equals the total number of CATALOGTILEs. The table must consist of one single ORIGFILE column at minimum. The table may contain more columns, having TTYPEj different from ORIGFILE, if the data provider intents to record further file parameters of interest.

  • Definition of catalogue data links. The keywords TXP3Ri and TXP3Ci, which were previously used to define the target catalogue of a data link between two catalogues, are being replaced by the new TXRGFi keyword. TXRGFi (type character string) should declare the filename of the target catalog in terms of the ORIGFILE name.

    Example:

        Previous data link definition:
        TXLNK3 = 'CATALOG ' / Data link type
        TXP3C3 = 'VMC_CAT ' / Target catalogue
        TXP3R3 = 3 / Release number
        TXCTY3 = 'SOURCEID' / Target catalogue's TTYPE

        New data link definition:
        TXLNK3 = 'CATALOG ' / Data link type
        TXRGF3 = 'vmc_er3_yjks_catMetaData.fits' / Target catalogue
        TXCTY3 = 'SOURCEID' / Target catalogue's TTYPE

    Note that catalogue data links point to the filename of the meta catalogue file (PRODCATG=SCIENCE.MCATALOG) in case of multi-tile catalogs.

  • Prefixed notation not supported. Encoding PROV in catalogue data using the prefixed notation becomes obsolete because ORIGFILE references are now also valid across Phase 3 batches1.

Other Requirements

  • MJD-OBS and MJD-END keywords are now mandatory for files of type PRODCATG= SCIENCE.CATALOGTILE.

  • PROCSOFT keyword are now mandatory for all science data product files (before it was recommended but not strictly required). PROCSOFT (string data type) indicates the reduction software system including its version number being used to produce this data product.

  • TEXPTIME keyword becomes mandatory for science data product files PRODCATG = SCIENCE.SPECTRUM and SCIENCE.SRCTBL. Note: TEXPTIME was already mandatory before for IMAGE, MEFIMAGE and CUBE.IFS.2

Data Release Structure and Submission Process

  • Deprecating data products. If some data, which were published in a previous release, turn out to be of sub-standard quality and should be removed from the current collection, then the corresponding files need to be listed explicitly by the Phase 3 data provider. The list of original filenames (ASCII format) can be uploaded then to the Phase 3 Release Manager.

  • Release modification type is now dropped. The release modification type (either ‘NEW’, ‘UPDATING’, or ‘SUPERSEDING’) becomes obsolete due to the management of file updates.

  • Multiple catalogues per batch. The previous limit of maximum one SCIENCE.CATALOG or SCIENCE.MCATALOG per release has been lifted. Now, any number of science catalogs are supported per batch (release).

    Hence, survey programmes producing multiple catalogues at the same time can include the description of all catalogues in one single PDF document instead of preparing one separate PDF per catalogue. Effectively, by using this scheme, there is less overhead involved in the preparation of data releases for surveys like VMC, VVV, UltraVISTA and PESSTO.

  • Native support for very large releases through multi-batch submission scheme. Very large data releases can be split into a sequence of smaller batches, which can be managed independently from the data release at large according to the data provider’s preferences.

  • Simplification of Phase 3 release management. Before you can submit reduced public survey data to ESO, you need to navigate to the Phase 3 release manager and request a new batch directory to be created. You do not need to care anymore about creating collection and release directories before starting your submission.

  • PDF document upload. The data release description (PDF format) need to be uploaded via the Phase 3 release manager web interface instead of using FTP as before.

  • Checking data compliance. It is now possible to transfer data to the Phase 3 ftp directory, in order to test if these data are compliant but without actually submitting these data for archival and publication, also known as test mode. Thus, the local validator (jar) becomes obsolete and is not distributed and supported anymore.

Notes:

1 http://www.eso.org/sci/observing/phase3/faq.html - prov_prefix

2 http://www.eso.org/sci/observing/phase3/faq.html - TEXPTIME