DataCite Schema

From The SBN Wiki
Revision as of 15:53, 18 January 2018 by Raugh (talk | contribs) (Safety Save)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Analysis and summary of the DataCite DOI database schema version 4.1 and the accompanying documentation., prepared with PDS data set applications in mind.

<resource>

ROOT

This is the root of the submission file document, and thus required.

Note that the content of the <resource> element is defined under an xs:all group, so that the immediate child nodes can appear in any order. The order here reflects the order in which they are listed in the schema.

<identifier>

REQUIRED

This is the assigned DOI identifier and must begin with the DataCite prefix "10." It must also have the XML attribue doiType present with a value of DOI.

     <identifier doiType="DOI">10.12345/abcde</identifier>

For submission documents, this field is nil.

<creators>

REQUIRED

This is the author list or equivalent. Creators should appear in priority order. Both individuals and institutions/organizations can be credited as creators.

<creator>

REQUIRED

This class contains the identifying information for one creator. It is repeated for each creator.

<creatorName>

REQUIRED

The string representing the name as it should appear in citations. For personal names, the format is "Family, Given" (all one string). For organizational names, use the formal name of the organization.

This tag has one optional XML attribute, nameType, which takes one of these values:

Organizational
Personal

Best practice is to always include this attribute for creatorName.

<givenName>

OPTIONAL

For personal names, this should contain the string corresponding to the given name in the <creatorName> element. If the creatorName includes only initials, this field may contain the full name(s) corresponding.

<familyName>

OPTIONAL

For personal names, this should contain the string corresponding to the family name (surname, patronymic, etc.).

Note: For PDS purposes, at least, we should consider where to associate suffixes (Jr., Sr., III, etc.). If DataCite, ADS, or the AAS journals have a convention, we should follow that.

<nameIdentifier>

OPTIONAL

This attribute provides a formal identifier for an individual or organization - for example, an author's ORCID, or an organizational DOI. Only public identifiers should appear here, of course. If there is more than one applicable identifier, this element may be repeated.

Required attribute: nameIdentifierScheme. The value should be the common name or acronym for the identifier given (like "ORCID").
Optional attribute: schemeURI. The value should be the URI of the defining organization or schema ("http://orcid.org", e.g.).

<affiliation>

OPTIONAL

This attribute provides an organizational affiliation (as free-format text) for the creator. It may be repeated.

<titles>

REQUIRED

This class lists names or titles for the resource being identified. At least one title must be provided. This is the title that will be used to format citations.

<title>

REQUIRED

This element contains a single title. It may be repeated for alternate titles where appropriate.

Optional attribute: titleType. This must have one of the following values:
  • AlternativeTitle
  • Subtitle
  • TranslatedTitle
  • Other

Note: For PDS purposes, the formal title should always be listed first, and additional titles should always be either alternatives or translations and identified accordingly through the titleType attribute.

Optional attribute: xml:lang. This should contain one of the standard ISO 2- or 3-letter codes (but this is not validated). Note that this indicates only the language of the associated title string, not the language of the resource.

PDS should require this attribute whenever a title is designated as a "TranslatedTitle".

<publisher>

REQUIRED

This attribute identifies the publisher/distribute/curator of the resource. It is used in creating citations.

For PDS data sets, this should always be "NASA Planetary Data System".

<publicationYear>

REQUIRED

The year the resource was made available to the public. This is used in creating citations.

For PDS data sets, this must be the four-digit year in which the data were publicly posted in the format and version associated with this DOI. There are other dates fields in which significant dates (like the data collection period) can be indicated.

<resourceType>

REQUIRED

This element takes a free-format text description of the type of resource associated with the DOI.

Required attribute: resourceTypeGeneral. This must have one of the following values:
  • Audiovisual
  • Collection
  • DataPaper
  • Dataset
  • Event
  • Image
  • InteractiveResource
  • Model
  • PhysicalObject
  • Service
  • Software
  • Sound
  • Text
  • Workflow
  • Other

Best practice generally is to consider the resourceTypeGeneral as the broader term which is then modified by the value string, so that a classification can be formed by concatenating the two with '/'. So, for example:

    <resourceType resourceTypeGeneral="Dataset">PDS4 Data Collection</resourceType>

would read as "Dataset/PDS4 Data Collection".

Best practice for "Text", specifically, is for the value to be taken from the CASRAI dictionary "Output Types" Sub-Element list at http://dictionary.casrai.org/Output_Types.

Note: PDS needs to consider how to use the values of resourceTypeGeneral with respect to individual data products, collections, and bundles. The CASRAI dictionary does not have a similar breakdown for data sets, so PDS should also develop (and enforce) a controlled value list for the content of this element at least for its own purposes, to facilitate searching and metrics generation. ADS and DataCite should be consulted.

<subjects>

OPTIONAL