Difference between revisions of "DataCite Schema"

From The SBN Wiki
Jump to navigation Jump to search
(Safety Save)
(Safety Save)
Line 163: Line 163:
  
 
''OPTIONAL''
 
''OPTIONAL''
 +
 +
This element lists keyword-type classifications as are commonly associated with journal articles.
 +
 +
 +
=== <subject> ===
 +
 +
''OPTIONAL''
 +
 +
This element provides a string that corresponds to a keyword or similar classifier for the resource.  It may be repeated as desired.  Each occurrence should contain only a single taxonomic-type entry, and the taxonomy should be indicated via the optional attributes as far as possible.
 +
 +
:Optional attribute: ''subjectScheme''. This should be the name of the taxonomy or authority.  There is no controlled value list.
 +
:Optional attribute: ''schemeURI''. This should be a reference to the taxonomy definition or reference site.
 +
:Optional attribute: ''valueURI''. If there is a URL, for example, for the definition of the specific term being used, include it here.
 +
:Optional attribute: ''xml:lang''. Use this attribute to provide the standard ISO abbreviation for the language of the term.
 +
 +
{| class="wikitable" style="background-color: yellow"
 +
|
 +
PDS really should find a reference taxonomy to use specifically for this field and the corresponding label fields.  There are one or two viable candidates in community use, and it would probably not benefit anyone for us to try create a new one.
 +
|}
 +
 +
Note that for hierarchical taxonomies, a single instance of ''&lt;subject&gt;'' should express the entire hierarchy as a single string in the appropriate notation - there is no implied relationship between ''&lt;subject&gt;'' elements.
 +
 +
== <contributors> ==
 +
 +
''OPTIONAL''
 +
 +
This element provides a means for identifying people and organizations, other than the previously identified ''&lt;creator&gt;'', who contributed to the creation, management, curation, distribution, etc., of the resource being described.
 +
 +
=== <contributor> ===
 +
 +
''OPTIONAL''
 +
 +
This element identifies a person or organization who made or makes some contribution to the resource.  There is a required attribute to define the type of contribution.  The element may be repeated as needed.
 +
 +
:Required attribute: ''contributorType''.  This must have one of the following values:
 +
::* ContactPerson
 +
::* DataCollector
 +
::* DataCurator
 +
::* DataManager
 +
::* Distributor
 +
::* Editor
 +
::* HostingInstitution
 +
::* Producer
 +
::* ProjectLeader
 +
:: *ProjectManager
 +
::* ProjectMember
 +
::* RegistrationAgency
 +
::* RegistrationAuthority
 +
::* RelatedPerson
 +
::* Researcher
 +
::* ResearchGroup
 +
::* RightsHolder
 +
::* Sponsor
 +
::* Supervisor
 +
::* WorkPackageLeader
 +
::* Other
 +
 +
These are all defined in the appendix to the DataCite schema description document.
 +
 +
{| class="wikitable" style="background-color: yellow"
 +
|
 +
PDS needs to define its own usage and interpretation of these terms for uniform application across nodes.  This should be considered fairly urgent, so we can incorporate this from the beginning in our DOI database info.
 +
|}
 +
 +
==== <contributorName> ====
 +
 +
''REQUIRED''
 +
 +
The name of a single person or organization contributing.  As in the case of ''&lt;creator&gt;'', this should be in the format "''Family, Given''" for personal names, and the formal name for organizations.
 +
 +
:Optional attribute: ''nameType''.  It must have one of these two values:
 +
::* Personal
 +
::* Organizational
 +
 +
Best practice is to use the optional attribute.
 +
 +
==== <givenName> ====
 +
 +
''OPTIONAL''
 +
 +
The given name of a personal name, analogous to the same field for ''&lt;creatorName&gt;''.
 +
 +
==== <familyName> ====
 +
 +
''OPTIONAL''
 +
 +
The surname or patronymic of a personal name, analogous to the same field for ''&lt;creatorName&gt;''.
 +
 +
==== <nameIdentifier> ====
 +
 +
''OPTIONAL''
 +
 +
A formal identifier for a person or organization, such as a personal ORCID or an organizational DOI.  It may be repeated if there is more than one applicable identifier.
 +
 +
:Required attribute: ''nameIdentifierScheme''.  This is the type of the identifier ("ORCID" or "DOI", e.g.).
 +
:Optional attribute: ''schemeURI''.  This is a URI reference to the identifier definition or defining organization.
 +
 +
==== <affiliation> ====
 +
 +
''OPTIONAL''
 +
 +
This element contains the name of an organization or institution with which the named contributor is affiliated. It is a free-format text field.  It should be repeated for each unique affiliation when there is more than one.
 +
 +
 +
== <dates> ==
 +
 +
''OPTIONAL''
 +
 +
This element provides a way to include various significant dates in the DOI database record.
 +
 +
=== <date> ===
 +
 +
''OPTIONAL''
 +
 +
One significant date for the resource.  Dates should be in ISO 8601 format and can be to any precision (but this is not schematically enforced).  This element may be repeated as needed for each date.
 +
 +
:Required attribute: ''dateType''. This indicates the significance of the date and must be one of the following values:
 +
 +
::* Accepted
 +
::* Available
 +
::* Collected
 +
::* Copyrighted
 +
::* Created
 +
::* Issued
 +
::* Other
 +
::* Submitted
 +
::* Updated
 +
::* Valid
 +
 +
These are defined in the DOI Schema description document.
 +
 +
:Optional attribute: ''dateInformation''.  This should be a ''very'' brief clarification of the ''dateType'', where necessary.
 +
 +
{| class="wikitable" style="background-color: yellow"
 +
|
 +
PDS needs to consider carefully how to use the ''dateType'' values and possible ''dateInformation'', especially in the context of the larger DOI and ADS databases.  It's particularly important to get these right.
 +
|}
 +
 +
== <language> ==
 +
 +
''OPTIONAL''
 +
 +
The natural language of the resource.  This is defined as being of type ''xs:language'', which provides syntax validation but does not actually fully enforce that values come from the "IETF BCP 47, ISO 639-1 language code," as specified in the description. 
 +
 +
== <alternateIdentifiers> ==
 +
 +
''OPTIONAL''
 +
 +
This element lists alternate identifiers for the same instance of the resource (as opposed to physically distinct, duplicate copies with their own identifiers).  The identifiers should be unique and controlled within some context which should be specified.
 +
 +
 +
{| class="wikitable" style="background-color: yellow"
 +
|
 +
From the way this is described, it sounds like the PDS4 LIDVID would be an "alternate identifier".  But typically we would want the LIDVID in a citation, so that needs some consideration.  PDS should also probably define standard values for ''alternateIdentifierType'' for PDS identifiers.
 +
|}
 +
 +
=== <alternateIdentifier> ===
 +
 +
''OPTIONAL''
 +
 +
This element provides one instance of an alternate identifier for the resource.  It may be repeated as desired.
 +
 +
:Required attribute: ''alternateIdentifierType''.  This string must describe the source or context of the identifier.
 +
 +
== <relatedIdentifiers> ==
 +
 +
''OPTIONAL''
 +
 +
This element lists identifiers for other resources related to this resource in some specific way.  Note that it has half a dozen attributes, only two of which are required, to help in defining the relationship.  Standard values are defined in the DataCite schema documentation.
 +
 +
:Required attribute: ''relatedIdentifierType''.  The value must come from the following list:
 +
::* ARKarXiv
 +
::* bibcode
 +
::* DOI
 +
::* EAN13
 +
::* EISSN
 +
::* Handle
 +
::* IGSN
 +
::* ISBN
 +
::* ISSN
 +
::* ISTC
 +
::* LISSN
 +
::* LDIS
 +
::* PMID
 +
::* PURL
 +
::* UPC
 +
::* URL
 +
::* URN
 +
 +
:Required attribute: ''relationType''.  The value must come from the following list:
 +
::* IsCitedBy
 +
::* Cites
 +
::* IsSupplementTo
 +
::* IsSupplementedBy
 +
::* IsContinutedBy
 +
::* Continues
 +
::* IsNewVersionOf
 +
::* IsPreviousVersionOf
 +
::* IsPartOf
 +
::* HasPart
 +
::* IsReferencedBy
 +
::* Referencecs
 +
::* IsDocumentedBy
 +
::* Documents
 +
::* IsCompiledBy
 +
::* Compiles
 +
::* IsVariantFormOf
 +
::* IsOriginalFormOf
 +
::* IsIdenticalTo
 +
::* HasMetadata
 +
::* IsMetadataFor
 +
::* Reviews
 +
::* IsReviewedBy
 +
::* IsDerivedFrom
 +
::* IsSourceOf
 +
::* Describes
 +
::* IsDescribedBy
 +
::* HasVersion
 +
::* IsVersionOf
 +
::* Requires
 +
::* IsRequiredby
 +
 +
{| class="wikitable" style="background-color: yellow"
 +
|
 +
Once again, PDS needs to give some institutional thought to how to use these both in initial submissions, and how to keep them updated when related products are tagged with DOIs.
 +
|}
 +
 +
:Optional attribute: ''resourceTypeGeneral''.  This attribute is identical to the one of the same name in ''&lt;resourceType&gt;'', above.
 +
 +
:These optional attributes should only be used when the value of the ''relationType'' attribute is either '''IsMetadataFor''' or '''HasMetadata''' (this is not validated):
 +
 +
::Optional attribute: ''relatedMetadataScheme''. This indicates the ID or name of a metadata definition standard.
 +
::Optional attribute: ''schemeURI''. This should be the URI of the named metadata standard.
 +
::Optional attribute: ''schemeType''. The DataCite definition is not clear, but this looks like a specific file format type for the referenced metadata standard (such as "XSD").
 +
 +
== <sizes> ==
 +
 +
''OPTIONAL''
 +
 +
This element provides unstructured size information.  In other words, it is not required to be numeric and there are no syntax constraints on the content.
 +
 +
=== <size> ===
 +
 +
''OPTIONAL''
 +
 +
A single size specification string, like "18GB" or "Three volumes".  This element may be repeated as needed or desired.
 +
 +
{| class="wikitable" style="background-color: yellow"
 +
|
 +
PDS should consider making systematic use of this field.  Might be helpful to users coming at the data from the publication side.
 +
|}
 +
 +
== <formats> ==
 +
 +
''OPTIONAL''
 +
 +
This class indicates the physical/digital format(s) of the resource.
 +
 +
=== <format> ===
 +
 +
''OPTIONAL''
 +
 +
This element contains a text description of the format.  It is not constrained.
 +
 +
Best practice is to use a file extension or MIME type string as the value.
 +
 +
{| class="wikitable" style="background-color: yellow"
 +
|
 +
PDS should be more formal about the content here.  We also need to think about possible mixed-format products, like single documents that comprise multiple files, some of which are text and some images/graphics, and the best way to describe ''format'' for collections (if at all).
 +
|}
 +
 +
== <version> ==
 +
 +
''OPTIONAL''
 +
 +
A version number associated with the resource.
 +
 +
Best practice is to obtain a new DOI for a major version change.
 +
 +
{| class="wikitable" style="background-color: yellow"
 +
|
 +
Because of the traceability and reproducability concerns involved in research data, PDS should almost certainly forbid the use of this element.  That is, is should be a PDS requirement that new versions of PDS products with DOIs should get their own DOIs.  The ''&lt;relatedIdentifiers&gt;'' element can be used to link the two versions in the DOI database.
 +
|}

Revision as of 19:53, 18 January 2018

Analysis and summary of the DataCite DOI database schema version 4.1 and the accompanying documentation., prepared with PDS data set applications in mind.

<resource>

ROOT

This is the root of the submission file document, and thus required.

Note that the content of the <resource> element is defined under an xs:all group, so that the immediate child nodes can appear in any order. The order here reflects the order in which they are listed in the schema.

<identifier>

REQUIRED

This is the assigned DOI identifier and must begin with the DataCite prefix "10." It must also have the XML attribue doiType present with a value of DOI.

     <identifier doiType="DOI">10.12345/abcde</identifier>

For submission documents, this field is nil.

<creators>

REQUIRED

This is the author list or equivalent. Creators should appear in priority order. Both individuals and institutions/organizations can be credited as creators.

<creator>

REQUIRED

This class contains the identifying information for one creator. It is repeated for each creator.

<creatorName>

REQUIRED

The string representing the name as it should appear in citations. For personal names, the format is "Family, Given" (all one string). For organizational names, use the formal name of the organization.

This tag has one optional XML attribute, nameType, which takes one of these values:

Organizational
Personal

Best practice is to always include this attribute for creatorName.

<givenName>

OPTIONAL

For personal names, this should contain the string corresponding to the given name in the <creatorName> element. If the creatorName includes only initials, this field may contain the full name(s) corresponding.

<familyName>

OPTIONAL

For personal names, this should contain the string corresponding to the family name (surname, patronymic, etc.).

Note: For PDS purposes, at least, we should consider where to associate suffixes (Jr., Sr., III, etc.). If DataCite, ADS, or the AAS journals have a convention, we should follow that.

<nameIdentifier>

OPTIONAL

This attribute provides a formal identifier for an individual or organization - for example, an author's ORCID, or an organizational DOI. Only public identifiers should appear here, of course. If there is more than one applicable identifier, this element may be repeated.

Required attribute: nameIdentifierScheme. The value should be the common name or acronym for the identifier given (like "ORCID").
Optional attribute: schemeURI. The value should be the URI of the defining organization or schema ("http://orcid.org", e.g.).

<affiliation>

OPTIONAL

This attribute provides an organizational affiliation (as free-format text) for the creator. It may be repeated.

<titles>

REQUIRED

This class lists names or titles for the resource being identified. At least one title must be provided. This is the title that will be used to format citations.

<title>

REQUIRED

This element contains a single title. It may be repeated for alternate titles where appropriate.

Optional attribute: titleType. This must have one of the following values:
  • AlternativeTitle
  • Subtitle
  • TranslatedTitle
  • Other

Note: For PDS purposes, the formal title should always be listed first, and additional titles should always be either alternatives or translations and identified accordingly through the titleType attribute.

Optional attribute: xml:lang. This should contain one of the standard ISO 2- or 3-letter codes (but this is not validated). Note that this indicates only the language of the associated title string, not the language of the resource.

PDS should require this attribute whenever a title is designated as a "TranslatedTitle".

<publisher>

REQUIRED

This attribute identifies the publisher/distribute/curator of the resource. It is used in creating citations.

For PDS data sets, this should always be "NASA Planetary Data System".

<publicationYear>

REQUIRED

The year the resource was made available to the public. This is used in creating citations.

For PDS data sets, this must be the four-digit year in which the data were publicly posted in the format and version associated with this DOI. There are other dates fields in which significant dates (like the data collection period) can be indicated.

<resourceType>

REQUIRED

This element takes a free-format text description of the type of resource associated with the DOI.

Required attribute: resourceTypeGeneral. This must have one of the following values:
  • Audiovisual
  • Collection
  • DataPaper
  • Dataset
  • Event
  • Image
  • InteractiveResource
  • Model
  • PhysicalObject
  • Service
  • Software
  • Sound
  • Text
  • Workflow
  • Other

Best practice generally is to consider the resourceTypeGeneral as the broader term which is then modified by the value string, so that a classification can be formed by concatenating the two with '/'. So, for example:

    <resourceType resourceTypeGeneral="Dataset">PDS4 Data Collection</resourceType>

would read as "Dataset/PDS4 Data Collection".

Best practice for "Text", specifically, is for the value to be taken from the CASRAI dictionary "Output Types" Sub-Element list at http://dictionary.casrai.org/Output_Types.

Note: PDS needs to consider how to use the values of resourceTypeGeneral with respect to individual data products, collections, and bundles. Looks like DataCite would like to be consistent with Dublin Core usage in this, and that would make sense for us as well - but decisions here will have consequences elsewhere in the database. Consistency across PDS would be highly desirable here.

The CASRAI dictionary does not have a similar breakdown for data sets, so PDS should also develop (and enforce) a controlled value list for the content of this element at least for its own purposes, to facilitate searching and metrics generation. ADS and DataCite should be consulted.

<subjects>

OPTIONAL

This element lists keyword-type classifications as are commonly associated with journal articles.


<subject>

OPTIONAL

This element provides a string that corresponds to a keyword or similar classifier for the resource. It may be repeated as desired. Each occurrence should contain only a single taxonomic-type entry, and the taxonomy should be indicated via the optional attributes as far as possible.

Optional attribute: subjectScheme. This should be the name of the taxonomy or authority. There is no controlled value list.
Optional attribute: schemeURI. This should be a reference to the taxonomy definition or reference site.
Optional attribute: valueURI. If there is a URL, for example, for the definition of the specific term being used, include it here.
Optional attribute: xml:lang. Use this attribute to provide the standard ISO abbreviation for the language of the term.

PDS really should find a reference taxonomy to use specifically for this field and the corresponding label fields. There are one or two viable candidates in community use, and it would probably not benefit anyone for us to try create a new one.

Note that for hierarchical taxonomies, a single instance of <subject> should express the entire hierarchy as a single string in the appropriate notation - there is no implied relationship between <subject> elements.

<contributors>

OPTIONAL

This element provides a means for identifying people and organizations, other than the previously identified <creator>, who contributed to the creation, management, curation, distribution, etc., of the resource being described.

<contributor>

OPTIONAL

This element identifies a person or organization who made or makes some contribution to the resource. There is a required attribute to define the type of contribution. The element may be repeated as needed.

Required attribute: contributorType. This must have one of the following values:
  • ContactPerson
  • DataCollector
  • DataCurator
  • DataManager
  • Distributor
  • Editor
  • HostingInstitution
  • Producer
  • ProjectLeader
*ProjectManager
  • ProjectMember
  • RegistrationAgency
  • RegistrationAuthority
  • RelatedPerson
  • Researcher
  • ResearchGroup
  • RightsHolder
  • Sponsor
  • Supervisor
  • WorkPackageLeader
  • Other

These are all defined in the appendix to the DataCite schema description document.

PDS needs to define its own usage and interpretation of these terms for uniform application across nodes. This should be considered fairly urgent, so we can incorporate this from the beginning in our DOI database info.

<contributorName>

REQUIRED

The name of a single person or organization contributing. As in the case of <creator>, this should be in the format "Family, Given" for personal names, and the formal name for organizations.

Optional attribute: nameType. It must have one of these two values:
  • Personal
  • Organizational

Best practice is to use the optional attribute.

<givenName>

OPTIONAL

The given name of a personal name, analogous to the same field for <creatorName>.

<familyName>

OPTIONAL

The surname or patronymic of a personal name, analogous to the same field for <creatorName>.

<nameIdentifier>

OPTIONAL

A formal identifier for a person or organization, such as a personal ORCID or an organizational DOI. It may be repeated if there is more than one applicable identifier.

Required attribute: nameIdentifierScheme. This is the type of the identifier ("ORCID" or "DOI", e.g.).
Optional attribute: schemeURI. This is a URI reference to the identifier definition or defining organization.

<affiliation>

OPTIONAL

This element contains the name of an organization or institution with which the named contributor is affiliated. It is a free-format text field. It should be repeated for each unique affiliation when there is more than one.


<dates>

OPTIONAL

This element provides a way to include various significant dates in the DOI database record.

<date>

OPTIONAL

One significant date for the resource. Dates should be in ISO 8601 format and can be to any precision (but this is not schematically enforced). This element may be repeated as needed for each date.

Required attribute: dateType. This indicates the significance of the date and must be one of the following values:
  • Accepted
  • Available
  • Collected
  • Copyrighted
  • Created
  • Issued
  • Other
  • Submitted
  • Updated
  • Valid

These are defined in the DOI Schema description document.

Optional attribute: dateInformation. This should be a very brief clarification of the dateType, where necessary.

PDS needs to consider carefully how to use the dateType values and possible dateInformation, especially in the context of the larger DOI and ADS databases. It's particularly important to get these right.

<language>

OPTIONAL

The natural language of the resource. This is defined as being of type xs:language, which provides syntax validation but does not actually fully enforce that values come from the "IETF BCP 47, ISO 639-1 language code," as specified in the description.

<alternateIdentifiers>

OPTIONAL

This element lists alternate identifiers for the same instance of the resource (as opposed to physically distinct, duplicate copies with their own identifiers). The identifiers should be unique and controlled within some context which should be specified.


From the way this is described, it sounds like the PDS4 LIDVID would be an "alternate identifier". But typically we would want the LIDVID in a citation, so that needs some consideration. PDS should also probably define standard values for alternateIdentifierType for PDS identifiers.

<alternateIdentifier>

OPTIONAL

This element provides one instance of an alternate identifier for the resource. It may be repeated as desired.

Required attribute: alternateIdentifierType. This string must describe the source or context of the identifier.

<relatedIdentifiers>

OPTIONAL

This element lists identifiers for other resources related to this resource in some specific way. Note that it has half a dozen attributes, only two of which are required, to help in defining the relationship. Standard values are defined in the DataCite schema documentation.

Required attribute: relatedIdentifierType. The value must come from the following list:
  • ARKarXiv
  • bibcode
  • DOI
  • EAN13
  • EISSN
  • Handle
  • IGSN
  • ISBN
  • ISSN
  • ISTC
  • LISSN
  • LDIS
  • PMID
  • PURL
  • UPC
  • URL
  • URN
Required attribute: relationType. The value must come from the following list:
  • IsCitedBy
  • Cites
  • IsSupplementTo
  • IsSupplementedBy
  • IsContinutedBy
  • Continues
  • IsNewVersionOf
  • IsPreviousVersionOf
  • IsPartOf
  • HasPart
  • IsReferencedBy
  • Referencecs
  • IsDocumentedBy
  • Documents
  • IsCompiledBy
  • Compiles
  • IsVariantFormOf
  • IsOriginalFormOf
  • IsIdenticalTo
  • HasMetadata
  • IsMetadataFor
  • Reviews
  • IsReviewedBy
  • IsDerivedFrom
  • IsSourceOf
  • Describes
  • IsDescribedBy
  • HasVersion
  • IsVersionOf
  • Requires
  • IsRequiredby

Once again, PDS needs to give some institutional thought to how to use these both in initial submissions, and how to keep them updated when related products are tagged with DOIs.

Optional attribute: resourceTypeGeneral. This attribute is identical to the one of the same name in <resourceType>, above.
These optional attributes should only be used when the value of the relationType attribute is either IsMetadataFor or HasMetadata (this is not validated):
Optional attribute: relatedMetadataScheme. This indicates the ID or name of a metadata definition standard.
Optional attribute: schemeURI. This should be the URI of the named metadata standard.
Optional attribute: schemeType. The DataCite definition is not clear, but this looks like a specific file format type for the referenced metadata standard (such as "XSD").

<sizes>

OPTIONAL

This element provides unstructured size information. In other words, it is not required to be numeric and there are no syntax constraints on the content.

<size>

OPTIONAL

A single size specification string, like "18GB" or "Three volumes". This element may be repeated as needed or desired.

PDS should consider making systematic use of this field. Might be helpful to users coming at the data from the publication side.

<formats>

OPTIONAL

This class indicates the physical/digital format(s) of the resource.

<format>

OPTIONAL

This element contains a text description of the format. It is not constrained.

Best practice is to use a file extension or MIME type string as the value.

PDS should be more formal about the content here. We also need to think about possible mixed-format products, like single documents that comprise multiple files, some of which are text and some images/graphics, and the best way to describe format for collections (if at all).

<version>

OPTIONAL

A version number associated with the resource.

Best practice is to obtain a new DOI for a major version change.

Because of the traceability and reproducability concerns involved in research data, PDS should almost certainly forbid the use of this element. That is, is should be a PDS requirement that new versions of PDS products with DOIs should get their own DOIs. The <relatedIdentifiers> element can be used to link the two versions in the DOI database.