Digital object identifier

From Wikipedia, the free encyclopedia

Jump to: navigation, search

A digital object identifier is a character string used to uniquely identify an electronic document or other entity. The DOI for a document remains fixed over the lifetime of the document, unlike URLs which can change when a publisher of online content changes its web server's file structure, and the DOI System provides a mechanism for locating an up-to-date URL for a document from its DOI, and for associating other forms of metadata with an object; thus, naming a document by its DOI provides a more stable mechanism than URLs for linking to online content.[1]

The DOI System is implemented through a federation of DOI Registration Agencies coordinated by the International DOI Foundation,[2] which developed and controls the system. The DOI System has been developed and implemented in a range of publishing applications since 2000; by late 2009 approximately 43 million DOI names had been assigned by some 4,000 organisations.[3]

Contents

[edit] DOI names

A DOI name takes the form of a character string divided into two parts: a prefix and a suffix. The prefix identifies the registrant of the name, and the suffix is chosen by the registrant and identifies the specific object associated with that DOI. Most legal Unicode characters are allowed in these strings, which are interpreted in a case-insensitive manner.

For example, in the DOI name 10.1000/182, the prefix is 10.1000 and the suffix is 182. All DOI names start with "10.", and the characters 1000 in the prefix identify the registrant; in this case the registrant is the International DOI Foundation itself. 182 is the suffix, or item ID, identifying a single object (in this case, the latest version of the DOI Handbook). Citations using DOI names should be printed as doi:10.1000/182. When the citation is a hypertext link, it is recommended to embed the link as a URL by concatenating "http://dx.doi.org/" to the DOI name, omitting its "doi:" prefix; e.g., the DOI name doi:10.1000/182 is linked as http://dx.doi.org/10.1000/182. This URL provides the location of an HTTP proxy server which will redirect web accesses to the correct online location of the linked item.

DOI names can identify pieces of intellectual property (such as texts, images, audio or video items, and software) in both electronic and physical forms, performances, and abstract works such as licenses, parties to a transaction, etc. They can be applied to objects at varying levels of granularity: DOI names can identify a journal, an individual issue of a journal, an individual article in the journal, or a single table in that article. The choice of granularity is left to the assigner, but in the DOI System it must be declared as part of the metadata that is associated to a DOI name, using a data dictionary based on the indecs Content Model.

[edit] Applications

Major applications of the DOI System currently include:

  • persistent citations in scholarly materials (journal articles, books, etc.) through CrossRef, a consortium of around 3,000 publishers;
  • scientific data sets, through DataCite, a consortium of leading research libraries, technical information providers, and scientific data centers;
  • European Union official publications, through the EU publications office.

An illustration of an application making good use of DOI System functionality is OECD's publication service SourceOECD: each table or graph in an OECD publication containing a DOI name leads to an Excel file of data underlying the tables and graphs. Further development of such services is planned.[4]

A multilingual European DOI RA activity, mEDRA, and a Chinese RA, Wanfang Data, are active in non-English language markets. Expansion to other sectors is planned by the International DOI Foundation.

[edit] Features and benefits

DOI names were developed with the key intended benefits of:

  • Persistent identification: each DOI name unequivocally and permanently identifies the object to which it is associated
  • Network actionability: each DOI name resolves to one or more web pages or other data assigned by the publisher
  • Semantic interoperability: metadata can be provided which allows unambiguous communication to any user, from any place, at any point of a distribution chain, with relevant pieces of information about the identified objects and their relationships.

The DOI System uses two underlying technologies plus a social infrastructure to achieve this. The technical infrastructure inherits the features and capabilities of the two underlying technologies: the Handle System and the indecs content model.

The Handle System ensures that the DOI name:

  • is not based on any changeable attributes of the entity (location, ownership, or any other attribute that may change without changing the referent's identity);
  • is opaque (preferably a "dumb number": a well known pattern invites assumptions that may be misleading, and meaningful semantics may not translate across languages and may cause trademark conflicts);
  • is unique within the system (to avoid collisions and referential uncertainty);
  • has optional, but nice to have, features that should be supported (human-readable, cut-and-paste-able, embeddable; fits common systems, e.g., URI specification).

And that the DOI name's resolution mechanism:

  • is reliable (using redundancy, no single points of failure, and fast enough to not appear broken);
  • is scalable (higher loads simply managed with more computers);
  • is flexible (can adapt to changing computing environments; useful to new applications);
  • is trusted (both resolution and administration have technical trust methods; an operating organization is committed to the long term);
  • builds on open architecture (encouraging the leverage efforts of a community in building applications on the infrastructure);
  • is transparent (users need not know the infrastructure details).

The Handle System's ability to provide administrative granularity, multiple resolution, and data typing were key to its selection for the DOI System. The Handle System is part of a Digital Object Architecture which relates to digital objects in a computer science sense, as an identifiable item of structured information in digital form within a network-based computer environment. Any object in the more general sense (the ontology sense, the word "thing") may be represented as a digital object, so there is no inconsistency in this use in the DOI System.

The indecs Content Model is the basis of the DOI System's approach to assigning metadata to define a referent and its relationships. This approach places importance on:

  • unique identification;
  • functional granularity;
  • appropriate access;
  • designated authority; and
  • independence of specific business model or legal framework.

The International DOI Foundation (IDF) oversees the integration of these technologies and operation of the system through a technical and social infrastructure. The social infrastructure of a federation of independent registration agencies offering DOI services was modelled on existing successful federated deployments of identifiers such as GS1 and ISBN.

DOI names may be used with other appropriate technology to provide added services, e.g., the OpenURL for context sensitive linking. The DOI directory is OpenURL-enabled so it can recognize a user with access to an OpenURL link resolver. Hence, on resolving, metadata can be pulled from the DOI agency CrossRef to create an OpenURL targeting the current local link resolver. Such an OpenURL link that contains a DOI name is persistent; publishers who use the CrossRef DOI System to identify their content make their products OpenURL-aware.

[edit] Comparison with other identifier schemes

A DOI name differs from commonly used Internet pointers to material, such as the URL, in that it identifies an object as a first-class entity, not simply the place where the object is located. It implements the URI (URN) concept and adds to it a data model and social infrastructure .[5]

A DOI name also differs from standard identifier registries such as the ISBN, ISRC, etc. The purpose of an identifier registry is to manage a given collection of identifiers; whereas the primary purpose of the DOI system is to make a collection of identifiers actionable and interoperable, where that collection can include identifiers from many other controlled collections.[6]

The DOI System offers persistent, semantically interoperable resolution to related current data, and is best suited to material that will be used in services outside the direct control of the issuing assigner (e.g., public citation, or managing content of value). It uses a managed registry (providing social and technical infrastructure). It does not assume any specific business model for the provision of identifiers or services, and enables other existing services to link to it in defined ways. Several approaches for making identifiers persistent have been proposed. The comparison of persistent identifier approaches is difficult because they are not all doing the same thing. Imprecisely referring to a set of schemes as "identifiers" doesn't mean that they can be compared easily. Other "identifier systems" may be enabling technologies with low barriers to entry, providing an easy to use labeling mechanism that allows anyone to set up a new instance (examples include PURL, URLs, GUIDs, etc.), but that may lack some of the functionality of a registry-controlled scheme and that will usually lack accompanying metadata in a controlled scheme. The DOI System does not have this approach and should not be compared directly to such identifier schemes. Various applications using such enabling technologies with added features have been devised that meet some of the features offered by the DOI System for specific sectors (e.g., ARK).

A DOI name does not depend on the object's location and, in this way, is similar to a Uniform Resource Name (URN) or Persistent Uniform Resource Locator (PURL) but differs from an ordinary Uniform Resource Locator (URL). URLs are often used as substitute identifiers for documents on the Internet (better characterised as URIs) although the same document at two different locations has two URLs. By contrast, persistent identifiers such as DOI names identify objects as first class entities: two instances of the same object would have the same DOI name.

[edit] Resolution

DOI name resolution is provided through the Handle System, developed by Corporation for National Research Initiatives, and is freely available to any user encountering a DOI name. Resolution redirects the user from a DOI name to one or more pieces of typed data: URLs representing instances of the object, services such as e-mail, or one or more items of metadata. To the Handle System, a DOI name is a handle, and so has a set of values assigned to it and may be thought of as a record that consists of a group of fields. Each handle value must have a data type specified in its "<type>" field, that defines the syntax and semantics of its data.

To resolve a DOI name, it may be input to a DOI resolver (e.g., at www.doi.org) or may be represented as a http string by preceding the DOI name by the string

http://dx.doi.org/

For example, to resolve the DOI name 10.1000/182, enter the address: "http://dx.doi.org/10.1000/182". Web pages or other hypertext documents can include hypertext links in this form. Some browsers allow the direct resolution of a DOI (or other handles) with an add-on, e.g., CNRI Handle Extension for Firefox. The CNRI Handle Extension for Firefox enables the browser to access handle or DOI URIs like hdl:4263537/4000 or doi:10.1000/1 using the native Handle System protocol. It will even replace references to web-to-handle proxy servers with native resolution.

[edit] Metadata

Each DOI name is associated with a series of metadata. The extent of this metadata may be defined by an application profile; a small kernel of common data for all DOI names can be optionally extended with other relevant data, which may be public or restricted. The metadata can be existing data from another scheme, which can be mapped to a DOI Application Profile using a data dictionary based on the indecs Content Model.

Registrants may update metadata about their contents any time they wish (when some publication data changes, when the primary URL the DOI name resolves to is modified, etc.).

[edit] Organizational structure

The International DOI Foundation (IDF), a non-profit organisation created in 1998, is the governance body of the DOI System.[7] It safeguards all intellectual property rights relating to the DOI System, manages common operational features, and supports the development and promotion of the DOI System. The IDF ensures that any improvements made to the DOI System (including creation, maintenance, registration, resolution and policymaking of DOI names) are available to any DOI registrant. It also prevents third parties from imposing additional licensing requirements beyond those of the IDF on users of the DOI system.

The IDF is controlled by a Board elected by the members of the Foundation, with an appointed Managing Agent who is responsible for co-ordinating and planning its activities. Membership is open to all organizations with an interest in electronic publishing and related enabling technologies. The IDF holds annual open meetings on the topics of DOI and related issues: the 2010 meeting is provisionally scheduled to be held in Hannover, Germany in mid year.

DOI Registration Agencies, appointed by the IDF, provide services to DOI registrants: they allocate DOI prefixes, register DOI names, and provide the necessary infrastructure to allow registrants to declare and maintain metadata and state data. Registration Agencies are also expected to actively promote the widespread adoption of the DOI System, to cooperate with the IDF in the development of the DOI System as a whole, and to provide services on behalf of their specific user community. A list of current RAs is maintained by the International DOI Foundation.

Registration agencies generally charge a fee to assign a new DOI name; parts of these fees are used to support the IDF. The DOI system overall, through the IDF, operates on a not-for-profit cost recovery basis.

[edit] Standardization

The DOI System is currently being standardised through the International Organization for Standardization, in its technical committee on identification and description TC46/SC9. The Draft International Standard ISO/DIS 26324, Information and documentation - Digital Object Identifier System was released for ballot on 5 October 2009. Voting will close on 5 March 2010 [8]. DOI is a registered URI under the infoURI [9] specification (IETF RFC4452), “The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces” [10]. info:doi/ is the infoURI Namespace of Digital Object Identifiers. The DOI syntax is a NISO standard, first standardised in 2000, ANSI/NISO Z39.84-2005 Syntax for the Digital Object Identifier [11]

[edit] See also

[edit] Notes and references

[edit] External links