• Metadata: a Primer

 

Table of Contents

• What is Metadata, Anyway?
• What is a Metadata Dictionary?
• What is a Metadata Element?
• What is a Sub-Element (Qualified or Refined)?
• What is an Authority, Controlled Vocabulary & Structured Syntax?
• What are Element Attributes?
• What is an Application Profile?
• What is Dublin Core (DCMI)?
• What is IEEE-LOM: Learning Object Metadata?
• What is a DTD and an XML Schema?
• What is a Namespace?
• What are Metadata Mappings and Crosswalks?
• What are the OAI-Open Archives Initiative and ORE-Object Reuse and Exchange?

 

This article was compiled by Paul E Burrows, Information Architect & Service Management, Teaching & Learning Technologies, University of Utah 

 

What is Metadata, Anyway?

“Metadata” is descriptive information about a resource. The resource may be video or audio, an image or graphic, a text-based document, or any other informational item whether electronic or not.

The primary purpose of metadata is to enhance findability and facilitate sharing...the ability to describe a resource and allow someone to discover, review, select, and retrieve an item.

Examples of metadata include...

  • the name of an item
  • descriptions or abstracts about its content
  • keywords or subject classifications
  • file formats
  • authors, producers, distributors, publishers
  • copyright and usage restrictions

Excellent metadata needs to be structured in some way. The descriptions available through metadata should avoid being created in a random or ad hoc manner. In other words, metadata should follow a well-documented, formalized scheme. The flip side of using standardized metadata schemes is called "Folksonomy" and is described in a Wikipedia article ...

In contrast to professionally developed taxonomies with controlled vocabularies, folksonomies are unsystematic and, from an information scientist's point of view, unsophisticated; however, for Internet users, they dramatically lower content categorization costs because there is no complicated, hierarchically organized nomenclature to learn. One simply creates and applies tags on the fly.

Folksonomy-based metadata is often referred to as "tagging."  Many contributor-centric websites, including YouTube, encourage their authors to add a series of tags to capture the descriptive essence of their work so that others can conduct searches and find items of interest.

 

Top of Page

What is a Metadata Dictionary?

It's all about definitions. When creating a systematic method for describing media resources, you have to start by creating metadata categories. Whether created from scratch or harvested from other sources, metadata categories must be defined...thus a metadata dictionary.

In defining a dictionary,...

  • The metadata categories are called ELEMENTS.
  • An ELEMENT may stand alone, or be further refined by creating a SUB-ELEMENT or QUALIFIED ELEMENT.
  • Each ELEMENT has a carefully defined set of qualities or ATTRIBUTES.
  • The defined elements are combined into an APPLICATION PROFILE that is specific to the needs of a particular community or type of user.

 

Top of Page

What is a Metadata Element?

The Periodic Table of Elements contains a carefully structured visualization of the chemical building blocks of the universe as we know it. Metadata Elements are the descriptive building blocks used to verbally or visually describe the world of resources, assets, media items, or "essence."

End of metaphor (our apologies to chemists everywhere). A typical metadata scheme or application profile consists of many Metadata Elements assembled into a framework or model.

 

 

Top of Page

What is a Sub-Element (Qualified or Refined)?

We're talking about drilling down into the descriptive core of a media item or resource. If the descriptors attached to a specific element aren't specific or expressive enough to fully identify an item, then that element needs to be further refined or qualified by using a companion or associated "sub-element." These are often built in a hierarchical structure.

The reason to use sub-elements is to achieve a detailed level of description that best suits a community of users (such as educational communities) without going overboard. After all, at some point in the process, a real, live person must describe an asset using the elements provided. An overly simplistic set of elements fails to capture the nature of that resource. An overly compulsive set of elements may capture every aspect of a resource, but be too difficult to use, too time consuming to implement, and too confusing to understand by most humans. One needs a set of elements and qualified elements that is "just right."

An element and its associated sub-element descriptions are often referred to as a "container," implying that the descriptions are related in nature and should be transported or conveyed to information systems intact, as a bundle. A simple example might be the contributing creators of a video program in which one "contributor container" contains an individual's name (e.g., Smithee, Alan) plus that person's role (director). A second instance of a "contributor container" contains another name (e.g., Doe, Jane) plus that person's role (executive producer). The name is "qualified" by a role or function. Unless a name and its associated role are paired in a container, then all you achieve is a list of names and disassociated roles.

 

Top of Page

What is an Authority, Controlled Vocabulary & Structured Syntax?

In addition to refining the descriptions of an element by creating sub-elements that "qualify," refinement can be achieved by establishing restrictions on how data is actually entered for a descriptor.

Refinement is also achieved if the grammar of a descriptor is controlled. A good example of this type of refinement is the order in which a person's name is displayed, e.g., LastName, FirstName MiddleName, Credentials (for a very interesting discussion on the complexities of displaying names, see "Representing People's Names in Dublin Core").

Another example is the manner in which dates are represented, whether you order the data by Month/Day/Year, Day/Month/Year, or Year/Month/Day (for a discussion of the representation variables in displaying dates and times, see the W3C report on "Date and Time Formats").

In order to better control the terms and descriptions used while cataloging, some metadata elements can employ ways to refine or "encode/enter" your data, using formal notations, vocabularies or parsing rules.

  • AUTHORITY FILE: Use an "authority file" from another agency that specifies how to properly enter descriptive information for a type of metadata element. It may provide taxonomies of terms organized into logical hierarchies, such as the Library of Congress "subject" terms.
  • CONTROLLED VOCABULARY: Use a short listing of prescribed terms, often called a "controlled vocabulary." The best practice is to select a term or terms from a picklist. The picklist insures consistency in data entry.
  • STRUCTURED SYNTAX: Follow a particular structured syntax, punctuation or grammar when entering data, e.g., LastName, FirstName MiddleName, Credentials or dates as 2005-02-24 (YYYY-MM-DD).

Controlling the descriptions entered for a metadata element ultimately means that end users are able to conduct successful searches for relevant media items and avoid an explosive number of irrelevant "hits."

 

Top of Page

What are Element Attributes?

Once again the topic focuses on specificity and standardization. There are many specifications on how to define data elements. If one hopes to share metadata descriptions with other organizations and entities (interoperability), then it's best to follow an established set of guidelines in setting up and defining metadata elements. A commonly understood framework allows diverse groups to appreciate, even harvest, data from each other.

In its underlying structure, a metadata scheme employs a modified standard for describing data elements used in databases and documents. It is called ISO/IEC 11179: Specification and Standardization of Data Elements. When using this standard, each descriptor or metadata element is identified by numerous attributes or characteristics that define the meaning and use of an element.

Although end-users are mostly unaware of the underlying structures of a metadata schema, those of us who mark up or create descriptions are immersed in understanding and applying metadata field attributes. What are some of the typical attributes used in the underlying structures of a metadata scheme?

NAME
The actual name of the descriptor or element.

DEFINITION
A brief definition of the descriptor or element. Guidelines for understanding how to use an individual element from UMAP are described under the attribute Guidelines for Usage.

REFINEMENTS & ENCODING SCHEMES
In order to better control the terms and descriptions used while cataloging, some metadata elements can exploit ways to refine or "encode/enter" your data, using formal notations, vocabularies or parsing rules.

  • Use an "authority file" from another agency that specifies how to properly enter descriptive information for a type of metadata element. It may provide taxonomies of terms organized into logical hierarchies, such as the Library of Congress "subject" terms.
  • Use a short listing of prescribed terms, often called a "controlled vocabulary." The best practice is to select a term or terms from a picklist. The picklist insures consistency in data entry.
  • Follow a particular syntax, punctuation or grammar when entering data, e.g., LastName, FirstName MiddleName, Credentials. 

Controlling the descriptions entered for a metadata element ultimately means that end users are able to conduct successful searches for relevant media items.

GUIDELINES FOR USAGE
The Guidelines are a brief user's guide to understanding UMAP, its elements, their intended meanings, and the proper way to use them when entering data or cataloging.

OBLIGATION TO USE
A metadata element does NOT necessarily have to be used or contain data when cataloging a media item. The "Obligation to Use" a metadata element is defined by these options...

  • MANDATORY: must use
  • MANDATORY IF APPLICABLE: must use if it makes sense for a media item
  • OPTIONAL: completely up to you
  • RECOMMENDED: USHE-Core thinks it is a good idea to enter data for this element

REPEATABLE ELEMENT
Some metadata schemes, such as the Dublin Core, suggest if you need to catalog more than one term or data entry for a single element, you repeat the instance of an element, each instance having a different term. For example, the element alternativeModes may have two entries, one for "SAP1 - Spanish" and one for "ClosedCaption - English." This attribute may be defined as...

  • REPEATABLE
  • UNBOUNDED
  • MINIMUM OCCURENCE
  • MAXIMUM OCCURENCE
  • or may actually use a number

TYPE OF DATA ENTRY
Any database designer must indicate what type of data is permitted for a field. The type of data permitted is often defined as TEXT STRING, NUMBER, INTEGER, DATE, DATETIME, TIME, CHAR, etc. USHE-Core uses relatively few data entry types. Most are text strings. Even the metadata elements which contain date/time stamps are considered to be text strings because of our use of the W3C-DTF encoding rules for dates and times, a profile based on ISO 8601.

http://www.w3.org/TR/NOTE-datetime

EXAMPLES
To illustrate the proper use of an element, USHE-Core provides some real world examples, in particular when the definition for an element is rather dense or confusing.

ELEMENT LABEL
(an Administrative Attribute) Usually the attribute called Name and the attribute called Label are the same. The Label is used to indicate the exact manner in which an element is referenced.

ELEMENT VERSION
(an Administrative Attribute) While developing metadata, several versions of elements or the meaning attached to them will change over time. Like software editions that are released, Element Version indicates the version you are viewing (hopefully, the most recent version).

NAMESPACE IDENTIFIER
(an Administrative Attribute) A unique name that identifies an organization that has developed an XML Schema. A namespace is identified via a Uniform Resource Identifier (a URL or URN). For example, the namespace for Dublin Core elements and qualifiers would be expressed respectively in XML as:

xmlns:dc = "http://dublincore.org/elements/1.0/"
xmlns:dcq = "http://dublincore.org/qualifiers/1.0/" >

The use of namespaces allows the definition of an element to be unambiguously identified with a URI, even though the label "title" alone might occur in many metadata sets. In more general terms, one can think of any closed set of names as a namespace. Thus, a controlled vocabulary such as the Library of Congress Subject Headings, a set of metadata elements such as DC, or the set of all URLs in a given domain can be thought of as a namespace that is managed by the authority that is in charge of that particular set of terms.

REGISTRATION AUTHORITY
(an Administrative Attribute) A system to provide management of metadata elements. Metadata registries are formal systems that provide authoritative information about the semantics and structure of data elements. Each element will include the definition of the element, the qualifiers associated with it, mappings to multilingual versions and elements in other schema.

A registration authority facilitates the consistent use of a metadata element by all parties and communities. It also contributes to the longevity of a metadata element as it maintains its integrity over time.

LANGUAGE OF THE ELEMENT
(an Administrative Attribute) Depending on the Registration Authority for a metadata element or its country of origination and usage, the language used to define an element is indicated. Standards exist to express languages in either two-letter or three-letter codes.

This attribute, "Language of the Element," refers to the language used to define the element and has nothing to do with the language used in the media item you are cataloging. A descriptor called language is often used to identify the primary language of a media item.

ISO-639-2: Codes for the representation of names of languages as a 3-letter code.
http://www.loc.gov/standards/iso639-2

 

Top of Page

What is an Application Profile?

An Application Profile is a set of metadata elements, policies, and guidelines defined for a particular application or situation. The elements may be harvested from one or more element sets, thus allowing a given application or profile to use pre-established, well-formed, standardized metadata in addition to other medadata descriptors that are created and defined locally (custom metadata). For example, a given application might choose a subset of the Dublin Core that meets its needs, or may select elements from the Dublin Core, another element set, and several locally defined elements, all combined in a single schema. An Application Profile is not complete without documentation that defines the policies and best practices appropriate to its use and application.

http://library.csun.edu/mwoodley/dublincoreglossary.html 

 

An application profile has been defined by the ARIADNE Foundation in a paper entitled "Application Profiles: Mixing and Matching Metadata Schemas," by Rachel Heery and Manjula Patel. Excerpts are included below.

Application profiles consist of data elements drawn from one or more namespace schemas combined together by implementors and optimised for a particular local application. Application profiles are useful as they allow the implementor to declare how they are using standard schemas.

The experience of implementors is critical to effective metadata management...implementors use standard metadata schemas in a pragmatic way. This is not new, to re-work Diane Hillmann’s maxim ‘there are no metadata police’, implementors will bend and fit metadata schemas for their own purposes.

Schema application profiles are distinguished by a number of characteristics. They...

May draw on one or more existing namespaces
The application profile may use elements from one or more different element sets, but the application profile cannot create new elements not defined in existing namespaces.

Introduce no new data elements
All elements in an application profile are drawn from elsewhere, from distinct namespace schemas. If an implementor wishes to create ‘new’ elements that do not exist elsewhere then (under this model) they must create their own namespace schema, and take responsibility for ‘declaring’ and maintaining that schema.

May specify permitted schemes and values
Often individual implementations wish to specify which range of values are permitted for a particular element, in other words they want to specify a particular controlled vocabulary for use in metadata created in accordance with that schema. The implementor may also want to specify mandatory schemes to be used for particular elements, for example particular date formats, particular formats for personal names.

Can refine standard definitions
The application profile can refine the definitions within the namespace schema, but it may only make the definition semantically narrower or more specific. This is to take account of situations where particular implementations use domain specific, or resource specific language.

http://www.ariadne.ac.uk/issue25/app-profiles/

 

 

 

Top of Page

What is Dublin Core (DCMI)?

Dublin Core (ISO 15836) is an international standard for resource discovery (http://dublincore.org). Many other metadata schemes are able to map their metadata to the standardized Dublin Core fields.

The Dublin Core Metadata Initiative (DCMI) is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.

The Dublin Core is a 15-element metadata element set intended to facilitate discovery of electronic resources. The Dublin Core has been in development since 1995 through a series of focused invitational workshops that gather experts from the library world, the networking and digital library research communities, and a variety of content specialties.

The Dublin Core Metadata Initiative is the body responsible for the ongoing maintenance of Dublin Core. The work of DCMI is done by contributors from many institutions in many countries. DCMI is a consensus-driven organization organized into working groups to address particular problems and tasks. DCMI working groups are open to all interested parties. Instructions for joining can be found at the DCMI web site under their section labeled "Working Groups."

 

Top of Page

What is IEEE-LOM: Learning Object Metadata? 


IEEE 1484.12.1-2002, Learning Object Metadata Standard

A very readable summary of IEEE-LOM and its intent is available from a Wikipedia article found at this URL:
http://en.wikipedia.org/wiki/Learning_object_metadata

A learning object is not particularly useful unless it can be described, identified, and evaluated for its chief instructional benefits as well as being discovered by others to be repurposed in their own learning environments. There are many options for describing media items.

 

 

Metadata descriptors are available in several broad categories:

  • objective metadata (file types, media types, technical parameters)
  • subjective metadata (descriptions, keywords)
  • rights metadata (copyright, usage rights)

But not until the Learning Technology Standards Committee (LTSC) of the IEEE (Institute of Electrical and Electronics Engineers) tackled the challenge were there additional metadata descriptions to describe the educational utilization of learning objects within a specific instructional setting or classroom. Thus was developed the IEEE-LOM, Learning Object Metadata Standard.

IEEE-LOM is formally known as IEEE 1484.12.1-2002, Learning Object Metadata Standard: LOMV1.0 Base Schema.

BACKGROUND (taken from the IMS Learning Resource Meta-data Best Practice and Implementation Guide)

In 1997, the IMS Project, part of the non-profit EDUCOM consortium (now EDUCAUSE) of US institutions of higher education and their vendor partners, established an effort to develop open, market-based standards for online learning, including specifications for learning content meta-data.

Also in 1997, groups within the National Institute for Standards and Technology (NIST) and the IEEE P.1484 study group (now the IEEE Learning Technology Standards Committee - LTSC) began similar efforts. The NIST effort merged with the IMS effort, and IMS began collaborating with the ARIADNE Project, a European Project with an active meta-data definition effort. 

In 1998, IMS and ARIADNE submitted a joint proposal and specification to IEEE, which formed the basis for the IEEE Learning Object Meta-Data (LOM) Draft Standard, which was a classification for a pre-draft IEEE Draft Standard. IMS publicized the IEEE work through the IMS community in the US, UK, Europe, Australia, and Singapore during 1999 and brought the resulting feedback into the ongoing specification development process.

SCOPE (taken from the IMS Learning Resource Meta-data Best Practice and Implementation Guide)

The IEEE LOM Draft Standard defines a set of meta-data elements that can be used to describe learning resources. This includes the element names, definitions, datatypes, and field lengths. The specification also defines a conceptual structure for the meta-data. The specification includes conformance statements for how meta-data documents must be organized and how applications must behave in order to be considered IEEE-conforming.

The IEEE LOM Draft Standard is intended to support consistent definition of meta-data elements across multiple implementations.
The IMS Learning Resource Meta-Data Best Practice and Implementation Guide therefore includes or references:
  • IEEE Learning Object Meta-Data Working Draft, Version 6.1
  • IMS Learning Resource Meta-Data XML Binding, Version 1.2
  • IMS Learning Resource Meta-Data Information Model, Version 1.2
  • IMS Taxonomy and Vocabulary Lists

The IMS Learning Resource Meta-data Best Practice and Implementation Guide provides general guidance about how an application may use LOM meta-data elements. The IMS Learning Resource XML Binding specification provides a sample XML representation and XML control files (DTD, XSD) to assist developers with their meta-data implementations. None of these IEEE or IMS documents address details of meta-data implementation, such as its architecture, programming language, and data storage approach.

 

EXTENSIONS (taken from the IMS Learning Resource Meta-data Best Practice and Implementation Guide)

There has been, and continues to be, much debate on extending meta-data for uses beyond search engine retrieval. At this point, individual developers and implementers must make decisions on how to best extend meta-data.

The LOM rule regarding extensions is that they shall not conflict with or alter specified meta-data elements. The intent is to discourage developers from creating new elements that replace or duplicate elements in the LOM standard. For example, a meta-data instance should not have a new element, say TitleAndVersion, that is used as a replacement for already existing elements; in this case the title and version of the meta-data structure.

 

All of that said, the metadata dictionary and organization of the descriptors in IEEE-LOM is comprehensive. For a schematic overview of the hierarchy of the actual metadata elements of IEEE-LOM, refer to the document located at this URL link--

http://www.imsglobal.org/metadata/imsmdv1p2p1/imsmd_bestv1p2p1.html#1197022

In general, IEEE-LOM's metadata gathers its descriptors under these aggregations...

  • general subjective descriptions
  • lifecycle factors
  • administrative metametadata
  • technical/objective identifiers
  • rights and intellectual property considerations
  • relationships to other resources
  • annotations
  • formal classifications
  • educational metadata


Of importance to this discussion is the inclusion by IEEE-LOM of educational metadata or descriptions on how an individual media item or learning object is best intended to be used in educational settings. The specific metadata elements are...

  • interactivity type
  • learning resource type
  • interactivity level
  • semantic density
  • intended end user role
  • context
  • typical age range
  • difficulty
  • typical learning time
  • description
  • language

Certainly not all of these decriptors are applicable in all educational settings, whether K-12 or higher education. Importantly, through its educational metadata, IEEE-LOM addresses the next level of reusability for learning objects, which is to identify objects with standardized utilization parameters, where appropriate and informative to educators, faculty, and students. K-12 educators may find similarities to the rubrics associated with specific lesson plans used in the classroom to teach specific topics and subjects, provide learning activities, and conduct assessment.

 

 

Top of Page

What is a DTD and an XML Schema? 

A metadata schema, as well as the actual descriptions of media items that may use the schema, need to be presented in some logical, clearly expressed manner so that the information can be understood. More importantly, using well-formed methods to express metadata schemas and descriptions allows different parties to share data; they are communicating using the same language and the same grammar.

A language that is often used to express well-formed data is XML, Extensible Markup Lanuage. Unfortunately, unless the party offering and the party accepting the well-formed data are using a common grammar, information is likely to be mangled as it is interpreted and validated.

This situation is where a DTD (Document Type Definition) or an XML Schema (also called an XSD--XML Schema Definition) is used to define the grammar and validate the data being shared. Some have stated that a DTD or XML Schema functions as a blueprint for describing the structure of the XML language in a document. These blueprints supply the...

  • Sequence in which elements appear in an XML document
  • Interrelationships between different elements (parent-child associations or nested relationships)
  • Types of data that are used to express elements and attributes (text string, number, date, timestamp, etc.)

 DTDs have been around longer than XML Schemas, and are very widely used. However, they have some limitations in their capacities, such as using non-XML syntax to compose a DTD, support for limited data types, inability to identify namespaces, and no support for extensibility or inheritance. XML Schemas, however, do not have these limitations, and also allow users to craft their own data types.

Typically, more complex data structures, with multiple data types, require the use of an XML Schema over a DTD.

For additional information, listed here are Primers, XML Schema Definitions and Specifications as provided by W3C:

 

 

For an example of a well-formed and documented metadata schema and associated XSD, refer to the PBCore Metadata Schema (Public Broadcasting Core) using the following URLs:

 

Top of Page

What is a Namespace?

There are many metadata schemes available for use by various industries and communities, each with their own set of elements and definitions.

The creation of a "Namespace" that is referenced by schema makers and schema users is done in order to distinguish one set of element names from another set used by a different schema. For example, the element "description" may have divergent meanings from one set of metadata to another. Two or more developers may be using an identical element name.

By declaring a formal Namespace in which a specific metadata schema declares the existence and meaning of its metadata elements and names, we avoid name collisons and confusion. 

A Namespace declares a "bread crumb trail" between real world applications of a schema's metadata and its humble origins...or at least it points to the party responsible for its creation in the first place.

 

Top of Page

What are Metadata Mappings and Crosswalks?

A metadata standard is created in order to identify (in an organized and logical manner) how content, knowledge, and media are to be described. These descriptions often deal with four major categories:

  1. Objective descriptions
  2. Subjective descriptions
  3. Educational use descriptions
  4. Rights and rights use descriptions

 

These descriptions are expressed through metadata "elements." An element is a named placeholder for a very specific type of information, e.g., a title, an author's name, a country, a creation date, a set of keywords, etc.

Different metadata standards exist in order to serve the needs of particular user communities, such as public broadcasters, TV program listings, libraries, medical practitioners, artists, global positioning data, museum collections, statistical and social research, educational applications, and so on and so on.

In most cases, each metadata element that is employed in a metadata standard uses a consistent set of properties or attributes for identification and definition. These properties are outlined in the ISO/IEC 11179: Specification and Standardization of Data Elements and is expressed in half a dozen published documents and drafts. Technically speaking, a metadata dictionary is considered to be "cognizant of ISO/IEC 11179." 

What happens when one community desires to share metadata information entered in its systems with another community that maintains its own metadata standard? In a perfect world, each metadata element from the "source" metadata standard could be paired with a similar metadata element in the "target" metadata standard, and the data would be transferred.

Unfortunately, such a pure one-to-one pairing or "harmonization" is rare. Although each standard may use a common method to express the properties of its metadata elements, the actual data held within the element may not "crosswalk" or "map" perfectly.

The following quote was extracted from an excellent article entitled Issues in Crosswalking Content Metadata Standards. It was originally published through NISO, the National Information Standards Organization, and is authored by Margaret St. Pierre of Blue Angel Technologies, Inc. and William P. LaPlant, Jr., of the U.S. Bureau of the Census Statistical Research Division. The web page is unfortunately no longer available.

A crosswalk is a specification for mapping one metadata standard to another. Crosswalks provide the ability to make the contents of elements defined in one metadata standard available to communities using related metadata standards. Unfortunately, the specification of a crosswalk is a difficult and error-prone task requiring in-depth knowledge and specialized expertise in the associated metadata standards.

Obtaining the expertise to develop a crosswalk is particularly problematic because the metadata standards themselves are often developed independently, and specified differently using specialized terminology, methods and processes. Furthermore, maintaining the crosswalk as the metadata standards change becomes even more problematic due to the need to sustain a historical perspective and ongoing expertise in the associated standards.

 

When harmonizing metadata elements from different standards, there are several points of intersection where collisions, rather than merging, may occur.

 

Matching Semantic Definitions

An element in the source standard may not find a companion element in the target standard because the definition, semantics, or meaning of the elements are different. With such a mismatch, a descriptor may not translate well.

 

Matching Element-to-Element Relationships

Suppose the source standard uses separate metadata elements to identify the (1) Last name of a person, (2) First name, (3) Middle name, and (4) Credentials for an individual. What if the target standard only employs a single element to contain all of a person's names, prefixes and suffixes? How do the "many" elements of the source map to the "one" element in the target? There is a "many-to-one" mismatch. Likewise, there may exist a "one-to-many" element mismatch between the source and target standards. Furthermore, one standard may contain extra elements and descriptors that cannot even be paired with the other system.

 

Matching & Converting Content

The properties for a metadata element may define or restrict its contents by...

  1. data types (e.g., text, numeric, string, date, etc.),
  2. ranges of values
  3. data refinements derived from the use of various authorities, controlled vocabularies, or specific syntaxes for the presentation of the data (e.g., keywords that are separated by semi-colons)
  4. repeatability of the element in order to express multiple values or desciptions
  5. mandatory or optional usage of the element when entering values

 Even though a metadata element from a source standard may semantically match an element in a target standard, the rules by which the actual data entered in the element may differ between the systems. The mismatch may be resolved by some form of conversion or data reformatting. Consistency in how data was originally entered is key to formulating automatic conversion utilities or crosswalks.

 

Matching Single vs. Multiple or Compound Data Objects

Many asset management systems and databases allow the relationships between several data records/media items to be expressed. For example, a video program might have a transcript (text document), brochure (pdf), DVD (non-digital medium for order fulfillment), and other items associated with it. If an end user searches for the video program, the search results report the related media items as well. These associated/related items are often housed as a "multiple" or "compound" data object. Many databases actually refer to them as "container fields." If the source and target metadata system use different methods to identify and report multiple or compound objects, then a mismatch in mapping will occur.

 

Matching Hierarchical and Flat Metadata Standards

Some metadata standards, like IEEE-LOM (Learning Object Metadata) use a very hierarchical structure to organize the relationships between metadata elements. These relationships can often become quite complex. Other standards, such as Dublin Core, are flat in nature, with no implied or expressed hierarchy. Trying to pair metadata elements between a hierarchical and a flat system can be troublesome.

 

Top of Page

Wha are the OAI-Open Archives Inistiative and ORE-Object Reuse and Exchange?

There is a brief article on the Open Archives Initative in the Wikipedia. This concisely written excerpt provides an overview of the OAI...

The Open Archives Initiative (OAI) is an attempt to build a "low-barrier interoperability framework" for digital archives (aka "institutional repositories") containing digital content (aka "digital libraries"). It allows people (Service Providers) to harvest metadata (from Data Providers). This metadata is used to provide "value-added services", often by combining different data sets.

Initially, the initiative has been involved in the development of a technological framework and interoperability standards specifically for enhancing access to e-print archives, in order to increase the availability of scholarly communication; OAI is, therefore, closely related to the Open Access movement. The developed technology and standards, though, are applicable in a much broader domain than scholarly publishing alone.

The OAI technical infrastructure, specified in the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), currently in version 2.0, defines a mechanism for data providers to expose their metadata. This protocol mandates that individual archives map their metadata to the Dublin Core, a simple and common metadata set for this purpose.

Funding for the initiative comes from various organizations including the Joint Information Systems Committee. 

New work has begun on the reuse and exchange of aggregations of content items, often referred to as "Compound Data or Digital Objects." There is a brief article with a listing of resources on the OAI initiative for ORE or Object Reuse and Exchange.

 

Top of Page

 

This article is compiled by:
Paul E Burrows
Teaching & Learning Technologies
University of Utah

 

 

Powered by Zendesk