VRA - Images, the newsletter of the VRA

Images, April 2010
vol.7, no.2

Digital Scene and Heard
Edited by Elizabeth Meyer (University of Cincinnati)
Digital Initiatives Advisory Group

Embedded Metadata, Part 2: Use Cases
By Greg Reser, Johanna Bauman, Steve Tatum, Sheryl Frisch

Introduction
In part one of this article, we traced the development of embedded metadata – data about an image which is converted to digital bits to become part of the digital file – and also showed how several standards have been developed over the years to make embedded metadata easier to read and edit on a wide variety of hardware and software platforms. One of the newest of these standards is Adobe's XMP1. It is coming to be widely accepted and has the ability to encode all of the legacy schema as well as custom user-defined schema.

Our goal in part two of this exploration into embedded metadata is to answer the questions that managers of image collections (who seldom send image files to press or advertising agencies) might ask about how embedded metadata could benefit them and the users of their visual resources or digital library collections. We have selected three specific use cases that are currently being implemented to showcase the use of embedded metadata in action. These use cases have also been selected because they exemplify three different applications of embedded XMP metadata in a VR or library environment: 1) as a means of gathering image metadata from faculty and students, 2) as a part of image production workflow in a VR collection or library, and 3) as a means of providing easy access to image files.

Multiple Schemas: Which one should I use?
In each of the use cases below, the image collection managers are using a mix of existing and custom schemas. The various standards developed over the years—TIFF, EXIF, IPTC IIM, and XMP—are widely used and can be found in almost every photo application. With all of these standards available to you, it would be natural to ask "Which one should I use?" The short answer is XMP, but this is a bit misleading. XMP is simply an encoding standard (related to and in many ways much like XML) that can represent any schema you choose to use, including the older EXIF, IPTC IIM, and TIFF tags. XMPs real power comes from its extensibility – its ability to be customized to your needs. Of course with great power comes great responsibility.

Recognizing the potential for confusion that multiple schema and mappings might create, several hardware and software manufactures (including Apple, Adobe, Canon, Microsoft, Nokia, and Sony) banded together in 2006 to form the Metadata Working Group.2 The focus of the group is to preserve digital image metadata and assure its seamless interoperability on and across all applications, devices, and services. To foster cooperation and encourage adoption, MWG is based on a formal legal framework and royalty free intellectual property policy. In other words, they want everyone to benefit: consumers and manufacturers. In 2008, MWG published its “Guidelines for Handling Image Metadata” 3 which lays out how metadata should be mapped across schema and how it should be written and read to maintain this mapping. Any custom embedded metadata schema or application developed for the management of digital images in VR collections and libraries should follow these guidelines.

Embedded XMP Metadata in Action

1. Collecting Metadata from Faculty and Scholars
Properly harnessed, XMP has the power to serve as a normalizing standard that will allow image collection managers to accept data from users who do not have access to a central database. Using a standardized XMP panel provided by the local image collection manager, a professor working in the field could manage her images by cataloging them directly in Photoshop or Adobe Bridge on site. At the end of her trip, she now has a set of cataloged images that she can: search and organize as she sees fit from her own desktop; share with colleagues who can do the same; submit for publication together with data; and, finally, submit back to the image collection manager for incorporation into a central VR database or digital library.

Use Case: Steve Tatum, Virginia Tech - Faculty field photographs of Virginia Speedways enhanced by student catalogers and ingested into multiple databases4
When a faculty member takes photographs in the field as part of their research5, they are generating assets of great value to multiple departments on a campus. For this project, which documents the auto racing speedways of Virginia6, the materials are distributed in two databases, three if you count the contributor's local files as a mini database.

The process begins with the faculty member photographing speedways and scanning ephemera such as tickets and posters. He conducts interviews, researches print resources like newspapers, and then records what he knows about each image in the default Photoshop Description panel. The image files with embedded metadata are then transferred to the Visual Resources Collection where student catalogers view the original description and then do further research to create new catalog records. These records are entered into a spreadsheet and also embedded in the image along with the original descriptions. Finally, cataloging is reviewed by the VR curator and sent to two databases: Virginia Tech's Luna Insight collection and The Virginia Tech Digital Libraries and Archives.

This process is complicated by the fact that each database requires a different schema: Dublin Core Terms for the Libraries and Archives and Luna Insight and VRA Core for the VR Collection. XMP's ability to store separate data sets of RDF XML is a perfect way to deal with this multi-schema requirement. The faculty member's original data is stored in the standard Photoshop fields (mostly Dublin Core Elements). This data remains unedited and functions as primary source material. The cataloger's data is entered using custom XMP panels with several schemas. To describe the resource (photograph, ticket stub), DC Terms7 and SKOS8 are used to comply with the Virginia Tech Libraries and Archives standards. To describe the Speedways as works of architecture a simple set of flat VRA Core9 elements is used. Each schema is used for a specific purpose and represents a different aspect of the resources. The role of embedded metadata in this project is multilayered: as a means of transmitting information, serving as convenient reference, and functioning as an archive of the original data captured.

2. Image Production Workflow
Embedding metadata can also be incorporated into tracking image production from order creation to final cataloging. As an image moves through the production process, staff can add metadata that records order information, photographer instructions or notes, quality control details, and even pre-cataloging data entered by assistants. Anyone in the office can open an image and see what stage of production it is in - who shot it, who edited it, who cataloged it, if it's ready for delivery, and what rights have been assigned to it. These attributes would also be searchable, allowing staff to easily find all the images in a given order that have not been cropped, or all of the images that need to be cataloged. All of this embedded metadata could be automatically extracted to a central database to become the starting point for full cataloging.

Use Case: Sheryl Frisch, Cal Poly - Student assistants enter data for ingest into VR database10
This process was developed as a solution to the restrictions of our IT system which does not allow multiple student assistants to enter data directly in the central database.

Going the opposite direction, from the database where the work records reside and then writing the data into the image file would be more expedient, but that is not possible at this time. Having student assistant’s work in Adobe Bridge allows us to accomplish our data entry tasks in an efficient way.

The process begins with students entering image data directly from the source using CS3 Bridge and three custom XMP panels: “Creator”, “Image”, and “Work” which follow the VRA Core more or less. We don’t have enough resources to support cataloging a complete Core record for each image so we focus on brief records. Once the data has been entered in Bridge it is then exported as a text file. This is imported into Excel where it is given a final review then saved as a new text file. It is then imported into our central EmbARK11 database and ultimately formatted for the Cal Poly Web Kiosk. Currently we have two data export formats for CS3. In the first, we can select either all or specific fields to export. The second is a CSU template that works with the WorldImage database12. The CSU export template adds additional columns for data required for the WorldImage database and populates the fields that were filled in by the student during the initial data capture. This way, correct data is only entered once and then transformed as needed. Data that was not entered properly is cleaned up and imported back into the image files. The filename is the key field.

To accomplish all of this, scripts written in JavaScript were created. They are placed in the Adobe Bridge CS3 Start up folder. Easy to use graphic user interfaces were created to facilitate the import and export process and are part of the scripts. The Adobe Library files had to be modified so that the scripts could “talk” to the panels. In this process, embedded metadata serves as the starting point of cataloging and functions as transfer method from workstation to central database.

3. Image Access
As we have already seen in the two examples above, one added benefit of producing images with embedded metadata is that it allows images to be searched across such software platforms as Lightroom, iPhoto, ACDsee, Picasa, not to mention the fact that the images become searchable directly through the operating system of the computer (Mac OSX and Windows 7). This is a distributed database model which you might be familiar with in the form of iTunes where each user downloads album track metadata to create their own searchable music database which can be managed in any way they choose. iTunes embeds the metadata in the audio files so that that when you transfer your files to an iPod, you can see the album and song titles. The final use case demonstrates the seamless integration of workflow and image access using embedded metadata.

Use Case: Greg Reser, UCSD - Accession record and image production tracking with PDFs13
With image accession record documentation files growing every year while file cabinet space shrinks, the UCSD Arts Library decided to switch from paper to digital. Creating PDF files of scanned slide sheets and printed documents has many benefits: the scan can be adjusted to show both the text on the slide label and the image on the film, the files can be accessed from user's workstations, and the files can be backed up for reliability.

These benefits alone are enough to make switching to PDFs for accession records worthwhile, but embedding administrative data in the PDF file header so that it can be read when the file is open is even more powerful, since it will make it possible to view the status record without searching the Library's main database. All members of the staff can open a PDF at their own desk and have instant access to everything that has been recorded about an order. Since the metadata is encoded using XMP, it is searchable using Adobe Bridge, which most team members use while working on image orders. It is also be possible to export the metadata to the Library's database and DAM if desired.

An added benefit would be the ability to embed production workflow data to simplify tracking the status and history of an order. The accession PDF could become a "job ticket" where each participant in the production process can sign off when their part is complete. Having one location for this would be much better than the scattered worksheets and notebooks we previously used. Creating a custom input form (known as a "panel" in Adobe terms) also allows us to control some parameters of the data input, a feature that is not available in the Library's database. For instance, the accession numbers must be six digits each and the date must be formatted MM/DD/YYYY. There are also several drop-down lists which provide controlled values and speed up data entry. Commonly used terms are saved and presented as "most recently used" lists which allow the user to instantly fill in a field with one click.

Since the built-in Adobe metadata panels do not contain all the fields necessary for the UCSD accession records, a custom XMP panel was created with 40 elements. The descriptive elements, such as Order Date, Source Title, and Source Author were mapped to Dublin Core. By Using Dublin Core14 whenever possible, it is possible to read the most essential accession data in the default PDF and Photoshop "Info" windows. This means that even if someone opens the PDF without the Arts Library XMP panel they will be able to see basic information about the accession record. The other 32 elements were mapped to a custom "ucsdartslib" namespace and record information such as Photographed By, Edited By, Cataloged By, Rights Type, and Export status.

Conclusion
These three examples demonstrate how an XMP panel and custom schema can be tailored to meet a variety of needs. They also highlight the fact that embedded image metadata has both a public and a private face. You can embed just about any information you want to in a custom schema for internal use, whether it's production tracking or primary data collection. However, it is important to distinguish this from externally shared data which must conform to widely recognized standards if you expect it to be understood by other users. Custom XMP schemas are by their nature hidden unless a user has implemented the custom panel in their application or they are using an application that can read all embedded metadata. Many photo applications only display the most common schemas, generally those mapped to IPTC. This is why it is important to decide at the outset if you intend your embedded metadata to be used only by you and your staff or by anyone using common photo tools. You can have it both ways by mapping your public data to the appropriate XMP elements and adhering to the most widely used definitions for them. For instance, use Creator and Date, which are available in common photo tools, to refer to the photograph, not the work shown in the photo. As a matter of best practice we recommend following the guidelines established by the Metadata Working Group and the Microsoft Photo Metadata Policy. In addition, plan carefully before implementing a custom XMP panel for your next (or first) project using embedded metadata.

Johanna Bauman (ARTstor) johanna.bauman@artstor.org
Greg Reser (UCSD) greser@ucsd.edu
Steve Tatum (Virginia Tech) setatum@vt.edu
Sheryl Frisch (California Polytechnic State University) sfrisch@calpoly.edu

1.http://www.adobe.com/products/xmp/

2.http://www.metadataworkinggroup.com/

3.http://www.metadataworkinggroup.com/specs/

4.http://metadatadeluxe.pbworks.com/Virginia%20Landscape%20XMP%20Panels

5.http://archdesign.vt.edu/faculty/brian-katen#speedways

6.http://www.vtnews.vt.edu/story.php?relyear=2004&itemno=480

7.http://dublincore.org/documents/dcmi-terms/

8.http://www.w3.org/TR/skos-reference/skos.html

9.http://www.vraweb.org/projects/vracore4/index.html

10.http://metadatadeluxe.pbworks.com/Cal-Poly-Process

11.http://www.gallerysystems.com/embark

12http://worldimages.sjsu.edu/

13http://metadatadeluxe.pbworks.com/UCSD-Accession-Records

14http://dublincore.org/documents/dces/

Return to Images, April 2010