Metadata - cluster logo

Experimental project

This is a sandbox project, which contains experimental code for developer use only.

(formerly 'mediadescriber')

This package accesses metadata and descriptions associated with images (or other media files).

The idea is that you can use desktop image library management tools (iPhoto, Picassa, Windows Photo Gallery, AcDSee, Adobe Bridge/Lightbox) and have the additional descriptive information you add transfer with your files when they are uploaded to your Drupal site.
* that's the idea anyway - each package has its own way of encoding metadata, not all these are currently possible.

This package can read and re-use embedded or supplimentary metadata sources (EXIF, XMP, Descript.ion) that describe uploaded media and absorb them into the Drupal system. This data is then used to provide captions or tagging information for the files.

History

Originally developed for Drupal4, the Drupal5 vesion of this library has been in background development for a while, but took a while to tidy up for public release. Mostly due to the boring dependencies on external libraries used to do the actual extraction from binary files. See the README or consult the makefile for where and how to get these.

Current re-release of this code (2011-10) Is mostly stable on Drupal6, where it will be refactored a little and moved on to Drupal 7. The git repository master should be expected to be Drupal 6 unless otherwise stated.

Requirements

On its own, you can use only descript.ion (text) files to provide limited metadata, or {filename.ext}.txt sidecar files.
EXIF data requires the right extension to be available to your PHP.
Additional libraries (ARC2, PJMT) are very helpful, and recommended for real use, as they parse better information from modern image library packages and sources.

Methods

The central module - meta_inspector.module - invokes other metadata extraction modules ( meta_descriptionfile.module , meta_pjmt.module (includes EXIF and XMP), meta_database.module ) to actually do the format- specific parsing. Each of these implements hooks that return the data they find in their own way.

Enhancing image_import

In the first case, this package is designed to enhance image_import functionality, to enable rapid uploading of image galleries prepared offline, and to enable those galleries to be migrated between Drupal sites easily.

If an image is detected to have embedded keywords (or dc:subject tags), this module presents the option to turn those keywords into taxonomy terms during the image_import process.

Instructions/Walkthrough

Illustrated instructions on how to work with metadata on the desktop and upload it as part of your image galleries is in the advanced help of the module

About the metadata

Many images come with a lot more information than just title and keywords though. Things like date, camera type, location can all be recorded by modern cameras. Images from production studios can/should have artist and copyright notes embedded also.

Use the 'View Raw Metadata' button on the image_import screen or 'Scan metadata' on the image node edit screen to see the types of stuff that's available.

Large amounts of the metadata presented may be redundant or repeated. Part of that is due to the mix of formats used by encoders, part due to the meta inspector 'simplifing' and aliasing complex namespaces down to the values we can recognise.
Technically, a value may be called
"Iptc4xmpCore:CreatorContactInfo.Iptc4xmpCore:CiAdrExtadr"
!! But I 'flatten' it down to " CiAdrExtadr " so we don't go insane. The original namespaced value is left there also however.

Mapping metadata to CCK

If you want to capture the available metadata - you must first prepare a place to put it.

Your destination image content-type must have some text fields defined for it using CCK.

Automatic field mapping by name

For example, to capture the 'Location' encoded in some images, add a field called 'field_location' in CCK. If that field is named correctly (prepended with 'field_', lowercased, all odd characters replaced with "_" ) - then it may get filled with the found data.

Modern systems use the Dublin Core for labelling the values - eg ' DC:Author '. This can be stored in your node in a field called 'field_dc_author'

Assisted field mapping using wizard

If you want your fields to be named something else, or to change this behaviour, A wizard is available to assist matching metadata technical labels with your preferred CCK field names explicitly.
See /admin/settings/meta
It can even help you create the CCK field on-the-fly. illustration of the field mapping UI

Roadmap/Unstable

Write-back of metadata in in development. Currently only updating of descript.ion files is happening (triggered when you update an image node title or description) but replacement of EXIF or XMP embedded data would be nice to have (current libraries don't support this well, or require commandline tools).

Eventually, this project should also be able to assist re-applying metadata such as copyright notices to imagecache image derivatives. Using current PHP libraries, resized images used on a website lose any metadata that they previously had.

Additional media support is all possible in theory. PDF files already contain XMP capabilities, and DOCX can too (or something similar). The same applies to videos of a couple of formats (though it's not widely seen). MP3 or audio files can probably be supported, though a bridge utility from this project to the much stronger getID3.module would be nice. I have no idea about whether this can help you with Flash.

There is currently no direct interaction with the Drupal 'Media' project. Interest in that area is welcome.

Related discussions and projects:

Development by Dan Morrison (dman) : a New Zealand Drupal Expert

Project information

  • Created by dman on , updated