I have the extraction of Exif-tags working, apart from one or two possibly related things:

. Imported tags are glued together. An image with two tags for 2 different people ends up as 1. There is a thread here that discusses that and recommends to use the dev version plus a taxonomy patch, but apparently that doesnt work in my case. The attached image shows two names that ought to be separate tags, but apparently the software doesnt know how to handle the semicolon.

. The display of the tags in Taxonomy edit mode is strange anyway, see the attached pict. Maybe its more a Taxonomy problem than an Exif problem, let me know.

CommentFileSizeAuthor
#7 a15.jpg111.01 KBtheorichel
#7 66460001.JPG2.56 MBtheorichel
ExifTags.PNG7.45 KBtheorichel

Comments

TheoRichel created an issue. See original summary.

theorichel’s picture

In another thread on this I read that some people assume that the comma is used as a separator in keyword tags. That may very often be the case, but when I look at my pictures (that were tagged with MS Photo Gallery) the semicolon is used. I see it in Drupal, in Windows Explorer and in some online Exif readers. Some of these online tools also report that the keywords are stored in a field called 'XPKeywords' with or without a space between XP and keywords. So far toying with XP-based fieldnames has had no effect, but maybe I should edit something in a php-file somewhere. Or is there a location where I can change the data separator. Suggestions are very welcome.

jphautin’s picture

Hello,

As mentioned in your request , the separator is hardcoded in the module for now. I will integrate the taxonomy patch in a release version soon with a new configuration in the widget to choose the separator character. this should fix your issue.

And to answser your last question.... Yes, the name look strange because it is encoded in UTF-8 which is not well managed in the case of the form. I do not know if it is a Core or taxonomy issue. It is the same on all major drupal version (6 to 8). I could not encode it to utf-8 but the tag will loose some (all) characters of the tags which was not desired.

regards,

theorichel’s picture

Thank you. Is there something a batch metadata converter could do, such as Exiftool? I have no experience with it, so I first ask.

theorichel’s picture

Fyi: I had a collection of Tif-files and since your module cannot extract tags from that format I converted it to JPG. According to Windows Explorer the tags are unchanged, but when I load it in Drupal this string
ExifII*øÈÈ is inserted in front of the tag, and the tag itself is cut off after a few characters.

jphautin’s picture

to be sure to correct the issue, can you upload an image that does not work so i can check the behavior against the code ?

theorichel’s picture

StatusFileSize
new2.56 MB
new111.01 KB

Great,

This photo has these 5 tags: Mike Joziasse; Sam Richel; Felix Richel; Niels de Waard; Armando van Oeffelen;
but Drupal considers them as 1.

And the other picture (a15) added strange characters before the tag en cut away a part of it. (This happened after I converted the file from tif to jpg.

In Windows Explorer both files display normally.

Thanks

  • jphautin committed 2645b71 on 7.x-1.x
    Issue #2656744 by jphautin: Strange characters in Exif Tags
    
jphautin’s picture

Status: Active » Fixed

Hello,

For the first image, It was an issue I just have fixed.
For the 2nd one, I check at the value extracted directly from the lib and it is not good.
The value read is not really usable as there is no separator between names. The prefix is probably here to explain the encoding (this is done by tools like Exiv2 which use Windows UCS-2) but these is not a part of the exif specification and it is not understandable by the exif lib.

For now, your only free option is to use exifTool or ACDSee to translate your TIFF metadata to JPEG. Adobe Lightroom might also do the trick.

check the next release for the first point.

regards,

theorichel’s picture

It took me a while to discover that you added a new field to the content type, but there it is and it works. Absolutely wonderful, thanks very much!

One minor point is that when I set the new field to for instance a semicolon, the field is empty again when I revisit it after saving. It doesnt forget the setting, but it i just not visible.

I'll solve the rest of my problems with Exiftools and the like. Thanks again.

jphautin’s picture

yes you are right. make the correction to quickly. I will create a new release with the settings set correctly.

  • jphautin committed cec0bea on 7.x-1.x
    Issue #2656744 by jphautin: Strange characters in Exif Tags
    
theorichel’s picture

One afaik undiscussed problem remains. The tags imported into Drupal display strange. In Taxonomy they are listed normally, but when I click on 'edit' of for instence term 'Aaf', I see A�a�f. When I change this back to regular text the taxonomy doesnt work anymore.
Now to solve this I joined the Exiftool-forum, where I got a swift response. I copy below some of the correspondence. I sent them a picture that contained the 'Aaf'-tag
PhilH/Exiftool: I'm a bit at a loss. The XMP in the original image you sent contains a Subject of "Aaf". I don't see any funny characters when I extract tags with ExifTool, or when I look at the XMP directly.
To which I replied: XMP? This is supposed to be an exif-tag. In Drupal I get the message that the XMP-library is not loaded and that therefore no xmp tags are extracted. Well they may be there nevertheless of course, but the file never went through any Adobe product anyway. Also this Aaf is extracted through 'Keywords' and not through 'Subject'.
And his answer: There is also an EXIF XPKeywords tag which stores the value "Aaf". This is stored with a 2-byte Unicode encoding, so in binary it looks like zero bytes between ASCII characters. I wonder if this is the problem. If so, whatever you are using to read this tag isn't decoding it properly. With exiftool -v you should see this:
--------------------------------
Code: [Select]

| 10) XPKeywords = Aaf
| - Tag 0x9c9e (8 bytes, int8u[8] read as undef[8]):
| 11a6: 41 00 61 00 66 00 00 00 [A.a.f...]

---------------------------
Can you shed any light here?
Many thanks

theorichel’s picture

Anyway, with Exiftool I have been able to find out that the 'extending' of tags with random chunks of words is a result from the conversion of a tif to a jpg and happens with the program 'CoffeeCup'.

jphautin’s picture

Hello,

here is the point : Most of Exif Data are stored in Unicode if the language chosen is not English. The make the need to read data in unicode.
But when you use a form, the unicode is not well rendered. I do not know how to change this behavior.
For taxonomy, I have tocheck the i18n_taxonomy module.

If it is really necessary, You could create a new issue to add a settings to read/store data as unicode or plain text.

theorichel’s picture

Thank you but I have decided to just delete those distorted tags and add the correct ones manually. Some work but doable. I can live with this.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.