Hi Dan.

Do you have any plans to convert this module to work with D8.

If so, I could use it in my current project, and would be willing to work with you to achieve the conversion.

James

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jlscott created an issue. See original summary.

bucefal91’s picture

FileSize
2.23 KB

Hello, guys!

I needed this functionality (get an image preview for the 1st page of a PDF). So I made a little research. I found this module and https://www.drupal.org/project/pdfpreview. They are pretty similar.

Looking at how both 7.x modules have handled this task, I've put up my own version of the same feature, but for D8. I am attaching a tar.gz file with my module (not sure if attaching a patch in such cases makes any sense since we literally speak about another Drupal core).

A few words about how I did it. Initially I tried to provide a new widget for file field (just like 7.x of PDF to Image does it). But then I did not like, because in fact it is not about widget (about how you add PDFs), rather it's about actually processing those PDFs every time an entity is saved. So I wiped clean my widget and just took another approach.

I use third party settings (https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Config%21...) to store in FieldConfig whether the PDFs should be "translated" into images. FieldConfig in D8 is what used to be field instance in D7. So you can enable/disable the PDF to Image functionality on per-instance basis through field admin UI.

My solution also depends on ImageMagick module (https://www.drupal.org/project/imagemagick) - I just did not want to code PDF to image conversion myself, so I opted to "outsource" it into that module. This issue will be of use to know how imagemagick (and thus image styles) can generate images out of PDF files https://www.drupal.org/node/2815483.

Lastly, I also wanted to reuse as much of Drupal concepts as possible, so instead of "hard wiring" conversion parameters into PDF to Image settings like the 7.x version does it, I use well known Drupal image styles (you can define an image style that will convert a PDF to JPG, PNG or anything else). So you can configure the image style that will be responsible for conversion as you please, and then you'll just reference it from PDF to Image settings.

To summarize, the whole conversion happens approximately in the following steps:

  1. Whenever an entity is saved, examine if it contains any file fields that must be "translated" into images.
  2. For each PDF file, execute necessary image style on top of it.
  3. The output (presumably the image file generated by the selected image style from the PDF) save into a specified image field on the same entity.

Pretty simple scheme, since 90% of the actual work is outsourced into image styles & image magick. The module is really 7kb big and it does the job well.

bucefal91’s picture

Status: Active » Needs review

I'll put as it "needs review" since now we actually have code to be reviewed.

bucefal91’s picture

FileSize
2.67 KB

Slightly updated version. The updated version will not forcefully overwrite the preview image if that one already exists. This way there is always a possibility to upload a custom preview image should not be desired the autogenerated one.

dman’s picture

Thanks for sharing this!
It's great that you described your thoughts and reasoning for your work - this helps a lot to see what has and hasn't been done! It's really helpful.

Reviewing gzips and changes is a little difficult. Can I suggest you put it in a sandbox?
In order to make merging later possible, a process for this can be:

* Create a new empty sandbox of your own
* Clone this 7.x/master module repo into it
* Create a new 8.x (empty) branch
* Add and commit your code there

It would go something like:

git clone --no-checkout https://git.drupal.org/project/pdf_to_imagefield.git
cd pdf_to_imagefield
git checkout -b 8.x-1.x
git rm -rf .
git commit -m "Reset everything for 8.x branch"
git remote set-url origin ${your_repo}
# Copy your files in..
git add .
git commit -m "Rewrite for 8.x"
git push

What this means is that later we'll be able to seamlessly merge this branch into the existing repo - with your history included, despite it being a full rewrite.
Other ways are possible, but often include some copy & paste and history loss.
It also means your own sandbox will be available for independent review and testing.

* If all goes well, I feel we should be able to open an 8.x branch in the project here pretty early on, and can move forwards with that.
I'm suggesting this (slightly more formal) approach, because that means you will get full attribution and ownership of the changes in the logs later - which would not happen if I copy&pasted your zip directly into an 8.x branch immediately.

dman’s picture

Before I hit the code, I'll talk about the good points you make...

The 7.x.3.x was using a unique widget, because that was the way available in D7 to provide additional config settings on a per-field basis.
For a user, setting preferences about what fields to operate on, where files get stored, what types of files are managed, and all that, is a per-field setting, and available in the field UI for that field. This stuff therefore gets configured as a 'widget' extension.
By adding 'third_party_settings' using a 'form_alter' - you are really just providing widget-like behaviour - only implicitly (based on introspection) instead of explicitly (based on site-builder choices). form_alters are a hidden drupal-ism, and *i feel* that D8 plugins are more explicit about what they are doing.

We wanted to be able to integrate with the rest of the Drupal framework as decoupled as possible.
For this reason, the process of *creating* the image (the data) is separate from the decision of how it gets displayed in different view_modes (the view).
Thus, setting the dimensions for the 'original' is the domain for the 'create' process, while the rendering choice is a display/theme job. THAT SAID, I do see the economy of trying to re-use descriptions of a drupal image_style ... but it may not always match some of the use-cases we may need.

From visual inspection, I don't see where the actual conversion takes place - it seems that when you describe the outsourcing of the image generation *entirely* to imagemagick ( is #2815483: Convert pdf to jpg truly a complete solution? ) that this becomes entirely an image style problem.
Again, I see some economy in this, and if it works - through admin changes and cache clears, and migrations - that seems pretty powerful.

I do have some concerns about the wider usage, and interactions with other mechanisms for image handling.
But for now, something is better than nothing, so i think that this can make some progress, although I'd have to test it against a few of our expected scenarios.

bucefal91’s picture

Hi :)

I always try to accompany my code with some verbal explanations about what it does :)

I'll open a sandbox next week then.

I also see your reasoning about "preview image creation is one thing, and displaying the preview is another thing" and totally support it.

Yes, you actually can 100% outsource PDF preview generation to image styles - basically you just invoke an image style on a PDF file and you get its preview. This way PDF to imagefield "abstracts away" from any semi-ugly shell calls. I definitely do not know much about possible use cases of PDF to imagefield module. As you've been in charge of the module you must know much better what kind of tasks people expect from the module and whether those expectations can be covered while still 100% outsourcing all the PDF->image dirty work to image styles.

Maybe even if standard image style effects are not enough to cover all the 100% of use cases, I'd still rather suggest to introduce additional image effects through PDF to imagefield submodule and then use them for actual PDF->image conversion rather than interacting directly with imagemagick binaries because this abstraction via image styles seems valuable to me and it gives an additional degree of freedom (somebody can always "plug-in" his own conversion backend instead of imagemagick while PDF to imagefield will not even notice the change). But that's just my personal opinion.

dman’s picture

Status: Needs review » Active

I've got a few expectations about what current capabilities we don't want to lose, and only a few future wish-list items.

A number of the expectations are in the tests

I work with a number of legacy or migrated (mostly govt or academic) sites. Many of them have thousands of old PDFs that sometimes get a theme makeover.
The target audience is just as often for this enhancement to be a feature request on an existing site, with pre-defined architecture and fields as it is to be a brand new built-from-scratch one. This adds some cases that need to be considered.

Points below are just listed, I've not rated them in order of importance yet.

Existing features

  1. Should work when adding a new node with configured fields
  2. Should also work if the module is introduced on an existing site that's already got nodes with PDF files attached
  3. Should be compatible (and round-trip-safe) with an editor choosing and uploading their own preview image if they wish to replace the automatic one
  4. Or if this module is enabled on a site that's already been doing this manually.
  5. Should be available as a bulk operation to let existing content take advantage of the new preview feature.
  6. Should be able to be configured to work selectively on entities that have more than one file attachment field, or more than one image field.
  7. The produced images should be managed and available to all normal rendering expectations, image_style, display_suite, media library, image file rendering options like lightbox. They should be native managed Drupal image files. (Having ImageMagick convert a file of type PDF to a derivative file of type PNG seems like it would risk MIME type or rendering widget confusion. I've already been there with imagecache_actions)
  8. Should be compatible with bulk operations or API calls, such as Migrate, Feeds, VBO, Services, and other automated content import processes.
  9. Must take care to do performance and batch handling gracefully - anticipate and deal with problems in the conversion process. (Cases have existed where a server upgrade (or PATH or permissions change) made ImageMagick libraries stop working, or where a site was migrated from an OS where it worked to one where the requirements had not been installed).
    Supporting the additional install steps (Ghostscript libraries etc) has been damn tricky - so I do largely see wisdom in handing off to ImageMagick module processing - if ImageMagick is the place that takes ownership of all this. Just sorta-permitting it through abuse of custom_actions seems a bit hard to reliably support/document.

Expectations for D8

  1. Should work on fieldable entities (not just nodes)
  2. Must be config-settings compatible. Declares a schema for any field config settings it introduces, and exposes them for import and export.

In D7, I supplied an optional pre-configured 'feature' content type definition - Which has been very helpful for being able to demo, to test, and to develop upon.
I expect that D8 config-management will make this much easier to work upon.

In the maybe-difficult pile:

  1. Going forwards, we want file_entity/media_entity handling. This is the Drupal 8.3 replacement for file-attachments, and well suited to the sort of pdf-file-management challenges we have *but* it's significantly trickier to invent file entries on the fly. OTOH, there is a media API that IMPLICITLY supports a file (PDF) having a PREVIEW as well. (eg video file_entities get a jpeg attached) so if we slip this utility in there it'll become a must-have.
  2. Should be able to produce an array of images (screenshots of all pages) for alternative displays like slideshows or field pagers. This was a strong feature in D6 & D7, though I don't *personally* use it much, it was powerful to be able to provide.
  3. MAY have a way to deal with multiple file uploads : more than one PDF in the same field. This ended up too hard to figure out in D7, so it's unstable/unsupported - don't do that.

I'm not asking you to address all those issues ;-)

But just letting you know my priorities and expected roadmap. I'm not sure yet, but I want to watch out for any potential conflicts between this idea and the direction you've got so far.

I think that it's possible that where you've gone with catching hook_entity_presave is going to work out better than the D7 hack that messed around with form_element_validate .. but there may be similar drawbacks there also.

bucefal91’s picture

Hello!

I got the sandbox ready: https://www.drupal.org/sandbox/bucefal91/2831541 did the exact setup as you requested (cloning this git repo, then swapping the remote URL, then pushing 8.x-1.x branch into the sandbox repo)

I've tested my code on the following setups:

  • Just a normal content type with file + image fields:
    • Uploading just PDF (image is autogenerated)
    • Uploading both PDF & image (the custom image is respected and not overwritten by the autogenerated one)
    • Updating the PDF and keeping the old preview (by design in such case the old preview is overwritten by the autogenerated, since supposedly the old preview was about the old PDF)
    • Updating both the PDF and its preview (the new custom preview is kept and not overwritten by the autogenerated one)
    • Removing the PDF and keeping preview (the preview is auto-removed since there's no more PDF)
    • Keeping the same PDF and just updating the preview image (the new custom preview image is kept and not overwritten by the autogenerated one).
  • Now the same set up (normal content type with file + image fields), but the whole thing is translatable. Executing the same individual test cases as enlisted above, but on both translations and making sure the 2 translations do not interfere among themselves.
  • Paragraphs: (the website I need it for is using the file + image fields inside of a paragraph). So we have kind of a scheme: content type has paragraph field and the paragraph type has file + image fields. All the same test cases executed on non-translatable paragraphs.
  • Translatable paragraphs (the same set up as above, but now we enable translation on the parent content type + the fields of paragraph entity). Again all the same test cases.
  • As far as I can tell we have 100% pass (I was doing it manually). That's actually why my code has some ugly revision_id look ups - it was necessary to make it work within paragraphs entities. Due to how Paragraphs module manage its work it wasn't that trivial to find out the "last" version of values of the fields (file & image) in question in order to conclude whether both or only PDF was updated in this current "save" operation.

    I've looked through your current 7.x-1.x tests: in theory the current 8x code should pass them.

    From your list of "Existing features", I got the following uncovered:

    Should be available as a bulk operation to let existing content take advantage of the new preview feature.

    Currently no batch autogeneration is available.

    Should be compatible with bulk operations or API calls, such as Migrate, Feeds, VBO, Services, and other automated content import processes.

    As far as I understand it, yes, it's compatible with all of it out-of-the-box since the code "hooks in" upon ::save() method, so whoever invokes that method (and literally everything must invoke that method when it wants to update/create an entity)

    Must take care to do performance and batch handling gracefully - anticipate and deal with problems in the conversion process. (Cases have existed where a server upgrade (or PATH or permissions change) made ImageMagick libraries stop working, or where a site was migrated from an OS where it worked to one where the requirements had not been installed).

    Haven't considered performance and throughput yet. Also, a random idea: we could ship PDF to imagefield 8.x module with a pre-created image style (via *.yml file) that generates JPG out of PDF so the users will have less pain setting up and configuring the whole thing. Right now you are expected to create a new image style (for PDF -> image) conversion manually.

    About "In the maybe-difficult pile:"

    Should be able to produce an array of images (screenshots of all pages) for alternative displays like slideshows or field pagers. This was a strong feature in D6 & D7, though I don't *personally* use it much, it was powerful to be able to provide.

    This could be a "show stopper" and stop us from using image styles, because by its very nature, image styles manipulate presumably the same file: you input PDF and output JPG. It might be difficult to make image style work in the scheme: you input PDF and output multiple JPG files. I'll have to study image styles to see whether it's possible. Maybe precisely here we'll need a custom image effect to cover such discrepancy.

    MAY have a way to deal with multiple file uploads : more than one PDF in the same field. This ended up too hard to figure out in D7, so it's unstable/unsupported - don't do that.

    Right the 8.x code does the following: it maps one to one PDFs to Images within the 2 fields. So if we have in the file field the following values:

    • delta 0: Hey-my-cool.pdf
    • delta 1: Just-another-file.pdf

    Then in the corresponding image field you'll wind up with:

    • delta 0: Hey-my-cool.pdf.jpeg (1st page preview of Hey-my-cool.pdf)
    • delta 1: Just-another-file.pdf.jpeg (1st page preview of Just-another-file.pdf)
    dman’s picture

    Thanks for your communication style. It really helps me be sure we are on the same page.

    You've picked up on my thoughts well.
    I've been giving some thought to the wholesale delegation to imagemagick styles, and I feel it's going to be a good way forwards.

    All the test cases you describe seem to be correct and intuative enough for most editorial needs. Getting the rules right during replacements and re-edits was a bit of a mission the first time around, as you can probably imagine.

    Providing a preset image conversion style is certainly a great step forwards. I was trying to imagine documenting the setup steps, and don't want too many things to go wrong there.
    I agree with your thoughts that (given my experience with imagecache_actions etc) that we may even be able to provide a specific image style plugin to wrap and configure the commandline stuff and give it back to ImageMagick - to smooth over the 'custom action' step... while still allowing the manual/custom/yaml one it as an option, but making setup less fragile for plug & play.

    I know that VBO proper is not properly available in D8 (yet, AFAIK) but we do have the ability to provide external 'actions' - such as those seen in the drop-down in /admin/content. An action there for 'update pdf derivatives' would be sufficient, will be forwards-compatible with VBO and Rules (when either of them arrive), and is decoupled and scriptable. I've got a few examples of that for D8 already, so as long as the 'action' of running the update is available to be called, we can do that. In fact, for True D8 OO style, the process itself could/should be moved into such an action plugin, and then *invoked* from the framework you already have working. But that's all internal refactoring, maybe.

    It's great to hear that your use-case is paragraphs - that's a great fit, and means we are certainly working with entities (not just nodes) natively! (There is no entity-level support for VBO-style actions yet AFAIK, so that will have to wait.)

    I'm OK to drop the 'multiple' pages step for now. I acknowledge it;s not a good fit for this approach, and can be put on the wish-list. Maybe not a show-stopper, as I can imagine a way (if using a self-configured imagecache action that takes parameters) it could be re-introduced later - but only if there is a need.
    Since we first needed this (for fancy, javascripty or Flash pagers) - other plugins have become stronger, and I'd be pointing people towards PDF-native readers in many cases this year.

    My performance concerns mostly stem from the multi-page challenge. But I had also encountered PDFs which (due to either size or formatting or encoding) could make the node-save process stall even if just trying to get the front page. It's something to watch out for. In response to that, I refactored where the action actually ran. Being able to run the update as a batchable, asynchronous action in the background after form-save gave me a lot more robustness there.
    But for single-page stuff, I'm OK to keep it in the single thread process... for now.

    I'll have to put aside some time to give this a proper roll on a test environment, and maybe look at some D8 tests ... (and ... there goes the weekend)

    bucefal91’s picture

    Minor update from me: now there is also an option to force a certain image toolkit for PDF to image conversion over the default website image toolkit (the one that is configured at /admin/config/media/image-toolkit).

    Useful option if you do not want to run all the image conversions on ImageMagick but since ImageMagick is the only toolkit that works with PDFs you were kind of forced to switch onto it. Now within file field settings you can specify to forcefully use Imagemagick no matter the default website toolkit.

    amarincolas’s picture

    Hello!

    I just downloaded the code and it does not seem to work anymore. I think something wierd its going on between the imagemagick module (8.x-1.0-alpha6) and the PDF's because the image is not generated from it.

    Jeff Veit’s picture

    @amirincolas, it works for me on D8.3.

    1. Ensure you have ImageMagick toolkit installed
    2. Set up an image style which uses the Convert action. It's possible you will have to switch to using the ImageMagick toolkit to see this action. Not sure. But this step is crucial, and where I went wrong.
    3. Set up a field to receive the image, on your content type.
    4. Set up or alter a file field - the one that holds the PDF. There will be extra options of how to handle the write to an image field on the content type.
    5. Edit and save a bit of content of that content type, with a PDF file. The image should be auto generated.

    It doesn't yet autogenerate if you don't save - i.e. no bulk import.

    bucefal91, nice module. Thank you.

    sonoutlaw’s picture

    I cannot get this to work. I have spent many hours adding/removing fields, content types, image styles...

    Drupal 8.3
    ImageMagick 6.7.2-7 2017-03-22 Q16

    The image file field is not being populated on node save. Imagemagick cannot do it's magic:

    ImageMagick error 1: identify: unable to open image `/home/southna/public_html/sites/default/files/styles/pdf_thumbnail_style/public/pdf-test.pdf.jpeg': No such file or directory @ error/blob.c/OpenBlob/2589. [command: /usr/bin/identify -format 'format:%[magick]|width:%[width]|height:%[height]|exif_orientation:%[EXIF:Orientation]\n' '/home/southna/public_html/sites/default/files/styles/pdf_thumbnail_style/public/pdf-test.pdf.jpeg']

    2ndmile’s picture

    Not working for me. Here is the error I am getting...

    Recoverable fatal error: Argument 1 passed to pdf_to_imagefield_imagemagick_arguments_alter() must be an instance of Drupal\imagemagick\Plugin\ImageToolkit\ImagemagickToolkit, instance of Drupal\imagemagick\ImagemagickExecArguments given, called in /var/www/forceqvidev/drupal/web/core/lib/Drupal/Core/Extension/ModuleHandler.php on line 501 and defined in pdf_to_imagefield_imagemagick_arguments_alter() (line 127 of modules/experimental/pdf_to_imagefield/pdf_to_imagefield.module).

    I have confirmed that Imagemagick and Ghostscript are installed correctly on the server.

    I will report back what I find.

    mcaden’s picture

    Eureka!

    Installing from cloning master on the sandbox git with the latest dependency modules and following the advice from #13 I received the same error from #15

    This was resolved by DOWNGRADING the imagemagick module to 1.x instead of 2.x.

    I still had the problem where everything SEEMED fine but there was no resulting image. I was able to solve this by:

    1. Navigate to: admin/config/media/image-toolkit
    2. Expand: "IMAGE FORMATS" -> "ENABLE/DISABLE IMAGE FORMATS"
    3. PDF -> enabled: true

    Not sure why I assumed PDF would be enabled by default but for me it was set to false. Once I did this it worked.

    EDIT: HUGE problem. What happens when file style changes/gets flushed? Answer: The preview is GONE. No way I want to go back through and "re-save" all the PDFs I've uploaded.

    However, at #1014816: Allow image fields to use any extensions the current image toolkit supports (instead of hard-coding jpg, png and gif only) I was actually able to have a PDF upload as an image field, and then simply set an image style for it to display as with the "convert" mentioned above. That seems to solve my problems. I simply used the imagemagick setup from here (making sure PDF was enabled like I mention above) and then applied the patch from that issue.

    sarathkm’s picture

    #16 issue is something troubling when image styles are flushed. That means we cannot utilize Image Style convert feature to do this job.

    Also it just converts just first pdf page to image unlike 7 module.

    mcaden’s picture

    However, at #1014816: Allow image fields to use any extensions the current image toolkit supports (instead of hard-coding jpg, png and gif only) I was actually able to have a PDF upload as an image field, and then simply set an image style for it to display as with the "convert" mentioned above. That seems to solve my problems. I simply used the imagemagick setup from here (making sure PDF was enabled like I mention above) and then applied the patch from that issue.

    For anybody else looking at this issue, that patch got pulled to the 8.5 branch and it was all I needed in order to get a PDF to render as an image preview. I didn't need this module at all.

    markdc’s picture

    Hello 2 years later. =]

    Tried #16 without success.
    And contrary to #18, I can't upload a PDF into an image field using Drupal 8.8.x with ImageMagick installed and PDF enabled for an image field.

    What is the best way to achieve this in 2020?

    Thanks.

    Anybody’s picture

    3 years later :) This great module is still used by 1,576 projects. Is there any plan for Drupal 8 / 9 /10?

    If not, could the maintainer perhaps put a link to this issue and information about it on the module page? Still I think it would be great, if the Drupal community (includes us all, me also) could create a good alternative!
    If one exists, it should be listed on the module page, please. :)

    Alternatives (or at least starting points) for Drupal 8+ are:

    HitchShock’s picture

    Status: Active » Closed (outdated)

    The ticket is already outdated.

    Anybody’s picture

    Thanks @HitchShock nice to see it's available now! Great work :)

    markdc’s picture

    Awesome. Can’t wait to try this out. Thanks!

    UPDATE:
    It's working great! I've already switched my production site over to this module. Multi-page PDF images have been a long-waited-for feature. Thank you!