Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Hi Dan.
Do you have any plans to convert this module to work with D8.
If so, I could use it in my current project, and would be willing to work with you to achieve the conversion.
James
Comment | File | Size | Author |
---|---|---|---|
#4 | pdf_to_imagefield.tar_.gz | 2.67 KB | bucefal91 |
Comments
Comment #2
bucefal91 CreditAttribution: bucefal91 commentedHello, guys!
I needed this functionality (get an image preview for the 1st page of a PDF). So I made a little research. I found this module and https://www.drupal.org/project/pdfpreview. They are pretty similar.
Looking at how both 7.x modules have handled this task, I've put up my own version of the same feature, but for D8. I am attaching a tar.gz file with my module (not sure if attaching a patch in such cases makes any sense since we literally speak about another Drupal core).
A few words about how I did it. Initially I tried to provide a new widget for file field (just like 7.x of PDF to Image does it). But then I did not like, because in fact it is not about widget (about how you add PDFs), rather it's about actually processing those PDFs every time an entity is saved. So I wiped clean my widget and just took another approach.
I use third party settings (https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Config%21...) to store in
FieldConfig
whether the PDFs should be "translated" into images.FieldConfig
in D8 is what used to be field instance in D7. So you can enable/disable the PDF to Image functionality on per-instance basis through field admin UI.My solution also depends on ImageMagick module (https://www.drupal.org/project/imagemagick) - I just did not want to code PDF to image conversion myself, so I opted to "outsource" it into that module. This issue will be of use to know how imagemagick (and thus image styles) can generate images out of PDF files https://www.drupal.org/node/2815483.
Lastly, I also wanted to reuse as much of Drupal concepts as possible, so instead of "hard wiring" conversion parameters into PDF to Image settings like the 7.x version does it, I use well known Drupal image styles (you can define an image style that will convert a PDF to JPG, PNG or anything else). So you can configure the image style that will be responsible for conversion as you please, and then you'll just reference it from PDF to Image settings.
To summarize, the whole conversion happens approximately in the following steps:
Pretty simple scheme, since 90% of the actual work is outsourced into image styles & image magick. The module is really 7kb big and it does the job well.
Comment #3
bucefal91 CreditAttribution: bucefal91 commentedI'll put as it "needs review" since now we actually have code to be reviewed.
Comment #4
bucefal91 CreditAttribution: bucefal91 commentedSlightly updated version. The updated version will not forcefully overwrite the preview image if that one already exists. This way there is always a possibility to upload a custom preview image should not be desired the autogenerated one.
Comment #5
dman CreditAttribution: dman as a volunteer and at Sparks Interactive commentedThanks for sharing this!
It's great that you described your thoughts and reasoning for your work - this helps a lot to see what has and hasn't been done! It's really helpful.
Reviewing gzips and changes is a little difficult. Can I suggest you put it in a sandbox?
In order to make merging later possible, a process for this can be:
* Create a new empty sandbox of your own
* Clone this 7.x/master module repo into it
* Create a new 8.x (empty) branch
* Add and commit your code there
It would go something like:
What this means is that later we'll be able to seamlessly merge this branch into the existing repo - with your history included, despite it being a full rewrite.
Other ways are possible, but often include some copy & paste and history loss.
It also means your own sandbox will be available for independent review and testing.
* If all goes well, I feel we should be able to open an 8.x branch in the project here pretty early on, and can move forwards with that.
I'm suggesting this (slightly more formal) approach, because that means you will get full attribution and ownership of the changes in the logs later - which would not happen if I copy&pasted your zip directly into an 8.x branch immediately.
Comment #6
dman CreditAttribution: dman as a volunteer and at Sparks Interactive commentedBefore I hit the code, I'll talk about the good points you make...
The 7.x.3.x was using a unique widget, because that was the way available in D7 to provide additional config settings on a per-field basis.
For a user, setting preferences about what fields to operate on, where files get stored, what types of files are managed, and all that, is a per-field setting, and available in the field UI for that field. This stuff therefore gets configured as a 'widget' extension.
By adding 'third_party_settings' using a 'form_alter' - you are really just providing widget-like behaviour - only implicitly (based on introspection) instead of explicitly (based on site-builder choices). form_alters are a hidden drupal-ism, and *i feel* that D8 plugins are more explicit about what they are doing.
We wanted to be able to integrate with the rest of the Drupal framework as decoupled as possible.
For this reason, the process of *creating* the image (the data) is separate from the decision of how it gets displayed in different view_modes (the view).
Thus, setting the dimensions for the 'original' is the domain for the 'create' process, while the rendering choice is a display/theme job. THAT SAID, I do see the economy of trying to re-use descriptions of a drupal image_style ... but it may not always match some of the use-cases we may need.
From visual inspection, I don't see where the actual conversion takes place - it seems that when you describe the outsourcing of the image generation *entirely* to imagemagick ( is #2815483: Convert pdf to jpg truly a complete solution? ) that this becomes entirely an image style problem.
Again, I see some economy in this, and if it works - through admin changes and cache clears, and migrations - that seems pretty powerful.
I do have some concerns about the wider usage, and interactions with other mechanisms for image handling.
But for now, something is better than nothing, so i think that this can make some progress, although I'd have to test it against a few of our expected scenarios.
Comment #7
bucefal91 CreditAttribution: bucefal91 commentedHi :)
I always try to accompany my code with some verbal explanations about what it does :)
I'll open a sandbox next week then.
I also see your reasoning about "preview image creation is one thing, and displaying the preview is another thing" and totally support it.
Yes, you actually can 100% outsource PDF preview generation to image styles - basically you just invoke an image style on a PDF file and you get its preview. This way PDF to imagefield "abstracts away" from any semi-ugly shell calls. I definitely do not know much about possible use cases of PDF to imagefield module. As you've been in charge of the module you must know much better what kind of tasks people expect from the module and whether those expectations can be covered while still 100% outsourcing all the PDF->image dirty work to image styles.
Maybe even if standard image style effects are not enough to cover all the 100% of use cases, I'd still rather suggest to introduce additional image effects through PDF to imagefield submodule and then use them for actual PDF->image conversion rather than interacting directly with imagemagick binaries because this abstraction via image styles seems valuable to me and it gives an additional degree of freedom (somebody can always "plug-in" his own conversion backend instead of imagemagick while PDF to imagefield will not even notice the change). But that's just my personal opinion.
Comment #8
dman CreditAttribution: dman as a volunteer and at Sparks Interactive commentedI've got a few expectations about what current capabilities we don't want to lose, and only a few future wish-list items.
A number of the expectations are in the tests
I work with a number of legacy or migrated (mostly govt or academic) sites. Many of them have thousands of old PDFs that sometimes get a theme makeover.
The target audience is just as often for this enhancement to be a feature request on an existing site, with pre-defined architecture and fields as it is to be a brand new built-from-scratch one. This adds some cases that need to be considered.
Points below are just listed, I've not rated them in order of importance yet.
Existing features
Supporting the additional install steps (Ghostscript libraries etc) has been damn tricky - so I do largely see wisdom in handing off to ImageMagick module processing - if ImageMagick is the place that takes ownership of all this. Just sorta-permitting it through abuse of custom_actions seems a bit hard to reliably support/document.
Expectations for D8
In D7, I supplied an optional pre-configured 'feature' content type definition - Which has been very helpful for being able to demo, to test, and to develop upon.
I expect that D8 config-management will make this much easier to work upon.
In the maybe-difficult pile:
I'm not asking you to address all those issues ;-)
But just letting you know my priorities and expected roadmap. I'm not sure yet, but I want to watch out for any potential conflicts between this idea and the direction you've got so far.
I think that it's possible that where you've gone with catching hook_entity_presave is going to work out better than the D7 hack that messed around with form_element_validate .. but there may be similar drawbacks there also.
Comment #9
bucefal91 CreditAttribution: bucefal91 at Websolutions Agency commentedHello!
I got the sandbox ready: https://www.drupal.org/sandbox/bucefal91/2831541 did the exact setup as you requested (cloning this git repo, then swapping the remote URL, then pushing 8.x-1.x branch into the sandbox repo)
I've tested my code on the following setups:
As far as I can tell we have 100% pass (I was doing it manually). That's actually why my code has some ugly revision_id look ups - it was necessary to make it work within paragraphs entities. Due to how Paragraphs module manage its work it wasn't that trivial to find out the "last" version of values of the fields (file & image) in question in order to conclude whether both or only PDF was updated in this current "save" operation.
I've looked through your current 7.x-1.x tests: in theory the current 8x code should pass them.
From your list of "Existing features", I got the following uncovered:
Currently no batch autogeneration is available.
As far as I understand it, yes, it's compatible with all of it out-of-the-box since the code "hooks in" upon
::save()
method, so whoever invokes that method (and literally everything must invoke that method when it wants to update/create an entity)Haven't considered performance and throughput yet. Also, a random idea: we could ship PDF to imagefield 8.x module with a pre-created image style (via *.yml file) that generates JPG out of PDF so the users will have less pain setting up and configuring the whole thing. Right now you are expected to create a new image style (for PDF -> image) conversion manually.
About "In the maybe-difficult pile:"
This could be a "show stopper" and stop us from using image styles, because by its very nature, image styles manipulate presumably the same file: you input PDF and output JPG. It might be difficult to make image style work in the scheme: you input PDF and output multiple JPG files. I'll have to study image styles to see whether it's possible. Maybe precisely here we'll need a custom image effect to cover such discrepancy.
Right the 8.x code does the following: it maps one to one PDFs to Images within the 2 fields. So if we have in the file field the following values:
Then in the corresponding image field you'll wind up with:
Comment #10
dman CreditAttribution: dman as a volunteer and at Sparks Interactive commentedThanks for your communication style. It really helps me be sure we are on the same page.
You've picked up on my thoughts well.
I've been giving some thought to the wholesale delegation to imagemagick styles, and I feel it's going to be a good way forwards.
All the test cases you describe seem to be correct and intuative enough for most editorial needs. Getting the rules right during replacements and re-edits was a bit of a mission the first time around, as you can probably imagine.
Providing a preset image conversion style is certainly a great step forwards. I was trying to imagine documenting the setup steps, and don't want too many things to go wrong there.
I agree with your thoughts that (given my experience with imagecache_actions etc) that we may even be able to provide a specific image style plugin to wrap and configure the commandline stuff and give it back to ImageMagick - to smooth over the 'custom action' step... while still allowing the manual/custom/yaml one it as an option, but making setup less fragile for plug & play.
I know that VBO proper is not properly available in D8 (yet, AFAIK) but we do have the ability to provide external 'actions' - such as those seen in the drop-down in /admin/content. An action there for 'update pdf derivatives' would be sufficient, will be forwards-compatible with VBO and Rules (when either of them arrive), and is decoupled and scriptable. I've got a few examples of that for D8 already, so as long as the 'action' of running the update is available to be called, we can do that. In fact, for True D8 OO style, the process itself could/should be moved into such an action plugin, and then *invoked* from the framework you already have working. But that's all internal refactoring, maybe.
It's great to hear that your use-case is paragraphs - that's a great fit, and means we are certainly working with entities (not just nodes) natively! (There is no entity-level support for VBO-style actions yet AFAIK, so that will have to wait.)
I'm OK to drop the 'multiple' pages step for now. I acknowledge it;s not a good fit for this approach, and can be put on the wish-list. Maybe not a show-stopper, as I can imagine a way (if using a self-configured imagecache action that takes parameters) it could be re-introduced later - but only if there is a need.
Since we first needed this (for fancy, javascripty or Flash pagers) - other plugins have become stronger, and I'd be pointing people towards PDF-native readers in many cases this year.
My performance concerns mostly stem from the multi-page challenge. But I had also encountered PDFs which (due to either size or formatting or encoding) could make the node-save process stall even if just trying to get the front page. It's something to watch out for. In response to that, I refactored where the action actually ran. Being able to run the update as a batchable, asynchronous action in the background after form-save gave me a lot more robustness there.
But for single-page stuff, I'm OK to keep it in the single thread process... for now.
I'll have to put aside some time to give this a proper roll on a test environment, and maybe look at some D8 tests ... (and ... there goes the weekend)
Comment #11
bucefal91 CreditAttribution: bucefal91 at Websolutions Agency commentedMinor update from me: now there is also an option to force a certain image toolkit for PDF to image conversion over the default website image toolkit (the one that is configured at
/admin/config/media/image-toolkit
).Useful option if you do not want to run all the image conversions on ImageMagick but since ImageMagick is the only toolkit that works with PDFs you were kind of forced to switch onto it. Now within file field settings you can specify to forcefully use Imagemagick no matter the default website toolkit.
Comment #12
amarincolas CreditAttribution: amarincolas commentedHello!
I just downloaded the code and it does not seem to work anymore. I think something wierd its going on between the imagemagick module (8.x-1.0-alpha6) and the PDF's because the image is not generated from it.
Comment #13
Jeff Veit CreditAttribution: Jeff Veit commented@amirincolas, it works for me on D8.3.
It doesn't yet autogenerate if you don't save - i.e. no bulk import.
bucefal91, nice module. Thank you.
Comment #14
sonoutlaw CreditAttribution: sonoutlaw commentedI cannot get this to work. I have spent many hours adding/removing fields, content types, image styles...
Drupal 8.3
ImageMagick 6.7.2-7 2017-03-22 Q16
The image file field is not being populated on node save. Imagemagick cannot do it's magic:
ImageMagick error 1: identify: unable to open image `/home/southna/public_html/sites/default/files/styles/pdf_thumbnail_style/public/pdf-test.pdf.jpeg': No such file or directory @ error/blob.c/OpenBlob/2589. [command: /usr/bin/identify -format 'format:%[magick]|width:%[width]|height:%[height]|exif_orientation:%[EXIF:Orientation]\n' '/home/southna/public_html/sites/default/files/styles/pdf_thumbnail_style/public/pdf-test.pdf.jpeg']
Comment #15
2ndmile CreditAttribution: 2ndmile commentedNot working for me. Here is the error I am getting...
Recoverable fatal error: Argument 1 passed to pdf_to_imagefield_imagemagick_arguments_alter() must be an instance of Drupal\imagemagick\Plugin\ImageToolkit\ImagemagickToolkit, instance of Drupal\imagemagick\ImagemagickExecArguments given, called in /var/www/forceqvidev/drupal/web/core/lib/Drupal/Core/Extension/ModuleHandler.php on line 501 and defined in pdf_to_imagefield_imagemagick_arguments_alter() (line 127 of modules/experimental/pdf_to_imagefield/pdf_to_imagefield.module).
I have confirmed that Imagemagick and Ghostscript are installed correctly on the server.
I will report back what I find.
Comment #16
mcaden CreditAttribution: mcaden commentedEureka!
Installing from cloning master on the sandbox git with the latest dependency modules and following the advice from #13 I received the same error from #15
This was resolved by DOWNGRADING the imagemagick module to 1.x instead of 2.x.
I still had the problem where everything SEEMED fine but there was no resulting image. I was able to solve this by:
Not sure why I assumed PDF would be enabled by default but for me it was set to false. Once I did this it worked.
EDIT: HUGE problem. What happens when file style changes/gets flushed? Answer: The preview is GONE. No way I want to go back through and "re-save" all the PDFs I've uploaded.
However, at #1014816: Allow image fields to use any extensions the current image toolkit supports (instead of hard-coding jpg, png and gif only) I was actually able to have a PDF upload as an image field, and then simply set an image style for it to display as with the "convert" mentioned above. That seems to solve my problems. I simply used the imagemagick setup from here (making sure PDF was enabled like I mention above) and then applied the patch from that issue.
Comment #17
sarathkm#16 issue is something troubling when image styles are flushed. That means we cannot utilize Image Style convert feature to do this job.
Also it just converts just first pdf page to image unlike 7 module.
Comment #18
mcaden CreditAttribution: mcaden commentedFor anybody else looking at this issue, that patch got pulled to the 8.5 branch and it was all I needed in order to get a PDF to render as an image preview. I didn't need this module at all.
Comment #19
markdcHello 2 years later. =]
Tried #16 without success.
And contrary to #18, I can't upload a PDF into an image field using Drupal 8.8.x with ImageMagick installed and PDF enabled for an image field.
What is the best way to achieve this in 2020?
Thanks.
Comment #20
Anybody3 years later :) This great module is still used by 1,576 projects. Is there any plan for Drupal 8 / 9 /10?
If not, could the maintainer perhaps put a link to this issue and information about it on the module page? Still I think it would be great, if the Drupal community (includes us all, me also) could create a good alternative!
If one exists, it should be listed on the module page, please. :)
Alternatives (or at least starting points) for Drupal 8+ are:
Comment #21
HitchShockThe ticket is already outdated.
Comment #22
AnybodyThanks @HitchShock nice to see it's available now! Great work :)
Comment #23
markdcAwesome. Can’t wait to try this out. Thanks!
UPDATE:
It's working great! I've already switched my production site over to this module. Multi-page PDF images have been a long-waited-for feature. Thank you!