CVS edit link for Peritus

I'm PHP programmer and use Drupal about 1,5 from now. Recently I finished my last module, which converts a PDF file to ImageFields and I think its good idea to share it with Drupal community. The work was performed with collaboration of user OnkelTem (http://drupal.org/user/239962) who is requesting CVS account too since he will be the co-maintainer of the module.

Module is implemented as a widget for FileField CCK field. It takes PDF file on its input, renders PDF on pages using IM, and then places result to associated ImageField. Actual task on conversion is perfomed by cron, which takes setting like how many PDFs are processed at once.

This functionality was initially requested by OnkelTem in this thread http://drupal.org/node/773050

Nikita

Comments

elaman’s picture

StatusFileSize
new10.2 KB

Attaching my module...

avpaderno’s picture

Status: Postponed (maintainer needs more info) » Needs work
Issue tags: +Module review

Hello, and thanks for applying for a CVS account. I am adding the review tags, and some volunteers will review your code, pointing out what needs to be changed.

As per http://drupal.org/cvs-application/requirements, the motivation message should be expanded to contain more details about the features of the proposed module, and it should include also a comparison with the existing solutions.

dman’s picture

Issue tags: -Module review

I like the idea. So much that I want to try it.
But first:
Documentation: pretty good.

1* Code style: Needs the normal amount of trivial tidy-up: (from coder.module)

pdf_to_imagefield.module

  • Line 128: string concatenation should be formatted with a space separating the operators (dot .) and non-quote terms
      pdf_to_imagefield_convert_pdf($pdf['filepath'], $path .'/'. $pdf['fid'] .'.jpg', array(), array('-density '.$file->density_x.'x'.$file->density_y));
    
  • Line 132: use stdClass caseCapitalization, it's the one exception to the mixed case style standard
        $image = new StdClass();
    
  • Line 154: Use uppercase for PHP constants, e.g. NULL, TRUE, FALSE
        return false;
    
  • Line 163: Control statements should have one space between the control keyword and opening parenthesis
      foreach(filefield_get_file_references($pdf) as $nid => $vid) {
    
  • Line 171: else statements should begin on a new line
        } else {
    
  • Line 206: string concatenation should be formatted with a space separating the operators (dot .) and non-quote terms
      $command = implode(' ', $extra) .' '.escapeshellarg($source) .' '. implode(' ', $args) .' '. escapeshellarg($dest);
    


pdf_to_imagefield.install


pdf_to_imagefield_file.inc

  • Line 16: use stdClass caseCapitalization, it's the one exception to the mixed case style standard
        $pdf = new StdClass();
    
  • Line 20: missing space after comma
        $pdf->density_x = preg_replace('/^([0-9]+)x[0-9]+$/', '\1' ,$file->field['widget']['density']);
    
  • Line 21: missing space after comma
        $pdf->density_y = preg_replace('/^[0-9]+x([0-9]+)$/', '\1' ,$file->field['widget']['density']);
    

sites/all/modules/modified/pdf_to_imagefield/pdf_to_imagefield_widget.inc

pdf_to_imagefield_widget.inc

  • Line 23: string concatenation should be formatted with a space separating the operators (dot .) and non-quote terms
          $imagefields[$key] = $filefield['widget']['label'] .' ('.$filefield['field_name'].')';
    
  • Line 54: Use an indent of 2 spaces, with no tabs
    	// Check that set density is valid.
    

It would be best if you fix that on a first module for review.

2*

include_once dirname(__FILE__) . '/pdf_to_imagefield_file.inc';

Nowadays we use module_load_include()

3*

function pdf_to_imagefield_init() {
  // If FileField is not available, immediately disable ImageField.
...

hook_init() runs for every page in the site, every time. Not a great place to check for something that might just once ever be needed. Could you move this check into somewhere better, like an admin page or hook_requirements()?

/**
 * Implementation of CCK's hook_widget_settings($op = 'validate').
 */
function pdf_to_imagefield_widget_settings_validate($widget) {
	// Check that set density is valid.
  if (!empty($widget['density']) && !preg_match('/^[0-9]+x[0-9]+$/', $widget['density'])) { 
    form_set_error($resolution, t('Please specify a density in the format XxY (e.g. 120x120).')); 
  } 
}

Looks like a code error there - $resolution is undefined. (not a code review thing, just a possible bug and PHP notice)

Good Drupal code by creating a widget for your special field. Generally fits with the Drupal way well. I didn't see much code that should have been Drupal-ed more. Builds on top of imageAPI/Imagemagick, despite it being a bit awkward. It does access an ImageAPI_ImageMagick _private function - but the reason for that is documented. ImageAPI needs to be prodded like that a bit still. Using that func is still better than reinventing an equivalent one.

I tried it out! Basic UI (content type & field config) seemed to work pretty well, with adequate validation and error-prevention. I could figure out what was required from your description, but a 4-step simple 'Setup/Install' guide in the README as well would be handy. No need for "download and enable" steps, but saying that you have to create the unlimited imagefield before the pdf field would be nice.

4* Module did not work for me at first run, and the cron message didn't suggest what could be done about it.

No pdf previews generated to the book node from emerald.pdf file. But it is still in cron list.
(In my case it seems my imagemagick path hadn't been configured yet - a suggestion to check the imagemagick configs would be appropriate in case of an error)

  if (0 != _imageapi_imagemagick_convert_exec($command, $output, $errors)) {
    // Suggest you actually log an error here?
    return FALSE;
  }

I then enabled some debug, got a bit further and encountered

ImageMagick reported an error: sh: gs: command not found convert: Postscript delegate failed 

... Which I eventually solved by Setting the path to gs in the imagemagick delegates.xml file because I couldn't set my server $PATH.

And then it worked Most excellently!.

So installing isn't 100% smooth, though I know that's out of your control - we just need more error-checking and troubleshooting clues. 4*

Suggestion A note to the user explaining why the PDF isn't immediately converted, and maybe a link to cron or an option to update immediately using batch API would be really nice.

Anyway, that was a full module test, not just a code review...

Security uses normal file uploads through Drupal correctly. It sanitizes and quotes the filenames that will be used on the commandline for conversion. Uses existing Drupal roles etc to control access.

If you can address #1, #2, #3, #4 above, I'd be really happy to see this really useful module in use!

dman’s picture

Issue tags: +Module review

X-post removed tags, sorry.
FYI, Folk have been Wanting this sort of functionality for a while now. I don't think there is anything else in this space.

elaman’s picture

Comparsion:
There is no other modules which provide creation of raster copy of PDF.

Features:
Module is based and supposed to function with modules:

The module relies on ImageMagick toolkit and is supposed to function via 'convert' CLI utility.

elaman’s picture

dman, thanks a lot. I'll do my best to fix this problems in short time.

elaman’s picture

1* Code style: Needs the normal amount of trivial tidy-up: (from coder.module)

Fixed.

2* Nowadays we use module_load_include()

Fixed. I've used ImageField as an example, so this piece of code is from there.

3* hook_init() runs for every page in the site, every time. Not a great place to check for something that might just once ever be needed. Could you move this check into somewhere better, like an admin page or hook_requirements()?

What if i delete this check? This was also copied from ImageField. I've no idea, how it's possible to disable/uninstall FileField module while dependent modules are enabled?

4* Module did not work for me at first run, and the cron message didn't suggest what could be done about it.

Still thinking about how to notify user about not found IM...

elaman’s picture

StatusFileSize
new10.19 KB

Oops, forgot to attach file.

avpaderno’s picture

Status: Needs work » Needs review
dman’s picture

#2 Yeah, you'd see that code in earlier modules. It works, but this is a new part of the API.

#3 delete it totally. If it's from imagefield it's only there for imagefields benefit and would run anyway. I thought it was for some special case that you had encountered. And no, I wouldn't think you could disable that module by accident.

#4 just need some "what do I do now" text. Tell user to check imageapi_imagmagick, (how to) check dependencies on the server, see the README. what does something being left in the cron list mean? For me it didn't seem to mean it would try again so..?

OnkelTem’s picture

Hi, I'm co-developer of the module.

dman, first, thank you for interesting in this and for quick and extensive replies.

#4 makes me think we should do some extra tests either on widget configuration page or on module's settings page.

1) Test if "-density" will work on a server. I have a setup atm, where specification of -density breakes everything - convert generates crappy JPG file while Drupal with watchdog keep silence. Maybe some version check? Yet to find out, i'm working on this.

2) Test that issue with delegates.xml

dman’s picture

The normal case with convert and gs would be debugging on a case-by-case basis. If you have ghostscript installed, it should be available in the $PATH. In my case it was for users, but not for the Apache daemon. It's fixable if you are able to restart the machine and have root etc.
But fixing it is not your problem, just testing if gs works at all would help installation - is all I'm saying.
These things are normal when you require other libraries on random servers. All that's expected is making it a bit easier to find where the problem is. I just documented my travails as a part of that!

elaman’s picture

Changes:
- New file: imagemagick_test.pdf (6,5kb)
- Changes in pdf_to_imagefield_widget.inc
New function:

/**
 * NEW: Helper function to check if ImageMagick is ready to convert
 */
function pdf_to_imagefield_check_imagemagick($widget) {
  // init test varaibles
  $source = drupal_get_path('module', 'pdf_to_imagefield') .'/imagemagick_test.pdf';
  $result = file_directory_path() . '/imagemagick_test.jpg';
  $density = !empty($widget['density']) ? $widget['density'] : '100x100';
  // check conversion
  if (pdf_to_imagefield_convert_pdf($source, $result, array(), array('-density '. $density))) {
    file_delete($result);
    return TRUE;
  }
  return FALSE;
}

and changed existing function:

/**
 * Implementation of CCK's hook_widget_settings($op = 'form').
 */
function pdf_to_imagefield_widget_settings_form($widget) {
  if (!pdf_to_imagefield_check_imagemagick($widget)) {
    drupal_set_message(t('ImageMagick test pdf to image conversion failed. Please check your system ImageMagick and/or Ghostscript configuration. While you seeing this message, pdf files will be queued and stay unprocessed.'), 'error');
  }
  // ...
}
elaman’s picture

StatusFileSize
new12.49 KB

:)

dman’s picture

Status: Needs review » Reviewed & tested by the community

Thanks, those are great ideas. An actual test run even.
I'm really liking this!
Lets get CVS on so we can start supporting it!
Due diligence has been done on the CVS application.

And it's a kick-ass feature!

avpaderno’s picture

Status: Reviewed & tested by the community » Needs work
  1. The license file must be removed as it cannot be committed in Drupal.org repository.
  2.     form_set_error($resolution, t('Please specify a density in the format XxY (e.g. 120x120).')); 
    

    The code is accessing a not defined variable; this would cause the error message to not appear associated with a form field, and the user could not understand which form field contains the not correct value.

  3.   $filefield_settings = module_invoke('filefield', 'widget_settings', 'save', $widget);
    

    Does one of the modules from which the proposed module depends require filefield.module? If filefield.module is not required, then the line after the one I reported will raise an error.

  4. 
    module_load_include('inc', 'pdf_to_imagefield', 'pdf_to_imagefield_file');
    module_load_include('inc', 'pdf_to_imagefield', 'pdf_to_imagefield_widget');
    
    

    It is better to merge the include files into the module file, if they are always being included. To split the code into different files make sense when the files are included in some specific moments, as it happens for the files containing menu callback, which are automatically included from Drupal when the specific path is visited.

elaman’s picture

StatusFileSize
new7.09 KB

Fixes:
1. File Deleted.
2. Fixed.
3. This module depends on FileField module, so it's required.
4. Merged into the module file.

Changes:

function pdf_to_imagefield_convert_pages($file, $path, $pdf) {
  // ...
  -db_query('UPDATE {pdf_to_imagefield} SET finished = %d WHERE fid = %d', 1, $pdf['fid']);
  +db_query('UPDATE {pdf_to_imagefield} SET finished = %d WHERE fid = %d', time(), $pdf['fid']);
  // ...
}
function pdf_to_imagefield_elements() {
  // ...
  +$elements['pdf_to_imagefield_widget']['#process'][] = 'pdf_to_imagefield_widget_process';
  // ...
}
function pdf_to_imagefield_theme() {
    // ...
    +'pdf_to_imagefield_widget_preview' => array(
    +  'arguments' => array('item' => NULL),
    +)
    // ...
}

New:

function theme_pdf_to_imagefield_widget_preview($file) {
  // some code to display the pdf file status
}
OnkelTem’s picture

future TODO:

* GS (Ghostscript) version check. For example, having problem with version 8.15.x - couldn't parse latest .PDFs, created in Adobe Acrobat 9 Pro. Anyway running software released 5 years ago is not very good idea. (GS 8.15.x was released in 2004)

* An option to generate images right now, upon node save. This could be a checkbox, say "[ ] Generate now", turned on/off by default according to the module settings. New access permission for this could be added too.

* Option to choose target format. Currently JPEGs are created, while in certain cases PNG images might be preferable or required.

avpaderno’s picture

Status: Needs work » Fixed

Thank you for your contribution! I am going to update your account.
These are some recommended readings to help with excellent maintainership:

You can find more contributors chatting on the IRC #drupal-contribute channel. So, come hang out and stay involved.
Thank you, also, for your patience with the review process.
Anyone is welcome to participate in the review process. Please consider reviewing other projects that are pending review. I encourage you to learn more about that process and join the group of reviewers.

I thank all the dedicated reviewers as well.

pupil’s picture

does it realy work?

dman’s picture

Status: Fixed » Closed (fixed)
Issue tags: -Module review

Automatically closed -- issue fixed for 2 weeks with no activity.

avpaderno’s picture

Component: Miscellaneous » new project application
Assigned: Unassigned » avpaderno
Issue summary: View changes