Introduction

The ETL (Extract, Load and Transform) module gives the ability to export and denormalize any kind of content. Basically you are able to select content types or custom entities to be exported into a destination database. It also gives you the possibility to alter the data before persist it.

* For a full description of the module, visit the project page:
https://www.drupal.org/sandbox/gabrielmachadosantos/2728473

Requirements

This module requires the following modules:

Automated review results

http://pareview.sh/pareview/httpgitdrupalorgsandboxgabrielmachadosantos2...

False positives:

  • FILE: /var/www/drupal-7-pareview/pareview_temp/inc/etl.admin.inc
    --------------------------------------------------------------------------
    FOUND 0 ERRORS AND 1 WARNING AFFECTING 1 LINE
    --------------------------------------------------------------------------
    209 | WARNING | Do not use the raw $form_state['input'], use
    | | $form_state['values'] instead where possible
    --------------------------------------------------------------------------

Git clone

git clone --branch 7.x-1.x http://git.drupal.org/sandbox/gabrielmachadosantos/2728473.git etl

Manual Reviews of other projects:

https://www.drupal.org/node/2833955#comment-11842518

Comments

gabrielmachadosantos created an issue. See original summary.

PA robot’s picture

Issue summary: View changes

Fixed the git clone URL in the issue summary for non-maintainer users.

We are currently quite busy with all the project applications and we prefer projects with a review bonus. Please help reviewing and put yourself on the high priority list, then we will take a look at your project right away :-)

Also, you should get your friends, colleagues or other community members involved to review this application. Let them go through the review checklist and post a comment that sets this issue to "needs work" (they found some problems with the project) or "reviewed & tested by the community" (they found no major flaws).

I'm a robot and this is an automated message from Project Applications Scraper.

gabrielmachadosantos’s picture

Issue summary: View changes
David Fiaty’s picture

Status: Needs review » Needs work

Hello Gabriel,

Nice module. These are the comments I would make:

1. The etl.install file is empty. Have you planned to add some code there?
If not, it should maybe be removed from the package.

2. In my opinion, there are not enough inline comments in some parts of the code of the module.
For example in etl.admin.inc: etl_entity_settings_form() and etl_get_mapping_config_rows() need a bit of logic commenting.

3. In et.admin.inc, the 2 first functions don't follow the function comments convention.
"Implements hook_[hook_name]" should maybe be added to the comment section of these functions.
It looks like you've done it for the other core hooks.

4. In etl.module line line 37, you are loading entities using the function etl_load_entities().
Since etl_load_entities() returns a value, your code will run faster if you store the result in a variable
before running the foreach loop on it.

As you're loading entities and some of them could be big, this:

foreach (etl_load_entities() as $entity => $info) {
 		
}

Should maybe be changed to this:

$entities = etl_load_entities();
foreach ($entities as $entity => $info) {
 		
}

Thank you.

gabrielmachadosantos’s picture

Status: Needs work » Needs review

Hi David, thanks for the review.

Please follow the considerations below:

1. Removed the install file.

2. I believe the logic and the variable names are pretty straight forward since it is a simple manipulation values.

3. It does follow the comment function convention, those aren't hooks, both of them are callbacks forms. https://www.drupal.org/coding-standards/docs#forms

4. I've updated it but only for easy reading because it won't affect at the code performance. http://stackoverflow.com/questions/11074141/using-function-call-in-forea...

Thanks again.

kapil.ropalekar’s picture

Hi gabrielmachadosantos,

My findings for your modue as follows:

etl.admin.inc

severity: criticalLine 112: table names should be enclosed in {curly_brackets}
$destination_id = $destination_database->query("SELECT column_name FROM information_schema.columns WHERE table_name = '{$destination_table}' ORDER BY ordinal_position LIMIT 1")->fetchField();

Thanks !

klausi’s picture

Status: Needs review » Needs work

Hm, that looks like SQL injection since $destination_table is directly concatenated into the SQL string. Always use placeholders for that, see https://www.drupal.org/writing-secure-code

yogeshmpawar’s picture

Pareview.sh is giving following errors.

http://pareview.sh/pareview/httpgitdrupalorgsandboxgabrielmachadosantos2...

Will do a manual review in some time.

gabrielmachadosantos’s picture

Status: Needs work » Needs review

Thanks kapil.ropalekar and klausi for the review. I've already updated to follow the security standards. Can you please review it?

Yogesh Pawar, there is only one error and it is a false positive as I described in the issue. Thanks.

Evgeny_Yudkin’s picture

Hello,
I`ve reviewed your module, there are some issues:

  1. admin/config/development/etl - I think, better to use description and units. Batch size - 30 what? Looks like 30 entities, but better to add.
  2. Why I should set database and entities in hooks and etc(If I'll want to move one entity from prod to dev to investigate it - now I'll need to deploy code to prod)? Database connection can be opened on fly: https://www.drupal.org/node/18429 In my opinion, it should be like so: special page for single entity or at least node (view/edit/"etl export") and "etl export" views bulk operation, where I can set database, table and etc.

    Also you may add a separate page to import all entities, filtering by bundle/id If no views installed.
  3. Why only database transfer, not transfer during files or during requests (but for drupal->drupal (only during requests) we already have https://www.drupal.org/project/drupal_sync)?
  4. Looks like current module doesnt support entityreferences and field collections. I think better to add field type condition and some hooks to provide possibility to proceed custom fields, references and etc.
  5. etl_load_entities() contains unnecessary variable, there will be notice in etl_cron_queue_info() if no exporting entities defined yet
  6. Now params description: etl_queue_export() and etl_load_entities()
  7. Why "ETL"? Its one more complex abbreviation. Its easier to remember "Entity transfer" or like so.

Kind Regards.

Evgeny_Yudkin’s picture

Status: Needs review » Needs work
gabrielmachadosantos’s picture

Status: Needs work » Needs review

Hi there,
Please consider the points below:

1. The label is "Package size" inside the "Batch" section and it is entities, it is the number of entities to be exported inside a package. I agree that it can have a field description, please feel free the provide a patch with a better description and I will be glad to move forward.

2. I think there might be a misunderstanding on the module's purpose. This module was not created to move content to other environments, there is already a module to do this (deploy), it gives you the ability to copy your Drupal content to other databases to be consumed mostly for BI reports. Database information is not set on hook, it is a variable and it can be set in different places. About database connection, I can't use multiple database connection because I need to perform both operations simultaneously. Entities are set in hooks, which gives the ability of any module set up an entity to be exported or synced.

3. As I explained above, this is not a Drupal -> Drupal sync module, as noted on the module's description, quoted below:
"The ETL (Extract, Transform and Load) module gives the ability to export and denormalize any kind of content. Basically you are able to select content types or custom entities to be exported into a destination database. It also gives you the possibility to alter the data before persist it."

4. It does support entityreferences and any other field type or custom entity and if doesn't you can easily use hook_etl_entity_load_data_alter or hook_etl_get_field_parser_value_alter to make it support.

5. I was not able to find the unnecessary variables on etl_load_entities(), can you please point the line? http://cgit.drupalcode.org/sandbox-gabrielmachadosantos-2728473/tree/etl...

6. The etl_load_entities() doesn't have any params but I will add the params info for etl_queue_export() in the next iteration since it is not impacting on the module's functionality.

7. "ETL" is a very known term used specially on BI area, it stands for "Entity Transform and Load" which means it is not only an Entity Transfer as you said, it does much more than simply move entity content, this information will denormalized and it can be changed during the transition if the user demands.

aryashreep’s picture

Status: Needs review » Needs work

Pareview.sh is giving following errors.

Review of the 7.x-1.x branch (commit d2fcd84):

  • Codespell has found some spelling errors in your code.
    ./inc/etl.admin.inc:459: conection  ==> connection
    
  • No automated test cases were found, did you consider writing Simpletests or PHPUnit tests? This is not a requirement but encouraged for professional software development.

This automated report was generated with PAReview.sh, your friendly project application review script. You can also use the online version to check your project. You have to get a review bonus to get a review from me.

Source: http://pareview.sh/ - PAReview.sh online service

klausi’s picture

Status: Needs work » Needs review

One minor coding standard error is surely not an application blocker, please do a real manual review.

gabrielmachadosantos’s picture

Issue summary: View changes
amaria’s picture

Status: Needs review » Needs work

Automated Review

CodeSniffer errors:

FILE: /root/repos/pareviewsh/pareview_temp/etl.api.php
----------------------------------------------------------------------
FOUND 2 ERRORS AFFECTING 1 LINE
----------------------------------------------------------------------
54 | ERROR | Type hint "array" missing for $entity_data
54 | ERROR | Type hint "array" missing for $info
----------------------------------------------------------------------...

Quite a few of these type hinting errors in Pareview now.

Manual Review

Individual user account
Yes: Follows
No duplication
Yes: Does not cause duplication and/or fragmentation
Master Branch
Yes: Follows the guidelines for master branch.
Licensing
Yes: Follows the licensing requirements.
3rd party assets/code
Yes: Follows the guidelines for 3rd party assets/code.
README.txt/README.md
No: Does not follow the guidelines for in-project documentation and/or the README Template.
  1. (+) Minimum requirements for documentation is to Provide help text in the Drupal UI. even if only to inline the README file.
Code long/complex enough for review
Yes: Follows the guidelines for project length and complexity.
Secure code
Yes: Meets the security requirements.
Coding style & Drupal API usage
  1. (*) I'm getting this error on export...There was an error trying to create the destination table. SQLSTATE[42000]: Syntax error or access violation: 1075 Incorrect table definition; there can be only one auto column and it must be defined as a key Using MySQL db. I believe you need to create a Primary Key. Unless somehow that's expected to be done in the alter routines? If so, should be documented.
  2. (+) Should probably document the administration section.

The starred items (*) are fairly big issues and warrant going back to Needs Work. Items marked with a plus sign (+) are important and should be addressed before a stable project release. The rest of the comments in the code walkthrough are recommendations.

If added, please don't remove the security tag, we keep that for statistics and to show examples of security problems.

This review uses the Project Application Review Template.

PA robot’s picture

Status: Needs work » Closed (won't fix)

Closing due to lack of activity. If you are still working on this application, you should fix all known problems and then set the status to "Needs review". (See also the project application workflow).

I'm a robot and this is an automated message from Project Applications Scraper.