First attempt to create CSV export file.
To do:

  • Check which fields really need to be split into several lines (separator: ";")
  • trim splitted fields
  • include cck fields

Comments

rjerome’s picture

Just out of curiosity, why do you need to split the lines containing multiple values?

Ron.

schildi’s picture

OK, may be I misunderstood something.
As far as I know (at the moment I can't find the document) logical line breaks are indicated by semicolons. E.g. if the field "author" contains more than one author these are separated by a ";". To mirror this behavior I converted the semicolons to line breaks. This results in multiple lines in one cell of the resulting table when the CSV-file is imported.

regards
Reiner

schildi’s picture

StatusFileSize
new6.83 KB

As mentioned CCK fields were not handled by the patch delivered above.
The attached new patch also contains code to include CCK fields in the export.
Since the central query is modified it should be checked by somebody else if the export works in all circumstances, specially when exporting in some other format with and without CCK fields.

schildi’s picture

StatusFileSize
new8.85 KB

Please check and test the attached patch

OK, some changes were needed.
* there were some problems with join left/right (compare code!)
* CCK fields were only accessable when exporting as CSV

You can make some online tests at
http://archiv.bgv-rhein-berg.de/biblio

Hint: Nodes 1625 and 2473 have a cck field associated ("Standort").

rjerome’s picture

Did you forget to attach the new patch or are you referring to the patch attached to #3?

Ron.

schildi’s picture

sorry!

schildi’s picture

StatusFileSize
new11.23 KB

After upgrading my drupal installation to above 5.3 I got a lot of problems.
So I can't work on this subject any longer and append the last version of the patch to this post.
It contains the configuration part which let you adjust the field separator etc. Entries in table "variable" are not generated.
Everything else seems to be OK.

Regards
Schildi

rjerome’s picture

Thanks, I'll try to incorporate this into the next release.

What problems did you have with the upgrade? Were they releated to the biblio module?

Ron.

schildi’s picture

no relation to the biblio module.
* Users can't edit their own old articles any more (permission denied)
* when saving a new article (node, story) as user admin I get a blank screen
* when editing a new created article (as ordinary user) saving also results in a blank screen. A second try gives the error message "This content has been modified by another user, changes cannot be saved." The node is no longer editable in any way!

Currently I have no idea how to fix this.

rjerome’s picture

This sounds like it may be PHP related. Check the php logs for errors and turn on error reporting in the php.ini to see if you get any output. You may be running into memory limits of PHP.

Ron.

schildi’s picture

Thanks!

Please take into account that I rearranged a query. Compare in function "biblio_db_search"

-  $query = db_rewrite_sql("SELECT DISTINCT n.*, b.* FROM {node} ...
+  $query = db_rewrite_sql("SELECT DISTINCT n.*, b.* FROM {biblio} ...

The central point of the query is now table "biblio" and no more table "node". Therefore the clause "WHERE n.type = 'biblio'" will be obsolete.
I am also not shure if the expression

SELECT ... FROM {node} ...  LEFT JOIN {biblio}

really does what you want. For my understanding this results in a table containing ALL records from "node" with possibly empty entries for "biblio".
Is there any use of empty "biblio" data?

rjerome’s picture

So I guess you resolved your issues with Drupal?

Your right,

SELECT ... FROM {node} ...  LEFT JOIN {biblio}

would yeild blank empty data without the WHERE n.type = 'biblio' clause. I can't really comment on the implications of reversing the join order until I look at the tables again, but on the surface it would seem OK.

Ron.

rjerome’s picture

I now remember why changing the order of that join would be a bad idea and it has to do with revisions. You could run into the same problem you describe above when joining on vid with the node table, since there could be multiple revisions of a given node in the biblio table.

Ron.

schildi’s picture

May be you are right and e.g. in function biblio_db_search the line

$join[] = _biblio_cck_join($dummy) . ' left join {node} n  on b.vid=n.vid';

should be changed to

$join[] = _biblio_cck_join($dummy) . ' join {node} n  on b.vid=n.vid';
rjerome’s picture

Another thing I forgot to mention was with regards to the column headings. While using the "field label" will work for a single entry or multple entries of the same type (i.e. Journal) it would not work for multiple entries of mixed type since the field lable for a given database field changes for each publication type.

I would propose using the db field names, perhaps with the "biblio_" stripped off.

Ron.

schildi’s picture

May be I did not took enough care about data structures.

select t.tid, t.name, f.fid, f.name, f.title, d.title from biblio_types t, biblio_type_details d, biblio_fields f where d.tid=t.tid and d.fid=f.fid order by f.fid;

For the export I took column names from table biblio_fields and used the content of "name" as DB field name and "title" as label.
These combinations seem unique to me. But "title" from table biblio_type_details varies depending on tid.
Is this what you mean?

From my point of view the usage of biblio_fields.name and biblio_fields.title is equivalent since there is a one to one relation between them.

Reiner

rjerome’s picture

Yes, your are correct, biblio_fields.name and biblio_fields.title are unique where as biblio_type_details.title varies with the type, which is why I would avoid biblio_type_details.title unless you are only exporting a single enty OR all the entries are of the same type.

As for which to use... biblio_fields.name or biblio_fields.title, I guess it depends how the file will ultimately be used. Obivously biblio_fields.title is more suited for human consumption, however if you wnated to re-import this data back into biblio, then biblio_fields.name would be a better choice.

I've been thinking that this format might be a good choice for doing dumps and restores of the biblio data, so if it were me, i would choose biblio_fields.name, but perhaps the choice of either could be an option. You may not have noticed, but there is a function called biblio_get_db_fields(), which will return all the field names (not titles) for you. I suppose this could be modified to return both names and titles as key=>value pairs.

By the way, I've been working on the 6.x version lately, and I have been integrating this export format into it. I'll back port it to 5.x when it's working.

The other thing I that might be required are some checks (if your not already doing so) to see if CCK is actually installed before trying to add those fields, since it's not a core module.

Cheers,

Ron.

schildi’s picture

Yes , I also thought about re-importing / updating of biblio data. Therefore I put the nid field in the first column.
And you are also right that naming conventions for columns depend on usage. If the export should be used for preparation of later updates / re-imports, then column names should refer to DB field names. For humans labels are more easy to read.
Therefor my idea was to make this configurable. The variable "biblio_csv_col_head" should be used to switch between the two choises.

Yes, and again you are right. I did not notice that there is a function named biblio_get_db_fields to deliver field names. And there is no use to have a copy of this functionality in an own "csv" function. Please compile the functionalities in your function biblio_get_db_fields.

And sorry, I didn't see the impacts on not having the CCK module installed. Yes, there should be a check. Otherwise the sql query would fail becauce of unknown table names.

Thank you very much!

Reiner

rjerome’s picture

Hi Reiner,

Just one other thought with regards to "re-importing". I've been thinking about this one for a while (prior to CSV) and I think having and/or using the nid might be dangerous. The issue is that I'm sure, given the ability to export the data, someone will try to import it into a different site/installation and the nid's will no longer be valid or may already be in use by some other node. For import it's probably better to just create a whole new node, thus letting Drupal deal with the nid's. The flip side of that coin is that someone will no doubt try to import into the same installation without removing the originals and thus have duplicate entries which is why I also need to develop some reliable duplicate detection system.

As they say... The devil is in the details :-)

Ron.

schildi’s picture

Hello Ron

yes, there must be a distinction between update and import. If an update is required a check has to be done to ensure that all existing nodes are of type "biblio". Probably the checks will be a bit more complicated, especially in the case when e.g cck entries have to be created or deleted. And what is about changing the type from e.g. book to some other biblio (sub-)type ?

So updates seem the most complicated case to me. An update has to be developed step by step. One step could be to first implement functionalities like updates where the types remain untouched and there are no (cck) fields to be created or deleted.

A simple import should ignore NIDs completely. And for imports it would be nice having a method to get rid of all biblio entries created in one (import-)run. May be all entries generated in one run will have the exact same time stamp?

Reiner

rjerome’s picture

Hi Reiner,

I was just wondering why you would want to do this...

And for imports it would be nice having a method to get rid of all biblio entries created in one (import-)run.

Perhaps something I've been thinking about lately would help... For imports I was thinking about tagging each entry to indicate that it is a new import that is not ready for display. This could be done using the node.status field in the node table, and setting it to some value which is outside the range normally used by Drupal. This would allow the admin to further process/check the nodes before they are made available for display.

Ron.

schildi’s picture

Hello Ron,

interesting idea. Then you have a way to double check the import before launching. But when you pop up a box asking the user if he has checked twice the import before new entries will be set active, he will normally confirm immediately.
So it does not help in the case when you had imported (and activated) records by "accident".

Did you think about updating records when another user is editing one of the nodes in questing? Locking? There is a locking module named ? (sorry, can't find this module currently). May be it provides a way to block a whole bunch of records. This would be an alternative to your "node status" idea.

Another way could be to use field "log" in table "node_revisions" to mark the import.

Reiner

rjerome’s picture

Yes that's true, you can't count on users to actually check anything, but I was thinking that checks for duplicate entries and validation/sanity checks could be done in software, and if they fail then those entries could be marked for further investigation. Perhaps two levels of flags, one that passed sanity checks and are waiting for final approval and ones that didn't pass sanity checks.

I hadn't thought of record locking. I wonder how drupal handles that (if at all).

Ron.

schildi’s picture

I got the name of the module I was looking for. It's "checkout". The author explains that
"When a user begins to modify a document, it is considered being 'checked out' of the system for exclusive editing access by that user"

Have a look at
http://drupal.org/project/checkout

Reiner

schildi’s picture

Hello Ron,

did you have a look at module "Node import" to handle CSV files on page http://drupal.org/handbook/modules/node_import ?

Regards
Reiner

bekasu’s picture

Status:Active» Closed (fixed)

Marking issue closed.

Aren Cambre’s picture

Status:Closed (fixed)» Closed (duplicate)

Seeing no recent progress on this, I am re-statusing as duplicate of #682044: Support fields (CCK) in D7 Bibliography Module. Fields (CCK) integration would allow easy import and export with the other modules that support this with CCK data.