i try to export a contenttype where only a single node exists. but always the whole files-directory (GB of data) gets exported too?
i first thought it might be because of admin-menu (the module following all the links?) but including "#admin-menu,.admin-blocks,.admin-tabs,.views-admin-links" in the DOM removal doesn't seem to change anything. is there something i'm doing wrong?

Comments

btopro’s picture

Status: Active » Needs work

I believe I've trapped for the file directory bloat, need to do some more testing on dev and should have an alpha3 out either by the end of the day or tomorrow for testing this issue being resolved.

Syntax is wrong on the exclusions. Here's the list I work off of for ELMS (primary target i'm working against):

div[id=admin-toolbar],div[id=regions_elms_navigation_left],div[id=regions_elms_navigation_top]

Syntax says to remove the divs that have id of the following three. You can use similar rules for things like div[class=yourclass] and other kind of crazy target removal. This needs better documentation in the API file / any, will definitely get there though.

thommyboy’s picture

hi, thanks for your quick reply. my server was pretty busy having some 6 GB of data stored in /files/ and even more- after html_export duplicating those 6 GB under /files/ too, the next export copies 12GB ;) thanks for having a look at that!
about the exclusion- i think the field descriptions should be changed too. i interpreted "Supply a css style selector" wrong it seems.
i really like the idea of the module- i would use it to generate pieces of data for including into other sites. doing that via drupal is too slow (our site's bootstrap just takes too long) and boost is a beast ;)

btopro’s picture

Version: 6.x-2.0-alpha2 » 6.x-2.0-alpha3
Status: Needs work » Fixed

just published alpha3, it should solve this issue but please test to verify. I'm normally one to issue testing against dev instead of alpha but static export of managed sites is such a massive task with so many different environments at play that this is the best way to handle this I think. I will be working on this more tomorrow to add ability to supply custom paths manually so the sooner you can verify that this is fixed the better. Thanks for your support / patience in getting this figured out. I think a lot of systems can benefit from this publishing workflow.

thommyboy’s picture

ok, did some more testing. the files-directory copy issue seems to be resolved.

exporting just one node i still get an error "An HTTP error 500 occurred. /de/batch?id=103&op=do" error page only saying "An error occurred while processing html_export_copy_all_resources with arguments: Array ( [0] => sites/default/files/html_export/export1331192803 )"

my removal-string looks like this "div[id=admin-menu],div[class=admin-blocks],div[class=admin-tabs],div[class=views-admin-links]" but still sites/all/modules/admin_menu/images get copied but this might be ok (following the admin_menu.css and get images from there?)

my index gets located in /sites/default/files/html_export/exportXYZ/node/10472 and contains (paths made relative by the module) "
" this style.css does not get copied though while some images being referenced in the style.css ARE?

i use a template file for that special nodetype i try to export (removing menus, header, footer, changing styles for getting just the plain content for embedding into another site) this template does not seem to be used, as the resulting index.html contains header etc. again.

my $sidebar_primary is missing in index.html

if needed I could provide you links to the exported node and the original one...

btopro’s picture

Version: 6.x-2.0-alpha3 » 6.x-2.x-dev

Good to hear on the files front

Hmm... I'm runningt he job on 100s at times and not getting timeouts.

The selector scans the DOM and rips out the things it finds, it doesn't prune the actual assets associated to the things you remove. Assets could be removed using a similar format but I'll probably have to add some support for targetting the removal of specific css / js.

You could also minimize the export size by installing masquerade and making a role that will have less stuff bootstrap (like non-admin).

I'm not sure what sidebar_primary is so I can't comment on why it wouldn't be there. This is treated as a direct drupal page load so shouldn't be things missing. Are you using context or core blocks to render them on the page?

If you could please doodle on the page pointing to what assets are missing (like the style.css you mention) -- http://markup.io/ is a cool way of doing this or jing or something to point out (attach html file) what lines the assets are missing on so I can try and strengthen the selection algorithm.

Also as these are outside of the original issue I'm still leaving this marked fixed and if these spawn off into larger topics we can open new issues.

thommyboy’s picture

hi, this doesn't seem to be a timeout- yesterday (copying 6GB) i got it after several minutes, today i get the same error after just some seconds (it seems the module does finish the export though).
sidebar_primary is just a sidebar containing blocks (in my case generated by context, yes)
right now it only exports index.html and stops with that error.
i'll do some more testng tomorrow- hungry now ;)

btopro’s picture

Hmm... very strange. If you can see what's going wrong let me know, might be that you have .html in the paths you are providing when it's looking for node/1234 or /admin/settings kind of drupal paths. I've been pretty deep (down a black hole) in trying to get things to zip correctly but I'll hopefully be able to look into this more next week / as more eyes get on the code. Please open new issues based on problems you find as the OP of this is now fixed.

btopro’s picture

a new hook has been added to the API to allow for the custom including of additional assets. There will not be a UI for this as it could be a security nightmare.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.