Problem/Motivation

Part of the Deploy task updates file paths in site content, e.g. sites/oldsite.com/files is updated to sites/newsite.com/files. This path update step also looks for standalone sites that are being moved into a multisite configuration, meaning paths of the form sites/default/ are replaced with sites/newsite.com.

With the current implementation of this update, links to external sites that follow this convention are being updated as well during this process (and, consequently, broken). For example,

Proposed resolution

The issue seems to be stemming from the database updates after line 15 in platform/drupal/deploy.inc (referencing current head on 6.x-2.x). The challenge here is to differentiate between links to the site being imported and links to other sites. Clearly, relative links can be safely updated, but links that include a domain name would need to be separated out. Checking against the old site url could work, but may not be sufficient.

User interface changes

None.

API changes

None.

Comments

_vid’s picture

Would it be possible to test for whether 'sites/' . $old_uri . '/' exists in files (filepath), users (picture), boxes (body).
If so, then we don't mass replace sites/default anywhere, especially in the node revisions (body, teaser).

helmo’s picture

Version: 6.x-2.1 » 7.x-3.x-dev
Status: Active » Needs work

I guess this still happens.

Could we add some regex magic to $replace_patterns in platform/drupal/deploy_7.inc to make sure we have a relative url?

omega8cc’s picture

Hmm.. we already handle absolute URLs via $replace_abs_patterns there, so also URLs hardcoded via wysiwyg editors are converted properly.

We can't blacklist absolute URLs in these replacements, because we want to handle them for site own (legacy) URLs.

I don't think we could make it possible to avoid converting sites/default to sites/sitename for third party absolute URLs without side effects, because how can we check if the site name in the absolute URL is our own and not third party? Checking and comparing existing URLs in the content instead of running mass replace could be very expensive and cause false positives anyway, because there is just no way to determine if the site name (old or new) is third party or not.

That said, now I'm not sure why we have $replace_abs_patterns in addition to $replace_patterns? If $replace_patterns really affects absolute URLs, then why we need also $replace_abs_patterns?

I hope it was not me who added this and now no longer recognizes its own child! :)

[EDiT] It was me!