I've got an odd problem. I'm updating from an old 4.5.8 installation to 5.0 and many posts that included "curly quotes" and other non-standard characters are coming out with garbage in the new displays. I know I solved this same problem in 4.5.8 but for the life of me, I can't remember what I did! This involves almost 3000 posts, some of them in excess of 64k words, so it is a large problem.

Any help out there?

Thanks,
Erin

Comments

Steven’s picture

Most likely you were using UTF-8 encoding before, when Drupal was not expecting it. We only told the database server our data was UTF-8 from 4.7 and onwards.

There are two possibilities:

  • The data was 'converted' in the update to 5.0 from Latin1 to UTF-8. However, because it was already UTF-8, it got mangled. In this case, you see various bad characters. The best option here is to export your database as is, convert it from UTF-8 back to Latin1, and re-import it the way it comes out. That is: you alter the data, but you leave the CHARACTER SET statements as is.
  • The data was converted as above, but inserted into Latin1 tables. This means that some characters will be converted to simple ASCII question mark characters and that the data is permanently damaged. In this case, you have to start with your original 4.5.8 database and convert it to vaild UTF-8, both in data and in table definitions, before re-importing it and upgrading.

In both cases, the goal is to get valid UTF-8 data in UTF-8 encoded database tables and columns. The 'conversion' back to Latin1 is just a trick that should result in real UTF-8.

To get started, you need to be sure exactly what it is in the database. Get a straight database dump and open it in an editor that understands encodings. Converting from Latin1 to UTF-8 will change each 1-byte non-ASCII character into 2-3 characters. Count the number of jumbled characters that you see instead of e.g. a quote, and you can verify whether the conversion was applied one or more times.

--
If you have a problem, please search before posting a question.

halfelven’s picture

That's a good clue. Though I could wish that such a converter were considered a necessary item to provide, or at least suggest the need for, in upgrades.

- Erin