>I've seen this happen, and you can tie yourself up in interesting
>knots if you try to export a MySQL database from Latin1 to UTF-8...
>you end up with double encoded characters. The result is that data is
>stored in the database with the broken double encoded characters in
>the article name. When you request the article using the UTF-8
>encoded characters, the MW engine thinks you are requesting a new
>page. If you request the page using the broken double encoded
>characters (assuming you can figure out what the double encoded chars
>are), you get the "lost" article.
>
>One way to fix this - *IF* this is what happened - is to export the
>tables as Latin1 (which contain UTF-8 data) instead of UTF-8 so that
>MySQL doesn't try to "convert" the tables, change the charset in the
>database dump to UTF-8 (use an editor to do a find/replace for Latin1
>to UTF-8), recreate the tables in UTF-8 instead of Latin1 (you may
>need to tweak the create table statements in the database dump and
>then import your database data.
This is exactly the path I was following last night with the stage environment. It *almost* worked. Unfortunately, we are having a few other issues that I have to look at. But yes, you are correct in that this is 95% of the problem.
>See also:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html in
>particular, this bit:
>-------------
>Warning
>The CONVERT TO operation converts column values between the character
>sets. This is not what you want if you have a column in one character
>set (like latin1) but the stored values actually use some other,
>incompatible character set (like utf8). In this case, you have to do
>the following for each such column:
>
>ALTER TABLE t1 CHANGE c1 c1 BLOB;
>ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
>The reason this works is that there is no conversion when you convert
>to or from BLOB columns.
>------------------
>This also may work.. and might be a more appropriate/elegant solution
>if you know which columns are affected.
I haven't tried this yet. This actually may be a much better way to go. Thanks for the tip!
>If you've got an offline mirror that you can test the dB on, it's not
>a difficult test to do... if it fixes things, then you know what
>happened :-) and if not... well, we're back at square one again.
I'm trying this all out on our staging wikis. I imported the production data down to rustage.opensuse.org, and that is where I am doing most of my work. I managed to get the titles to at least appear correctly, but we are having some other issues as well. I haven't given up hope though.
-Matt