I'm trying to figure out how to fix the broken names, and some of the contributors are working on a fix for the skin. I would suggest not trying to rebuild the page, as it will probably get overwritten when we try to fix it.
Just a thought on this... Since this recent MedaiWiki update was an upgrade from an old MediaWiki instance running on an existing/old MySQL database... is it possible that some of the tables were set to Latin1 and contained UTF-8 encoded data?
Yes, you are on the right path with this.
I've seen this happen, and you can tie yourself up in interesting knots if you try to export a MySQL database from Latin1 to UTF-8... you end up with double encoded characters. The result is that data is stored in the database with the broken double encoded characters in the article name. When you request the article using the UTF-8 encoded characters, the MW engine thinks you are requesting a new page. If you request the page using the broken double encoded characters (assuming you can figure out what the double encoded chars are), you get the "lost" article.
One way to fix this - *IF* this is what happened - is to export the tables as Latin1 (which contain UTF-8 data) instead of UTF-8 so that MySQL doesn't try to "convert" the tables, change the charset in the database dump to UTF-8 (use an editor to do a find/replace for Latin1 to UTF-8), recreate the tables in UTF-8 instead of Latin1 (you may need to tweak the create table statements in the database dump and then import your database data.
This is exactly the path I was following last night with the stage environment. It *almost* worked. Unfortunately, we are having a few other issues that I have to look at. But yes, you are correct in that this is 95% of the problem.
See also: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html in particular, this bit: ------------- Warning The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:
ALTER TABLE t1 CHANGE c1 c1 BLOB; ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8; The reason this works is that there is no conversion when you convert to or from BLOB columns. ------------------ This also may work.. and might be a more appropriate/elegant solution if you know which columns are affected. I haven't tried this yet. This actually may be a much better way to go. Thanks for the tip!
If you've got an offline mirror that you can test the dB on, it's not a difficult test to do... if it fixes things, then you know what happened :-) and if not... well, we're back at square one again.
I'm trying this all out on our staging wikis. I imported the production data down to rustage.opensuse.org, and that is where I am doing most of my work. I managed to get the titles to at least appear correctly, but we are having some other issues as well. I haven't given up hope though. -Matt