Hello, Am Freitag, 2. Dezember 2011 schrieb Matthew Ehle:
Hi Matthew, please don't roll back the content of the english wiki, as it doesn't have utf-8 characters in the titles, and there have been quite some changes this week. We are starting the opensuse board elections today, and use the wiki as platform for the candidates for example.
I have verified that it doesn't contain UTF8 in the titles and is safe to continue. It will NOT be rolled back, and I have removed the lock on it.
An interesting thing I noticed is that the English wiki has a different collation on the page table than the other wikis. It is a new wiki, unlike the others, which have been upgraded from much earlier versions. That may be a clue as to the root cause and how we can fix it.
Which collation is used for a) the english (and new german) wiki b) for all other wikis? Which collation was used before doing the update on the now broken wikis? BTW: The MySQL default charset might also be involved - I seem to remember that old mediawiki versions just used whatever was the default. At least I have a wiki with similar problems where the column uses the default MySQL charset - but fortunately it only affects Special:Listfiles (https://bugzilla.wikimedia.org/show_bug.cgi?id=32207 if you are interested). In my case, the database contains utf-8, but the column is marked as iso-8859-15. I had a short look at the ru wiki - I don't understand anything there ;-) but the page titles look like double-encoded utf-8 to me. Write some of them to a text file and try recode utf-8..$previous_charset $file <scary idea> If I understood you right, the problem only affects the page _titles_. It looks like the page title is stored in the "page" table - and not in too many other tables (I found it in some logging and cache tables, which aren't too relevant IMHO). Can you try to just roll back the page titles in the page table? Run the following query on the _old_ database to get a list of the correct page titles as UPDATE statements: select concat('UPDATE page SET page_title="', page_title, '" WHERE page_id=' , page_id) from page; Check that the result is valid utf-8 (or use recode to fix it), make sure your MySQL connection uses utf-8 and then apply the resulting UPDATE queries to the new database. WARNING: this is completely untested and wrapped in a "<scary idea>" tag for a reason. It might work, but I can't promise anything... </scary idea>
The other wiki that is new, of course, is the German wiki. It has the same collation on the page table as the English wiki, and it is different than the old German wiki and all the other wikis.
Guess which UTF8 wiki isn't broken?
Hmmm... ;-)
Also for the other wikis I hope there comes up a patch so we can fix the page titles without losing the new edits. I think there also have been some changes in the german wiki at least. It looks like the German wiki is actually fine, for the reasons mentioned above. I'll remove the lock on it soon. I'm working hard on saving the others. I think the Russian wiki may be a lost cause, but I haven't lost hope yet.
See above - and I don't see a reason why the ru wiki should be "more lost" than other language wikis ;-) Regards, Christian Boltz --
Glaub mir, die Schrott-Quote bei den ATA/Billig-SATA ist enorm, die meisten merken's halt nur nicht. ;) PS. Wir handeln u.a. mit sowas und die Rücklaufrate ist (sehr) hoch. Du bist Schrotthaendler? ;-) [> Mirko Richter und Thomas Hertweck in suse-linux]
-- To unsubscribe, e-mail: opensuse-web+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-web+owner@opensuse.org