Mailinglist Archive: opensuse-web (101 mails)

< Previous Next >
Re: [opensuse-web] Re: [opensuse-wiki] Wiki Upgrade Problems
Which collation is used for a) the english (and new german) wiki b) for
all other wikis? Which collation was used before doing the update on the
now broken wikis?

They use a binary collation, as opposed to UTF-8 for the rest of the wikis.


BTW: The MySQL default charset might also be involved - I seem to
remember that old mediawiki versions just used whatever was the default.
At least I have a wiki with similar problems where the column uses the
default MySQL charset - but fortunately it only affects
Special:Listfiles (https://bugzilla.wikimedia.org/show_bug.cgi?id=32207
if you are interested). In my case, the database contains utf-8, but the
column is marked as iso-8859-15.

I had a short look at the ru wiki - I don't understand anything there
;-) but the page titles look like double-encoded utf-8 to me. Write some
of them to a text file and try recode utf-8..$previous_charset $file

I thought so too at first. That's not quite the case. I have actually made a
lot of progress on the Russian wiki in stage. From what I have found, it
appears that the update just dumped Latin1 encoded text into a UTF-8 table
without properly encoding the text itself.

<scary idea>
If I understood you right, the problem only affects the page _titles_.
It looks like the page title is stored in the "page" table - and not in
too many other tables (I found it in some logging and cache tables,
which aren't too relevant IMHO).
Also category tables, and several others.

Can you try to just roll back the page titles in the page table?
If we did it immediately after upgrading, it probably would have worked.
However, that table will no longer be consistent with all of the other tables.

Run the following query on the _old_ database to get a list of the
correct page titles as UPDATE statements:

select concat('UPDATE page SET page_title="', page_title, '" WHERE
page_id=' , page_id) from page;

Check that the result is valid utf-8 (or use recode to fix it), make
sure your MySQL connection uses utf-8 and then apply the resulting
UPDATE queries to the new database.

WARNING: this is completely untested and wrapped in a "<scary idea>" tag
for a reason. It might work, but I can't promise anything...
</scary idea>
Unfortunately, we have too many keys and indexes for me to think that will work
very well.

Also for the other wikis I hope there comes up a patch so we can fix
the page titles without losing the new edits. I think there also
have been some changes in the german wiki at least.
It looks like the German wiki is actually fine, for the reasons
mentioned above. I'll remove the lock on it soon. I'm working hard
on saving the others. I think the Russian wiki may be a lost cause,
but I haven't lost hope yet.

See above - and I don't see a reason why the ru wiki should be "more
lost" than other language wikis ;-)

There are a LOT of inconsistencies in the Russian wiki, apparently not just
with the UTF-8 page titles. There are a lot of duplicate keys and indexes. I
managed to get the stage database properly encoded, but I lost about two dozen
pages in the process due to duplicate key errors.




Matthew Ehle
Web Engineer
IS&T


Mobile Phone: (801) 358-1655
mehle@xxxxxxxxxx
< Previous Next >