
Am 4. Juni 2022 23:20:16 MESZ schrieb Olav Reinert <seroton10@gmail.com>:
I did in fact spend some time working on a tool for fixing the encoding issues that exist in the forums database. I got it to do something, but it wasn't very good - certainly not something I would let loose on the forums database.
At some point, I discovered this post:
https://forums.opensuse.org/content.php/14-UTF-8-Encoding-Change
TLDR: The encoding errors were created on purpose. Thus, by definition, there isn't anything to fix - even if some posts are full of mojibake.
Obviously, that's also when I dropped working on a tool to fix it.
Knowing in what way the database is broken, all of the errors are fully fixable, if we know the moment the database broke. I will actually attempt to fix it myself then I guess, because that doesn't sound like a satisfying solution to me, and the actual solution to the problem shouldn't be that hard to implement all things considered. What I'm thinking of doing would be to do two database imports, with different encodings set, one for data with the modify date before the breakage with the old encoding and then the other for data with the modify date after the breakage with the new encoding. The only real issue is finding the exact moment the database broke, which will be easier having the post above. LCP [Sasi] https://lcp.world/