![](https://seccdn.libravatar.org/avatar/04ffcb9ba74bfcbacabec9db6cee1c3e.jpg?s=120&d=mm&r=g)
Hi, I am intending to restart work on salting discourse, which has been blocked for years by two things: * Migration to vb5 * Fixing the database in vb4 I know there has been some progress on vb4 database fix, but vb5 migration seems to be blocked on auth which would likely be easier to fix by just migrating to discourse right away, since it uses a system we have integrated in other apps in the project already. The problem I am facing is that since the vb4 fix isn't public right now, any attempt at database migration testing I could perform on discourse is not gonna be done on the same state of the database as we would get after applying the fix. How should we proceed then? From my personal pov, setting up a duplicate vb4 setup and fixing a database, so that we can properly test it and apply those fixes back into the main instance is the best way to solve this issue. From there we can just work on migrating to discourse, with testing that actually tests the correct state of the database. Alternatively I'm sure we could find a Nextcloud plugin to replace forums with Nextcloud too, I'm sure that would solve all the problems ;) LCP [Sasi] https://lcp.world/
![](https://seccdn.libravatar.org/avatar/1900ae8cc1d502032c1137e1e9dcb30f.jpg?s=120&d=mm&r=g)
Hi As long as the history can be maintained whether that be in a searchable read-only archive or similar, would it not make more sense just to start fresh and move forward? Setup an instance that works with the current infrastructure login system etc? Training for the Admins and moderators in the new setup as well.
![](https://seccdn.libravatar.org/avatar/04ffcb9ba74bfcbacabec9db6cee1c3e.jpg?s=120&d=mm&r=g)
On Sa, Jun 4 2022 at 14:49:27 -0000, Malcolm Lewis <malcolmlewis@linuxmail.org> wrote:
Hi As long as the history can be maintained whether that be in a searchable read-only archive or similar, would it not make more sense just to start fresh and move forward?
I would much prefer to actually migrate the current forums instead of dropping them, I don't like the idea of breaking the current conversations and moving to new software with potentially unanswered questions. We may end up migrating with broken encoding or posts before 2014 missing or something along those lines at worst.
Setup an instance that works with the current infrastructure login system etc?
Training for the Admins and moderators in the new setup as well.
I can set up an instance right away so we can get past the training as quickly as possible tbh, that's probably an important aspect here. I just need to get in touch with somebody from SUSE IT to set up oidc login, that will hopefully be painless this time. LCP [Sasi] https://lcp.world/
![](https://seccdn.libravatar.org/avatar/008a8db3f6a813af5f8064f2be96e100.jpg?s=120&d=mm&r=g)
On Sat, 04 Jun 2022 14:49:27 -0000, Malcolm Lewis wrote:
Hi As long as the history can be maintained whether that be in a searchable read-only archive or similar, would it not make more sense just to start fresh and move forward?
That makes sense to me as well - I don't know that we need to go back through history "forever" (if we decided we do, maybe we can do an export to a mailing list format and just wrap it into the ML archives? The NNTP gateway is also a ML gateway IIRC).
Setup an instance that works with the current infrastructure login system etc?
Training for the Admins and moderators in the new setup as well.
I think there's a Docker image we could use to just get familiar with the system itself; the bigger deal is working out structure and how what we have translates to Discourse, and all the different config options. I looked at it a few years back, but it's been a while. -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
![](https://seccdn.libravatar.org/avatar/184f2936f5d39b27534f4dd7c4d15bfb.jpg?s=120&d=mm&r=g)
Sasi Olin wrote:
How should we proceed then? From my personal pov, setting up a duplicate vb4 setup and fixing a database, so that we can properly test it and apply those fixes back into the main instance is the best way to solve this issue.
Have you spoken to Olav? My impression is he had already fixed the database issue, but maybe only in vb5. IGnoring the authentication issue, we were ready to migrate to vb5. Otherwise I have no real opinion on it. -- Per Jessen, Zürich (28.1°C) Member, openSUSE Heroes
![](https://seccdn.libravatar.org/avatar/d9f1bf0a7e7bcbfbcc9814ae89f062da.jpg?s=120&d=mm&r=g)
On Sat, 2022-06-04 at 16:07 +0200, Sasi Olin wrote:
Hi,
I am intending to restart work on salting discourse, which has been blocked for years by two things: * Migration to vb5 * Fixing the database in vb4
I know there has been some progress on vb4 database fix, but vb5 migration seems to be blocked on auth which would likely be easier to fix by just migrating to discourse right away, since it uses a system we have integrated in other apps in the project already.
I did in fact spend some time working on a tool for fixing the encoding issues that exist in the forums database. I got it to do something, but it wasn't very good - certainly not something I would let loose on the forums database. At some point, I discovered this post: https://forums.opensuse.org/content.php/14-UTF-8-Encoding-Change TLDR: The encoding errors were created on purpose. Thus, by definition, there isn't anything to fix - even if some posts are full of mojibake. Obviously, that's also when I dropped working on a tool to fix it.
The problem I am facing is that since the vb4 fix isn't public right now, any attempt at database migration testing I could perform on discourse is not gonna be done on the same state of the database as we would get after applying the fix.
How should we proceed then? From my personal pov, setting up a duplicate vb4 setup and fixing a database, so that we can properly test it and apply those fixes back into the main instance is the best way to solve this issue. From there we can just work on migrating to discourse, with testing that actually tests the correct state of the database.
Just go ahead and try to import (a dump of) the current database. Regards, Olav
![](https://seccdn.libravatar.org/avatar/04ffcb9ba74bfcbacabec9db6cee1c3e.jpg?s=120&d=mm&r=g)
Am 4. Juni 2022 23:20:16 MESZ schrieb Olav Reinert <seroton10@gmail.com>:
I did in fact spend some time working on a tool for fixing the encoding issues that exist in the forums database. I got it to do something, but it wasn't very good - certainly not something I would let loose on the forums database.
At some point, I discovered this post:
https://forums.opensuse.org/content.php/14-UTF-8-Encoding-Change
TLDR: The encoding errors were created on purpose. Thus, by definition, there isn't anything to fix - even if some posts are full of mojibake.
Obviously, that's also when I dropped working on a tool to fix it.
Knowing in what way the database is broken, all of the errors are fully fixable, if we know the moment the database broke. I will actually attempt to fix it myself then I guess, because that doesn't sound like a satisfying solution to me, and the actual solution to the problem shouldn't be that hard to implement all things considered. What I'm thinking of doing would be to do two database imports, with different encodings set, one for data with the modify date before the breakage with the old encoding and then the other for data with the modify date after the breakage with the new encoding. The only real issue is finding the exact moment the database broke, which will be easier having the post above. LCP [Sasi] https://lcp.world/
![](https://seccdn.libravatar.org/avatar/04ffcb9ba74bfcbacabec9db6cee1c3e.jpg?s=120&d=mm&r=g)
On So, Jun 5 2022 at 01:21:37 +0200, Sasi Olin <hellcp@opensuse.org> wrote:
What I'm thinking of doing would be to do two database imports, with different encodings set, one for data with the modify date before the breakage with the old encoding and then the other for data with the modify date after the breakage with the new encoding. The only real issue is finding the exact moment the database broke, which will be easier having the post above.
Update on this: picked the date based on that post, and it seems to work to an extent: all the posts after that date seem to be broken (because the db still uses latin1 encoding while vb keeps pushing utf8 content into it) Using ``` mysqldump --host=192.168.47.4 --port=3307 --user=vbulletin \ --password=$password --where "dateline > '1304923020'" webforums vb_post ``` I was able to see the breakage quite easily, and I am able to force mysql to export this as if it was a utf-8 database, so that we get both both parts of the database to be the correct encoding inside the dump Could we maybe create a copy of vb server (and its database), so that we could test what changes we actually need to make for this to work? And on that note, does anyone have a good idea how to find broken characters inside of the dump, so that we can find the earliest date with breakage (in other tables too, I was just using vb_post as an example, because it's the most obvious of the broken tables)? It shouldn't be too difficult to do in an automated way I would assume, but my process for this feels more manual than it probably should be. LCP [Sasi] https://lcp.world/
![](https://seccdn.libravatar.org/avatar/04ffcb9ba74bfcbacabec9db6cee1c3e.jpg?s=120&d=mm&r=g)
On Mo, Jun 6 2022 at 22:24:29 +0200, Sasi Olin <hellcp@opensuse.org> wrote:
Update on this: picked the date based on that post, and it seems to work to an extent: all the posts after that date seem to be broken (because the db still uses latin1 encoding while vb keeps pushing utf8 content into it)
A quick note on that, it's not entirely true, Russian part of the forums was using utf8 since it started, so we will have to exclude all of the posts that belong to that part of the forums and export them together with the rest of utf8 posts after the date. This far I found 1304945809 to be the most recent broken post, but I will have to check it again after I come up with a query that excludes russian forum posts. LCP [Sasi] https://lcp.world/
![](https://seccdn.libravatar.org/avatar/04ffcb9ba74bfcbacabec9db6cee1c3e.jpg?s=120&d=mm&r=g)
On Mi, Jun 15 2022 at 22:19:50 +0200, Sasi Olin <hellcp@opensuse.org> wrote:
A quick note on that, it's not entirely true, Russian part of the forums was using utf8 since it started, so we will have to exclude all of the posts that belong to that part of the forums and export them together with the rest of utf8 posts after the date. This far I found 1304945809 to be the most recent broken post, but I will have to check it again after I come up with a query that excludes russian forum posts.
After some time, I came up with the following queries: ``` # Posts excluding the russian forum (909) select * from vb_post where dateline < '1304944363' and threadid in (select threadid from vb_thread where forumid not in (with recursive cte as (select forumid from vb_forum where forumid='909' union all select c.forumid from vb_forum c join cte ON cte.forumid=c.parentid) select forumid from cte)); # Threads excluding the russian forum (909) select * from vb_thread where dateline < '1304944363' and threadid in (select threadid from vb_thread where forumid not in (with recursive cte as (select forumid from vb_forum where forumid='909' union all select c.forumid from vb_forum c join cte ON cte.forumid=c.parentid) select forumid from cte)); # All of pms select * from vb_pmtext; ``` I am not touching the rest of the forums, since we will not need it for discourse migration anyway. Should we schedule maintenance for the forums and fix them using those? What I would essentially do would be: 1. Dump the entire database into utf8 charset (essentially intentionally breaking it with mysqldump --set-charset=utf8 > ...) 2. Dump the bits of the database with latin1 encoding using the queries above (with mysqldump --set-charset=latin1 --single-transaction=TRUE \ --where="$query_after_where" $database $table > ...) 3. Import the big dump into a new database 4. Override the broken database bits with the small dumps It seems like the least potential for breakage this way, since all of those seem properly indexed but I am open to suggestions. There are some older and newer posts with broken encoding, because a lot of people were using utf8 with nntp before the switch and some people kept using signatures with broken encoding past the utf8 switch. We can keep around the old database and fix the posts one by one, but people did reference broken posts in the forums themselves as well, so it would be breaking the actual flow of the conversations at that point. LCP [Sasi] https://lcp.world/
![](https://seccdn.libravatar.org/avatar/008a8db3f6a813af5f8064f2be96e100.jpg?s=120&d=mm&r=g)
On Fri, 17 Jun 2022 04:30:52 +0200, Sasi Olin wrote:
I am not touching the rest of the forums, since we will not need it for discourse migration anyway. Should we schedule maintenance for the forums and fix them using those? What I would essentially do would be: 1. Dump the entire database into utf8 charset (essentially intentionally breaking it with mysqldump --set-charset=utf8 > ...) 2. Dump the bits of the database with latin1 encoding using the queries above (with mysqldump --set-charset=latin1 --single-transaction=TRUE \ --where="$query_after_where" $database $table > ...) 3. Import the big dump into a new database 4. Override the broken database bits with the small dumps
It seems like the least potential for breakage this way, since all of those seem properly indexed but I am open to suggestions.
There are some older and newer posts with broken encoding, because a lot of people were using utf8 with nntp before the switch and some people kept using signatures with broken encoding past the utf8 switch. We can keep around the old database and fix the posts one by one, but people did reference broken posts in the forums themselves as well, so it would be breaking the actual flow of the conversations at that point.
Coming back to this discussion - it looks like the test environment has been offline for a while (I've noticed a 502 Gateway error for > 1 month now). We're having a discussion inside the forums admin/mod team about how we might move this forward without worrying about migrating the existing content - in the interests of making some progress. But I think it would also inform our discussion to know if anything more has happened with the database migration - and if there's a way for us to get back into the sandbox system to see what structure was able to be built out (I know Discourse only allows categories and sub-categories, but no further levels of sub-categories can be created, which means 2 levels rather than 3, as we have now). Jim -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
![](https://seccdn.libravatar.org/avatar/04ffcb9ba74bfcbacabec9db6cee1c3e.jpg?s=120&d=mm&r=g)
On Mi, Sep 7 2022 at 17:21:49 -0000, Jim Henderson <hendersj@gmail.com> wrote:
Coming back to this discussion - it looks like the test environment has been offline for a while (I've noticed a 502 Gateway error for > 1 month now).
We're having a discussion inside the forums admin/mod team about how we might move this forward without worrying about migrating the existing content - in the interests of making some progress.
Apologies for being quiet here, but I am planning on moving right now, so I couldn't move this any further, I will be able to get back into it in October. This was additionally delayed because I did not get an answer about postgres user and database for a while, so I couldn't even setup an old db for our discourse setup to use. That's why it has been stuck on 502 for this long. When it comes to the migration, I was thinking it may be a good idea to notify the people whose data won't be migrated (blog and articles creators) that they should ask us for backup of their posts. This can be done earlier, assuming nobody will post new blog posts or articles.
But I think it would also inform our discussion to know if anything more has happened with the database migration - and if there's a way for us to get back into the sandbox system to see what structure was able to be built out (I know Discourse only allows categories and sub-categories, but no further levels of sub-categories can be created, which means 2 levels rather than 3, as we have now).
As it works right now, it removes the third layer entirely, so we are left with language categories and topic subcategories. LCP [Jake] https://lcp.world/
![](https://seccdn.libravatar.org/avatar/008a8db3f6a813af5f8064f2be96e100.jpg?s=120&d=mm&r=g)
On Wed, 07 Sep 2022 19:30:29 +0200, Jacob Michalskie wrote:
On Mi, Sep 7 2022 at 17:21:49 -0000, Jim Henderson <hendersj@gmail.com> wrote:
Coming back to this discussion - it looks like the test environment has been offline for a while (I've noticed a 502 Gateway error for > 1 month now).
We're having a discussion inside the forums admin/mod team about how we might move this forward without worrying about migrating the existing content - in the interests of making some progress.
Apologies for being quiet here, but I am planning on moving right now, so I couldn't move this any further, I will be able to get back into it in October.
No worries, Jake - I know how moving is. :)
This was additionally delayed because I did not get an answer about postgres user and database for a while, so I couldn't even setup an old db for our discourse setup to use. That's why it has been stuck on 502 for this long.
That's good to know. I set up my own instance to look at, so the forums team will be able to access it and work on the structure.
When it comes to the migration, I was thinking it may be a good idea to notify the people whose data won't be migrated (blog and articles creators) that they should ask us for backup of their posts. This can be done earlier, assuming nobody will post new blog posts or articles.
That sounds good. I don't know that we're using that functionality a lot anyways.
But I think it would also inform our discussion to know if anything more has happened with the database migration - and if there's a way for us to get back into the sandbox system to see what structure was able to be built out (I know Discourse only allows categories and sub-categories, but no further levels of sub-categories can be created, which means 2 levels rather than 3, as we have now).
As it works right now, it removes the third layer entirely, so we are left with language categories and topic subcategories.
That's good to know. I think that probably makes sense, and was the direction that I was thinking in playing around with my isolated configuration. Good luck with your move! -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
participants (6)
-
Jacob Michalskie
-
Jim Henderson
-
Malcolm Lewis
-
Olav Reinert
-
Per Jessen
-
Sasi Olin