Re: User support data anyalsis: Request to export specific subforums
Hey Malcom, thanks for your quick reply, Added heroes to addressees to my reply (this message). We are looking for: - the full contents of - all posts (original posts + replies) - thread titles (unless it's already included in original posts) - of the [2017-1-1, 2020-12-7] (Dec 12 of this month) period. - exported to json Somewhat doable? Thanks, Adrien Le mar. 8 déc. 2020 à 14:57, Malcolm <malcolmlewis@opensuse.org> a écrit :
On Tue, 08 Dec 2020 12:27:47 +0100 Adrien Glauser <adrien.glauser@gmail.com> wrote:
Hi there,
Attila (CC'ed in this very message) and I are working on the next version of the documentation (https://github.com/openSUSE/openSUSE-docs-revamped) and we would like to perform data analysis on the openSUSE platforms to better understand difficulties for different categories of users.
The forums are the last oS platform we haven't collected data from yet. Could you help us extract all contents from subforums - Install/Boot/Login - Applications - Hardware - Network/Internet for the [2017-1-1 -> 2020-12-7] (12th of January 2020) period?
That would be very appreciated. It would also provide a nice "objective" complement to poll I am orgnanizing here https://lists.opensuse.org/archives/list/project@lists.opensuse.org/thread/B....
Have a nice day!
Adrien Hi Adrien We have no access to the database, that's under the control of the Heroes team.
What sort of data are you after? We can look at the release prefix data, but that normally gets changed after a release to archive off the old prefix. Or are you just after the thread/post count?
-- Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890) Tumbleweed 20201205 | GNOME Shell 3.38.2 | 5.9.12-1-default Intel DQ77MK MB | Xeon E3-1245 V2 X8 @ 3.40 GHz | Intel/Nvidia up 1 day 10:43, 2 users, load average: 0.05, 0.07, 0.30
On Tue, 8 Dec 2020 15:08:49 +0100 Adrien Glauser <adrien.glauser@gmail.com> wrote:
Hey Malcom, thanks for your quick reply,
Added heroes to addressees to my reply (this message).
We are looking for: - the full contents of - all posts (original posts + replies) - thread titles (unless it's already included in original posts) - of the [2017-1-1, 2020-12-7] (Dec 12 of this month) period. - exported to json
Somewhat doable?
Hi Adrien Please keep the forum-admin group in the reply, there are two other Admins to keep in the conversation. My only concern would be having to take the forums offline for such a data dump. But again, are there some specific metrics your after? Seems to me a crafted sql query would also reduce the data amount? @Jim & Gertjan any other thoughts or concerns? -- Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890) Tumbleweed 20201205 | GNOME Shell 3.38.2 | 5.9.12-1-default Intel DQ77MK MB | Xeon E3-1245 V2 X8 @ 3.40 GHz | Intel/Nvidia up 1 day 11:18, 2 users, load average: 0.01, 0.39, 0.80
Sorry Malcom, changed email client recently and it's a quite a battle still.
Exporting from the forums allegedly requires taking down the DB I certaintly DO NOT want to have the forums taken down at any point. I am not a MySQL guy but in my book `mysqldump` tries to dump without blocking. If that's not enought I am quite confident they are other ways to export in a non-blocking way. More knowledgeable folks reading this are warmly invited to confirm or infirm.
What metrics am I after? I can't be more specific than I've been already. We'll be doing data analysis with ELK so we need not more and not less than I've characterized. The goal is to identify patterns in issues and pain points after performing full-text analysis.
Have a pleasant afternoon, Adrien Le mardi 8 décembre 2020, 15:41:00 CET Malcolm a écrit :
On Tue, 8 Dec 2020 15:08:49 +0100 Adrien Glauser <adrien.glauser@gmail.com> wrote:
Hey Malcom, thanks for your quick reply,
Added heroes to addressees to my reply (this message).
We are looking for: - the full contents of - all posts (original posts + replies) - thread titles (unless it's already included in original posts) - of the [2017-1-1, 2020-12-7] (Dec 12 of this month) period. - exported to json
Somewhat doable?
Hi Adrien Please keep the forum-admin group in the reply, there are two other Admins to keep in the conversation.
My only concern would be having to take the forums offline for such a data dump.
But again, are there some specific metrics your after? Seems to me a crafted sql query would also reduce the data amount?
@Jim & Gertjan any other thoughts or concerns?
Adrien Glauser wrote:
Exporting from the forums allegedly requires taking down the DB
I certaintly DO NOT want to have the forums taken down at any point. I am not a MySQL guy but in my book `mysqldump` tries to dump without blocking. If that's not enought I am quite confident they are other ways to export in a non-blocking way. More knowledgeable folks reading this are warmly invited to confirm or infirm.
mysqldump will not be able to export unless mysql is running. -- Per Jessen, Zürich (1.4°C) Member, openSUSE Heroes
Op dinsdag 8 december 2020 15:41:00 CET schreef Malcolm:
On Tue, 8 Dec 2020 15:08:49 +0100 Adrien Glauser <adrien.glauser@gmail.com> wrote:
Hey Malcom, thanks for your quick reply,
Added heroes to addressees to my reply (this message).
We are looking for: - the full contents of - all posts (original posts + replies) - thread titles (unless it's already included in original posts) - of the [2017-1-1, 2020-12-7] (Dec 12 of this month) period. - exported to json
Somewhat doable?
Hi Adrien Please keep the forum-admin group in the reply, there are two other Admins to keep in the conversation.
My only concern would be having to take the forums offline for such a data dump.
Same here.
But again, are there some specific metrics your after? Seems to me a crafted sql query would also reduce the data amount?
I'd do this with a proper query rather than mysqldump.
@Jim & Gertjan any other thoughts or concerns?
Private Messages would also be included. The feature may not be used much, but to me private is private, and these data should be excluded.
-- Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890) Tumbleweed 20201205 | GNOME Shell 3.38.2 | 5.9.12-1-default Intel DQ77MK MB | Xeon E3-1245 V2 X8 @ 3.40 GHz | Intel/Nvidia up 1 day 11:18, 2 users, load average: 0.01, 0.39, 0.80 _______________________________________________ Forums Admin mailing list -- forums-admin@lists.opensuse.org To unsubscribe, email forums-admin-leave@lists.opensuse.org List Netiquette: https://en.opensuse.org/openSUSE:Mailing_list_netiquette List Archives: https://lists.opensuse.org/archives/list/forums-admin@lists.opensuse.org
-- Gertjan Lettink a.k.a. Knurpht openSUSE Forums Team
Knurpht-openSUSE wrote:
Op dinsdag 8 december 2020 15:41:00 CET schreef Malcolm:
My only concern would be having to take the forums offline for such a data dump.
Same here.
That should not be necessary, I think.
But again, are there some specific metrics your after? Seems to me a crafted sql query would also reduce the data amount?
I'd do this with a proper query rather than mysqldump.
The issue is - that requires vBulletin internals knowledge that I don't have. If anyone can write the query, I would however be happy to run it. Anyway, there is an open ticket #80858, maybe it is best to continue there? -- Per Jessen, Zürich (1.5°C) Member, openSUSE Heroes
On Tue, 08 Dec 2020 17:22:44 +0100, Knurpht-openSUSE wrote:
Private Messages would also be included. The feature may not be used much, but to me private is private, and these data should be excluded.
Good catch, and I would agree with that. The same holds true for any private forums we have, even though some on the heroes team have access to that data anyways - we wouldn't want to expose that data publicly. -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
On Tue, 08 Dec 2020 08:41:00 -0600, Malcolm wrote:
On Tue, 8 Dec 2020 15:08:49 +0100 Adrien Glauser <adrien.glauser@gmail.com> wrote:
Hey Malcom, thanks for your quick reply,
Added heroes to addressees to my reply (this message).
We are looking for: - the full contents of - all posts (original posts + replies) - thread titles (unless it's already included in original posts) - of the [2017-1-1, 2020-12-7] (Dec 12 of this month) period. - exported to json
Somewhat doable?
Hi Adrien Please keep the forum-admin group in the reply, there are two other Admins to keep in the conversation.
My only concern would be having to take the forums offline for such a data dump.
But again, are there some specific metrics your after? Seems to me a crafted sql query would also reduce the data amount?
@Jim & Gertjan any other thoughts or concerns?
I think a database dump like this would be possible without taking the forums offline - it may even be possible to export from the admincp. Exported to JSON might require data transformation - but I think you've got it covered in terms of looking for information about what the metrics are that are being sought. It's possible that data might be available even without doing a database dump. -- Jim Henderson Please keep on-topic replies on the list so everyone benefits
participants (5)
-
Adrien Glauser
-
Jim Henderson
-
Knurpht-openSUSE
-
Malcolm
-
Per Jessen