[opensuse-buildservice] Proposal to use data_migrate gem for API
Proposal to use data_migrate gem for API Hi all, I would like to propose that we use the data_migrate https://github.com/ilyakatz/data-migrate gem to handle changes to db data. Here is a PR https://github.com/openSUSE/open-build-service/pull/3701 I have opened where I need to update each notification's event_payload from YAML to JSON serialisation. This doesn't make sense to do in a normal migration because it has nothing to do with database structure only the content of the database. This gem allows us to handle such changes in the same way we handle normal migrations. The alternative is to use rake tasks for data changes which is what I initially did in the PR in this commit https://github.com/openSUSE/open-build-service/pull/3701/commits/f2c9ef6a65c.... However the downsides of that are that we need to always update the README's to let updaters know exactly which rake tasks need to be run to update their db and it also makes it easier to sync data changes with other developers. All thats needed to get your database up to date with the gem is to run `rake db:migrate:with_data`. If you have any opinions please post on the PR to continue this discussion. Thanks -- Evan Rolfe Full Stack Web Developer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
Hi, On 09/04/2017 12:13 PM, Evan Rolfe wrote:
Proposal to use data_migrate gem for API
Hi all, I would like to propose that we use the data_migrate https://github.com/ilyakatz/data-migrate gem to handle changes to db data. Here is a PR https://github.com/openSUSE/open-build-service/pull/3701 I have opened where I need to update each notification's event_payload from YAML to JSON serialisation. This doesn't make sense to do in a normal migration because it has nothing to do with database structure only the content of the database. This gem allows us to handle such changes in the same way we handle normal migrations.
The alternative is to use rake tasks for data changes which is what I initially did in the PR in this commit https://github.com/openSUSE/open-build-service/pull/3701/commits/f2c9ef6a65c....
However the downsides of that are that we need to always update the README's to let updaters know exactly which rake tasks need to be run to update their db and it also makes it easier to sync data changes with other developers. All thats needed to get your database up to date with the gem is to run `rake db:migrate:with_data`.
If you have any opinions please post on the PR to continue this discussion.
Thanks
I'm fine with it and like the idea! Go ahead :) Christian
On Dienstag, 5. September 2017, 16:18:37 CEST wrote Christian Bruckmayer:
Hi,
On 09/04/2017 12:13 PM, Evan Rolfe wrote:
Proposal to use data_migrate gem for API
Hi all, I would like to propose that we use the data_migrate https://github.com/ilyakatz/data-migrate gem to handle changes to db data. Here is a PR https://github.com/openSUSE/open-build-service/pull/3701 I have opened where I need to update each notification's event_payload from YAML to JSON serialisation. This doesn't make sense to do in a normal migration because it has nothing to do with database structure only the content of the database. This gem allows us to handle such changes in the same way we handle normal migrations.
well, we used to do that also with current migrations? Eg. check new issue tracker entries or new attributes.
The alternative is to use rake tasks for data changes which is what I initially did in the PR in this commit https://github.com/openSUSE/open-build-service/pull/3701/commits/f2c9ef6a65c....
However the downsides of that are that we need to always update the README's to let updaters know exactly which rake tasks need to be run to update their db and it also makes it easier to sync data changes with other developers. All thats needed to get your database up to date with the gem is to run `rake db:migrate:with_data`.
I don't get this exactly, does this mean that there is no single command to update all data to current state? You always have to know lot's of special commands when updating from 2.8 to 2.9 documented in the README files? Does this also mean that it breaks our update tests in CI and auto deployment? I am strongly against this in that case.... Or do I miss something here? bye adrian
If you have any opinions please post on the PR to continue this discussion.
Thanks
I'm fine with it and like the idea! Go ahead :)
Christian
-- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
I don't get this exactly, does this mean that there is no single command to update all data to current state? Please keep in mind the distinction between database and data, in this context by "database" i mean the database structure. And "data" means
Adrian, thanks for raising these concerns. Let me just explain a use case of data migrations: We have an event_payload column which is serialised in YAML in the notifications table, we need to change this to be serialised in JSON to fix issue #3638 https://github.com/openSUSE/open-build-service/issues/3638 . To do this we need a script to convert the serialisation of existing notifications from YAML to JSON, but we only want to run this once. We could create a rake task to do it, but then there is the problem that we need to add that rake task to the update guide so upgrading becomes more complicated and also all other developers will need to run this rake task to get their database up to date too. The other option is to use a normal database migration to handle the conversion from yaml to json. However then we would have a database migration which does not change the database structure at all. Also it means that we have to include the database structure changes (which require downtime) with the data changes (which might not require downtime) so for some data changes which might take a long time (i.e. > 1hour) that would be difficult to deploy if there were also database migrations that needed running. (This is precisely the case when changing the serialisation of project_log_entires, the script to do that will take at least a couple hours because there are ~3million rows in the project_log_entries table). See this stackoverflow: Rails migration: only for schema change or also for updating data? https://stackoverflow.com/questions/19387440/rails-migration-only-for-schema... On 05/09/17 16:42, Adrian Schröter wrote: the content of the rows in tables. So the command to get the database up to date is (as it always has been) `rake db:migrate`. This gem now gives us a new command to get the data up to date (which we didn't have before) which is `rake data:migrate`.
You always have to know lot's of special commands when updating from 2.8 to 2.9 documented in the README files? No, thats one of the main reasons to use this gem, it gives you the `rake data:migrate` so that you don't have to know lots of special commands when updating. It even gives you this rake task which performs both the database and data migrate commands:
`rake db:migrate:with_data`
Does this also mean that it breaks our update tests in CI and auto deployment? It won't break any CI stuff because this is only for existing obs instances with populated databases. If you're creating a new obs instance from scratch then this rake task is not necessary. I'm not sure what "auto deployment" is but the deployment process in the wiki will need to be changed to make sure that we run `rake db:migrate:with_data` instead of just running `rake db:migrate`.
-- Evan Rolfe Full Stack Web Developer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-0; Fax: +49-911-7417755;https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
On Mittwoch, 6. September 2017, 10:02:25 CEST wrote Evan Rolfe:
Adrian, thanks for raising these concerns. Let me just explain a use case of data migrations:
We have an event_payload column which is serialised in YAML in the notifications table, we need to change this to be serialised in JSON to fix issue #3638 https://github.com/openSUSE/open-build-service/issues/3638 .
To do this we need a script to convert the serialisation of existing notifications from YAML to JSON, but we only want to run this once. We could create a rake task to do it, but then there is the problem that we need to add that rake task to the update guide so upgrading becomes more complicated and also all other developers will need to run this rake task to get their database up to date too.
The other option is to use a normal database migration to handle the conversion from yaml to json. However then we would have a database migration which does not change the database structure at all. Also it means that we have to include the database structure changes (which require downtime) with the data changes (which might not require downtime) so for some data changes which might take a long time (i.e. > 1hour) that would be difficult to deploy if there were also database migrations that needed running. (This is precisely the case when changing the serialisation of project_log_entires, the script to do that will take at least a couple hours because there are ~3million rows in the project_log_entries table).
Okay, but I don't see this mixture as a big problem, because * Updaters from OBS 2.8.x to 2.9.x need a down time anyway * Updaters like us, who run on git master should follow the migrations and understand the nature. We can still run this migration in parallel and without downtime. We just need to ensure that there isn't another migration afterwards, which requires a downtime, right? Btw, we used to do even structural changes without downtime, if we know that the old code won't cause problems (eg. when just adding a new column which does not harm). On the other side, I would like to keep to steps for updating as small as possible to the user. So I am still very much in favor to do this with our standard migrations, if you don't see a problem in my points above. good morning :) adrian
See this stackoverflow: Rails migration: only for schema change or also for updating data? https://stackoverflow.com/questions/19387440/rails-migration-only-for-schema...
On 05/09/17 16:42, Adrian Schröter wrote:
I don't get this exactly, does this mean that there is no single command to update all data to current state? Please keep in mind the distinction between database and data, in this context by "database" i mean the database structure. And "data" means the content of the rows in tables. So the command to get the database up to date is (as it always has been) `rake db:migrate`. This gem now gives us a new command to get the data up to date (which we didn't have before) which is `rake data:migrate`.
You always have to know lot's of special commands when updating from 2.8 to 2.9 documented in the README files? No, thats one of the main reasons to use this gem, it gives you the `rake data:migrate` so that you don't have to know lots of special commands when updating. It even gives you this rake task which performs both the database and data migrate commands:
`rake db:migrate:with_data`
Does this also mean that it breaks our update tests in CI and auto deployment? It won't break any CI stuff because this is only for existing obs instances with populated databases. If you're creating a new obs instance from scratch then this rake task is not necessary. I'm not sure what "auto deployment" is but the deployment process in the wiki will need to be changed to make sure that we run `rake db:migrate:with_data` instead of just running `rake db:migrate`.
-- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On 08/09/17 08:24, Adrian Schröter wrote:
We just need to ensure that there isn't another migration afterwards, which requires a downtime, right?
This is the part that concerns me, how do we ensure that? If you have a data-migration which takes 3 hours but is lumped in with the other database migrations how are we going to ensure that the 3 hour data-migration is run without downtime but the other migrations are run with downtime? I would also be open to using a rake task for the data migration if you would prefer over using a third party gem to handle this? -- Evan Rolfe Full Stack Web Developer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On Montag, 11. September 2017, 11:53:33 CEST wrote Evan Rolfe:
On 08/09/17 08:24, Adrian Schröter wrote:
We just need to ensure that there isn't another migration afterwards, which requires a downtime, right?
This is the part that concerns me, how do we ensure that? If you have a data-migration which takes 3 hours but is lumped in with the other database migrations how are we going to ensure that the 3 hour data-migration is run without downtime but the other migrations are run with downtime?
I would also be open to using a rake task for the data migration if you would prefer over using a third party gem to handle this?
I don't mind the rubygem, just the additional needed step. Could we run the data modifications also always when "db:migrate" is called? That way you can opt-in to do data changes only, but we don't need to teach people yet another command to run on next OBS version update. -- Adrian Schroeter email: adrian@suse.de SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) Maxfeldstraße 5 90409 Nürnberg Germany -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On 11/09/17 12:43, Adrian Schröter wrote:
Could we run the data modifications also always when "db:migrate" is called? Yes I'm sure thats possible but it would defeat the purpose of having data migrations separate from database migrations. That way you can opt-in to do data changes only, but we don't need to teach people yet another command to run on next OBS version update. I don't think we can avoid adding another command to the OBS version update process. If we want to update a table which has ~3million rows (project_log_entries) then the script to do that will at least an hour. So as I see it we have these three options (maybe there are alternatives?):
1. Include the script in a normal rails migration => Downside: the updaters will have the extra step of making sure that this particular migration is run without downtime (I dont know how that would even work?). 2. We use the data_migration gem => Downside: the updaters have to run a second command `rake data:migrate` 3. We use a rake task => Downside: the updaters have to run a rake task, and any future data changes will also require new rake tasks so there are potentially many more steps involved for updating as opposed to just one step: `rake data:migrate`. I don't see how #1 is going to work but if you have an idea then I would be open to that too. -- Evan Rolfe Full Stack Web Developer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Hey, On 11.09.2017 13:57, Evan Rolfe wrote:
On 11/09/17 12:43, Adrian Schröter wrote:
Could we run the data modifications also always when "db:migrate" is called? Yes I'm sure thats possible but it would defeat the purpose of having data migrations separate from database migrations.
The point is defaults. So far the default was that every kind of migrations was lumped together. So we have two options 1. Change the default, make our Users aware of rake db:migrate:with_data 2. Don't change the default, make `db:migrate` do what `db:migrate:with_data` does. Make another rake task that does the same as `db:migrate` for people who want to untangle migration kinds. I'm sure you can already guess what I really like to avoid ;-) Henne -- Henne Vogelsang http://www.opensuse.org Everybody has a plan, until they get hit. - Mike Tyson -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On 09/11/2017 11:53 AM, Evan Rolfe wrote:
On 08/09/17 08:24, Adrian Schröter wrote:
We just need to ensure that there isn't another migration afterwards, which requires a downtime, right?
This is the part that concerns me, how do we ensure that? If you have a data-migration which takes 3 hours but is lumped in with the other database migrations how are we going to ensure that the 3 hour data-migration is run without downtime but the other migrations are run with downtime?
Well, since we have to run the migrations manually, we could just run them independently. Assuming the migrations themself don't depend on each other. This might not be an option for other people that host OBS. Though they are probably not affected as much by this problem as we are by hosting a public service. Björn
I would also be open to using a rake task for the data migration if you would prefer over using a third party gem to handle this?
-- Björn Geuken - Rails Developer - Open Build Service SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
On 09/04/2017 12:13 PM, Evan Rolfe wrote:
Proposal to use data_migrate gem for API
Hi all, I would like to propose that we use the data_migrate https://github.com/ilyakatz/data-migrate gem to handle changes to db data. Here is a PR https://github.com/openSUSE/open-build-service/pull/3701 I have opened where I need to update each notification's event_payload from YAML to JSON serialisation. This doesn't make sense to do in a normal migration because it has nothing to do with database structure only the content of the database. This gem allows us to handle such changes in the same way we handle normal migrations.
The alternative is to use rake tasks for data changes which is what I initially did in the PR in this commit https://github.com/openSUSE/open-build-service/pull/3701/commits/f2c9ef6a65c....
However the downsides of that are that we need to always update the README's to let updaters know exactly which rake tasks need to be run to update their db and it also makes it easier to sync data changes with other developers. All thats needed to get your database up to date with the gem is to run `rake db:migrate:with_data`.
If you have any opinions please post on the PR to continue this discussion.
Thanks
Sounds good to me. Let's try it out! Björn
-- Evan Rolfe Full Stack Web Developer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- Björn Geuken - Rails Developer - Open Build Service SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-buildservice+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-buildservice+owner@opensuse.org
Just for the record, we've agreed in a meeting that we will keep database structure migrations separate from database content migrations, the latter of which will be handled by the data_migrate gem. This means that we will need to run `rake db:migrate:with_data` instead of just `rake db:migrate` so we will need to update the appliance to run that command too. On 04/09/17 11:13, Evan Rolfe wrote:
Proposal to use data_migrate gem for API
Hi all, I would like to propose that we use the data_migrate https://github.com/ilyakatz/data-migrate gem to handle changes to db data. Here is a PR https://github.com/openSUSE/open-build-service/pull/3701 I have opened where I need to update each notification's event_payload from YAML to JSON serialisation. This doesn't make sense to do in a normal migration because it has nothing to do with database structure only the content of the database. This gem allows us to handle such changes in the same way we handle normal migrations.
The alternative is to use rake tasks for data changes which is what I initially did in the PR in this commit https://github.com/openSUSE/open-build-service/pull/3701/commits/f2c9ef6a65c....
However the downsides of that are that we need to always update the README's to let updaters know exactly which rake tasks need to be run to update their db and it also makes it easier to sync data changes with other developers. All thats needed to get your database up to date with the gem is to run `rake db:migrate:with_data`.
If you have any opinions please post on the PR to continue this discussion.
Thanks
-- Evan Rolfe Full Stack Web Developer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-0; Fax: +49-911-7417755;https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- Evan Rolfe Full Stack Web Developer SUSE Linux GmbH, Maxfeldstr. 5, D-90409 Nürnberg Tel: +49-911-74053-0; Fax: +49-911-7417755; https://www.suse.com/ SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
participants (5)
-
Adrian Schröter
-
Björn Geuken
-
Christian Bruckmayer
-
Evan Rolfe
-
Henne Vogelsang