-
Notifications
You must be signed in to change notification settings - Fork 335
Add Safe Ecto Migration guides #720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Safe Ecto Migration guides #720
Conversation
| If you're using Phoenix and PhoenixEcto, you will likely appreciate disabling | ||
| the migration lock in the CheckRepoStatus plug during dev to avoid hitting and | ||
| waiting on the advisory lock with concurrent web processes. You can do this by | ||
| adding `migration_lock: false` to the CheckRepoStatus plug in your | ||
| `MyAppWeb.Endpoint`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If you're using Phoenix and PhoenixEcto, you will likely appreciate disabling | |
| the migration lock in the CheckRepoStatus plug during dev to avoid hitting and | |
| waiting on the advisory lock with concurrent web processes. You can do this by | |
| adding `migration_lock: false` to the CheckRepoStatus plug in your | |
| `MyAppWeb.Endpoint`. | |
| If you're using Phoenix, you will likely need to disable | |
| the migration lock in the CheckRepoStatus plug during dev to avoid hitting and | |
| waiting on the advisory lock with concurrent web processes. You can do this by | |
| adding `migration_lock: false` to the CheckRepoStatus plug in your | |
| `MyAppWeb.Endpoint`. |
|
|
||
| Note: we cannot use `Ecto.Migration.modify/3` as it will include updating the column type as | ||
| well unnecessarily, causing Postgres to rewrite the table. For more information, | ||
| [see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we link to an external repo? Perhaps we remove the link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the issue (hah) leaving the link, it's mostly a great issue explaining and reproducing the problem for those that want more info.
Removing the link is fine though
|
|
||
| The issue is that we cannot use `Ecto.Migration.modify/3` as it will include updating the column type as | ||
| well unnecessarily, causing Postgres to rewrite the table. For more information, | ||
| [see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link here too!
|
|
||
| ### Good | ||
|
|
||
| **Strategy 1** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call these **Option 1** and so on for consistency with the rest of the guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively rename Option 1 to Strategy. :)
|
|
||
| ## Adding a value to a PostgreSQL enum | ||
|
|
||
| Adding enum values inside a transaction fails in PostgreSQL < 12. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PostgreSQL 12 is already EOLed, so we can remove it.
|
|
||
| Here's how we'll manage the backfill: | ||
|
|
||
| 1. Create a "temporary" table. In this example, we're creating a real table that we'll drop at the end of the data migration. In Postgres, there are [actual temporary tables](https://www.postgresql.org/docs/12/sql-createtable.html) that are discarded after the session is over; we're not using those because we need resiliency in case the data migration encounters an error. The error would cause the session to be over, and therefore the temporary table tracking progress would be lost. Real tables don't have this problem. Likewise, we don't want to store IDs in application memory during the migration for the same reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a temporary table, couldn't we use a temporary column? Or is the issue that removing the column later would be expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It assumes the backfill is about one column, and if so that could be another option, but it doesn't have to be: the user might be doing multiple fixes and write to multiple columns.
The point is that we need to store a list of records that a generic operation needs to run on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Jose means you add a column to say "fix applied: yes or no". Then when you iterate through your table doing the updates you use that column to prevent yourself from applying the fix more than once in case of restarts. And whatever filter you used to populate your temporary table can be used to stop the iterating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I understand now. Yeah that could be a good approach but yes also have to consider the same gotchas with adding columns with defaults on large existing tables.
I find a separate temporary table safer and less coupled. Less likely to fool with application logic
| @@ -0,0 +1,365 @@ | |||
| # Backfilling Data | |||
|
|
|||
| When I say "backfilling data", I mean that as any attempt to change data in bulk. This can happen in code through migrations, application code, UIs that allow multiple selections and updates, or in a console connected to a running application. Since bulk changes affect a lot of data, it's always a good idea to have the code reviewed before it runs. You also want to check that it runs efficiently and does not overwhelm the database. Ideally, it's nice when the code is written to be safe to re-run. For these reasons, please don't change data in bulk through a console! | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only place we use first person in the guides, so we should make it consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I kept looking at this phrase and didn't like it. I can rephrase
|
Thank you @dbernheisel! This looks excellent and I do have some quick feedback:
|
|
Absolutely. I'll go through feedback today |
|
The gotchas are important however for each flavor. The MyXQL adapter uses advisory locks so there are less issues with transactions. The Postgres adapter does not default to advisory locks (maybe it should?) so that it can avoid some gotchas MSSQL I have less experience so more verification is needed Sqlite3 is different enough where I don't suspect these same gotchas apply, but again more verification is needed. I think the recipes could be formatted better to have callouts per adapter? It does cover MySQL and Postgres, but is silent on the others. |
Yeah, I was not asking to remove those, those are really great! It was mostly the debugging locks bits and the reference material, so I pushed my changes already. We could have a separate document on debugging PG locks but as I said, easier to do in a future PR. :) From my point of view, nothing else needs to removed. There are still some PG specific commands still but it is very easy for someone to consult those for other databases. |
No description provided.