July 20, 2016 | Tyler Barto
Data Migration Service Overview
DMS is used to migrate a source database to a target database, supporting both homogeneous and heterogeneous migrations i.e. like-to-like or like-to-unlike migrations.
In order to replicate data from the source to target, DMS uses a special replication instance in AWS to read from source and put on the target. Nothing is written to the source database to facilitate the transfer. No agent needs to be installed and the database remains as it was before migration.
For all of this to work, tasks need to be defined in DMS as:
- the endpoints,
- replication instance, and
- type of migration.
In order to use DMS for a production database, the schema should be migrated and ready to go before starting the migration. DMS can perform very basic schema transfer and conversion. For homogenous migrations, proprietary tools for the particular database should be used whereas heterogeneous migrations use a schema conversion tool, for instance AWS SCT prior to the DMS migration.
To create a migration, the following objects must be defined at the beginning:
- Source Endpoint – existing database from which tables are to be migrated
- Destination Endpoint – target database to which tables are to be migrated
- Replication Server – Special type of instance used exclusively for retrieving data from the source database, performing nominal transformation if desired, and putting data on the target database
- Tasks – Define which replication instances and endpoints to use, as well as the migration type
Creating a Replication Instance
When creating a replication instance, there are only a few items that need to be defined, as shown below, such as name, description, instance class, VPC, and whether or not it will be publicly accessible.
Currently, T2 and C4 types can be used for instance class. Consider; though, when choosing an instance class, especially for a production migration, there is real benefit to selecting a class with more compute resources available to it; The bigger the instance, the faster the migration will complete.
Connectivity is the key consideration when selecting a VPC. Instances in the VPC selected here must be able to connect to the endpoints. If there is an on-premise database, there should be a direct-connect, VPN or some other means to connect the equipment. Or if both endpoints are in AWS, but in separate VPCs, then, VPC peering, or a VPN is required.
The VPC also needs a security group with Egress rules allowing traffic to both endpoints. There is a provision for the default security groups for the VPC containing the replication instance, but that can be changed if the replication instance is supplied with a public IP by referring to the EIP in the EC2 console and altering the security group reference there.
Some additional configuration can be made when creating a replication instance, as shown below:
It is possible to increase the amount of storage attached to the replication instance. While it may seem rational to match the storage with that used by the database being migrated, the replication instance only uses local storage for cached data. So unless there are a great number of transactions on the replication source, there is a slow connection to the target or some other cause resulting in the need for a large cache, the default size is generally fine.
At least two endpoints are needed, a source and destination, with at least one of them existing in AWS. The configuration elements for an endpoint are shown here:
First, we need to define whether the endpoint will be a source or destination and also define the identifier that will be used for reference by the migration task. Next, the database engine and specific connection information must be supplied. Extra connection attributes can also be provided.
With all endpoint details in order, it is good practice to test endpoint connectivity and there is a handy place in the creation screen to do just that, as shown above. Simply select the VPC and replication instance from which to test and hit the “Run Test” button which will provide pass or fail information. In order to successfully complete the test, confirm the configurations on any network firewalls, NACLs, security groups, etc. as they are required to allow the replication instances to communicate with the endpoints on the DB listener ports.
Creating a Task
With a replication instance, source endpoint, and destination endpoint each defined, a task can be created. The configuration page for which is shown below:
- Migrate existing data – full load
- Migrate existing database and replicate ongoing changes
- Replicate data changes only
Migrating existing data discards any transactions that might occur during the migration while, “Migrate existing database and replicate ongoing changes” caches the mid-migration transactions and applies them after the initial dump has been completed. “Replicate data changes only” assumes a database dump has already completed on the target and uses log files on the source to make the target current.
Other task settings are shown below:
While the three options provided for target table preparation are to “do nothing”, “drop tables on target”, or “truncate”, it is recommended that a good schema already exist on the target which leaves “do nothing” as the ideal choice.
Three choices are available to handle LOB (Large Object) columns:
- Don’t include LOB columns – skips them entirely
- Full LOB mode – indiscriminately includes all LOB columns
- Limited LOB mode – Allows the specification of a maximum LOB size and will include everything smaller than the size given.
The Enable Logging option is not on by default as it invokes charges for CloudWatch utilization but it is highly recommended.
You can also define custom table mappings using JSON format as demonstrated below:
By default, tasks start on the creation, if disabled; however, a task can be started manually from the console. Once started, the progress can be viewed from the console.
Unless otherwise specified, eight tables will load concurrently. If working with a larger replication instance and the source database would not suffer from the increase in reads, more concurrency can be specified. Conversely, if the source database is on the other side of a slow network connection or heavy load prevents the replication instance from reading many tables concurrently, that number can also be decreased.
If the migration is stopped before completion and restarted, the task will pick up where it left, but any tables that were in the active process of migration when stopped, will start completely over.
Hints & Tips
Finally, here are some useful hints and tips for using DMS:
- Check unsupported data types in AWS documentation
- Create only tables and primary keys – no foreign keys
- Reduce contention on target during migration (disable logging, backups, multi-az) for faster transfer
- Replication instance relies on heavy CPU utilization
- Enable supplemental logging on source to enable ongoing migration
- MS SQL Server, bak files can’t be used by DMS yet
Hope you found this post useful, please share your feedback at firstname.lastname@example.org.