AWS Database Migration Service

July 20, 2016 |

Data Migration Service Overview

DMS is used to migrate a source database to a target database, supporting both homogeneous and heterogeneous migrations i.e. like-to-like or like-to-unlike migrations.

In order to replicate data from the source to target, DMS uses a special replication instance in AWS to read from source and put on the target. Nothing is written to the source database to facilitate the transfer.  No agent needs to be installed and the database remains as it was before migration.

For all of this to work, tasks need to be defined in DMS as:

  • the endpoints,
  • replication instance, and
  • type of migration.

In order to use DMS for a production database, the schema should be migrated and ready to go before starting the migration.  DMS can perform very basic schema transfer and conversion. For homogenous migrations, proprietary tools for the particular database should be used whereas heterogeneous migrations use a schema conversion tool, for instance AWS SCT prior to the DMS migration.

DMS Components

To create a migration, the following objects must be defined at the beginning:

  • Source Endpoint – existing database from which tables are to be migrated
  • Destination Endpoint – target database to which tables are to be migrated
  • Replication Server – Special type of instance used exclusively for retrieving data from the source database, performing nominal transformation if desired, and putting data on the target database
  • Tasks – Define which replication instances and endpoints to use, as well as the migration type

Creating a Replication Instance

When creating a replication instance, there are only a few items that need to be defined, as shown below, such as name, description, instance class, VPC, and whether or not it will be publicly accessible.

DMS post_pic 1

Currently, T2 and C4 types can be used for instance class.  Consider; though, when choosing an instance class, especially for a production migration, there is real benefit to selecting a class with more compute resources available to it; The bigger the instance, the faster the migration will complete.

DMS post_pic 2Connectivity is the key consideration when selecting a VPC.  Instances in the VPC selected here must be able to connect to the endpoints. If there is an on-premise database, there should be a direct-connect, VPN or some other means to connect the equipment. Or if both endpoints are in AWS, but in separate VPCs, then, VPC peering, or a VPN is required.

The VPC also needs a security group with Egress rules allowing traffic to both endpoints. There is a provision for the default security groups for the VPC containing the replication instance, but that can be changed if the replication instance is supplied with a public IP by referring to the EIP in the EC2 console and altering the security group reference there.

Some additional configuration can be made when creating a replication instance, as shown below:

DMS post_pic 3It is possible to increase the amount of storage attached to the replication instance.  While it may seem rational to match the storage with that used by the database being migrated, the replication instance only uses local storage for cached data. So unless there are a great number of transactions on the replication source, there is a slow connection to the target or some other cause resulting in the need for a large cache, the default size is generally fine.

Creating Endpoints

At least two endpoints are needed, a source and destination, with at least one of them existing in AWS.  The configuration elements for an endpoint are shown here:

DMS post_pic 4First, we need to define whether the endpoint will be a source or destination and also define the identifier that will be used for reference by the migration task.  Next, the database engine and specific connection information must be supplied.  Extra connection attributes can also be provided.

DMS post_pic 5With all endpoint details in order, it is good practice to test endpoint connectivity and there is a handy place in the creation screen to do just that, as shown above.  Simply select the VPC and replication instance from which to test and hit the “Run Test” button which will provide pass or fail information.  In order to successfully complete the test, confirm the configurations on any network firewalls, NACLs, security groups, etc. as they are required to allow the replication instances to communicate with the endpoints on the DB listener ports.

Creating a Task

With a replication instance, source endpoint, and destination endpoint each defined, a task can be created.  The configuration page for which is shown below:

DMS post_pic 6While creating the task, the replication instance and endpoints are referenced as well as the migration type.  There are three types of migrations:

  • Migrate existing data – full load
  • Migrate existing database and replicate ongoing changes
  • Replicate data changes only

Migrating existing data discards any transactions that might occur during the migration while, “Migrate existing database and replicate ongoing changes” caches the mid-migration transactions and applies them after the initial dump has been completed.  “Replicate data changes only” assumes a database dump has already completed on the target and uses log files on the source to make the target current.

Other task settings are shown below:

DMS post_pic 7While the three options provided for target table preparation are to “do nothing”, “drop tables on target”, or “truncate”, it is recommended that a good schema already exist on the target which leaves “do nothing” as the ideal choice.

Three choices are available to handle LOB (Large Object) columns:

  • Don’t include LOB columns – skips them entirely
  • Full LOB mode – indiscriminately includes all LOB columns
  • Limited LOB mode – Allows the specification of a maximum LOB size and will include everything smaller than the size given.

The Enable Logging option is not on by default as it invokes charges for CloudWatch utilization but it is highly recommended.

You can also define custom table mappings using JSON format as demonstrated below:

DMS post_pic 8Time to Migrate

By default, tasks start on the creation, if disabled; however, a task can be started manually from the console.  Once started, the progress can be viewed from the console.

DMS post_pic 9Unless otherwise specified, eight tables will load concurrently. If working with a larger replication instance and the source database would not suffer from the increase in reads, more concurrency can be specified.  Conversely, if the source database is on the other side of a slow network connection or heavy load prevents the replication instance from reading many tables concurrently, that number can also be decreased.

If the migration is stopped before completion and restarted, the task will pick up where it left, but any tables that were in the active process of migration when stopped, will start completely over.

Hints & Tips

Finally, here are some useful hints and tips for using DMS:

  • Check unsupported data types in AWS documentation
  • Create only tables and primary keys – no foreign keys
  • Reduce contention on target during migration (disable logging, backups, multi-az) for faster transfer
  • Replication instance relies on heavy CPU utilization
  • Enable supplemental logging on source to enable ongoing migration
  • MS SQL Server, bak files can’t be used by DMS yet

Hope you found this post useful, please share your feedback at info@reancloud.com.

Other Blog Posts

Blog

Top 5 Reasons to Utilize Cloud Computing in Financial Services
Read More
Blog

Is Migrating to the Cloud Safe for Financial Sector Companies?
Read More
Blog

REAN Cloud is one of the few AWS Premier Partners to achieve both AWS DevOps Competency and MSP Designation
Read More
Blog

7 Ways DevOps Can Save Your Company…Time and Money
Read More
Request Consultation
close slider