March 30, 2017 | S. P. T. Krishnan
Part 1 Recap
In part one of this blog series, we started installing a package (including a specific version) that is not available in the default Amazon Linux repositories. We took the example of installing the R programming language, version 3.3.2. We located this specific package in an optional repository, but some required prerequisites were not available. In this concluding blog post, we continue this journey to install the prerequisites and realize the R332 package.
Installing R 3.3.2 (continued)
In , amazon lists the packages that went into the Amazon Linux distribution release 2016-09. We confirmed that none of the above required packages were included in the distribution. Further, we also confirmed that the prerequisite packages were not present in the EPEL repository . This step is required to ensure that the packages are not visible due to system configuration such as “priority”. The discussion thread in  suggested the possibility of adding the base CentOS repository to get the packages. We followed this route.
We now initiate an Amazon EC2 instance with CentOS 6.5 and observe the repositories that have been enabled by default. The bash commands and the corresponding results in following code block (retracted) shows this.
We inferred a couple of things from the above console output.
- gpgcheck is ON. This means we will need to create a gpgkey file and copy the public key. We can copy the public key from this instance or from the Internet.
- We need to convert the variables used in the URLs to be static numbers since they no longer will apply to Amazon Linux.
- We also confirm that the required packages (tk, tk-devel and xdg-utils) are present in the CentOS repository using a mirror site .
- In order to avoid any potential package conflicts between the Amazon base and CentOS base repositories, we explicitly only include the required packages from the CentOS repository using the tip provided in .
- Finally, we also increase the priority to match that of Amazon base, EPEL repositories.
- Using appropriate substitution, we derive the following block that will be added to a brand new .repo file.
Now, we will switch back to our Amazon Linux instance. We create a new file called “centos.repo” in the “/etc/yum.repos.d” and copy the above contents. We also make a new file “/etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6” and copy the public key from the CentOS instance to this instance. Following this we will rerun the command “sudo yum -y install R“. Finally, as a test, we will now install the R ML package “caret” and verify that it installs successfully. The reason we are using caret is due to the fact that it requires R 3.3.2.
In the above command, we have pre-selected an R repo that automatically redirects you to the nearest mirror. The url  shows all the mirrors that is available and you are free to choose a static one that is near to you. The url  shows the syntax for this “install.packages” command when selecting a mirror.
As with cent-os-6 repo configuration, we will also tighten the packages exposed from epel.repo by adding the required filters – “includepkgs=R*,libRmath*” to the epel.repo to avoid any package conflicts between epel, centos and amazon linux repos.
The result from the previous sections shows that we are able to install R332 on Amazon EMR and Amazon EC2 when we are using Amazon Linux. In this section, we will summarize all the steps that is required to get R332 up and running along with the various end states of the various configuration files. We will then generalize the steps so you can adapt it to install other packages of your interest.
We now provide you a generic checklist that we believe will be useful to install a package, and its prerequisites, on an Amazon Linux instance. Every subsequent step assumes that the answer to the previous step is a “no”.
- Is the package installed by default ? Use “which <packagename>” command to find out.
- Is the package available from default repositories ? Use “yum –showduplicates list <packagename>” to find out.
- Is the package available from one or more disabled repositories ? Inspect the repo configuration files in “/etc/yum.repos.d” to find out.
- Enable one repo at a time by changing the configuration variable “enabled=1”. Ensure that the priority level is the same as that of the default repositories.
- Rerun step 2 to check if this exposes the package you are interested in.
- If the package is still not available, enable the the CentOS repository by adding the .repo file. Ensure the CentOS version is compatible with the Amazon Linux version.
- CentOS.repo by default does a GPG-based key check. Either disable this or add the public key listed above.
- You can also download the public key from the official website  if you don’t trust us 😉
- At this stage you should be able to install any package from Amazon Linux, EPEL and CentOS distributions. If you still can’t find your package, another popular RPM distribution that you can try is the Fedora Linux. YMMV.
In summary, we hope that this post has been helpful in providing guidance on how you can install RPM based packages that are not carried by the Amazon Linux distribution. If you are interested in having our team help you with your projects, please contact us at email@example.com. Also, if you like these kind of problems and want to join our team please contact us at firstname.lastname@example.org or visit us at http://www.reancloud.com/company/careers/.
 Amazon Linux – https://aws.amazon.com/amazon-linux-ami/
 Red Hat Enterprise Linux – https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux
 Fedora version history – https://en.wikipedia.org/wiki/Fedora_version_history
 EPEL – https://fedoraproject.org/wiki/EPEL
 EPEL packages list – https://dl.fedoraproject.org/pub/epel/6/x86_64/
 missing packages error – https://forums.aws.amazon.com/thread.jspa?messageID=262860