Special: Make threat forecasting work for you. FREE webcast explains how.
ITworld.com
  Search  
ITworld Home Page ITworld Webcasts ITworld White Papers ITworld Newsletters ITworld News ITworld Topic Map Changing the way you view IT
MD5 - Message Digest 5
UNIX IN THE ENTERPRISE --- 12/11/2003

Sandra Henry-Stocker

Several weeks ago, this column discussed the cksum (i.e., checksum) command and its usefulness in ensuring that two files on different systems are, in fact, the same file. In this column, we will look at two related commands - the sum command that calculates a simple checksum and block count for a file - and the md5 command that produces a highly reliable checksum and is used nearly universally to verify the integrity of free software offered for downloading.

Many people recognize MD5 for its role in assuring the integrity of files even if they have never used the md5 command. This is because numerous implementations of MD5 have been created to facilitate use of the MD5 algorithm in various products and contexts. A Perl programmer, for example, might find the Digest::MD5 Perl module invaluable. Digest::MD5 allows the MD5 Message Digest algorithm to be used from within Perl programs and can be downloaded from http://search.cpan.org/search?dist=Digest-MD5. NOTE: This module requires perl 5.004 or later.

Many other people will recognize MD5 from having seen MD5 files alongside software offered on archive sites. Whether or not you have made use of the MD5 checksums available on these sites, you have probably noticed them and know that their use is in verifying the integrity of the downloadable files.

How does the md5 command work?

In a similar manner to the cksum command, the md5 command is used to compute the same checksum on a file to verify that the file is intact, that it has not been changed since created or that it is the correct version of the target file. Regardless of where a file is stored and the peculiarities of the system on which it is stored, the md5 command will provide a reliable indication of the file's identity.

The md5 command is the implementation of an algorithm that takes as input a message of arbitrary length (generally a file) and creates from that message or file a 128-bit "fingerprint" or "digest" that represents in concise form the content of the file. Like cksum, the smallest possible change in the file will result in an unmistakable change in the resultant checksum. Analysts capable of properly reviewing the reliability of the algorithm claim that the chance of getting the same MD5 checksum from different files is "computationally infeasible". In layman's words, the likelihood of getting the same MD5 checksum from different files is so infinitesimally small as to not be worth considering.

To calculate the MD5 checksum on a file, you simply use the name of the file as an argument to the md5 command:

$ md5 gcc-3.3.2.tar.gz
60ab4d3431786a81be6522cc04bc1827 gcc-3.3.2.tar.gz

To make use of the md5 command, you will need to download and install the software. To my knowledge, it is not included in any Unix distribution. For Solaris systems, the software is available both in package form and in compressed tar format from http://www.sunfreeware.com.

While using MD5 is straightforward, one caution is in order. You must be sure to compute the checksum at the correct time in the process of downloading and installing software. Most of the time, the MD5 checksum provided on the download site will be the checksum for the software in its compressed archive format. If this isn't clear, the name given to the checksum file should indicate what is intended. For example, if the MD5 checksum file is called abc-1.2.3.tar.gz-md5, the md5 command should be run against the abc.1.2.3.tar.gz file. If you unzip the file first, the md5 command will not yield the expected result. In fact, if you subsequently zip the file up again and then run md5, you are still not likely to get the expected result. In almost every case, you should run md5 before you make any changes to the file you are downloading. The following sequence of commands illustrates the problem:

Let's say we start with two files, one an exact copy of the other.

boson> ls md*
md5-6142000-sol8-intel-local.gz md5-copy.gz

We compute the MD5 checksums on both to verify that they are identical.

boson> md5 md5*
MD5 (md5-6142000-sol8-intel-local.gz) = 28aeaf16b7d50e8b7dcb66f2bb95aecf MD5 (md5-copy.gz) = 28aeaf16b7d50e8b7dcb66f2bb95aecf

Then, we unzip the copy and then zip it up again.

boson> gunzip md5-copy.gz
boson> gzip md5-copy

When we run the md5 command again, we notice that the unzipped and re-zipped file no longer have the same MD5 checksum.

boson> md5 md5*
MD5 (md5-6142000-sol8-intel-local.gz) = 28aeaf16b7d50e8b7dcb66f2bb95aecf MD5 (md5-copy.gz) = 185272b9bd531c66058dc9695b296cf8

NOTE: This caution also applies to use of the cksum command.

Often, the md5 checksum for an archive will be stored in a separate file meant to be downloaded, displayed and compared against the computed checksum. Sometimes, multiple checksums will be stored in a single text file on a site, listing each product and the expected checksum in a form such as this excerpt from http://www.sunfreeware.com:

65999f654102f5438ac8562d13a6eced gcc-3.3.2.tar.bz2 60ab4d3431786a81be6522cc04bc1827 gcc-3.3.2.tar.gz

How do sum, cksum and md5 compare?

The cksum tool is extremely useful for verifying that two files are the same file - better than depending on file attributes such as file size and dates (both easily forged and often misleading). cksum is handy because most Unix operating systems will include the command.

The MD5 checksum can also be used to compare files across systems, but only if it has been installed on both systems. Both commands can be used to verify the integrity of files installed on a system.

The sum command, because of its simple algorithm and the short length of the generated checksum can easily generate the same checksum for two altogether different files. Because of this, it is generally seen as unreliable for detecting file changes. In addition, on Solaris systems, there are two sum commands - /usr/bin/sum and /usr/ucb/sum. These two sum commands will generate different checksums for the same file, each in a slightly different format as shown here:

# /usr/bin/sum performance.ppt
53512 864 performance.ppt # /usr/ucb/sum performance.ppt 45753 432

This can cause confusion because which sum command is used depends on the user's search path.

Should you use md5?

While you can download any downloadable file without bothering to verify it against the checksum (only a handful of installation scripts will verify a file against its checksum before allowing you to install the particular software), the extra time that it takes to verify the download is trivial once the md5 command has been installed.

Of course, it is possible that a compromised file along with its posted MD5 checksum file might be maliciously altered and no protection except, perhaps, checksums posted at other sights can guard against the likelihood of this. Even so, it is still good practice to verify downloaded files against the expected checksums and to get into the habit of verifying the integrity of important system files - whether manually using a tool such as md5 or in an automated fashion with a tool such as Tripwire.

The MD5 algorithm was developed by one of the founders of RSA Data Security (now RSA Security) and MIT professor, Ronald L. Rivest. To learn more about MD5, RSA and products built around this technology, visit http://www.rsasecurity.com/. For details on the MD5 algorithm, refer to RFC 1321 - The MD5 Message-Digest Algorithm.

 

Sandra Henry-Stocker has been administering Unix systems for nearly 18 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She currently works for TeleCommunication Systems, a wireless communications company, in Annapolis, Maryland, where no one else necessarily shares any of her opinions. She lives with her second family on a small farm on Maryland's Eastern Shore. Send comments and suggestions to mailto:sstocker@itworld.com.



ITworld.com Site Network
 www.itworld.com
 security.itworld.com
 smallbusiness.itworld.com
 storage.itworld.com
 utilitycomputing.itworld.com
 wireless.itworld.com
Advertisement
Sponsored links
HP Wireless Solutions for business. Proven technology. Superior service.
How do you maximize return on your IT investments? Learn more now.
Setting the pace of PC technology. HP Compaq Desktops, starting at $367.
By networking your storage, you can reduce costs, protect your information--and simplify management.
Tips to Optimize Your Revenue Assurance Investment
Free webcast: Stepping up your SMB Network Infrastructure
Find the Right Balance Between Useful Wireless Networks and Security
Latest News, Webcasts, White Papers and Newsletters on UTILITY COMPUTING
Experts estimate that more bioinformatic data will be created over the next three years than in the last 40-thousand years combined! Learn what to do about it.
 Home   Newsletters  UNIX IN THE ENTERPRISE
www.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com
 
About Us   Privacy Policy    Terms of Service   Webcast & Marketing Solutions
Copyright © 2003 Accela Communications, Inc. All rights reserved