How to do line-by-line comparison of files in Linux using diff command - Part II

In the first part of this diff command tutorial series, we discussed the basics of the command, including how it works and how the output it produces can be comprehended. While there's definitely a bit of learning curve involved with this command line utility, it's worth learning especially if your daily work involves performing file-related tasks on a CLI-only Linux machines.

Assuming that you already know the basic usage of the diff command, in this tutorial, we'll discuss the various command line options the tool provides, through some easy to understand examples.

But before we proceed, keep in mind that all the examples in this tutorial have been tested on Ubuntu 14.04 with Bash version 4.3.11(1) and diff version 3.3.

Diff command options

1. Report when the files are identical

By default, when diff command detects that the files being compared are identical, it does not produce any output.

$ diff file1 file2
$

But, there exists a command line option (-s) using which you can force the command to report this in the output:

$ diff -s file1 file2
Files file1 and file2 are identical

2. Copied context and Unified context

These are basically two different formats in which the diff command can produce its output. Copied context is enabled using the -c command line option, while Unified context is enabled using -u. Following is an example of the former:

$ diff -c file1 file2
*** file1 2016-12-29 09:36:47.175597647 +0530
--- file2 2016-12-29 09:19:55.799558326 +0530
***************
*** 1,3 ****
Hi
! Helllo
Bye
--- 1,3 ----
Hi
! Hello
Bye

So, in the Copied context output format, the differing lines are indicated by an exclamation mark (!).

And here's the example of the Unified context format:

$ diff -u file1 file2
--- file1 2016-12-29 09:36:47.175597647 +0530
+++ file2 2016-12-29 09:19:55.799558326 +0530
@@ -1,3 +1,3 @@
Hi
-Helllo
+Hello
Bye

In this output format, + and - symbols before lines denote versions of the differing line : '-' when line in file1 is missing from file2, '+' when line in file2 was added to file1.

3. Output an 'ed' script

The diff command is also capable of producing commands that the 'ed' editor can use to convert the original file (file1 in our examples here) into the new file (file2). Here's how you do this:

Suppose file1 and file2 contain the following change:

$ diff file1 file2
2c2
< Helllo
---
> Hello

Now, use the -e command line option to produce the output the 'ed' editor understands, and redirect that output into a file:

diff -e file1 file2 > out

Here's what out contains in this case:

2c
Hello
.

What you need to next is to add the command 'w' at the end of the out file.

2c
Hello
.
w

Now, run the following command:

ed - file1 < out

And you'll see that file1 and file2 are now identical.

$ diff file1 file2
$

For more information on this functionality, head here.

4. Produce output in two columns

Normally, the diff command produces output in the following way:

$ diff file1 file2
2c2
< Helllo
---
> Hello

But there exists a command line option (-y) that directs diff to produce output in two separate columns. Here's an example:

$ diff -y file1 file2
Hi                               Hi
Helllo                         | Hello
Bye                              Bye

As you can see, this output format uses a '|' to indicate lines that the different.

5. Hide common lines

If you observe the output shown in the previous section (point 4 above), you'll notice that with -y command line option, diff - in the output - produces common lines as well. In case you need to suppress these identical lines, you can use the --suppress-common-lines option.

himanshu@himanshu-desktop:~$ diff -y --suppress-common-lines file1 file2
Helllo                                   | Hello

6. Show C function each change is in

For cases where you use diff to compare two C language files, there's a command line option (-p) that directs the utility to show precisely which C function each change is in. For example, suppose these are the two C files:

file1.c:

#include<stdio.h>

void compare(float x, float y)
{
 if(x == y) // incorrect way
 {
 printf("\n EQUAL \n");
 }
}


int main(void)
{
 compare(1.234, 1.56789);

 return 0;
}

file2.c:

#include<stdio.h>

void compare(float x, float y)
{
 if(x == y)
 {
 printf("\n EQUAL \n");
 }
}


int main(void)
{
 compare(1.234, 1.56789);

 return 0;
}

Here's the output when both files are compared normally:

$ diff file1.c file2.c 
5c5
< if(x == y) // incorrect way
---
> if(x == y)

And here's the output, when the files are compared using the -p option:

$ diff -p file1.c file2.c 
*** file1.c 2016-12-29 11:45:36.587010816 +0530
--- file2.c 2016-12-29 11:46:39.823013274 +0530
***************
*** 2,8 ****
 
void compare(float x, float y)
{
! if(x == y) // incorrect way
{
printf("\n EQUAL \n");
}
--- 2,8 ----
 
void compare(float x, float y)
{
! if(x == y)
{
printf("\n EQUAL \n");
}

So as you can see, with -p, diff gives you a more detailed look at where the change is, indicating the differing lines using an exclamation mark (!).

7. Recursively compare subdirectories

The diff command also lets you recursively compare subdirectories, but that's not its default behavior. What I mean to say is, if you take the following case:

$ diff diff-files/ second-diff-files/
diff diff-files/file1 second-diff-files/file1
1c1
< Hi
---
> i
diff diff-files/file2 second-diff-files/file2
2c2
< Hello
---
> ello

The diff command only compared files in the top level directories, but if you use the command line option -r (which is for recursive diff), you'll see that the even the files present in subdirectories are compared:

$ diff -r diff-files/ second-diff-files/
diff -r diff-files/file1 second-diff-files/file1
1c1
< Hi
---
> i
diff -r diff-files/file2 second-diff-files/file2
2c2
< Hello
---
> ello
diff -r diff-files/more-diff-files/file1 second-diff-files/more-diff-files/file1
1c1
< Hi
---
> i
diff -r diff-files/more-diff-files/file2 second-diff-files/more-diff-files/file2
2c2
< Hello
---
> ello

8. Treat absent files as empty

The diff command also provides an option using which you can direct the tool to treat absent files as empty. For example, if you compare file1 with file3 (that doesn't exist), the default behaviour of diff is to produce an error:

$ diff file1 file3
diff: file3: No such file or directory

This is not wrong per se; in fact this makes perfect sense. But there might be cases where-in you'd not want the diff command to throw error for such situations (while being part of a bash script, may be?), then for those scenarios, you can use the -N command line option that forces the command to treat absent files as empty, and continue with the comparison.

$ diff -N file1 file3
1,5d0
< Hi
< 
< Helllo
< 
< Bye

Conclusion

If you go through both parts of this tutorial series properly, and practice all the examples that the articles contain, then it won't be stretch to say that you'll end up having a good command on the tool. Of course, we couldn't discuss everything related to diff in this series, but rest assured that many of the important features/functionalities have been covered.

For those who'd like to know more about the utility, the man page is always there for you. And not to mention that you should keep using the tool frequently with different set of files so as to simulate different use cases.

Log in or Sign up