Americas

  • United States
sandra_henrystocker
Unix Dweeb

Breaking Linux files into pieces with the split command

How-To
Oct 05, 20233 mins
Linux

Some simple Linux commands allow you to break up files and reassemble them as needed in order to accommodate size restrictions on file size for storage or email attachments

Linux systems provide a very easy-to-use command for breaking files into pieces. This is something that you might need to do prior to uploading your files to some storage site that limits file sizes or emailing them as attachments. To split a file into pieces, you simply use the split command.

$ split bigfile

By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.

Unless you ask, the command runs without giving you any feedback. You can, however, use the –verbose option if you would like to see the file chunks as they are being created.

$ split –-verbose bigfile
creating file 'xaa'
creating file 'xab'
creating file 'xac'

You can also contribute to the file naming by providing a prefix. For example, to name all the pieces of your original file bigfile.xaa, bigfile.xab and so on, you would add your prefix to the end of your split command like so:

$ split –-verbose bigfile bigfile.
creating file 'bigfile.aa'
creating file 'bigfile.ab'
creating file 'bigfile.ac'

Note that a dot is added to the end of the prefix shown in the above command. Otherwise, the files would have names like bigfilexaa rather than bigfile.xaa.

Note that the split command does not remove your original file, just creates the chunks. If you want to specify the size of the file chunks, you can add that to your command using the -b option. For example:

$ split -b100M bigfile

Specifying Linux file sizes

File sizes can be specified in kilobytes, megabytes, gigabytes … up to yottabytes! Just use the appropriate letter from K, M, G, T, P, E, Z and Y.

If you want your file to be split based on the number of lines in each chunk rather than the number of bytes, you can use the -l (lines) option. In this example, each file will have 1,000 lines except, of course, for the last one which may have fewer lines.

$ split --verbose -l1000 logfile log.
creating file 'log.aa'
creating file 'log.ab'
creating file 'log.ac'
creating file 'log.ad'
creating file 'log.ae'
creating file 'log.af'
creating file 'log.ag'
creating file 'log.ah'
creating file 'log.ai'
creating file 'log.aj'

Reassembling Linux files using a cat command

If you need to reassemble your file from pieces on a remote site, you can do that fairly easily using a cat command like one of these:

$ cat x?? > original.file
$ cat log.?? > original.file

Splitting and reassembling with the commands shown above should work for binary files as well as text files. In this example, we’ve split the zip binary into 50 kilobyte chunks, used cat to reassemble them and then compared the assembled and original files. The diff command verifies that the files are the same.

$ split --verbose -b50K zip zip.
creating file 'zip.aa'
creating file 'zip.ab'
creating file 'zip.ac'
creating file 'zip.ad'
creating file 'zip.ae'
$ cat zip.a? > zip.new
$ diff zip zip.new
$                    

The only caution I have to give at this point is that, if you use split often and use the default naming, you will likely end up overwriting some chunks with others and maybe sometimes having more chunks than you were expecting because some were left over from some earlier split.

sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.