I picked up part of this tip from the GNU World Order podcast 6×02, in which Klaatu describes a rather unusual situation of having to move a larger than 4.7GB file on DVDs without compression between Linux PCs.
Thinking about it, this situation occurs more often than I’d like to think. You see, I always have a couple of 4GB memory sticks with me, but sometimes I want to transfer files that individually are greater than 4GB. Or they are exactly 4GB when the usable space on my thumbdrive is 3.8GB for instance. Or I want to transfer files from a filesystem that is journalled (e.g. NTFS or ext3) to one using the FAT32 file system. FAT32 has a filesize limit of 4GB minus one byte. Dragging a 4GB disk on there will fail.
Notice: Even though the memory stick you buy are usually NTFS formatted, the FAT32 and other non-journalled filesystems, are a lot safer on USB memory sticks and external USB disk drives ’cause journalled filesystems wears out the flash drive quicker. However, this means that if you have a FAT32 formatted 16GB memory stick, you still can’t put individual files on it that are larger than 4GB minus one byte.
So what to do? It’s obvious! You disassemble and re-assemble!
In Windows: WinRAR
I often do this with backups that can span across several DVDs, and instead of manually trying to squeeze as much data in as few disks as possible, I split up the data in DVD-sized chunks and burn them instead.
Here’s how you do it using WinRAR, which comes in both 32-bit and 64-bit versions. Right-click the file(s) or folder you want to split and select Add to archive.. Put in a sensible name and the location for the output file(s). Leave Archive format as RAR and under compression method select STORE. This means NO COMPRESSION, less time spent compressing/decompressing, and less chance of data corruption.
Underneath that box, there’s a "Split to volumes, bytes" drop-down. Select "DVD+R: 4481mb" for DVD sized archives. For other size, enter them in as bytes. This will create filename.part01.rar, filename.part02.rar, filename.part03.rar and so on, that can now be fitted on the media of your choosing. When you’ve transferred them to the machine you want to use the files in, run filename.part01.rar and WinRAR will handle the rest.
In GNU/Linux: split and cat
So how do you do this without buying Windows 7, slapping it on your Ubuntu box with a trial edition of Winrar, loosing all your geek cred all at once? You use the GNU core utilities split and cat in your favourite bash terminal!
From the GNU coreutils manual; split creates output files containing consecutive or interleaved sections of input. Synopsis:
split [option] [input [prefix]]
Using these abbreviations (all options here):
‘b’ => 512 ("blocks")
‘KB’ => 1000 (KiloBytes)
‘K’ => 1024 (KibiBytes)
‘MB’ => 1000*1000 (MegaBytes)
‘M’ => 1024*1024 (MebiBytes)
‘GB’ => 1000*1000*1000 (GigaBytes)
‘G’ => 1024*1024*1024 (GibiBytes)
So I have bigfile.iso of 8GB I want to split into DVD-sized chunks of 4GB. I run:
$ split --bytes=4GB bigfile.iso smallfile
Which produces smallfileaa and smallfileab that are 4GB each. Or to accommodate the 4GB minus one byte file size limitation of the FAT file system, here’s one for 4GB minus one and nine more bytes (to play it safe):
$ split --bytes=3999999990 bigfile.avi fat-friendly.avi
They are appended a cat-friendly postfix ‘aa’, ‘ab’ etc. automatically. You can use –verbose with split if you’re doing many or very large files. When I want to put them back together again (concatenate), I simply use GNU cat to pipe all of the files in a dir with the smallfile prefix into a large file again.
$ cat smallfile*>bigfile-copy.iso
If you compare the two files bigfile.iso and bigfile-copy.iso using md5sum, you will find that they are both exactly the same. Incidentally, having installed the GNU coreutils for win32 on the Windows XP box I’m typing from, I can confirm that it works in XP as well. Now, that’s impressive! :)
EDIT 8th of March 2011: Check this out!
Concatenating files then comparing md5 sums using coreutils for win32: