How to split a big CSV file using Linux or macOS Terminal?

Question

Florian Poullin · Accepted Answer

You have a very big CSV file, and you want to split it into several files of n lines. Smaller files let you open them with CSV editors without reaching their limits.

If you are not afraid of using the terminal, macOS and Linux come with a simple utility to split files into parts: split. See full documentation on split man page.

split [-l line_count] [-b byte_count] [-n chunk_count] [file [prefix]]

The command split breaks up files using:

lines - useful for text-based files
bytes - for binary files
number - if you want to split a file into a specific number of parts.

CSV is a format to store structured data using text files. In a CSV file, each line is a data record. To split a CSV file, we'll use split with the lines parameter.

split -l 100000 filename.csv

Basic split CSV file

Split generates files with basic names beginning with x: xaa, xab, xac, etc. See the next section to split your CSV files and rename the generated files.

Using split and renaming generated files

Split has two interesting parameters. The first is -d to use numeric suffixes (1, 2, 3, etc.) instead of letters. The second is the prefix parameter for the generated files. To define the prefix, just add a string after your CSV filename.

split -l 100000 -d filename.csv file_

split CSV file with prefix and numeric suffixes

Then, loop on all files starting with file_ and rename them to add .csv.

for i in $(find file_*); do mv $i "$i.csv"; done

Renaming generated files with .csv

Adding CSV header to generated files

You have several generated files like: file_00.csv, file_01.csv, etc.

But the CSV header line is on the first file (file_00.csv) and all other files don't have a CSV header line.

To copy the file_00.csv first line on all other files, run the following command:

for i in $(find . -type f -name "file_*.csv" -not -name "file_00.csv");
    do echo -e "$(head -1 file_00.csv)
$(cat $i)" > $i;
done

What it does is simple:

Step1

Loop on all file that begin with file_ but are not file_00.csv:

find . -type f -name "file_*.csv" -not -name "file_00.csv"

Step2

For each file in the loop, copy the first line of file_00.csv to it. There is no easy way to prepend a line to a file using the command line. We just use echo with the first line from file_00.csv and the file content from the current file cat $i.

To Sum up the whole process

Step 1: Split the CSV file into parts of 100000 lines and prefix the generated files with "file_".

split -l 100000 -d filename.csv file_

Step 2: Add .csv to all generated files.

for i in $(find file_*); do mv $i "$i.csv"; done

Step 3: Copy header from first generated file at the beginning of the other files.

for i in $(find . -type f -name "file_*.csv" -not -name "file_00.csv");
    do echo -e "$(head -1 file_00.csv)
$(cat $i)" > $i;
done

How to split a big CSV file using Linux or macOS Terminal?

Using split and renaming generated files

Adding CSV header to generated files

To Sum up the whole process

Join the future of list management