You have a very big CSV file, and you want to split it into several files of n lines. Smaller files let you open them with CSV editors without reaching their limits.

If you are not afraid of using the terminal, macOS and Linux come with a simple utility to split files into parts: split. See full documentation on split man page.

split [-l line_count] [-b byte_count] [-n chunk_count] [file [prefix]]

The command split breaks up files using:

  • lines - useful for text-based files
  • bytes - for binary files
  • number - if you want to split a file into a specific number of parts.

CSV is a format to store structured data using text files. In a CSV file, each line is a data record. To split a CSV file, we'll use split with the lines parameter.

split -l 100000 filename.csv
Basic split CSV file
Basic split CSV file

Split generates files with basic names beginning with x: xaa, xab, xac, etc. See the next section to split your CSV files and rename the generated files.

Using split and renaming generated files

Split has two interesting parameters. The first is -d to use numeric suffixes (1, 2, 3, etc.) instead of letters. The second is the prefix parameter for the generated files. To define the prefix, just add a string after your CSV filename.

split -l 100000 -d filename.csv file_
split CSV file with prefix and numeric suffixes
split CSV file with prefix and numeric suffixes

Then, loop on all files starting with file_ and rename them to add .csv.

for i in $(find file_*); do mv $i "$i.csv"; done
Renaming generated files with .csv
Renaming generated files with .csv

Adding CSV header to generated files

You have several generated files like: file_00.csv, file_01.csv, etc.

But the CSV header line is on the first file (file_00.csv) and all other files don't have a CSV header line.

To copy the file_00.csv first line on all other files, run the following command:

for i in $(find . -type f -name "file_*.csv" -not -name "file_00.csv");
    do echo -e "$(head -1 file_00.csv)\n$(cat $i)" > $i;
done

What it does is simple:

Step1

Loop on all file that begin with file_ but are not file_00.csv:

find . -type f -name "file_*.csv" -not -name "file_00.csv"

Step2

For each file in the loop, copy the first line of file_00.csv to it. There is no easy way to prepend a line to a file using the command line. We just use echo with the first line from file_00.csv and the file content from the current file cat $i.

To Sum up the whole process

Step 1: Split the CSV file into parts of 100000 lines and prefix the generated files with "file_".

split -l 100000 -d filename.csv file_

Step 2: Add .csv to all generated files.

for i in $(find file_*); do mv $i "$i.csv"; done

Step 3: Copy header from first generated file at the beginning of the other files.

for i in $(find . -type f -name "file_*.csv" -not -name "file_00.csv");
    do echo -e "$(head -1 file_00.csv)\n$(cat $i)" > $i;
done