Skip to content

merge

Merge sort multiple sorted CSV files into 1+ files

Sorted CSV files as those created using the csv function are sorted on a file by file basis, but not across the files.

dbworkload can assist you with re-writing all rows into new CSV files where the rows are sorted across all files.

This is known as a merge-sort operation.

Example

Create 2 files such as these. Notice the rows are sorted alphabetically on a file by file basis, but not across all files.

$ cat bank/t.0.csv 
aaa
ddd
fff
kkk
rrr
zzz

$ cat bank/t.1.csv 
bbb
ggg
lll
ooo
ttt
yyy

Run the merge command

dbworkload util merge -i bank/

You will notice a new directory has been created. Inspect the file

$ cat bank.merged/out_000.csv 
aaa
bbb
ddd
fff
ggg
kkk
lll
ooo
rrr
ttt
yyy
zzz

Rows are now correctly sorted. Use option --csv-max-rows to split the output in many, smaller files.

See also