Skip to content

csv

Generating CSV files

You can seed a database quickly by letting dbworkload generate pseudo-random data and import it.

dbworkload takes the DDL as an input and creates an intermediate YAML file, with the definition of what data you want to create (a string, a number, a date, a bool..) based on the column data type.

You then refine the YAML file to suit your needs, for example, the size of the string, a range for a date, the precision for a decimal, a choice among a discrete list of values..

You can also specify what is the percentage of NULL for any column, or how many elements in an ARRAY type. You then specify the total row count, how many rows per file, and in what order, if any, to sort by.

Then dbworkload will generate the data into CSV or TSV files, compress them if so requested.

You can then optionally merge-sort the files.

Example

Here is a sample input YAML file, bank.yaml.

ref_data:
- count: 1000
  sort-by: 
    - acc_no
  columns:
    acc_no:
      type: sequence
    external_ref_id:
      type: uuid
    created_time:
      type: timestamp
    acc_details:
      type: string

Now let's create a CSV dataset, using only 1 processor.

dbworkload util csv -i bank.yaml -x 1

The CSV files will be located inside a bank directory.

Inspect it

$ head -n5 bank/ref_data.0_0_0.tsv 
0       3a2edc9d-a96b-4541-99ae-0098527545f7    2008-03-19 06:20:27.209214      CWUh0FWashpmWCx4LF3kb1
1       829de6d6-103c-4707-9668-c4359ef5373c    2014-02-13 22:04:20.168239      QGspICZBHYpRLnHNcg
2       5dd183af-d728-4e12-8b11-2900b6f6880a    2019-04-01 16:14:40.388236      sEUukccOePdnIbiQyVUSi0HS7rL
3       21f00778-5fca-4302-8380-56fa461adfc8    2003-05-21 19:21:21.598455      OQTNwxoZIAdNmcA6fJM5eGDvMJgKJ
4       035dac61-b4a3-40a4-9e4d-0deb50fef3ae    2011-08-15 06:15:40.405698      RvToVnn20BEXoxFzw9QFpCt

See also