Seed the database tables

Create the tables

Create the tables in a database called bank.

PostgreSQL

postgres=# CREATE DATABASE bank;

Now, you need to disconnect and reconnect to bank

psql 'postgres://fabio:postgres@localhost:5432/bank?sslmode=disable'

Once on the SQL prompt, import the file, or copy-paste, as you prefer.

bank=# \i bank.ddl

bank=# \d
--          List of relations
--  Schema |   Name   | Type  | Owner 
-- --------+----------+-------+-------
--  public | orders   | table | fabio
--  public | ref_data | table | fabio
-- (2 rows)

CockroachDB

defaultdb> CREATE DATABASE bank;

defaultdb> USE bank;

bank> \i bank.ddl

bank> SHOW TABLES;
--   schema_name | table_name | type  |   owner   | estimated_row_count | locality
-- --------------+------------+-------+-----------+---------------------+-----------
--   public      | orders     | table | cockroach |                   0 | NULL
--   public      | ref_data   | table | cockroach |                   0 | NULL
-- (2 rows)

Generate datasets

Next, generate some CSV data to seed the database. dbworkload has 2 built-in utility functions that can assist with this task:

dbworkload util yaml, which converts a DDL file into a data generation definition file, structured in YAML.
dbworkload util csv, which takes the YAML file as input and outputs CSV files.

Let's use the yaml utility with our bank.ddl file.

dbworkload util yaml -i bank.ddl

For this exercise, we will use below simplified YAML file. Replace the content of bank.yaml with below YAML

ref_data:
- count: 1000
  sort-by: 
    - acc_no
  columns:
    acc_no:
      type: sequence
    external_ref_id:
      type: uuid
    created_time:
      type: timestamp
    acc_details:
      type: string

Now let's create a CSV dataset

dbworkload util csv -i bank.yaml -x 1

The CSV files will be located inside a bank directory.

$ ls -lh1 bank
ref_data.0_0_0.tsv

$ wc -l bank/*
    1000 bank/ref_data.0_0_0.tsv

Inspect it

$ head -n5 bank/ref_data.0_0_0.tsv 
0       3a2edc9d-a96b-4541-99ae-0098527545f7    2008-03-19 06:20:27.209214      CWUh0FWashpmWCx4LF3kb1
1       829de6d6-103c-4707-9668-c4359ef5373c    2014-02-13 22:04:20.168239      QGspICZBHYpRLnHNcg
2       5dd183af-d728-4e12-8b11-2900b6f6880a    2019-04-01 16:14:40.388236      sEUukccOePdnIbiQyVUSi0HS7rL
3       21f00778-5fca-4302-8380-56fa461adfc8    2003-05-21 19:21:21.598455      OQTNwxoZIAdNmcA6fJM5eGDvMJgKJ
4       035dac61-b4a3-40a4-9e4d-0deb50fef3ae    2011-08-15 06:15:40.405698      RvToVnn20BEXoxFzw9QFpCt

Importing datasets

Now we are ready to import the CSV file into our table ref_data.

PostgreSQL

For PostgreSQL Server, at the SQL prompt, just use COPY

bank=# COPY ref_data FROM '/path/to/workloads/bank/ref_data.0_0_0.tsv' WITH CSV DELIMITER AS e'\t';
COPY 1000
Time: 8.713 ms

CockroachDB

For CockroachDB, my favorite method is to use a webserver for serving the CSV files.

Open a new terminal then start a simple python server

cd workloads/bank
python3 -m http.server 3000

If you open your browser at http://localhost:3000, you should see file ref_data.0_0_0.tsv being served.

At the SQL prompt, import the file

bank> IMPORT INTO ref_data CSV DATA ('http://localhost:3000/ref_data.0_0_0.tsv') WITH delimiter = e'\t', nullif = '';
        job_id        |  status   | fraction_completed | rows | index_entries | bytes
----------------------+-----------+--------------------+------+---------------+--------
  1013454367369822209 | succeeded |                  1 | 1000 |             0 | 71401
(1 row)

Review imported data

Good stuff, your dataset has been successfully imported, confirm on the SQL prompt

bank=# SELECT * FROM ref_data LIMIT 5;
 acc_no |           external_ref_id            |         created_time          |       acc_details        
--------+--------------------------------------+-------------------------------+--------------------------
      0 | a4a689d3-ae12-4ca2-8b74-a9929c89f420 | 2015-06-03 04:03:48.248701-04 | DiKysAoJeZ9hqDehOhE2N
      1 | 01abe15c-fcae-4492-812e-37b2212ababa | 2019-06-11 17:57:05.76814-04  | nEYyCwhTJTXXaMfeHeWj
      2 | 40cc3ccd-74bf-4688-ae85-7f68cd0e44f2 | 2001-10-10 18:42:21.865742-04 | DFhQ0aUbjqJ9CsczdyWv
      3 | 9491509e-0424-4cdf-879f-790cff2d7289 | 2003-08-13 20:26:49.484942-04 | ggpx35EPH698N2MGlUPV13P0
      4 | a54b9e47-10c7-4dce-9db8-12c7eaf80e17 | 2022-06-02 22:56:20.445508-04 | JCOhKbB3YMrOOBm
(5 rows)

bank=# SELECT count(*) FROM ref_data;
 count 
-------
  1000
(1 row)

We're then ready to proceed to the next section.