Skip to content

Seed the database tables

Create the tables

Create the tables in a database called bank.

PostgreSQL

postgres=# CREATE DATABASE bank;

Now, you need to disconnect and reconnect to bank

psql 'postgres://fabio:postgres@localhost:5432/bank?sslmode=disable'

Once on the SQL prompt, import the file, or copy-paste, as you prefer.

bank=# \i bank.ddl

bank=# \d
--          List of relations
--  Schema |   Name   | Type  | Owner 
-- --------+----------+-------+-------
--  public | orders   | table | fabio
--  public | ref_data | table | fabio
-- (2 rows)

CockroachDB

defaultdb> CREATE DATABASE bank;

defaultdb> USE bank;

bank> \i bank.ddl

bank> SHOW TABLES;
--   schema_name | table_name | type  |   owner   | estimated_row_count | locality
-- --------------+------------+-------+-----------+---------------------+-----------
--   public      | orders     | table | cockroach |                   0 | NULL
--   public      | ref_data   | table | cockroach |                   0 | NULL
-- (2 rows)

Generate datasets

Next, generate some CSV data to seed the database. dbworkload has 2 built-in utility functions that can assist with this task:

Let's use the yaml utility with our bank.ddl file.

dbworkload util yaml -i bank.ddl

For this exercise, we will use below simplified YAML file. Replace the content of bank.yaml with below YAML

ref_data:
- count: 1000
  sort-by: 
    - acc_no
  columns:
    acc_no:
      type: sequence
    external_ref_id:
      type: uuid
    created_time:
      type: timestamp
    acc_details:
      type: string

Now let's create a CSV dataset

dbworkload util csv -i bank.yaml -x 1

The CSV files will be located inside a bank directory.

$ ls -lh1 bank
ref_data.0_0_0.tsv

$ wc -l bank/*
    1000 bank/ref_data.0_0_0.tsv

Inspect it

$ head -n5 bank/ref_data.0_0_0.tsv 
0       3a2edc9d-a96b-4541-99ae-0098527545f7    2008-03-19 06:20:27.209214      CWUh0FWashpmWCx4LF3kb1
1       829de6d6-103c-4707-9668-c4359ef5373c    2014-02-13 22:04:20.168239      QGspICZBHYpRLnHNcg
2       5dd183af-d728-4e12-8b11-2900b6f6880a    2019-04-01 16:14:40.388236      sEUukccOePdnIbiQyVUSi0HS7rL
3       21f00778-5fca-4302-8380-56fa461adfc8    2003-05-21 19:21:21.598455      OQTNwxoZIAdNmcA6fJM5eGDvMJgKJ
4       035dac61-b4a3-40a4-9e4d-0deb50fef3ae    2011-08-15 06:15:40.405698      RvToVnn20BEXoxFzw9QFpCt

Importing datasets

Now we are ready to import the CSV file into our table ref_data.

PostgreSQL

For PostgreSQL Server, at the SQL prompt, just use COPY

bank=# COPY ref_data FROM '/path/to/workloads/bank/ref_data.0_0_0.csv' WITH CSV DELIMITER AS e'\t';
COPY 1000
Time: 8.713 ms

CockroachDB

For CockroachDB, my favorite method is to use a webserver for serving the CSV files.

Open a new terminal then start a simple python server

cd workloads/bank
python3 -m http.server 3000

If you open your browser at http://localhost:3000, you should see file ref_data.0_0_0.tsv being served.

At the SQL prompt, import the file

bank> IMPORT INTO ref_data CSV DATA ('http://localhost:3000/ref_data.0_0_0.tsv') WITH delimiter = e'\t', nullif = '';
        job_id        |  status   | fraction_completed | rows | index_entries | bytes
----------------------+-----------+--------------------+------+---------------+--------
  1013454367369822209 | succeeded |                  1 | 1000 |             0 | 71401
(1 row)

Review imported data

Good stuff, your dataset has been successfully imported, confirm on the SQL prompt

bank=# SELECT * FROM ref_data LIMIT 5;
 acc_no |           external_ref_id            |         created_time          |       acc_details        
--------+--------------------------------------+-------------------------------+--------------------------
      0 | a4a689d3-ae12-4ca2-8b74-a9929c89f420 | 2015-06-03 04:03:48.248701-04 | DiKysAoJeZ9hqDehOhE2N
      1 | 01abe15c-fcae-4492-812e-37b2212ababa | 2019-06-11 17:57:05.76814-04  | nEYyCwhTJTXXaMfeHeWj
      2 | 40cc3ccd-74bf-4688-ae85-7f68cd0e44f2 | 2001-10-10 18:42:21.865742-04 | DFhQ0aUbjqJ9CsczdyWv
      3 | 9491509e-0424-4cdf-879f-790cff2d7289 | 2003-08-13 20:26:49.484942-04 | ggpx35EPH698N2MGlUPV13P0
      4 | a54b9e47-10c7-4dce-9db8-12c7eaf80e17 | 2022-06-02 22:56:20.445508-04 | JCOhKbB3YMrOOBm
(5 rows)

bank=# SELECT count(*) FROM ref_data;
 count 
-------
  1000
(1 row)

We're then ready to proceed to the next section.