Seed the database tables
Create the tables
Create the tables in a database called bank
.
PostgreSQL
postgres=# CREATE DATABASE bank;
Now, you need to disconnect and reconnect to bank
psql 'postgres://fabio:postgres@localhost:5432/bank?sslmode=disable'
Once on the SQL prompt, import the file, or copy-paste, as you prefer.
bank=# \i bank.ddl
bank=# \d
-- List of relations
-- Schema | Name | Type | Owner
-- --------+----------+-------+-------
-- public | orders | table | fabio
-- public | ref_data | table | fabio
-- (2 rows)
CockroachDB
defaultdb> CREATE DATABASE bank;
defaultdb> USE bank;
bank> \i bank.ddl
bank> SHOW TABLES;
-- schema_name | table_name | type | owner | estimated_row_count | locality
-- --------------+------------+-------+-----------+---------------------+-----------
-- public | orders | table | cockroach | 0 | NULL
-- public | ref_data | table | cockroach | 0 | NULL
-- (2 rows)
Generate datasets
Next, generate some CSV data to seed the database.
dbworkload
has 2 built-in utility functions that can assist with this task:
dbworkload util yaml
, which converts a DDL file into a data generation definition file, structured in YAML.dbworkload util csv
, which takes the YAML file as input and outputs CSV files.
Let's use the yaml
utility with our bank.ddl
file.
dbworkload util yaml -i bank.ddl
For this exercise, we will use below simplified YAML file.
Replace the content of bank.yaml
with below YAML
ref_data:
- count: 1000
sort-by:
- acc_no
columns:
acc_no:
type: sequence
external_ref_id:
type: uuid
created_time:
type: timestamp
acc_details:
type: string
Now let's create a CSV dataset
dbworkload util csv -i bank.yaml -x 1
The CSV files will be located inside a bank
directory.
$ ls -lh1 bank
ref_data.0_0_0.tsv
$ wc -l bank/*
1000 bank/ref_data.0_0_0.tsv
Inspect it
$ head -n5 bank/ref_data.0_0_0.tsv
0 3a2edc9d-a96b-4541-99ae-0098527545f7 2008-03-19 06:20:27.209214 CWUh0FWashpmWCx4LF3kb1
1 829de6d6-103c-4707-9668-c4359ef5373c 2014-02-13 22:04:20.168239 QGspICZBHYpRLnHNcg
2 5dd183af-d728-4e12-8b11-2900b6f6880a 2019-04-01 16:14:40.388236 sEUukccOePdnIbiQyVUSi0HS7rL
3 21f00778-5fca-4302-8380-56fa461adfc8 2003-05-21 19:21:21.598455 OQTNwxoZIAdNmcA6fJM5eGDvMJgKJ
4 035dac61-b4a3-40a4-9e4d-0deb50fef3ae 2011-08-15 06:15:40.405698 RvToVnn20BEXoxFzw9QFpCt
Importing datasets
Now we are ready to import the CSV file into our table ref_data
.
PostgreSQL
For PostgreSQL Server, at the SQL prompt, just use COPY
bank=# COPY ref_data FROM '/path/to/workloads/bank/ref_data.0_0_0.csv' WITH CSV DELIMITER AS e'\t';
COPY 1000
Time: 8.713 ms
CockroachDB
For CockroachDB, my favorite method is to use a webserver for serving the CSV files.
Open a new terminal then start a simple python server
cd workloads/bank
python3 -m http.server 3000
If you open your browser at http://localhost:3000, you should see file ref_data.0_0_0.tsv
being served.
At the SQL prompt, import the file
bank> IMPORT INTO ref_data CSV DATA ('http://localhost:3000/ref_data.0_0_0.tsv') WITH delimiter = e'\t', nullif = '';
job_id | status | fraction_completed | rows | index_entries | bytes
----------------------+-----------+--------------------+------+---------------+--------
1013454367369822209 | succeeded | 1 | 1000 | 0 | 71401
(1 row)
Review imported data
Good stuff, your dataset has been successfully imported, confirm on the SQL prompt
bank=# SELECT * FROM ref_data LIMIT 5;
acc_no | external_ref_id | created_time | acc_details
--------+--------------------------------------+-------------------------------+--------------------------
0 | a4a689d3-ae12-4ca2-8b74-a9929c89f420 | 2015-06-03 04:03:48.248701-04 | DiKysAoJeZ9hqDehOhE2N
1 | 01abe15c-fcae-4492-812e-37b2212ababa | 2019-06-11 17:57:05.76814-04 | nEYyCwhTJTXXaMfeHeWj
2 | 40cc3ccd-74bf-4688-ae85-7f68cd0e44f2 | 2001-10-10 18:42:21.865742-04 | DFhQ0aUbjqJ9CsczdyWv
3 | 9491509e-0424-4cdf-879f-790cff2d7289 | 2003-08-13 20:26:49.484942-04 | ggpx35EPH698N2MGlUPV13P0
4 | a54b9e47-10c7-4dce-9db8-12c7eaf80e17 | 2022-06-02 22:56:20.445508-04 | JCOhKbB3YMrOOBm
(5 rows)
bank=# SELECT count(*) FROM ref_data;
count
-------
1000
(1 row)
We're then ready to proceed to the next section.