Source Tables - Setup & Loading Requirements

jhillam · July 1, 2022, 7:03pm

Source Table Location

Once Truelty is set up in your Snowflake account, the following database and schema will be in place, and this is where you need to place and maintain the tables to be processed by Truelty.

Database: TRUELTY
Schema: LZ

Source Table Requirements

Each source table identified for Truelty to process must meet the following requirements:

Have a unique key ID for each record.
The table’s unique key ID must be limited to a single column in the table
1. If your table’s unique key ID is made up of multiple columns, you will need to concatenate the columns into a single column to be used as the table’s unique key ID column.
The Unique Key ID Column MUST be a VARCHAR Data Type.
1. We require it to be varchar to allow the table to be sliced into partitions for load-balancing various processes.
2. If your Unique Key ID Column is not a varchar data type, there is a simple solution.
  1. Define the Unique Key ID column as varchar in the create table DDL script, and Snowflake will automatically convert your loading data to varchar data type.
Include in the table creation DDL the CLUSTER BY clause
1. The cluster needs to be applied to the Unique Key ID column in the following way:
  1. Cluster By (left(<unique key id column name>,5))

Source Table Pre-load Checklist

Pre-load checklist
1. Ensuring that the column representing the Unique Key ID for each record is:
  1. Unique across all records
  2. Has no NULL values
  3. A VARCHAR data type
2. Where possible, in the data loading pipelines, sort your incoming records by your Unique Key ID column using the following:
  1. Order By left(<unique key id column name>,5)
  2. Specifying the ordering will more effectively load the micro-partitions and reduce processing for auto-re-clustering the table.
3. If the data has first and last names in the data
  1. Concatenate the First and Last names into a Full Name column
  2. Delimit First and Last with a single space only
    1. i.e., John Smith
4. Any phone number should be area code & phone number together
5. If you are using country or region codes, we recommend removing them from phone numbers
Your data pipeline for loading the source tables should only insert new records or update existing records
1. We recommend you avoid deleting records (except for duplicates or null records) as this can adversely affect the cluster key and deep chaining of orphaned records.