Data Model

In the CrossEngage Predictions Platform, data is divided into three types; Customer, Transaction and Activity data. Customer and Transaction data are required for creating Prediction Models, while Activity data is optional.

Data Format

All data is uploaded to the Predictions Platform in the form of CSV files, encoded in UTF-8. The structure of the CSV files should comply with the RFC4180 Standard, with the Extension that a header with the field names is included in the file.

  • The fields of the CSV are separated by a comma (,) by default. Note that you can also use other characters, by changing the default settings of how the file is read.

  • Each line has the same number of fields.

  • Each line has a carriage return at the end (CRLF). The last line may or may not have a carriage return after it.

  • If there are quotation marks, the delimiter (,), or a new line in the fields, the fields must be separated with quotation marks and quotation marks within the fields must be indicated by writing them in pairs.

Examples

The following is an example of valid CSV, where values are not enclosed in quotation marks.

customer_id, comment, activity_type, activity_timestamp
209147, template: Campaign1, email_sent, 13:00:00
209147, , email_click, 2020-06-17 13:09:14

Now if one or more values contain Carriage Return or the Delimiter (,), we can re-write this file with quotation marks:

"customer_id", "comment", "activity_type", "activity_timestamp"
"209147", "template: Campaign1, upsell", "email_sent", "13:00:00"
"209147", "", "email_click", "2020-06-17 13:09:14"

In this example, the first value of "comment" now contains a comma (template: Campaign1, upsell).

Similarly, if one of more values contain Quotation marks, we can re-write the file where every value is enclosed in quotation marks, and quotation marks inside fields are doubled.

"customer_id", "comment", "activity_type", "activity_timestamp"
"209147", "template: ""Campaign1"", upsell", "email_sent", "13:00:00"
"209147", "", "email_click", "2020-06-17 13:09:14"

In this example, the first value of "comment now contains quotation marks, that are written twice (template: ""Campaign1"", upsell).

File Structure and Naming

The CrossEngage Predictions Platform uses two types of files to maintain up-to-date data, Base files and Delta files.

Base files contain data of a certain period and are part of each data package, e.g. transaction files for individual years. These files contain complete data for a specific time period, and are not expected to be replaced later.

Delta files contain data of a time period that has not yet fully elapsed, for example, transactions of the current year. Subsequent uploads of current year transactional data will replace the previous data, as the new file is a superset of the old one.

Customer Data

Since your collection of Customers is never considered "complete", customer files are always Delta files. This means that with every Upload, you upload the complete and most-up-to-date Customer data to replace the previous version. Customer Data files use the same name every time: customers_delta.csv.

Transaction Data

Transaction files containing all transaction records of 1 year or multiple years follow the following naming scheme:

  • Transactions 2015 -> transactions_base_2015.csv

  • Transactions 2016 -> transactions_base_2016.csv

  • Transactions 2017-2020 -> transactions_base_2017_2020.csv

Transaction files containing partial records of the current year use the following naming scheme:

  • Transactions 2023 Q1 -> transactions_delta_2023.csv

  • Transactions 2023 Q1+Q2+Q3 -> transactions_delta_2023.csv

Once the year is complete, and a single file with complete records of the year is created, it should be named as a base file again. Uploading a base file for 2023 will cause the platform to disregard the previous delta files for that year.

Activity Data

Activity files follow a similar naming scheme to the transaction files:

  • Outbound Activities 2015 -> activities_outbound_base_2015.csv

  • Outbound Activities 2016 -> activities_outbound_base_2016.csv

  • Outbound Activities 2017-2020 -> activities_outbound_base_2017_2020.csv

Activity files containing partial records of the current year use the following naming scheme:

  • Outbound Activities 2023 Q1 -> activities_outbound_delta_2023.csv

  • Outbound Activities 2023 Q1+Q2+Q3 -> activities_outbound_delta_2023.csv

Once the year is complete, and a single file with complete records of the year is created, it should be named as a base file again. Uploading a base file for 2023 will cause the platform to disregard the previous delta files for that year.

For more information on the Data Tab, please click here.

For more information on Preparing and Validating your data, please click here.

Last updated