Data Wrangling

The Data Wrangling phase is where you bring your data into Cauzen. You upload a CSV file, preview it, and configure column metadata that helps Cauzen understand your data throughout the rest of the workflow.

Uploading a file

Drag and drop a CSV file onto the upload area, or click it to open a file browser. Cauzen accepts standard CSV files up to 50 MB.

Only one dataset can be active at a time. If you want to switch to a different file, click Replace Dataset — you'll be asked to confirm, since replacing the dataset also resets the causal model and any inference history.

Dataset overview

Once a file is uploaded, the Dataset Overview card shows a summary of your data:

Field	Description
Rows	Total number of data rows
Columns	Total number of columns
Kept Columns	How many columns are included in the analysis
Labeled	How many columns have a custom label
Described	How many columns have a description
File Size	Size of the uploaded file
Data Types	A breakdown of detected types (number, string, boolean, date)

Previewing your data

The Data Preview table shows the raw contents of your CSV. Use the page controls below the table to browse through the data.

Configuring dataset metadata

Below the preview you can give your dataset a name and a description. These are used by AI features later in the workflow to provide better results.

Click the Generate with AI button next to either field to have Cauzen suggest a name or description based on the column names and sample data.

Configuring column metadata

The Column Metadata table lists every column in your dataset. For each column you can:

Include or exclude columns

Use the toggle in the Include column to mark a column as included or excluded. Excluded columns appear faded and are not used in the causal model or inference queries. Use the checkbox in the column header to toggle all columns at once.

Only include columns that are relevant to your analysis. Excluding irrelevant columns keeps the causal graph clean and improves AI accuracy.

Column status and recommendations

The Status column explains whether a column is included, excluded, or recommended for exclusion. Cauzen automatically flags columns that are structurally unsafe or unlikely to be useful causal variables:

Recommendation	What it means
Identifier-like	The column name looks like an ID, UUID, row number, or record identifier
Row identifier	Every row has a unique value, so the column behaves like an identifier
Sequential date	Date-like values are unique and strictly ordered, which often acts like row order rather than a model variable
Empty	The column has no usable values
Constant	Every row has the same value
Duplicate	The encoded values duplicate an earlier kept column
Linearly dependent	The encoded values are a deterministic linear combination of other kept columns

Recommended columns start excluded so downstream discovery is less likely to fail. If you re-include one, Cauzen keeps your choice and leaves the recommendation visible as a warning. If a duplicate or dependency was detected, the status also names the column or columns that caused the recommendation.

Uploading a file

Dataset overview

Previewing your data

Configuring dataset metadata

Configuring column metadata

Include or exclude columns

Column status and recommendations

Labels

Descriptions

Data types

Example values

Continuing to Causal Modeling

On this page