Cauzen
Cauzen Docs

Data Wrangling

Upload your CSV data and configure columns for analysis.

The Data Wrangling phase is where you bring your data into Cauzen. You upload a CSV file, preview it, and configure column metadata that helps Cauzen understand your data throughout the rest of the workflow.

Uploading a file

Drag and drop a CSV file onto the upload area, or click it to open a file browser. Cauzen accepts standard CSV files up to 50 MB.

Only one dataset can be active at a time. If you want to switch to a different file, click Replace Dataset — you'll be asked to confirm, since replacing the dataset also resets the causal model and any inference history.

Dataset overview

Once a file is uploaded, the Dataset Overview card shows a summary of your data:

FieldDescription
RowsTotal number of data rows
ColumnsTotal number of columns
Kept ColumnsHow many columns are included in the analysis
LabeledHow many columns have a custom label
DescribedHow many columns have a description
File SizeSize of the uploaded file
Data TypesA breakdown of detected types (number, string, boolean, date)

Previewing your data

The Data Preview table shows the raw contents of your CSV. Use the page controls below the table to browse through the data.

Configuring dataset metadata

Below the preview you can give your dataset a name and a description. These are used by AI features later in the workflow to provide better results.

Click the Generate with AI button next to either field to have Cauzen suggest a name or description based on the column names and sample data.

Configuring column metadata

The Column Metadata table lists every column in your dataset. For each column you can:

Include or exclude columns

Use the toggle in the Include column to mark a column as included or excluded. Excluded columns appear faded and are not used in the causal model or inference queries. Use the checkbox in the column header to toggle all columns at once.

Only include columns that are relevant to your analysis. Excluding irrelevant columns keeps the causal graph clean and improves AI accuracy.

Column status and recommendations

The Status column explains whether a column is included, excluded, or recommended for exclusion. Cauzen automatically flags columns that are structurally unsafe or unlikely to be useful causal variables:

RecommendationWhat it means
Identifier-likeThe column name looks like an ID, UUID, row number, or record identifier
Row identifierEvery row has a unique value, so the column behaves like an identifier
Sequential dateDate-like values are unique and strictly ordered, which often acts like row order rather than a model variable
EmptyThe column has no usable values
ConstantEvery row has the same value
DuplicateThe encoded values duplicate an earlier kept column
Linearly dependentThe encoded values are a deterministic linear combination of other kept columns

Recommended columns start excluded so downstream discovery is less likely to fail. If you re-include one, Cauzen keeps your choice and leaves the recommendation visible as a warning. If a duplicate or dependency was detected, the status also names the column or columns that caused the recommendation.

Labels

The Label field lets you give a column a human-readable name. For example, a column named bmi_kg_m2 might have a label of Body Mass Index. Labels appear throughout the Causal Modeling and Causal Inference phases.

Click the generate button (✨) next to a label field to have Cauzen suggest a label using AI.

Descriptions

The Description field lets you explain what a column measures. Good descriptions help the AI discovery and inference features understand the meaning of each variable.

Click the generate button next to a description field to generate one automatically.

Data types

The Type badge shows the data type Cauzen detected for each column: number, string, boolean, date, or unknown. This is detected automatically from the CSV and cannot be changed.

Example values

The Example column shows the first non-empty value from each column so you can verify the data looks correct.

Continuing to Causal Modeling

When you're satisfied with your column configuration, click Continue to Causal Modeling at the bottom of the page to move to the next phase.

You can always come back to Data Wrangling later to adjust metadata. Your changes will be reflected in subsequent phases.