Data Wrangling
Upload your CSV data and configure columns for analysis.
The Data Wrangling phase is where you bring your data into Cauzen. You upload a CSV file, preview it, and configure column metadata that helps Cauzen understand your data throughout the rest of the workflow.
Uploading a file
Drag and drop a CSV file onto the upload area, or click it to open a file browser. Cauzen accepts standard CSV files up to 50 MB.
Only one dataset can be active at a time. If you want to switch to a different file, click Replace Dataset — you'll be asked to confirm, since replacing the dataset also resets the causal model and any inference history.
Dataset overview
Once a file is uploaded, the Dataset Overview card shows a summary of your data:
| Field | Description |
|---|---|
| Rows | Total number of data rows |
| Columns | Total number of columns |
| Kept Columns | How many columns are included in the analysis |
| Labeled | How many columns have a custom label |
| Described | How many columns have a description |
| File Size | Size of the uploaded file |
| Data Types | A breakdown of detected types (number, string, boolean, date) |
Previewing your data
The Data Preview table shows the raw contents of your CSV. Use the page controls below the table to browse through the data.
Configuring dataset metadata
Below the preview you can give your dataset a name and a description. These are used by AI features later in the workflow to provide better results.
Click the Generate with AI button next to either field to have Cauzen suggest a name or description based on the column names and sample data.
Configuring column metadata
The Column Metadata table lists every column in your dataset. For each column you can:
Include or exclude columns
Use the toggle in the Include column to mark a column as included or excluded. Excluded columns appear faded and are not used in the causal model or inference queries. Use the checkbox in the column header to toggle all columns at once.
Only include columns that are relevant to your analysis. Excluding irrelevant columns keeps the causal graph clean and improves AI accuracy.
Column status and recommendations
The Status column explains whether a column is included, excluded, or recommended for exclusion. Cauzen automatically flags columns that are structurally unsafe or unlikely to be useful causal variables:
| Recommendation | What it means |
|---|---|
| Identifier-like | The column name looks like an ID, UUID, row number, or record identifier |
| Row identifier | Every row has a unique value, so the column behaves like an identifier |
| Sequential date | Date-like values are unique and strictly ordered, which often acts like row order rather than a model variable |
| Empty | The column has no usable values |
| Constant | Every row has the same value |
| Duplicate | The encoded values duplicate an earlier kept column |
| Linearly dependent | The encoded values are a deterministic linear combination of other kept columns |
Recommended columns start excluded so downstream discovery is less likely to fail. If you re-include one, Cauzen keeps your choice and leaves the recommendation visible as a warning. If a duplicate or dependency was detected, the status also names the column or columns that caused the recommendation.
Labels
The Label field lets you give a column a human-readable name. For example, a column named bmi_kg_m2 might have a label of Body Mass Index. Labels appear throughout the Causal Modeling and Causal Inference phases.
Click the generate button (✨) next to a label field to have Cauzen suggest a label using AI.
Descriptions
The Description field lets you explain what a column measures. Good descriptions help the AI discovery and inference features understand the meaning of each variable.
Click the generate button next to a description field to generate one automatically.
Data types
The Type badge shows the data type Cauzen detected for each column: number, string, boolean, date, or unknown. This is detected automatically from the CSV and cannot be changed.
Example values
The Example column shows the first non-empty value from each column so you can verify the data looks correct.
Continuing to Causal Modeling
When you're satisfied with your column configuration, click Continue to Causal Modeling at the bottom of the page to move to the next phase.
You can always come back to Data Wrangling later to adjust metadata. Your changes will be reflected in subsequent phases.