Causal Modeling
Build a causal graph representing cause-and-effect relationships in your data.
The Causal Modeling phase is where you construct a directed acyclic graph (DAG) — a diagram that represents the cause-and-effect relationships between your variables. Each node is a column from your dataset, and each arrow represents a direct causal influence from one variable to another.
This causal graph is the foundation for the queries you run in the Causal Inference phase.
The canvas
Your columns are displayed as nodes on an interactive canvas. You can drag nodes to rearrange them, zoom in and out, and pan around the canvas.
The graph state is saved in your browser, including node positions, edge types, constraints, zoom, and pan. If you leave the phase or refresh the page, Cauzen restores the canvas as you left it.
Nodes are color-coded:
- Blue nodes — regular variables
- Green nodes — root (exogenous) variables that have no incoming edges, meaning nothing in the model causes them
Click a node or an edge to select it and view its properties in the panel on the right.
Discovering edges with AI
The fastest way to build a causal model is to click Discover in the toolbar. Cauzen sends the kept columns, dataset metadata, and any background-knowledge constraints to the backend. The backend runs causal discovery, streams progress messages, returns a partial graph when available, and then applies LLM refinement.
Discovery is a starting point, not a final answer. Always review the proposed edges against your domain knowledge and correct any that don't make sense.
While discovery is running, Cauzen shows the current backend progress message. If a partial graph is returned before LLM refinement finishes, the graph appears immediately. When refinement succeeds, the final graph replaces the partial graph and an LLM refinement panel appears with the backend's reasoning. If refinement fails after a partial graph was received, Cauzen keeps the partial graph and shows the backend error.
If your model already has edges, Cauzen will ask you to confirm before replacing them. Discovery respects any edges you've marked as Required or Forbidden (see Background knowledge below).
Discovery readiness checks
Before sending data to the backend, Cauzen validates the kept columns using the same numeric encoding used for discovery. This catches cases where the correlation matrix would be singular, such as duplicate columns or columns that are deterministic combinations of other columns.
If the kept columns are not ready, Cauzen opens a dialog listing recommended exclusions and why each column was flagged. You can:
- Click Apply exclusions and run to mark the recommended columns as excluded and immediately start discovery again.
- Click Review columns to leave the dataset unchanged, then return to Data Wrangling if you want to decide manually.
If the backend still reports a singular matrix, Cauzen shows an actionable error directing you to review or apply recommended exclusions.
Drawing edges manually
To draw an edge between two nodes:
- Hold Shift
- Click and drag from the source node
- Release over the target node
A dashed line follows your cursor while dragging. If the target node is valid, it will be highlighted with a yellow border when you hover over it.
Edges cannot create cycles (the graph must remain acyclic), and duplicate edges between the same pair of nodes are not allowed.
Clearing the model
To remove all edges and start over, click the Clear Model button (trash icon) in the toolbar. You'll be asked to confirm.
Layouts
Use the Layout dropdown in the toolbar to rearrange the nodes on the canvas:
| Layout | Description |
|---|---|
| Hierarchical | Arranges nodes in a top-down tree structure. Good for showing clear causal chains. |
| Grid | Places nodes in an evenly spaced grid. |
| Circle | Arranges nodes in a circle. |
| Force-Directed | Uses physics simulation to space nodes based on their connections. Often gives a natural-looking result for complex graphs. |
The layout only affects the visual arrangement — it does not change the causal relationships.
Zooming and fitting
Use the + and − buttons in the toolbar to zoom in and out. Click the Fit to Screen button (frame icon) to zoom the canvas so all nodes are visible.
Node properties
Click any node to view its properties in the right panel:
- Label — the human-readable name from Data Wrangling
- Column — the original column name from the CSV
- Type — the detected data type
- Example — a sample value from the data
- Description — the description from Data Wrangling
Edge properties
Click any edge to view and edit its properties:
Edge type
The Type dropdown controls what kind of relationship the edge represents:
| Type | Symbol | Meaning | Example |
|---|---|---|---|
| Directed | → | A directly causes B | Smoking → lung cancer: smoking changes lung cancer risk. |
| Bidirected | ↔ | A and B share a hidden common cause | Ice cream sales ↔ drownings: hot weather can increase both. |
| Partially directed | o→ | The source endpoint is unresolved and the target endpoint is an arrowhead | Diet o→ weight: diet may change weight, or motivation may affect both diet and weight. |
| Nondirected | o-o | Orientation is unresolved at both endpoints | Exercise o-o sleep: exercise may affect sleep, sleep may affect exercise, or stress may affect both. |
| Undirected | — | A double-tail edge, often representing selection bias through an implicit conditioned selection node | Income — exercise among survey respondents: joining the survey may depend on both. |
Most manually drawn edges should be Directed. Discovery algorithms can return symmetric edge types where the stored endpoints are just graph endpoints, not a source-to-target causal direction.
Background knowledge
The Constraint dropdown lets you encode prior knowledge about whether an edge should exist:
| Constraint | Meaning |
|---|---|
| None | No constraint — the edge is treated as a discovery result |
| Required | This edge must appear in any discovered model |
| Forbidden | This edge must not appear in any discovered model |
Setting constraints before running Discover is a powerful way to incorporate domain expertise. For example, if you know that age cannot be caused by income, you can mark that edge as Forbidden.
Continuing to Causal Inference
When your causal model reflects your understanding of the system, click Continue to Causal Inference in the bottom-right corner.