Causal Modeling

The Causal Modeling phase is where you construct a directed acyclic graph (DAG) — a diagram that represents the cause-and-effect relationships between your variables. Each node is a column from your dataset, and each arrow represents a direct causal influence from one variable to another.

This causal graph is the foundation for the queries you run in the Causal Inference phase.

The canvas

Your columns are displayed as nodes on an interactive canvas. You can drag nodes to rearrange them, zoom in and out, and pan around the canvas.

The graph state is saved in your browser, including node positions, edge types, constraints, zoom, and pan. If you leave the phase or refresh the page, Cauzen restores the canvas as you left it.

Nodes are color-coded:

Blue nodes — regular variables
Green nodes — root (exogenous) variables that have no incoming edges, meaning nothing in the model causes them

Click a node or an edge to select it and view its properties in the panel on the right.

Discovering edges with AI

The fastest way to build a causal model is to click Discover in the toolbar. Cauzen sends the kept columns, dataset metadata, and any background-knowledge constraints to the backend. The backend runs causal discovery, streams progress messages, returns a partial graph when available, and then applies LLM refinement.

Discovery is a starting point, not a final answer. Always review the proposed edges against your domain knowledge and correct any that don't make sense.

While discovery is running, Cauzen shows the current backend progress message. If a partial graph is returned before LLM refinement finishes, the graph appears immediately. When refinement succeeds, the final graph replaces the partial graph and an LLM refinement panel appears with the backend's reasoning. If refinement fails after a partial graph was received, Cauzen keeps the partial graph and shows the backend error.

If your model already has edges, Cauzen will ask you to confirm before replacing them. Discovery respects any edges you've marked as Required or Forbidden (see Background knowledge below).

Discovery readiness checks

Before sending data to the backend, Cauzen validates the kept columns using the same numeric encoding used for discovery. This catches cases where the correlation matrix would be singular, such as duplicate columns or columns that are deterministic combinations of other columns.

If the kept columns are not ready, Cauzen opens a dialog listing recommended exclusions and why each column was flagged. You can:

Click Apply exclusions and run to mark the recommended columns as excluded and immediately start discovery again.
Click Review columns to leave the dataset unchanged, then return to Data Wrangling if you want to decide manually.

If the backend still reports a singular matrix, Cauzen shows an actionable error directing you to review or apply recommended exclusions.

Drawing edges manually

To draw an edge between two nodes:

Hold Shift
Click and drag from the source node
Release over the target node

A dashed line follows your cursor while dragging. If the target node is valid, it will be highlighted with a yellow border when you hover over it.

Edges cannot create cycles (the graph must remain acyclic), and duplicate edges between the same pair of nodes are not allowed.

Clearing the model

To remove all edges and start over, click the Clear Model button (trash icon) in the toolbar. You'll be asked to confirm.

Layouts

Use the Layout dropdown in the toolbar to rearrange the nodes on the canvas:

Layout	Description
Hierarchical	Arranges nodes in a top-down tree structure. Good for showing clear causal chains.
Grid	Places nodes in an evenly spaced grid.
Circle	Arranges nodes in a circle.
Force-Directed	Uses physics simulation to space nodes based on their connections. Often gives a natural-looking result for complex graphs.

The layout only affects the visual arrangement — it does not change the causal relationships.

Zooming and fitting

Use the + and − buttons in the toolbar to zoom in and out. Click the Fit to Screen button (frame icon) to zoom the canvas so all nodes are visible.

Node properties

Click any node to view its properties in the right panel:

Label — the human-readable name from Data Wrangling
Column — the original column name from the CSV
Type — the detected data type
Example — a sample value from the data
Description — the description from Data Wrangling

Edge properties

Click any edge to view and edit its properties:

Edge type

The Type dropdown controls what kind of relationship the edge represents:

Type	Symbol	Meaning	Example
Directed	→	A directly causes B	Smoking → lung cancer: smoking changes lung cancer risk.
Bidirected	↔	A and B share a hidden common cause	Ice cream sales ↔ drownings: hot weather can increase both.
Partially directed	o→	The source endpoint is unresolved and the target endpoint is an arrowhead	Diet o→ weight: diet may change weight, or motivation may affect both diet and weight.
Nondirected	o-o	Orientation is unresolved at both endpoints	Exercise o-o sleep: exercise may affect sleep, sleep may affect exercise, or stress may affect both.
Undirected	—	A double-tail edge, often representing selection bias through an implicit conditioned selection node	Income — exercise among survey respondents: joining the survey may depend on both.

Most manually drawn edges should be Directed. Discovery algorithms can return symmetric edge types where the stored endpoints are just graph endpoints, not a source-to-target causal direction.

Background knowledge

The Constraint dropdown lets you encode prior knowledge about whether an edge should exist:

Constraint	Meaning
None	No constraint — the edge is treated as a discovery result
Required	This edge must appear in any discovered model
Forbidden	This edge must not appear in any discovered model

Setting constraints before running Discover is a powerful way to incorporate domain expertise. For example, if you know that age cannot be caused by income, you can mark that edge as Forbidden.

Continuing to Causal Inference

When your causal model reflects your understanding of the system, click Continue to Causal Inference in the bottom-right corner.

On this page