Load data & assign variable types
Upload CSV with mixed data types
Include a header row. Columns can be continuous (numeric values) or categorical (text labels). The tool auto-detects variable types, which you can adjust after upload. Limit: 5,000 rows. Alternatively, select a scenario above to load sample data.
Drag & Drop CSV file (.csv, .tsv, .txt)
Include headers with mixed continuous and categorical columns.
Preprocessing & clustering
Advanced settings
Distance weight parameter (γ)
Gamma (γ) controls the relative weight of categorical vs. continuous variables in distance calculations. Auto-mode uses the average standard deviation of continuous features (typically works well). Increase γ to give categorical variables more influence; decrease to prioritize continuous variables.
Auto: γ = (will be calculated after data load)
Additional info & guidance
Start with k=3–4 and run diagnostics for k=2–8. Look for an elbow in the cost plot and high silhouette values (>0.3) to identify well-separated clusters. Because k-prototypes uses multiple random initializations, results are generally stable but may vary slightly between runs.
Standardization is recommended when continuous variables have very different scales (e.g., age 0–100 vs. spend $0–$10,000). Note: standardization affects auto-calculated gamma by changing variance structure. If clusters seem overly driven by one variable type, adjust gamma manually.