Group IV · Generative analytics · CSCI 455 / 555 · Spring 2026

PlotForge

An AI-built interactive plotting and statistical analysis interface, delivered end-to-end through vibe coding with Claude Code.

PlotForge is a browser-based platform for mathematical function plotting and statistical data analysis, built end-to-end through vibe coding with Claude Code. It targets students, educators, and lightweight analysts who need to explore datasets without writing code. The platform spans function plotting, data import, eight statistical analysis modules, and seven machine-learning algorithms in a single dark-theme web application backed by Python Flask. The codebase totals approximately 17,700 lines across 31 source files.

Python FlaskSciPyscikit-learnChart.jsVanilla JSClaude Code

The brief.

PlotForge addresses three underserved user segments. STEM students regularly need to plot functions, run inferential tests, and inspect model results, but face steep setup costs in Python or MATLAB. Educators need live interactive demonstrations of statistical concepts without managing classroom software environments; Desmos and GeoGebra are interactive but support neither real data nor inferential statistics. Lightweight analysts need a fast scratchpad for uploading a CSV, visualizing distributions, running a quick regression, and exporting findings — before committing to a full programming stack.

The Anaconda 2022 State of Data Science report found that 45% of data practitioners spend significant time navigating between multiple tools rather than performing analysis, and "inadequate tools" ranked among the top three workflow challenges. Python and R dominate data-science usage but consistently require environment setup, package management, and scripting skill that excludes non-programmers. No lightweight, browser-based tool covers the full introductory curriculum from descriptive statistics through supervised ML without requiring any code.

Target: STEM students, educators teaching introductory statistics, and lightweight analysts who need a fast, code-free scratchpad for CSV exploration, distribution fitting, regression, and ML.

Running python src/app.py is the only setup step. The tool covers the complete introductory statistics curriculum — descriptive statistics, distribution fitting, 8 hypothesis test types, correlation, linear/logistic regression, and supervised/unsupervised ML — with no package installation or coding by end users.From the write-up

The landscape.

ToolApproachWeaknessOur edge
Desmos / GeoGebraInteractive function graphersNo data import, no inferential statistics, no MLAdds real data, inferential stats, and ML in the same browser surface
MATLABFull numerical computing environmentHeavy install, license cost, no interactive web accessZero-install, browser-native, free
Jupyter NotebookCode-first scientific notebookRequires Python scripting skill and environment managementCode-free chip UI exposes the same statistical coverage to non-programmers
JASPFree GUI for statisticsNo function plotting, limited ML, no web deploymentAdds plotting, full ML coverage, and a single-command web deploy

PlotForge's edge is concrete and feature-specific: a single-command setup with full introductory-stats coverage; a unified workflow where data imported once is immediately available across every analysis tab without re-importing; transparent automatic one-hot encoding for categorical variables (chip UI marks them with a dashed border and a cat badge); interactive curve fitting where users seed initial parameters by clicking on the plot instead of guessing numerical values; and an animated ML training progress bar with a model-specific ETA so users do not abandon long-running fits.

The system.

PlotForge uses a two-tier architecture: a vanilla HTML/CSS/JavaScript frontend with no build step and no framework, and a Python Flask backend that serves both static files and a REST API. The Flask root route (/) serves index.html via send_from_directory, ensuring API fetch paths resolve correctly regardless of how the HTML is opened.

The backend (app.py, 1,451 lines) exposes 15 REST endpoints under /api/stats/. SciPy powers the inferential layer — one-sample and two-sample t-tests, Mann-Whitney, Kruskal-Wallis, ANOVA, chi-squared, Shapiro-Wilk, curve fitting (Gaussian, exponential, power, logistic, polynomial, sine), Q-Q plots, ECDF, and preprocessing transforms. scikit-learn supplies linear and logistic regression, random forest, gradient boosting, decision tree, KNN, SVM, K-Means, and PCA, with cross-validation, ROC/PR curves, confusion matrices, and feature importance.

The frontend (stats.js 2,378 lines, index.html 646 lines, components.css 1,168 lines, plus 20 additional JS/CSS files — roughly 17,700 lines total) renders all visualizations with Chart.js and implements a chip-style variable selection system across every analysis tab. Chips support three selection modes: single-select radio, multi-select, and type-filtered (Hypothesis Testing automatically switches between numeric-only chips for t-tests/ANOVA and categorical-only chips for Chi-squared).

The implementation.

The data flow is linear and stateless on the client. A user imports a CSV/JSON/Excel/pickle/hdf5/numpy file. Columns become list-variable objects with kind:'list'; the user selects via the chip UI; _statsPost(endpoint, payload) posts JSON to Flask; categorical variables are one-hot encoded client-side via _expandVar(v) before transmission; Flask returns JSON results; the frontend renders tables, Chart.js visualizations, confusion matrices, ROC/PR curves, and feature-importance bars.

All features proposed in the original specification were delivered (interactive plotting, data import, regression, hypothesis testing). Additions beyond the proposal included chip-style variable selectors, transparent categorical variable support, seven ML algorithms with full evaluation metrics, an animated training progress bar with ETA computed as pct = 90(1 - e^(-3t/Tˆ)), and six histogram subtypes (histogram, violin, Q-Q, ECDF, boxplot, frequency bar).

The entire platform was built using Claude Code (Claude Sonnet 4.6) running in the terminal with direct file-editing access. A CLAUDE.md at the project root captured architecture, file roles, and naming conventions, allowing Claude Code to resume context across sessions without re-explanation. The chip-selector retrofit across all stats tabs — estimated at 3-4 hours of manual work — was completed in roughly 5 minutes with a single prompt.

POST /api/stats/ttestPOST /api/stats/anovaPOST /api/stats/chisquaredPOST /api/stats/regressionPOST /api/stats/ml-trainPOST /api/stats/curve-fitPOST /api/stats/correlationPOST /api/stats/descriptive

Built with AI.

Where AI helped

  • UI scaffolding at scale: chip system, progress bar, and results rendering across 8 tabs were generated with high first-pass correctness from single specifications.
  • Backend statistics implementation: every SciPy and scikit-learn call (cross-validation, ROC curves, Kruskal-Wallis, PCA) was implemented correctly without iteration.
  • CSS consistency: the existing design token system (--acc1, --border2) was maintained across every new component without visual drift.

Where AI struggled

  • Frontend-backend field name mismatches (elbow_k vs elbow_ks, y_test vs y_orig, flat vs nested roc object) required runtime error output to be pasted back for diagnosis — Claude could not predict them from code alone.
  • A CSS height-collapse edge case where the confusion-matrix wrapper had height:0 despite an inner <table> of 120px (caused by overflow-x:auto collapsing flex-column children) needed getBoundingClientRect() inspection and one debug pass to fix.
  • Open-ended architectural questions ("React or vanilla JS?") drew confident answers without full codebase context, sometimes requiring course-correction.

Runtime context is essential for debugging — show Claude the exact error output, not just the code — and scope specificity dramatically outperforms vague directives; humans should make the structural decisions (REST API design, single-file Flask, shared variable state) and hand them to Claude as constraints to execute against.

The evidence.

~17,700 lines
Codebase size
31 source files, mostly AI-authored
15
REST endpoints
All under /api/stats/
7
ML algorithms
Linear/logistic regression, RF, GB, DT, KNN, SVM, plus K-Means and PCA
8 types
Hypothesis tests
t-tests, Mann-Whitney, Kruskal-Wallis, ANOVA, chi-squared, Shapiro-Wilk, plus more
~5 minutes
Chip-selector retrofit
Estimated 3-4 hours manual; Claude Code, one pass
0.517
ML CV mean accuracy
Random Forest on random synthetic data — correctly exposes overfitting vs 100% in-sample
One-Sample t-Test on score (μ₀ = 0), n=60 synthetic observations from Uniform[50,100] t = 39.10, df = 59, p = 6.90 × 10⁻⁴⁴, 95% CI = [71.61, 79.34]. Switching to Chi-Squared automatically filters chips to show only the categorical group variable, confirming type-aware filtering. End-to-end verification run on a synthetic 60-observation dataset.

Limits & next.

Limits

  • Volume-profile / advanced visualizations rely on a fixed analysis catalogue; users cannot register new test types from the UI.
  • Open-ended architectural questions to the LLM (vanilla vs framework, file structure) returned confident answers without full codebase context.
  • Backend field-name drift between Python results and JS renderers needed explicit runtime error output to diagnose — code alone was insufficient.
  • Categorical encoding is automatic but limited to one-hot; ordinal or target encodings are not yet supported.
  • Persistence is in-session only — no saved projects, no notebooks, no sharable URLs.

Next

  • Build a proper evaluation harness to measure first-pass correctness on a held-out set of feature specs.
  • Add saved-session storage so users can return to an imported dataset across visits.
  • Surface AI-assisted explanations next to each test result so beginners can read what a t-statistic means in context.
  • Expand the categorical-encoding strategies beyond one-hot (ordinal, target, hashing).
  • Add interactive curve-fit seeding as a first-class tutorial flow so students learn what each parameter does.