Generative AI for Software Development — CSCI 455 / 555

An undergraduate class · CSCI 455 / 555 · Spring 2026

Fifteen undergraduate scholars spent twelve weeks at the College of William & Mary learning how code-aware language models actually behave: where they hallucinate, how to measure their failures, and how to build workflows that don't collapse in production. Eleven projects shipped this spring.

ModulesVIII

LabsIII

Cohort15

Shipped11

Open the playbook → See the final projects

§ I The Cohort

The 2026 class.

Aidan BasloeGroup 01 · Stock AI

Sam BennettGroup 05 · BURT++

Nathaniel CallabresiGroup 02 · Video Index

Alan Gonzalez OsorioGroup 03 · Arbitrage

James HeGroup 04 · PlotForge

Walker HymanGroup 11 · Rhizome

Alice JiGroup 06 · Claim Verify

Jeff LinGroup 01 · Stock AI

Abby SchwallGroup 07 · Degree Map

Jack StawaszGroup 04 · PlotForge

Krishna SwaminathanGroup 08 · RAG Rules

Yibarek TadesseGroup 09 · CodeCaster

Camly TranGroup 06 · Claim Verify

Lily WalkerGroup 02 · Video Index

Carter WilliamsonGroup 10 · Youth Sports

§ II The Playbook · An overview of the course structure

Eight chapters of theory.

An overview of how the term is built — each chapter is read end-to-end, then drilled in a notebook. Lectures hand off to lab handouts; lab handouts hand off to seminar discussions. The notebook stays open the entire semester.

IWk 01-02

Mining repositories

Collecting, cleaning, and tokenizing source-code data from public repositories. PyDriller, BPE, deduplication with MinHash, and the ethics of training on copyleft code.

repositoriesBPEASTslicensing

Lab A · shipped→

IIWk 03

Modeling code

From n-grams to the naturalness hypothesis. Probability refresher, MLE, perplexity, smoothing, sampling temperature — and why code is more predictable than English.

MLEperplexitysmoothingsampling

Pre-lab · spam classifier→

IIIWk 04

Evaluating rigorously

Classification metrics, BLEU and its discontents, CodeBLEU, pass@k, embeddings, SIDE, and the unglamorous human-evaluation rubric the best papers include without making a show of it.

BLEUpass@kCodeBLEUSIDE

Lab B · shipped→

IVWk 05-06

Deep learning

Neural networks, backpropagation, embeddings, LSTM/GRU, attention, transformers, autoregressive generation, pre-training, and fine-tuning — the engine room.

neural netsLSTMtransformersCodeBERT

Lab · code completion→

VWk 07-08

Prompting LLMs

In-context learning, few-shot, chain-of-thought, prompt engineering, RAG, tool use, context-window management, prompt chaining, self-consistency, and prompt evaluation.

ICLCoTRAGtool use

Lab C · shipped→

VIWk 09

Hallucinations in code

How LLMs fabricate, the CodeHalu taxonomy, RAG mitigation, prompt defenses, tool-augmented generation, production case studies, and hallucination-resistant workflows.

CodeHaluRAG mitigationproduction cases

Workshop · red-team→

VIIWk 10

NP-completeness

Reductions, hardness, and what LLMs do when the underlying problem isn't tractable. Where statistical pattern-matching collides with the unforgiving floor of computational complexity.

reductionsSAThardnesscomplexity

Theory companion→

VIIIWk 11-12

Genetic algorithms

Population search, fitness landscapes, crossover, fitness approximation with LLM predictors, the GA+LLM architecture, and the honest limits of evolutionary search over code.

populationselectionfitness approxGA + LLM

Capstone-adjacent→

Grading scheme

Assignments avg · midterm · capstone split · participation

Deliverable	What it is	Weight
Assignments I-III (avg)	Average of three coding assignments — mining, modeling, evaluation.	40%
Midterm	Mid-semester written examination of theory and methods.	10%
Final project	Capstone block · 5–7 page write-up paired with a ten-minute in-class demo of the shipped product. Graded on five rubric criteria.	45%
Participation	Office-hour engagement, seminar discussion, peer review.	5%
Bonus	Additive, not weighted — for exceptional contributions.	+0–5
Total	Weighted components sum to 100%; bonus remains additive.	100%

§ III The Labs

Five labs, one notebook each.

Each lab pairs a chapter of theory with a hands-on notebook — the artifact a future student inherits. Run them locally, modify them, break them. Lab handouts and source notebooks are linked from each module.

Lab AWk 01-02

Repository mining & dedup

Clone three permissively-licensed Java repos, extract methods, tokenize, and dedup. Deliverable: a clean JSONL corpus plus a short report on what got cut and why.

1 handout · 3 notebooks→

Pre-labWk 03

ML warmup — spam

Naive Bayes vs Random Forest on a spam corpus — the MLE / smoothing / evaluation muscle you'll re-use on code tokens in the n-gram lab.

1 notebook→

Lab BWk 04

Evaluating a code model

Fine-tune CodeT5 for code translation, then evaluate with BLEU and CodeBLEU. Two notebooks — one runs the model, one computes the metrics.

2 notebooks→

LabWk 05-06

Deep code completion

An end-to-end seq2seq + attention setup for code completion with a small hyperparameter tuner. Reproduces every number in the corresponding lecture.

1 notebook→

Lab CWk 07-08

Prompting & RAG

Talk to an LLM API through code. ICL, chain-of-thought, and a simple regression suite against your own prompts.

1 notebook→

§ IV Final Projects

Real shipped products.

Each group chose a real problem, scoped a system, and built something that runs. Scored on market analysis, differentiation, and technical framework. All eleven groups shipped on schedule — five cleared the bar, six fell short. Results below tell the whole story, sorted by group number.

Group01

Finance · markets

Stock Investment AI

An algorithmic stock-prediction interface with explainable retrieval-grounded recommendations. Pairs price-signal modeling with LLM-generated reasoning over filings.

Aidan Basloe · Jeff Lin→

Group02

Search · multimedia

Multimodal Video Indexing

Natural-language search across video archives, replacing brittle metadata-only retrieval with vision-and-language embeddings indexed at scene granularity.

Nathaniel Callabresi · Lily Walker→

Group03

Sports · risk

Sports Betting Arbitrage

Real-time cross-sportsbook arbitrage detection with risk-aware position sizing. Surfaces price disagreements before they close.

Alan Gonzalez Osorio→

Group04

Tooling · data

PlotForge

A plotting interface for data analysis aimed at students, educators, and lightweight analysts. Natural-language to charting with iterative refinement.

James He · Jack Stawasz→

Group05

Dev tools · QA

BURT++

A bug-report assistant that translates non-technical user complaints into actionable engineering tickets — clarifying reproduction steps as it goes.

Sam Bennett→

Group06

Civic · verification

GenAI Claim Verification

Retrieval-augmented evidence pipeline for verifying factual claims, attaching source citations with calibrated confidence.

Alice Ji · Camly Tran→

Group07

Education · planning

W&M Degree Map

A planning tool for liberal-arts students navigating complex general-education requirements. Goal-aware course recommendations with clear-eyed prerequisite traversal.

Abby Schwall→

Group08

Sports · rules