“Back to basics” — Implementing various ML algorithms from scratch with a pure, low-level approach, and coparing them against each other.

Overview

This project explores housing price prediction on the Boston Housing dataset using multiple approaches — from hand-coded gradient descent to deep learning — to understand the fundamentals of regression at every level of abstraction.

Dataset

Source: Boston Housing (Kaggle)

  • 506 samples, 13 features, 1 target (MEDV — Median home value in $1000s)
  • Train/Test Split: 80/20 (random state 42)
  • Preprocessing: Z-score standardisation (mean/std computed on train set only, applied to both)

Features

FeatureDescription
CRIMPer capita crime rate by town
ZNProportion of residential land zoned for lots > 25k sqft
INDUSProportion of non-retail business acres per town
CHASCharles River dummy variable (1 if tract bounds river)
NOXNitric oxide concentration (parts per 10 million)
RMAverage number of rooms per dwelling
AGEProportion of owner-occupied units built pre-1940
DISWeighted distances to five Boston employment centres
RADIndex of accessibility to radial highways
TAXFull-value property tax rate per $10,000
PTRATIOPupil-teacher ratio by town
B1000(Bk − 0.63)² where Bk is the proportion of Black residents
LSTATPercentage lower status of the population
MEDVMedian value of owner-occupied homes ($1000s) — TARGET

Dataset Analysis

Feature Distributions

Histograms for all 14 variables in the dataset:

CRIMZNINDUSCHAS
NOXRMAGEDIS
RADTAXPTRATIOB
LSTATMEDV

Correlation Analysis

Correlation HeatmapCorrelation Clustermap
Standard correlation heatmapHierarchical clustered correlation map

Key correlations with MEDV (target):

  • Strong positive: RM (rooms) — more rooms → higher price
  • Strong negative: LSTAT (lower status %) — higher LSTAT → lower price
  • Notable negatives: PTRATIO, INDUS, TAX, CRIM

Methods Implemented

1. Gradient Descent — Linear Regression from Scratch (rawRun.py)

A pure NumPy implementation with no ML library abstractions:

  • Manual weight initialisation (w ~ N(0, 0.01), b = 0)
  • MSE loss with analytical gradient computation
  • Convergence tolerance check (tol = 1e-4)
  • Hyperparameters: α = 0.001, epochs = 1200
# Core update rule (no libraries)
dw = (2 / m) * X_train.T @ (y_pred - y_train)
db = (2 / m) * sum(y_pred - y_train)
w -= α * dw
b -= α * db

2. Neural Network — Keras/TensorFlow (EvolutionaryAlgorithm.py)

A 3-hidden-layer dense network trained with Adam optimiser:

LayerUnitsActivation
Input13
Hidden 122ReLU
Hidden 222ReLU
Hidden 322ReLU
Output1Linear
  • Total parameters: 1,343
  • Optimiser: Adam
  • Loss: MSE
  • Batch size: 32, Epochs: 50 (with early stopping, patience=20)

3. Evolutionary Algorithm (Work in Progress)

Scaffolded in EvolutionaryAlgorithm.py — evolves neural network weights using:

  • Tournament selection (k=3)
  • Gaussian mutation (per-gene with configurable rate and sigma)
  • Uniform crossover (per-gene swap with configurable rate)

⚠️ Status: Skeleton implemented — fitness(), ea_training_loop(), and mutation() functions are stubbed and awaiting completion.


Results Comparison

Metrics

MetricGradient Descent (Linear)Neural Network (Keras)Evolutionary Algorithm
MSE32.40612.957101.712
RMSE5.6923.59910.085
MAE3.3782.3997.748
0.5580.823−0.387

Training Loss Curves

GD LossNN LossEA Loss
Gradient descent MSE convergence over ~1200 epochsNN train/validation loss over ~50 epochs (with early stopping)EA fitness convergence over 50 generations

Test Set Predictions — Actual vs Predicted

GD ScatterNN ScatterEA Scatter
Gradient descent predictionsNeural network predictionsEvolutionary algorithm predictions

Points closer to the red dashed line (y = x) indicate better predictions. The neural network shows tighter clustering around the ideal line, especially in the mid-range values.


Project Structure

Zamin/
├── rawRun.py                 # Gradient descent linear regression (from scratch)
├── NeuralNetwork.py          # Neural network with Keras/TensorFlow
├── EvolutionaryAlgorithm.py  # Evolutionary algorithm for NN weight optimisation
├── dataPreProcess.py         # Dataset visualisation & correlation analysis
├── housing.csv               # Boston Housing dataset (506 × 14)
├── stickyNote.txt            # Project notes & core concepts
└── output/
    ├── dataanalysis/         # Histograms, heatmaps, correlation plots
    ├── linearreg/            # GD loss curves & scatter plots
    ├── nn/                   # NN loss curves, scatter plots & saved models
    └── ea/                   # EA convergence & prediction plots

Key Takeaways

The evolutionary algorithm failed spectacularly, I was rather surprised. I was expecting it to fall a bit short, but not underperform to this extent. Linear regression and te NN were a delight however, solid scores with very simple setups. I found the NN topology taken from the study to be straight forward to implement and logical.

Future Work

Lots of things could be improved, first and foremost add ridge regression to the linear algorithm, for the NN - add pruning. These are nice but most of all I will be looking to implement these onto a more sophisticated and real-world dataset. This project was great for the rapid feedback loop and small scale. Ramping up to something more fully fledged is the next step. Definitely more to come.