Sole Engineer — Full Stack & MLMay 2025

Investment Advisory & Portfolio Management System

The Architecture

Financial decision-making is one of the highest-stakes domains for software to touch. This project started as a question: how close can a self-built system get to the kind of predictive tooling that institutional investors pay seven figures for?

The system is built around two distinct ML model classes, each chosen for a specific forecasting task. ARIMA (AutoRegressive Integrated Moving Average) handles time-series prediction for assets with strong temporal autocorrelation — stock prices and gold futures, where historical patterns carry statistically meaningful forward signal. Random Forest handles risk classification — a supervised learning approach that categorises portfolio positions by risk profile based on multi-feature input (volatility, sector, market cap, historical drawdown).

Real-time market data is sourced via the Yahoo Finance API, feeding both the live dashboard and the model inference pipeline. This means predictions aren't made against stale data — every forecast runs against the most recent available price history. Gold and real estate forecasting extends the model surface beyond equities, reflecting a realistic multi-asset portfolio view.

The frontend is intentionally lightweight — Flask templates with Plotly for interactive charting and Matplotlib for static visualisations. No JavaScript framework overhead; the complexity budget was reserved entirely for the ML layer. The result is a system where the intelligence is in the models, not the UI.

Risk assessment and classification sits at the core of the advisory layer — each portfolio position receives a risk tier, and the recommendation engine surfaces rebalancing suggestions based on the aggregate risk profile. SQLite handles local persistence for development with a PostgreSQL migration path documented for production use.

The repo is structured in three clean layers — /frontend, /backend, /docs — with an 18-commit history reflecting iterative model improvement rather than a single dump. This is experimental, but it's engineered experimentally.

Strategic Methodology

Model-first development — both ML models were trained, evaluated, and validated against historical Yahoo Finance data before any web interface was built. The Flask app was treated as a presentation layer for already-working intelligence, not a framework to build intelligence around. Documentation lives in /docs as a first-class output alongside the code.

Engineering Challenges

Choosing the right model per asset class — ARIMA requires stationary time-series data, which meant applying differencing and ADF tests to each asset's price history before fitting. Random Forest required careful feature engineering to avoid lookahead bias in the training set.
Integrating real-time Yahoo Finance API data into the model inference pipeline without introducing latency that degrades the user experience — solved with background data caching and incremental model updates rather than full retraining on each request.
Extending forecasting beyond equities to gold and real estate — assets with fundamentally different volatility profiles, seasonal patterns, and data availability required separate preprocessing pipelines and model hyperparameter tuning per asset class.
Building a risk classification system that's meaningful without being overconfident — the Random Forest outputs probability distributions per risk tier rather than hard labels, surfacing the model's uncertainty to the user rather than hiding it behind a single verdict.
Visualising multi-dimensional financial data (price history, predicted trajectory, confidence intervals, portfolio allocation) in a web interface without overwhelming non-technical users — Plotly's interactive charts with progressive disclosure (hover for detail, zoom for precision) solved this cleanly.

Project Impact

"Where machine learning meets market reality — an AI-powered advisory system that applies ARIMA and Random Forest models to real-world financial data, because predicting the future is a problem worth engineering."

Core Arsenal

Python 3FlaskScikit-LearnARIMARandom Forest RegressionPandasNumPyPlotlyMatplotlibYahoo Finance APISQLitePostgreSQLHTML5 / CSS3

STATUS: DEPLOYED

Check Live

Intelligence Unit

Technical Log.

A high-fidelity breakdown of the build's architectural achievements and performance markers.

Synthesis

"An AI investment advisory web app using ARIMA for time-series stock and gold forecasting, Random Forest for risk classification, Yahoo Finance API for real-time data and Plotly for interactive portfolio visualisation. Built as personal ML exploration"

Hard Evidence

ADF stationarity testing applied before ARIMA fitting

Random Forest trained on multi-feature financial input with lookahead bias prevention

Yahoo Finance API live data integration

Plotly interactive charts with confidence interval rendering

SQLite local persistence with PostgreSQL migration path

Marker 1

Demonstrates ML engineering depth — two distinct model classes (ARIMA for time-series, Random Forest for classification) applied to a real-world financial domain with real market data.

Marker 2

Extends forecasting surface to three asset classes — equities, gold, and real estate — each with tailored preprocessing and model configuration.

Marker 3

Risk classification outputs probability distributions, not hard labels — a deliberate design decision that surfaces model uncertainty rather than false precision.

Marker 4

Proves Python versatility across three project types: FastAPI REST APIs (Curator), Flask ML apps (this), and a professional TypeScript stack — a genuinely polyglot engineering profile.

Marker 5

Real-time Yahoo Finance integration means predictions run on live data — not a toy trained on a static CSV.

Query Archive

01Why ARIMA for stock prediction and not an LSTM or transformer-based model?

ARIMA was a deliberate starting point — it's interpretable, statistically grounded, and forces rigorous stationarity testing before any prediction is made. LSTMs can overfit to noise in financial data without careful regularisation. ARIMA's explicit assumptions make failures understandable, which matters more in a learning context than raw prediction accuracy.

02How does the risk classification system work in practice?

Each portfolio position is featurised — historical volatility, sector, market cap band, max drawdown over 90 days, and correlation with the broader market index. The Random Forest classifier outputs a probability score across three risk tiers (conservative, moderate, aggressive). The advisory engine uses the aggregate portfolio risk profile to generate rebalancing suggestions.

03Why Flask and not FastAPI for the backend given that you've used FastAPI before?

Flask was chosen for its tight integration with the Python data science ecosystem — Pandas, Matplotlib, and Scikit-Learn all feel native in a Flask context. For an ML-first project, Flask's simplicity kept the focus on model engineering. FastAPI would be the right choice if this ever evolved into a production API service.

04Is the system genuinely useful as an investment tool?

It's a research and learning instrument, not financial advice. ARIMA on 90-day windows can surface directional trends, but financial markets have enough noise that no deterministic model should be trusted blindly. The value of this project is the engineering rigour applied to the problem, not the prediction accuracy — which is documented honestly in /docs.