Hybrid ML for Stock Forecasting (APPLE)
A fixed XGBoost → LSTM pipeline with a chronological split to forecast next-day closing price of Apple closing price.
Executive Summary
- Design: Train XGBoost to predict t+1 log return; feed the ordered XGBoost predictions into an LSTM to produce the final forecast (no compromise on order).
- Discipline: Chronological split (past → future), strict target alignment, no leakage.
- Explainability: SHAP on the XGBoost stage (global + local).
- Deliverable: Reproducible pipeline with RMSE / R^2 and Directional Accuracy on a held-out test period.
Problem & Target
- Question: Can a sequential hybrid (XGBoost → LSTM) outperform standalone models for next-day AAPL?
- Target: Next-day log return of close; final price is reconstructed from the predicted return.
Data & Features
- ~30 years of AAPL daily OHLCV.
- Engineered features include: RSI, MACD, SMA_10, EMA_10, Momentum, Volatility, Bollinger Bands (lower, range), and Lag_1–Lag_3.
Train / Validation / Test
- Split: Chronological. Rows on/before the cutoff date form train/validation; rows after the cutoff form test.
- Leakage controls: Any scaling/encoding is fit on train only; the target is strictly t+1.
Pipeline (high level)
1) Feature engineering on OHLCV (Open, High, Low, Close, Volume)
2) XGBoost predicts next-day log return and then reconstruct price
3) SHAP explains feature contribution to XGBOOST
4) LSTM consumes the ordered XGBoost prediction stream to model temporal dependence
5) Then reconstruct next-day price from predicted log return
6) Evaluate with RMSE/R^2 (returns & prices) and Directional Accuracy
How to Run
- Use the Colab notebook (badge above) or run locally with your own environment and config.
-
Stages: prepare_data → build_features → train_xgb → shap_report → make_sequences → train_lstm → evaluate.
-
Results
Figures of SHAP analysis and Final output
Reproducibility & Quality
- Deterministic seeds (42); no data leakage (fit transforms on train only).
- Config-driven runs; artifacts saved for audit; concise, documented functions.
Project files (quick access)
Ethics & Disclaimer
Research purpose only; not financial advice. Verify data licensing and corporate actions.
Contact (Ireland)
Abdullah Al Tawab — Dublin · Open to walk-throughs and technical discussion.