suchkow's picture
Update README.md
3e188a2 verified
metadata
license: unknown

Prediction of Securities

This project contains various files that were generated during the time of creation of the course work

Project Structure

data/stocks

  • CSV Files: Various CSV files containing stock data and sentiment scores.

    • nytimes.csv: sentiment scores from NYTimes.
    • reuters.csv: sentiment scores from Reuters.
    • final_data/: Contains final processed stock data for specific companies plus sentiments form NYT AND REUTERS. These files were used on Kaggle to optimise and test models.
      • AAPL.csv: Apple Inc. stock data.
      • JPM.csv: JPMorgan Chase & Co. stock data.
      • PG.csv: Procter & Gamble Co. stock data.
      • TM.csv: Toyota Motor Corporation stock data.
      • XOM.csv: Exxon Mobil Corporation stock data.
  • Python Scripts: Scripts related to data preprocessing and sentiment analysis.

    • preprocessing.py: Script for preprocessing stock data.
    • stock_loader.py: Script for loading stock data.
    • __init__.py: Initialization file for the package.

notebooks

  • Local: Contains local Jupiter notebooks that were used for early stages of optimisation and testing
  • nyt_titles_loader.ipynb: one of the files for web scraping, there were too many to include, also they were spread out across colab, kaggle
  • Other files showcase early attempts to use torch with optuna to tune RNNs
  • Kaggle: Contains files from kaggle, later stages optimisation using GPU, Pruning callbakcs of Keras and XGBoost
  • regression_plots_and_metrics.ipynb: final values and plots used in the report
  • classification_plots_and_metrics.ipynb: final values and plots used in the report

rnn_model

  • Using Keras: Contains RNN models implemented using Keras.

    • models.py: Model getters
    • optimise.py: Optimisation for keras, only functions, the optimisation was done in Kaggle using their Tesla P100 GPU
    • __init__.py: Initialization file for the package.
  • Using Torch: Contains RNN models implemented using PyTorch.

    • classification.py: Classification RNN models using PyTorch.
    • early_stopping.py: Early stopping utility for RNN models in PyTorch.
    • loaders.py: Data loaders for RNN models in PyTorch.
    • optimise.py: Optimization routines for RNN models in PyTorch.
    • regression.py: Regression RNN models using PyTorch.
    • train_eval.py: Training and evaluation scripts for RNN models in PyTorch.
    • __init__.py: Initialization file for the package.

utils

  • Utility Scripts: Various utility scripts to support the main functionality.
    • sequences.py: Utility functions for getting sequences.
    • stock_loader_utils.py: Utility functions for loading stock data.
    • torch_train_util.py: Utility functions for training PyTorch models.
    • utils.py: General utility functions.
    • __init__.py: Initialization file for the package.