--- license: unknown --- # Prediction of Securities This project contains various files that were generated during the time of creation of the course work ## Project Structure ### data/stocks - **CSV Files**: Various CSV files containing stock data and sentiment scores. - `nytimes.csv`: sentiment scores from NYTimes. - `reuters.csv`: sentiment scores from Reuters. - **final_data/**: Contains final processed stock data for specific companies plus sentiments form NYT AND REUTERS. These files were used on Kaggle to optimise and test models. - `AAPL.csv`: Apple Inc. stock data. - `JPM.csv`: JPMorgan Chase & Co. stock data. - `PG.csv`: Procter & Gamble Co. stock data. - `TM.csv`: Toyota Motor Corporation stock data. - `XOM.csv`: Exxon Mobil Corporation stock data. - **Python Scripts**: Scripts related to data preprocessing and sentiment analysis. - `preprocessing.py`: Script for preprocessing stock data. - `stock_loader.py`: Script for loading stock data. - `__init__.py`: Initialization file for the package. ### notebooks - **Local**: Contains local Jupiter notebooks that were used for early stages of optimisation and testing - `nyt_titles_loader.ipynb`: one of the files for web scraping, there were too many to include, also they were spread out across colab, kaggle - Other files showcase early attempts to use torch with optuna to tune RNNs - **Kaggle**: Contains files from kaggle, later stages optimisation using GPU, Pruning callbakcs of Keras and XGBoost - `regression_plots_and_metrics.ipynb`: final values and plots used in the report - `classification_plots_and_metrics.ipynb`: final values and plots used in the report ### rnn_model - **Using Keras**: Contains RNN models implemented using Keras. - `models.py`: Model getters - `optimise.py`: Optimisation for keras, only functions, the optimisation was done in Kaggle using their Tesla P100 GPU - `__init__.py`: Initialization file for the package. - **Using Torch**: Contains RNN models implemented using PyTorch. - `classification.py`: Classification RNN models using PyTorch. - `early_stopping.py`: Early stopping utility for RNN models in PyTorch. - `loaders.py`: Data loaders for RNN models in PyTorch. - `optimise.py`: Optimization routines for RNN models in PyTorch. - `regression.py`: Regression RNN models using PyTorch. - `train_eval.py`: Training and evaluation scripts for RNN models in PyTorch. - `__init__.py`: Initialization file for the package. ### utils - **Utility Scripts**: Various utility scripts to support the main functionality. - `sequences.py`: Utility functions for getting sequences. - `stock_loader_utils.py`: Utility functions for loading stock data. - `torch_train_util.py`: Utility functions for training PyTorch models. - `utils.py`: General utility functions. - `__init__.py`: Initialization file for the package.