File size: 2,898 Bytes
3e188a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: unknown
---

# Prediction of Securities
This project contains various files that were generated during the time of creation of the course work
## Project Structure

### data/stocks
- **CSV Files**: Various CSV files containing stock data and sentiment scores.
  - `nytimes.csv`: sentiment scores from NYTimes.
  - `reuters.csv`: sentiment scores from Reuters.
  - **final_data/**: Contains final processed stock data for specific companies plus sentiments form NYT AND REUTERS. These files were used on Kaggle to optimise and test models.
    - `AAPL.csv`: Apple Inc. stock data.
    - `JPM.csv`: JPMorgan Chase & Co. stock data.
    - `PG.csv`: Procter & Gamble Co. stock data.
    - `TM.csv`: Toyota Motor Corporation stock data.
    - `XOM.csv`: Exxon Mobil Corporation stock data.

- **Python Scripts**: Scripts related to data preprocessing and sentiment analysis.
  - `preprocessing.py`: Script for preprocessing stock data.
  - `stock_loader.py`: Script for loading stock data.
  - `__init__.py`: Initialization file for the package.

### notebooks
- **Local**: Contains local Jupiter notebooks that were used for early stages of optimisation and testing
 - `nyt_titles_loader.ipynb`: one of the files for web scraping, there were too many to include, also they were spread out across colab, kaggle
 - Other files showcase early attempts to use torch with optuna to tune RNNs
- **Kaggle**: Contains files from kaggle, later stages optimisation using GPU, Pruning callbakcs of Keras and XGBoost
 - `regression_plots_and_metrics.ipynb`: final values and plots used in the report
 - `classification_plots_and_metrics.ipynb`: final values and plots used in the report

### rnn_model
- **Using Keras**: Contains RNN models implemented using Keras.
  - `models.py`: Model getters
  - `optimise.py`: Optimisation for keras, only functions, the optimisation was done in Kaggle using their Tesla P100 GPU
  - `__init__.py`: Initialization file for the package.

- **Using Torch**: Contains RNN models implemented using PyTorch.
  - `classification.py`: Classification RNN models using PyTorch.
  - `early_stopping.py`: Early stopping utility for RNN models in PyTorch.
  - `loaders.py`: Data loaders for RNN models in PyTorch.
  - `optimise.py`: Optimization routines for RNN models in PyTorch.
  - `regression.py`: Regression RNN models using PyTorch.
  - `train_eval.py`: Training and evaluation scripts for RNN models in PyTorch.
  - `__init__.py`: Initialization file for the package.

### utils
- **Utility Scripts**: Various utility scripts to support the main functionality.
  - `sequences.py`: Utility functions for getting sequences.
  - `stock_loader_utils.py`: Utility functions for loading stock data.
  - `torch_train_util.py`: Utility functions for training PyTorch models.
  - `utils.py`: General utility functions.
  - `__init__.py`: Initialization file for the package.