mtyrrell commited on
Commit
fe497ae
1 Parent(s): 2ff7aaa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -37,18 +37,75 @@ It achieves the following results on the evaluation set:
37
 
38
  ## Model description
39
 
40
- More information needed
41
 
42
  ## Intended uses & limitations
43
 
44
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## Training and evaluation data
47
 
48
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## Training procedure
51
 
 
 
52
  ### Training hyperparameters
53
 
54
  The following hyperparameters were used during training:
 
37
 
38
  ## Model description
39
 
40
+ The model is a multi-label text classifier based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and fine-tuned on text sourced from national climate policy documents.
41
 
42
  ## Intended uses & limitations
43
 
44
+ The classifier assigns the following classes to to denote Mitigation categories as portrayed in extracted passages from the documents. The Mitigation categories are based on a taxonomy defined by the TraCS Climate Strategies for Transport (implemented by GIZ and funded by the International Climate Initiative (IKI) of the German Federal Ministry for Economic Affairs and Climate Action (BMWK)):
45
+
46
+ {0: 'Active mobility',
47
+ 1: 'Alternative fuels',
48
+ 2: 'Aviation improvements',
49
+ 3: 'Comprehensive transport planning',
50
+ 4: 'Digital solutions',
51
+ 5: 'Economic instruments',
52
+ 6: 'Education and behavioral change',
53
+ 7: 'Electric mobility',
54
+ 8: 'Freight efficiency improvements',
55
+ 9: 'Improve infrastructure',
56
+ 10: 'Labels',
57
+ 11: 'Land use',
58
+ 12: 'Public transport improvement',
59
+ 13: 'Shipping improvements',
60
+ 14: 'Transport demand management',
61
+ 15: 'Vehicle improvements'}
62
+
63
+ The intended use is for climate policy researchers and analysts seeking to automate the process of reviewing lengthy, non-standardized PDF documents to produce summaries and reports.
64
+
65
+ Due to inconsistencies in the training data, the classifier performance leaves room for improvement. The classifier exhibits reasonable multi-class training metrics (F1 ~ 0.5), with low precision in the identification of true positive classifications (precision ~ 0.4), but a wide net to capture as many true positives as possible (recall ~ 0.75). When tested on real world unseen test data, the performance was similar to training validation (F1 ~ 0.5). However, testing was based on a small out-of-sample dataset containing it's own inconsistencies. Therefore classification may prove better or worse in practice.
66
 
67
  ## Training and evaluation data
68
 
69
+ The training dataset is comprised of labelled passages from 2 sources:
70
+ - [ClimateWatch NDC Sector data](https://www.climatewatchdata.org/data-explorer/historical-emissions?historical-emissions-data-sources=climate-watch&historical-emissions-gases=all-ghg&historical-emissions-regions=All%20Selected&historical-emissions-sectors=total-including-lucf%2Ctotal-including-lucf&page=1). Here we utilized the QA dataset (CW_NDC_data_Sector).
71
+ - [IKI TraCS Climate Strategies for Transport Tracker](https://changing-transport.org/wp-content/uploads/20220722_Tracker_Database.xlsx) implemented by GIZ and funded by the International Climate Initiative (IKI) of the German Federal Ministry for Economic Affairs and Climate Action (BMWK).
72
+
73
+ The combined dataset[GIZ/policy_qa_v0_1](https://huggingface.co/datasets/GIZ/policy_qa_v0_1) contains ~85k rows. Each row is duplicated twice, to provide varying sequence lengths (denoted by the values 'small', 'medium', and 'large', which correspond to sequence lengths of 60, 85, and 150 respectively - indicated in the 'strategy' column). This effectively means the dataset is reduced by 1/3 in useful size, and the 'strategy' value should be selected based on the use case. For this training, we utilized the 'medium' samples, from the IKITracs data only. Furthermore, for each row, the 'context' column contains 3 samples of varying quality. The approach used to assess quality and select samples is described below.
74
+
75
+ The pre-processing operations used to produce the final training dataset were as follows:
76
+
77
+ 1. Dataset is filtered based on 'medium' value in 'strategy' column (sequence length = 85), selecting only IKITracs samples.
78
+ 2. For IKITracs, labels are assigned based on the presence of of 'parameter' values matching the mapping taxonomy defined by TraCS^*
79
+ 4. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'.
80
+ 5. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
81
+ 6. The 'match_onanswer' and 'answerWordcount' are used conditionally to select hihg quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
82
+ 7. Data is then augmented using sentence shuffle from the ```albumentations``` library
83
+
84
+ ###**Parameter to category mapping taxonomy**
85
+
86
+ |index|Category|Parameter|
87
+ |---|---|---|
88
+ |0|Active mobility|S\_Activemobility,S\_Cycling,S\_Walking|
89
+ |1|Alternative fuels|I\_Altfuels,I\_Biofuel,I\_Ethanol,I\_Hydrogen,I\_LPGCNGLNG,I\_RE|
90
+ |2|Aviation improvements|I\_Aircraftfleet,I\_Airtraffic,I\_Aviation,I\_Capacityairport,I\_CO2certificate,I\_Jetfuel|
91
+ |3|Comprehensive transport planning|A\_Complan,A\_LATM,A\_Natmobplan,A\_SUMP|
92
+ |4|Digital solutions|I\_Autonomous,I\_DataModelling,I\_ITS,I\_Other,S\_Maas,S\_Ondemand,S\_Sharedmob|
93
+ |5|Economic instruments|A\_Economic,A\_Emistrad,A\_Finance,A\_Fossilfuelsubs,A\_Fueltax,A\_Procurement,A\_Roadcharging,A\_Vehicletax|
94
+ |6|Education and behavioral change|I\_Campaigns,I\_Capacity,I\_Ecodriving,I\_Education|
95
+ |7|Electric mobility|I\_Emobility,I\_Emobilitycharging,I\_Emobilitypurchase,I\_ICEdiesel,I\_Smartcharging,S\_Micromobility|
96
+ |8|Freight efficiency improvements|I\_Freighteff,I\_Load,S\_Railfreight|
97
+ |9|Improve infrastructure|S\_Infraexpansion,S\_Infraimprove,S\_Intermodality|
98
+ |10|Labels|I\_Efficiencylabel,I\_Freightlabel,I\_Fuellabel,I\_Transportlabel,I\_Vehiclelabel|
99
+ |11|Land use|A\_Density,A\_Landuse,A\_Mixuse|
100
+ |12|Public transport improvement|S\_BRT,S\_PTIntegration,S\_PTPriority,S\_PublicTransport|
101
+ |13|Shipping improvements|I\_Onshorepower,I\_PortInfra,I\_Shipefficiency,I\_Shipping|
102
+ |14|Transport demand management|A\_Caraccess,A\_Commute,A\_Parkingprice,A\_TDM,A\_Teleworking,A\_Work,S\_Parking|
103
+ |15|Vehicle improvements|A\_LEZ,I\_Efficiencystd,I\_Fuelqualimprove,I\_Inspection,I\_Lowemissionincentive,I\_Vehicleeff,I\_Vehicleimprove,I\_VehicleRestrictions,I\_Vehiclescrappage|
104
 
105
  ## Training procedure
106
 
107
+ The model hyperparameters were tuned using ```optuna``` over 10 trials on a truncated training and validation dataset. The model was then trained over 5 epochs using the best hyperparameters identified.
108
+
109
  ### Training hyperparameters
110
 
111
  The following hyperparameters were used during training: