How to get the total reward point
Thanks for this great work. Tested with the example provided and the output is as follows. Is the average used as the total score (I found that the scores of each attribute vary greatly, such as -17.8157 42.4842)? What is the range of scores for each attribute and how are these attributes weighted to get the total score?
reward_quantiles: tensor([[ -5.9229, -17.8157, -7.9583, -1.7157, 6.2691, 11.7173, 3.8997,
-1.2744, 0.7031, 6.7963, 28.9960, 41.4233, 30.3169, 32.1501,
38.3505, 42.4842, 38.2890, 34.5722, 43.0714]])
Sorry for the late reply. The reward quantiles you posted are the quantile estimates corresponding to the single aggregated distribution in increasing order. So the first value corresponds to the 0.05 quantile while the last value is the quantile estimate for the 0.95 quantile.
The total reward is given in the score key and is computed as the average over the quantile estimates.
import numpy as np
score_lst = [
-5.9229,
-17.8157,
-7.9583,
-1.7157,
6.2691,
11.7173,
3.8997,
-1.2744,
0.7031,
6.7963,
28.9960,
41.4233,
30.3169,
32.1501,
38.3505,
42.4842,
38.2890,
34.5722,
43.0714,
]
start = 0.05
step = 0.05
count = 20
weights_lst = [start + step * i for i in range(count)]
weight_score = []
for score, weight in zip(score_lst, weights_lst):
weight_score.append(score * weight)
print(f"mean score: {np.mean(weight_score)}")
mean score: 13.49363552631579
Am I right?