How to get the total reward point

by MonteXiaofeng - opened Nov 7, 2024

Nov 7, 2024

Thanks for this great work. Tested with the example provided and the output is as follows. Is the average used as the total score (I found that the scores of each attribute vary greatly, such as -17.8157 42.4842)? What is the range of scores for each attribute and how are these attributes weighted to get the total score?

reward_quantiles: tensor([[ -5.9229, -17.8157, -7.9583, -1.7157, 6.2691, 11.7173, 3.8997,
-1.2744, 0.7031, 6.7963, 28.9960, 41.4233, 30.3169, 32.1501,
38.3505, 42.4842, 38.2890, 34.5722, 43.0714]])

nicolinho

Owner 17 days ago

Sorry for the late reply. The reward quantiles you posted are the quantile estimates corresponding to the single aggregated distribution in increasing order. So the first value corresponds to the 0.05 quantile while the last value is the quantile estimate for the 0.95 quantile.

The total reward is given in the score key and is computed as the average over the quantile estimates.

MonteXiaofeng

17 days ago

import numpy as np

score_lst = [
-5.9229,
-17.8157,
-7.9583,
-1.7157,
6.2691,
11.7173,
3.8997,
-1.2744,
0.7031,
6.7963,
28.9960,
41.4233,
30.3169,
32.1501,
38.3505,
42.4842,
38.2890,
34.5722,
43.0714,
]

start = 0.05
step = 0.05
count = 20
weights_lst = [start + step * i for i in range(count)]

weight_score = []
for score, weight in zip(score_lst, weights_lst):
weight_score.append(score * weight)
print(f"mean score: {np.mean(weight_score)}")

mean score: 13.49363552631579

Am I right?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment