Spaces:
Runtime error
Runtime error
Update evaluation/demo_humaneval.md
Browse files- evaluation/demo_humaneval.md +1 -15
evaluation/demo_humaneval.md
CHANGED
@@ -52,18 +52,4 @@ Results: {'pass@1': 0.1, 'pass@10': 0.7631, 'pass@20': 1.0}
|
|
52 |
````
|
53 |
|
54 |
If we take a closer look at the unit test results for each candidate solution, we find that 2 passed the unit test. This means that we have 2 correct solutions among 20, which corresponds to our pass@1 value `2/20 = 0.1`. The scores pass@10 and pass@20 are higher, because the more samples we select from the candidate completions, the more likely we are to include the correct implementation. As
|
55 |
-
for pass@20, it is `1`, since if we select all 20 candidates the problem gets solved which gives 100% success rate.
|
56 |
-
|
57 |
-
```python
|
58 |
-
|
59 |
-
def truncate_number(number: float) -> float:
|
60 |
-
""" Given a positive floating point number, it can be decomposed into
|
61 |
-
and integer part (largest integer smaller than given number) and decimals
|
62 |
-
(leftover part always smaller than 1).
|
63 |
-
|
64 |
-
Return the decimal part of the number.
|
65 |
-
>>> truncate_number(3.5)
|
66 |
-
0.5
|
67 |
-
"""
|
68 |
-
return number % 1
|
69 |
-
```
|
|
|
52 |
````
|
53 |
|
54 |
If we take a closer look at the unit test results for each candidate solution, we find that 2 passed the unit test. This means that we have 2 correct solutions among 20, which corresponds to our pass@1 value `2/20 = 0.1`. The scores pass@10 and pass@20 are higher, because the more samples we select from the candidate completions, the more likely we are to include the correct implementation. As
|
55 |
+
for pass@20, it is `1`, since if we select all 20 candidates the problem gets solved which gives 100% success rate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|