Remove input id tokens
Hi,
Scenario: I have got 5 files , which has code in it. Now I am trying to evaluate the files and get some recommendations via starcoder model.
Challenge: I am able to iterate thru all files and get recommendations independently. But when running in a single flow in a loop, after the first file is encoded and decoded, for the second file, the input_ids of the previous file remains. How to remove the input_ids tokens of the previous file.
for each file
input_ids: torch.Tensor = self.tokenizer.encode(query, max_length=7000, return_tensors='pt', truncation=True).to(self.device)
print(len(input_ids[0]))
For example:
1st file: Len of input IDs is , 1111
2nd file[2nd iteration]: Len of input IDs is, 3018 [but it should 1907]
Please help with a solution for this. Thanks.