Get audio embeddings
#6
by
epinnock
- opened
Is is possible to use this to generate audio embeddings? If so is there any good documentation on this.
Yes - you can extract hidden states from the model by passing the argument output_hidden_states
to the forward call, see https://huggingface.co./docs/transformers/main/en/model_doc/musicgen#transformers.MusicgenForConditionalGeneration.forward
Or alternatively to the generate method:
from transformers import AutoProcessor, MusicgenForConditionalGeneration
processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
inputs = processor(
text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
padding=True,
return_tensors="pt",
)
generated_outputs = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=256, output_hidden_states=True, return_dict_in_generate=True)
Which embeddings are you interested in particular? The audio codes from EnCodec? Or the hidden-states from MusicGen?