facebook
/

seamless-m4t-large

@@ -37,9 +37,9 @@ We provide extensive evaluation results of SeamlessM4T-Medium and SeamlessM4T-La
 First, load the processor and a checkpoint of the model:
 ```python
-from transformers import AutoProcessor, SeamlessM4TModel
-processor = AutoProcessor.from_pretrained("facebook/hf-seamless-m4t-large")
-model = SeamlessM4TModel.from_pretrained("facebook/hf-seamless-m4t-large")
 ```
 You can seamlessly use this model on text or on audio, to generated either translated text or translated audio.
@@ -47,14 +47,14 @@ You can seamlessly use this model on text or on audio, to generated either trans
 Here is how to use the processor to process text and audio:
 ```python
-# let's load an audio sample from an Arabic speech corpus
-from datasets import load_dataset
-dataset = load_dataset("arabic_speech_corpus", split="test", streaming=True)
-audio_sample = next(iter(dataset))["audio"]
-# now, process it
-audio_inputs = processor(audios=audio_sample["array"], return_tensors="pt")
-# now, process some English test as well
-text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
 ```
@@ -63,8 +63,8 @@ text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_t
 [`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) can *seamlessly* generate text or speech with few or no changes. Let's target Russian voice translation:
 ```python
-audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
-audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
 ```
 With basically the same code, I've translated English text and Arabic speech to Russian speech samples.
@@ -75,12 +75,12 @@ Similarly, you can generate translated text from audio files or from text with t
 This time, let's translate to French.
 ```python
-# from audio
-output_tokens = model.generate(**audio_inputs, tgt_lang="fra", generate_speech=False)
-translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
-# from text
-output_tokens = model.generate(**text_inputs, tgt_lang="fra", generate_speech=False)
-translated_text_from_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
 ```

 First, load the processor and a checkpoint of the model:
 ```python
+>>> from transformers import AutoProcessor, SeamlessM4TModel
+>>> processor = AutoProcessor.from_pretrained("facebook/hf-seamless-m4t-large")
+>>> model = SeamlessM4TModel.from_pretrained("facebook/hf-seamless-m4t-large")
 ```
 You can seamlessly use this model on text or on audio, to generated either translated text or translated audio.
 Here is how to use the processor to process text and audio:
 ```python
+>>> # let's load an audio sample from an Arabic speech corpus
+>>> from datasets import load_dataset
+>>> dataset = load_dataset("arabic_speech_corpus", split="test", streaming=True)
+>>> audio_sample = next(iter(dataset))["audio"]
+>>> # now, process it
+>>> audio_inputs = processor(audios=audio_sample["array"], return_tensors="pt")
+>>> # now, process some English test as well
+>>> text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
 ```
 [`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) can *seamlessly* generate text or speech with few or no changes. Let's target Russian voice translation:
 ```python
+>>> audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
+>>> audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
 ```
 With basically the same code, I've translated English text and Arabic speech to Russian speech samples.
 This time, let's translate to French.
 ```python
+>>> # from audio
+>>> output_tokens = model.generate(**audio_inputs, tgt_lang="fra", generate_speech=False)
+>>> translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
+>>> # from text
+>>> output_tokens = model.generate(**text_inputs, tgt_lang="fra", generate_speech=False)
+>>> translated_text_from_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
 ```