nguyenbh commited on
Commit
1ea5d67
·
verified ·
1 Parent(s): a11e830

Update readme

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -46,6 +46,7 @@ widget:
46
  - role: user
47
  content: Can you provide ways to eat combinations of bananas and dragonfruits?
48
  library_name: transformers
 
49
  ---
50
 
51
  ## Model Summary
@@ -407,8 +408,9 @@ model = AutoModelForCausalLM.from_pretrained(
407
  model_path,
408
  device_map="cuda",
409
  torch_dtype="auto",
410
- trust_remote_code=True,
411
- attn_implementation='flash_attention_2',
 
412
  ).cuda()
413
 
414
  # Load generation config
@@ -466,6 +468,8 @@ response = processor.batch_decode(
466
  print(f'>>> Response\n{response}')
467
  ```
468
 
 
 
469
  ## Responsible AI Considerations
470
 
471
  Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
@@ -561,7 +565,7 @@ Note that by default, the Phi-4-multimodal-instruct model uses flash attention,
561
  * NVIDIA H100
562
 
563
  If you want to run the model on:
564
- * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"
565
 
566
  ## License
567
  The model is licensed under the [MIT license](./LICENSE).
 
46
  - role: user
47
  content: Can you provide ways to eat combinations of bananas and dragonfruits?
48
  library_name: transformers
49
+ paper: arxiv.org/abs/2503.01743
50
  ---
51
 
52
  ## Model Summary
 
408
  model_path,
409
  device_map="cuda",
410
  torch_dtype="auto",
411
+ trust_remote_code=True,
412
+ # if you do not Ampere or later GPUs, change attention to "eager"
413
+ _attn_implementation='flash_attention_2',
414
  ).cuda()
415
 
416
  # Load generation config
 
468
  print(f'>>> Response\n{response}')
469
  ```
470
 
471
+ **Notes**:
472
+
473
  ## Responsible AI Considerations
474
 
475
  Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
 
565
  * NVIDIA H100
566
 
567
  If you want to run the model on:
568
+ * NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with _attn_implementation="eager"
569
 
570
  ## License
571
  The model is licensed under the [MIT license](./LICENSE).