Text Generation
scaling
ArturBaranowskiAA commited on
Commit
fd1d732
·
1 Parent(s): 5fc77c4

Update README.md with references to safetensors-conversions.

Browse files
Files changed (1) hide show
  1. README.md +53 -13
README.md CHANGED
@@ -10,11 +10,18 @@ This model card provides an overview of the **Pharia-1-LLM-7B** model family, wh
10
 
11
  Pharia-1-LLM-7B comes in two distinct variants, `Pharia-1-LLM-7B-control` and [`Pharia-1-LLM-7B-control-aligned`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned). Due to being trained on a multilingual corpus, both models are culturally and linguistically optimized for German, French and Spanish. The Pharia-1-LLM-7B models were trained on carefully curated data in compliance with applicable EU and national regulations, including copyright and data privacy laws. With improved token efficiency, the Pharia-1-LLM-7B-control models excel in domain-specific applications, particularly in the automotive and engineering industries. As such, they serve as a valuable complement to the community's selection of weight-available foundation models. `Pharia-1-LLM-7B-control` is engineered to deliver concise, length-controlled responses that match the performance of leading open-source models in the 7B to 8B parameter range. `Pharia-1-LLM-7B-control` can be aligned to user preferences, making it suitable for critical applications without the risk of shutdown behavior. `Pharia-1-LLM-7B-control-aligned` has received additional alignment training to mitigate the risks associated with using the model.
12
 
 
 
 
 
 
 
 
13
  # Model Overview
14
 
15
- * **Developed by:** Aleph Alpha Research
16
 
17
- * **Model type/architecture:** Autoregressive (causal, decoder only) transformer large language models with rotary position embeddings, trained on the next token prediction task. Both `Pharia-1-LLM-7B-control` and `Pharia-1-LLM-7B-control-aligned` are a standalone transformer foundation models with the intention to be integrated into broader AI applications (systems).
18
 
19
  * **Language(s):** Trained in English, German, French, Spanish, Italian, Portuguese, and Dutch. Tested in English, German, Spanish, and French.
20
 
@@ -31,12 +38,12 @@ We provide access to our models through the channels listed below.
31
 
32
  * **Intelligence Layer SDK**: After the account is approved, accessing the models through the [Intelligence Layer SDK](https://github.com/Aleph-Alpha/intelligence-layer-sdk) is possible. It is a source available library that allows users to easily interact with any model in the Pharia-1-LLM-7B model family as well as supported third-party models, and to build evaluation pipelines to ensure every application delivers the expected results in production.
33
 
34
- * **On-premise installation:** Our customers are supplied with our full LLM stack, including model weights and inference runtime. Contact us for options to deploy Pharia-1-LLM-7B models in any cloud or on-premise environment. We provide our customers with open access to our full model checkpoint including weights and code for commercial use.
35
 
36
  * **Hugging Face:** The model’s weights are available on Hugging Face under the [Open Aleph License](https://github.com/Aleph-Alpha/.github/blob/main/oal.pdf), which limits the usage to educational and research purposes.
37
 
38
 
39
- Please refer to the [changelog](https://docs.aleph-alpha.com/changelog/) for updates to the models served. We do not deprecate officially released versions of old model generations when we release newer versions, so users can continue to have access to available models.
40
 
41
  No prompt data is stored when using our systems, which means that we do not collect PII (personally identifiable information) for any of our public API users as detailed in our Terms & Conditions. We do not log user inputs to the models. We do not train on user data.
42
 
@@ -54,7 +61,7 @@ The Pharia-1-LLM-7B models are not to be used for illegal or unlawful actions of
54
 
55
  Although we do not inspect the requests sent to our API, we regularly review and monitor potential violations that may be related to our models and, depending on the circumstances of the specific case, take legal action against them. This includes, but is not limited to, enforcement to remove published model content, requesting compensation for damages caused, and account termination or removal of credits.
56
 
57
- For non-anonymous reports, we also provide an appeals mechanism for usage policy violations via our dedicated contact address [[email protected]](mailto:[email protected]) to communicate with us.
58
 
59
  Customers and partners are enabled to use our [ticketing system](https://servicedesk.aleph-alpha.de/external) for appeals, claims and feedback
60
 
@@ -62,7 +69,37 @@ Customers and partners are enabled to use our [ticketing system](https://service
62
 
63
  ### Inference
64
 
65
- To perform inference with the model, you’ll first need to [install the Scaling library](https://github.com/Aleph-Alpha/scaling). Follow the installation instructions provided in the repository's README file. After installation, download the model weights and use the Scaling inference module to load the checkpoint, vocabulary, and configuration files.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ```python
68
  from pathlib import Path
@@ -73,10 +110,13 @@ inference_model = TransformerInferenceModule.from_checkpoint(
73
  checkpoint_dir=Path("path/to/Pharia-1-LLM-7B-control-aligned"),
74
  )
75
 
76
- input_text = """<|start_header_id|>user<|end_header_id|>
 
 
77
 
78
  When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
79
 
 
80
  """
81
 
82
  generation = inference_model.generate(max_tokens=100, input_text=input_text)
@@ -424,11 +464,11 @@ The following table shows the training setup, efficiency and duration for all Ph
424
  | Hardware Type | Hardware Amount | Avg. measured step duration | Avg. measured MFU | Avg. measured TFLOPS | Iterations (number of update steps) | Training tokens | GPU hours | Total FLOPs |
425
  | A100 (80GB) H100 | Up to 256 GPUs | 8.6s (A100) 3.6s (H100) | 0.66 (A100) 0.5 (H100) | 215 (A100)<br><br>520 (H100) | 582000 + 350000 | ~4.7T + 3T | 356k on A100 + 96k on H100 | 2.75\*1023 + 1.68\*1023 |
426
 
427
- The total compute budget is reported in FLOPS in accordance with the [Bloom implementation](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/e52bdabbde3c6895aceb76c1bced295c2646121f/megatron/training.py#L759) to provide comparability to the [related paper](https://arxiv.org/pdf/2211.05100.pdf).
428
 
429
  ### Environmental Impact
430
 
431
- Our data centre runs on 100% renewable energy such that **no CO2 emissions are incurred for any inference job** executed through the API. Furthermore, the data center operates with a net-zero water footprint.
432
 
433
  To estimate CO2 emissions, we base our calculations on the following assumptions:
434
 
@@ -442,11 +482,11 @@ To estimate CO2 emissions, we base our calculations on the following assumptions
442
  | Carbon emitted | Carbon emitted accounting for PUE | Power consumption | Note |
443
  | A100: 0 | A100: 0 | A100: max 400W per GPU<br><br>H100: max 700W per GPU | A100: 100% water-powered energy |
444
 
445
- Numbers may be put into context e.g. by reference to [estimating the carbon footprint of BLOOM, a 176B parameter language model](https://arxiv.org/pdf/2211.02001.pdf).
446
 
447
  # Risks and Limitations
448
 
449
- **Note:** Language models are **not agents** and not optimized for prescriptive actions. The use of language models in high-stake environments, for critical decisions or to support a user's wellbeing should be performed with additional guardrails in place.
450
 
451
  While `Pharia-1-LLM-7B-control-aligned` has received extra training to mitigate risks associated with harmful outputs and biases, it may still be prone to produce undesirable completions in some circumstances.
452
 
@@ -469,7 +509,7 @@ Large language models can sometimes generate undesired outputs that are unsuitab
469
 
470
  * Employing a finetuned model designed to maintain an appropriate tone and style, including avoiding offensive language.
471
 
472
- * Implementing [explainability](https://docs.aleph-alpha.com/docs/tasks/explain/) checks to create an audit trail at the application level.
473
 
474
  * Conducting additional validations at the application level to ensure output quality and appropriateness.
475
 
@@ -507,7 +547,7 @@ Risks may be mitigated by:
507
 
508
  * Performing validations on the application layer (e.g., classifying the output).
509
 
510
- * Using the repetition penalty, especially in the case of repetition, or other parameters available in the API (see [documentation](https://docs.aleph-alpha.com/api/complete/)).
511
 
512
  * Avoiding of use cases targeted at retrieval of personally identifiable information.
513
 
 
10
 
11
  Pharia-1-LLM-7B comes in two distinct variants, `Pharia-1-LLM-7B-control` and [`Pharia-1-LLM-7B-control-aligned`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned). Due to being trained on a multilingual corpus, both models are culturally and linguistically optimized for German, French and Spanish. The Pharia-1-LLM-7B models were trained on carefully curated data in compliance with applicable EU and national regulations, including copyright and data privacy laws. With improved token efficiency, the Pharia-1-LLM-7B-control models excel in domain-specific applications, particularly in the automotive and engineering industries. As such, they serve as a valuable complement to the community's selection of weight-available foundation models. `Pharia-1-LLM-7B-control` is engineered to deliver concise, length-controlled responses that match the performance of leading open-source models in the 7B to 8B parameter range. `Pharia-1-LLM-7B-control` can be aligned to user preferences, making it suitable for critical applications without the risk of shutdown behavior. `Pharia-1-LLM-7B-control-aligned` has received additional alignment training to mitigate the risks associated with using the model.
12
 
13
+ You can find all model weights and their corresponding safetensors conversions at the following links:
14
+
15
+ - [`Pharia-1-LLM-7B-control`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control)
16
+ - [`Pharia-1-LLM-7B-control-hf`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-hf) (Safetensors)
17
+ - [`Pharia-1-LLM-7B-control-aligned`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned)
18
+ - [`Pharia-1-LLM-7B-control-aligned-hf`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned-hf) (Safetensors)
19
+
20
  # Model Overview
21
 
22
+ * **Developed by:** Aleph Alpha Research
23
 
24
+ * **Model type/architecture:** Autoregressive (causal, decoder only) transformer large language models with rotary position embeddings, trained on the next token prediction task. Both `Pharia-1-LLM-7B-control` and `Pharia-1-LLM-7B-control-aligned` are a standalone transformer foundation models with the intention to be integrated into broader AI applications (systems).
25
 
26
  * **Language(s):** Trained in English, German, French, Spanish, Italian, Portuguese, and Dutch. Tested in English, German, Spanish, and French.
27
 
 
38
 
39
  * **Intelligence Layer SDK**: After the account is approved, accessing the models through the [Intelligence Layer SDK](https://github.com/Aleph-Alpha/intelligence-layer-sdk) is possible. It is a source available library that allows users to easily interact with any model in the Pharia-1-LLM-7B model family as well as supported third-party models, and to build evaluation pipelines to ensure every application delivers the expected results in production.
40
 
41
+ * **On-premise installation:** Our customers are supplied with our full LLM stack, including model weights and inference runtime. Contact us for options to deploy Pharia-1-LLM-7B models in any cloud or on-premise environment. We provide our customers with open access to our full model checkpoint including weights and code for commercial use.
42
 
43
  * **Hugging Face:** The model’s weights are available on Hugging Face under the [Open Aleph License](https://github.com/Aleph-Alpha/.github/blob/main/oal.pdf), which limits the usage to educational and research purposes.
44
 
45
 
46
+ Please refer to the [changelog](https://docs.aleph-alpha.com/changelog/) for updates to the models served. We do not deprecate officially released versions of old model generations when we release newer versions, so users can continue to have access to available models.
47
 
48
  No prompt data is stored when using our systems, which means that we do not collect PII (personally identifiable information) for any of our public API users as detailed in our Terms & Conditions. We do not log user inputs to the models. We do not train on user data.
49
 
 
61
 
62
  Although we do not inspect the requests sent to our API, we regularly review and monitor potential violations that may be related to our models and, depending on the circumstances of the specific case, take legal action against them. This includes, but is not limited to, enforcement to remove published model content, requesting compensation for damages caused, and account termination or removal of credits.
63
 
64
+ For non-anonymous reports, we also provide an appeals mechanism for usage policy violations via our dedicated contact address [[email protected]](mailto:[email protected]) to communicate with us.
65
 
66
  Customers and partners are enabled to use our [ticketing system](https://servicedesk.aleph-alpha.de/external) for appeals, claims and feedback
67
 
 
69
 
70
  ### Inference
71
 
72
+ You can load the model and tokenizer using the Hugging Face Transformers library and our safetensors conversion in [`Pharia-1-LLM-7B-control-hf`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-hf) and [`Pharia-1-LLM-7B-control-aligned-hf`](https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned-hf).
73
+
74
+ ```python
75
+ import torch
76
+
77
+ from transformers import AutoModelForCausalLM, PreTrainedTokenizerFast
78
+
79
+ INPUT = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
80
+
81
+ You are a helpful assistant. You give engaging, well-structured answers to user inquiries.<|eot_id|><|start_header_id|>user<|end_header_id|>
82
+
83
+ When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
84
+
85
+
86
+ """
87
+
88
+ MODEL_ID = "Aleph-Alpha/Pharia-1-LLM-7B-control-hf"
89
+
90
+ tokenizer = PreTrainedTokenizerFast.from_pretrained(MODEL_ID)
91
+ model = AutoModelForCausalLM.from_pretrained(MODEL_ID, trust_remote_code=True, torch_dtype=torch.bfloat16)
92
+
93
+ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
94
+ model = model.to(device)
95
+
96
+ inputs = tokenizer(INPUT, return_token_type_ids=False, return_tensors="pt").to(device)
97
+ outputs = model.generate(**inputs, max_new_tokens=50)
98
+ generated_text = tokenizer.decode(outputs[0])
99
+ print(generated_text)
100
+ ```
101
+
102
+ To perform inference with the original model files, you’ll first need to [install the Scaling library](https://github.com/Aleph-Alpha/scaling). Follow the installation instructions provided in the repository's README file. After installation, download the model weights and use the Scaling inference module to load the checkpoint, vocabulary, and configuration files.
103
 
104
  ```python
105
  from pathlib import Path
 
110
  checkpoint_dir=Path("path/to/Pharia-1-LLM-7B-control-aligned"),
111
  )
112
 
113
+ input_text = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
114
+
115
+ You are a helpful assistant. You give engaging, well-structured answers to user inquiries.<|eot_id|><|start_header_id|>user<|end_header_id|>
116
 
117
  When was Rome founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
118
 
119
+
120
  """
121
 
122
  generation = inference_model.generate(max_tokens=100, input_text=input_text)
 
464
  | Hardware Type | Hardware Amount | Avg. measured step duration | Avg. measured MFU | Avg. measured TFLOPS | Iterations (number of update steps) | Training tokens | GPU hours | Total FLOPs |
465
  | A100 (80GB) H100 | Up to 256 GPUs | 8.6s (A100) 3.6s (H100) | 0.66 (A100) 0.5 (H100) | 215 (A100)<br><br>520 (H100) | 582000 + 350000 | ~4.7T + 3T | 356k on A100 + 96k on H100 | 2.75\*1023 + 1.68\*1023 |
466
 
467
+ The total compute budget is reported in FLOPS in accordance with the [Bloom implementation](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/e52bdabbde3c6895aceb76c1bced295c2646121f/megatron/training.py#L759) to provide comparability to the [related paper](https://arxiv.org/pdf/2211.05100.pdf).
468
 
469
  ### Environmental Impact
470
 
471
+ Our data centre runs on 100% renewable energy such that **no CO2 emissions are incurred for any inference job** executed through the API. Furthermore, the data center operates with a net-zero water footprint.
472
 
473
  To estimate CO2 emissions, we base our calculations on the following assumptions:
474
 
 
482
  | Carbon emitted | Carbon emitted accounting for PUE | Power consumption | Note |
483
  | A100: 0 | A100: 0 | A100: max 400W per GPU<br><br>H100: max 700W per GPU | A100: 100% water-powered energy |
484
 
485
+ Numbers may be put into context e.g. by reference to [estimating the carbon footprint of BLOOM, a 176B parameter language model](https://arxiv.org/pdf/2211.02001.pdf).
486
 
487
  # Risks and Limitations
488
 
489
+ **Note:** Language models are **not agents** and not optimized for prescriptive actions. The use of language models in high-stake environments, for critical decisions or to support a user's wellbeing should be performed with additional guardrails in place.
490
 
491
  While `Pharia-1-LLM-7B-control-aligned` has received extra training to mitigate risks associated with harmful outputs and biases, it may still be prone to produce undesirable completions in some circumstances.
492
 
 
509
 
510
  * Employing a finetuned model designed to maintain an appropriate tone and style, including avoiding offensive language.
511
 
512
+ * Implementing [explainability](https://docs.aleph-alpha.com/docs/tasks/explain/) checks to create an audit trail at the application level.
513
 
514
  * Conducting additional validations at the application level to ensure output quality and appropriateness.
515
 
 
547
 
548
  * Performing validations on the application layer (e.g., classifying the output).
549
 
550
+ * Using the repetition penalty, especially in the case of repetition, or other parameters available in the API (see [documentation](https://docs.aleph-alpha.com/api/complete/)).
551
 
552
  * Avoiding of use cases targeted at retrieval of personally identifiable information.
553