Spaces:
Sleeping
Sleeping
Add app with hf inference
Browse files- README.md +16 -1
- app.py +32 -0
- article.md +1 -0
- description.md +3 -0
- requirements.txt +1 -0
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
title: Emission Extractor
|
3 |
-
emoji:
|
4 |
colorFrom: gray
|
5 |
colorTo: green
|
6 |
sdk: gradio
|
@@ -8,6 +8,21 @@ sdk_version: 4.16.0
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
1 |
---
|
2 |
title: Emission Extractor
|
3 |
+
emoji: 💨
|
4 |
colorFrom: gray
|
5 |
colorTo: green
|
6 |
sdk: gradio
|
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: apache-2.0
|
11 |
+
preload_from_hub:
|
12 |
+
- mistralai/Mistral-7B-Instruct-v0.2
|
13 |
+
- nopperl/emissions-extraction-lora
|
14 |
+
datasets:
|
15 |
+
- nopperl/sustainability-report-emissions-instruction-style
|
16 |
+
- nopperl/corporate-emission-reports
|
17 |
+
tags:
|
18 |
+
- information-extraction
|
19 |
+
- retrieval
|
20 |
+
- climate
|
21 |
+
- sustainability-reports
|
22 |
+
- corporate-social-responsibility
|
23 |
+
- emissions
|
24 |
+
- greenhouse-gas-emissions
|
25 |
+
- co2-emissions
|
26 |
---
|
27 |
|
28 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
app.py
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
|
3 |
+
from corporate_emission_reports.inference import extract_emissions
|
4 |
+
|
5 |
+
|
6 |
+
def predict(input_method, document_file, document_url):
|
7 |
+
document_path = document_file if input_method == "File" else document_url
|
8 |
+
emissions = extract_emissions(document_path, "mistralai/Mistral-7B-Instruct-v0.2", lora="nopperl/emissions-extraction-lora", engine="hf", low_cpu_mem_usage=True)
|
9 |
+
return emissions.model_dump_json()
|
10 |
+
|
11 |
+
with open("description.md", "r") as f:
|
12 |
+
description = f.read().strip()
|
13 |
+
|
14 |
+
with open("article.md", "r") as f:
|
15 |
+
article = f.read().strip()
|
16 |
+
|
17 |
+
interface = gr.Interface(
|
18 |
+
predict,
|
19 |
+
inputs=[gr.Radio(choices=["File", "URL"], value="File"), gr.File(type="filepath", file_types=[".pdf"], file_count="single", label="Report File"), gr.Textbox(label="Report URL")],
|
20 |
+
outputs=gr.JSON(),
|
21 |
+
description=description,
|
22 |
+
examples = [
|
23 |
+
["URL", None, "https://www.bms.com/assets/bms/us/en-us/pdf/bmy-2022-esg-report.pdf"],
|
24 |
+
["URL", None, "https://www.7andi.com/library/dbps_data/_template_/_res/en/sustainability/sustainabilityreport/2022/pdf/2022_all_01.pdf"],
|
25 |
+
["URL", None, "https://www.infineon.com/dgdl/Sustainability_at+Infineon_2023.pdf?fileId=8ac78c8b8b657de2018c009d03120100"],
|
26 |
+
],
|
27 |
+
article=article,
|
28 |
+
analytics_enabled=False,
|
29 |
+
cache_examples=False,
|
30 |
+
)
|
31 |
+
interface.queue().launch(debug=True, share=True)
|
32 |
+
|
article.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
Technical overview: The system retrieves the relevant pages of the uploaded report using simple search. These pages are input into a finetuned [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) language model, which outputs a JSON object containing the emission information. The system achieves an emission extraction accuracy of 62% and a source citation accuracy of 67% on the [corporate-emission-reports](https://huggingface.co/datasets/nopperl/corporate-emission-reports) dataset. Note that the model is quantized due to resource limitations.
|
description.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
Upload (or link to an URL of) a sustainability report PDF and get the most recent scope 1, 2 and 3 greenhouse gas emissions reported in the document. Since this space does not use a GPU, it will be quite slow. The system was developed for English documents.
|
2 |
+
|
3 |
+
Notes about the output: the emission values are in metric tons of CO2eq. The `sources` field is a list of pages containing the emission values. The page numbers are 0-based, e.g. the number 12 will correspond to page number 13 in your PDF viewer.
|
requirements.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
git+https://github.com/nopperl/corporate_emission_reports
|