## Dataset Processing ### Our Benchmark (processed OIE2016) Firstly, download our benchmark tailored for compact extractions provided [`here`](https://zenodo.org/record/7014032#.YwQQ0OzMJb8) and put it under [`data/OIE2016(processed)`](https://github.com/FarimaFatahi/CompactIE/tree/master/data/OIE2016(processed)). Secondly, split out the train, development, test set for the constituent extraction model by running: ``` cd OIE2016(processed)/constituent_model python process_constituent_data.py ``` Lastly, split out the train, development, test set for the constituent linking model by running: ``` cd OIE2016(processed)/relation_model python process_linking_data.py ``` Note that the data folders for training each model are set to the ones mentioned above. ### Evaluation Benchmarks Three evaluation benchmarks (**BenchIE**, **CaRB**, and **Wire57**) are used for evaluating CompactIE's performance. Note that since these datasets are not targeted for compact triples, we exclude triples that have at least one clause within a constituent. To get the final data (json format) for these benchmarks, run: ```bash ./process_test_data.sh ``` ### Other files Since the schema design of the table filling model does not support conjunctions inside constituents, we use the conjunction module developed by [`OpenIE6`](https://github.com/dair-iitd/openie6) to break sentences into smaller conjunction-free sentences before passing them to the system. Therefore, input new test files (`source_file.txt`), produce the conjunction file (`conjunctions.txt`) and then run: ``` python process.py --source_file source_file.txt --target_file output.json --conjunctions_file conjunctions.txt ``` ### Compactness measurement To measure the compactness metrics mentioned in the paper (AL, NCC, RPA), set the `INPUT_FILE` variable inside the following scrip to the test file path and run it as follows: ``` python compactness_measurements.py ```