Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.15.0
CSAI Knowledge Aggregator
Introduction
With the common requirement for various central teams to record, attend, and then process KT recording sessions or parse through documents, PDFs, etc. folders, there was a clear need to simplify and automate the gathering of knowledge and generate some value more expediently.
Created for personal use, this has been split out for other interested parties to make use of. In the current version on the 'share' branch, this tool is tailored specifically for transcribing mp4 KT recordings (or utilizing existing transcripts provided by various video platforms such as Zoom or Loom) to parse out various knowledge outputs and create KB articles. While initially focused on Central Support knowledge capture, with some minor adjustments, it can be tailored for other applications.
With a long-term goal of generalized content capture and curation, certain outputs may not be relevant for all use cases. Some parameterization has already been implemented but can be further adjusted.
Ideal KT Input Guidance Runbook
Current Outputs
- High-level Summary
- Topic Specific Summaries
- Glossary
- Troubleshooting Steps
- Word Cloud and Matching Symptoms
- KB For each Summary and the Troubleshooting Steps
- Screenshots of any captured Timestamps in Summary/Troubleshooting Steps
Note: Processing.log is also generated in the working directory.
Prerequisites
- Python 3.11
- ffmpeg - Pre-req for Pydub's AV manipulation.
Installation
Clone the Repository
git clone -b share --single-branch https://github.com/trilogy-group/cs-ai-kt-transcribe.git
Set up the Python Environment
Pick your poison
pyenv virtualenv [env_name]
pyenv activate [env_name]
python3 -m venv [env_name]
source ./bin/activate
Installing Dependencies
From your primary venv directory:
./bin/python -m pip install -r requirements.txt
Generate and Populate .env file
Within the primary venv directory, create a file named '.env' and populate it with the content below, replacing with your OpenAI API Key:
OPENAI_API_KEY=[YOUR_API_KEY]
Usage
Basic Usage
Topic and Transcribe are optional parameters that can be passed in to handle two special-cases - long-form multi-topic videos and skipping transcriptions.
By default, the script assumes you are providing video content (.mp4 format) in the input directory for a single topic that requires transcribing. Each Video (or transcript, if the optional flag is set to False) within the provided input directory will be processed in sequence. A folder is generated matching the video or transcript files name and the various outputs are placed within. Audio/Video precursor artefacts are placed within a generated "Processed" folder.
Once the basic set up above is completed, an Input directory can be generated to store your videos/transcripts to process. Then, run the script using the form below:
kt-transcript.py [--topic [TOPIC]] [--transcribe [TRANSCRIBE]] [input_folder]
Example Usage
./bin/python cs-ai-kt-transcribe/kt-transcript.py --topic True --transcribe True ./Input-Folder
Arguments
positional arguments:
input_folder The folder containing videos/transcripts to process relative to the current working directory.
options:
--topic If set to True, will generate topic-specific summaries in addition to the high-level summary.
--transcribe If set to False, will skip transcribing and leverage an existing '*_full_transcript.txt' file to generate outputs.
Customizing Outputs
Within the prompts directory in your pyenv, you will find a selection of prompt files that can be tweaked and adjusted to alter the final behaviours of the LLM processing. The prompts provided are tailored for Kandy, a VoIP Telephony product. While this has limited impact on its ability to parse other content, specialising the Persona segment of the prompt for a particular skillset does produce higher-quality results.
While the specific content of the videos being parsed will likely determine the ideal use case, the topic prompt can be altered to provide more targeted/specialized summaries. Note that the "[REPLACE_ME]" placeholder within the topic prompt is handled within the topic processing logic and is not intended to be manually replaced before running. The identified topics are replaced at runtime.
If the transcription element is leveraged and you encounter certain terminology/acronyms not properly being captured, you can seed the prompt to improve outputs: OpenAI Whisper Docs