skyvera's picture
Initial commit
6718307 verified

A newer version of the Gradio SDK is available: 5.15.0

Upgrade

CSAI Knowledge Aggregator

Introduction

With the common requirement for various central teams to record, attend, and then process KT recording sessions or parse through documents, PDFs, etc. folders, there was a clear need to simplify and automate the gathering of knowledge and generate some value more expediently.

Created for personal use, this has been split out for other interested parties to make use of. In the current version on the 'share' branch, this tool is tailored specifically for transcribing mp4 KT recordings (or utilizing existing transcripts provided by various video platforms such as Zoom or Loom) to parse out various knowledge outputs and create KB articles. While initially focused on Central Support knowledge capture, with some minor adjustments, it can be tailored for other applications.

With a long-term goal of generalized content capture and curation, certain outputs may not be relevant for all use cases. Some parameterization has already been implemented but can be further adjusted.

Ideal KT Input Guidance Runbook

Example Outputs

Current Outputs

  • High-level Summary
  • Topic Specific Summaries
  • Glossary
  • Troubleshooting Steps
  • Word Cloud and Matching Symptoms
  • KB For each Summary and the Troubleshooting Steps
  • Screenshots of any captured Timestamps in Summary/Troubleshooting Steps

Note: Processing.log is also generated in the working directory.

Prerequisites

  • Python 3.11
  • ffmpeg - Pre-req for Pydub's AV manipulation.

Installation

Clone the Repository

git clone -b share --single-branch https://github.com/trilogy-group/cs-ai-kt-transcribe.git

Set up the Python Environment

Pick your poison

pyenv virtualenv [env_name]
pyenv activate [env_name]
python3 -m venv [env_name]
source ./bin/activate

Installing Dependencies

From your primary venv directory:

./bin/python -m pip install -r requirements.txt

Generate and Populate .env file

Within the primary venv directory, create a file named '.env' and populate it with the content below, replacing with your OpenAI API Key:

OPENAI_API_KEY=[YOUR_API_KEY]

Usage

Basic Usage

Topic and Transcribe are optional parameters that can be passed in to handle two special-cases - long-form multi-topic videos and skipping transcriptions.

By default, the script assumes you are providing video content (.mp4 format) in the input directory for a single topic that requires transcribing. Each Video (or transcript, if the optional flag is set to False) within the provided input directory will be processed in sequence. A folder is generated matching the video or transcript files name and the various outputs are placed within. Audio/Video precursor artefacts are placed within a generated "Processed" folder.

Once the basic set up above is completed, an Input directory can be generated to store your videos/transcripts to process. Then, run the script using the form below:

kt-transcript.py [--topic [TOPIC]] [--transcribe [TRANSCRIBE]] [input_folder]

Example Usage

./bin/python cs-ai-kt-transcribe/kt-transcript.py --topic True --transcribe True ./Input-Folder

Arguments

positional arguments:
  input_folder              The folder containing videos/transcripts to process relative to the current working directory.

options:
  --topic                   If set to True, will generate topic-specific summaries in addition to the high-level summary. 
  --transcribe              If set to False, will skip transcribing and leverage an existing '*_full_transcript.txt' file to generate outputs.

Customizing Outputs

Within the prompts directory in your pyenv, you will find a selection of prompt files that can be tweaked and adjusted to alter the final behaviours of the LLM processing. The prompts provided are tailored for Kandy, a VoIP Telephony product. While this has limited impact on its ability to parse other content, specialising the Persona segment of the prompt for a particular skillset does produce higher-quality results.

While the specific content of the videos being parsed will likely determine the ideal use case, the topic prompt can be altered to provide more targeted/specialized summaries. Note that the "[REPLACE_ME]" placeholder within the topic prompt is handled within the topic processing logic and is not intended to be manually replaced before running. The identified topics are replaced at runtime.

If the transcription element is leveraged and you encounter certain terminology/acronyms not properly being captured, you can seed the prompt to improve outputs: OpenAI Whisper Docs