TruthfulQA Directional Enhancement for Language Models: A Novel Approach to Specialization without Fine-Tuning

"Even though My experiments and ideas may seem unconventional, wouldn't it be significant if they proved to be effective?

After all, nothing starts out perfect.

The vast realm of AI is like a great wall—while we may not be able to completely cross it, isn't simply climbing up and seeing beyond it still a step forward?

What I am doing now is an attempt to provide a path that allows us to look beyond that wall.

May divine blessings and great wealth be upon all AI researchers who dedicate themselves to exploring these frontiers and pushing the boundaries of the unknown."

This Model by "AI JOAH"

Overview

This model is made by muzerai aka "AI JOAH" using kakaocorp/kanana-nano-2.1b-instruct (test purpose).

Subscribe to my YouTube Channel AI JOAH

This project presents a methodology for enhancing specific capabilities of language models using the Directional Enhancement technique. This approach does not introduce new knowledge into the model but amplifies its existing latent abilities. While preserving the general capabilities of the language model, it significantly improves performance in specific domains such as TruthfulQA Direction

This is a speculative code reasoning enhancement version of kakaocorp/kanana-nano-2.1b-instruct.

If enhance_tqa.txt is changed for a different domain, this model style can be adapted accordingly. This test utilizes 817 question-answer pairs for specialization in TruthfulQA Direction. Instead of relying on the model's own generated responses, directly curated question-answer pairs are injected to update the attention mechanism, ensuring alignment with factual accuracy.

datasets reference for full samples (question, best_answer, correct_answers, incorrect_answers): truthfulqa/truthful_qa.

enhance_tqa.txt & normal_tqa.txt are all english based, keep in mind for the performance for korean, maybe it works.

Technical Background

Principle of Directional Enhancement

This approach identifies a specialization direction in the representation space of the language model, associated with a specific capability, and enhances the model’s attention weights in that direction.

Compute the difference in representation between specialized prompts (domain-specific) and general prompts within the model's hidden states.
Normalize this difference vector to obtain the specialization direction.
Enhance the model’s self-attention output projection weights (o_proj) along this specialized direction.

This method strengthens the model’s intrinsic abilities rather than introducing completely new knowledge or patterns. It functions similarly to how a lens amplifies a specific wavelength of light.

Computing Specialization Direction

Unlike conventional fine-tuning, which modifies all weights in the model, this approach identifies a targeted enhancement direction by analyzing differences in activations across specialized and general inputs.

A set of specialized prompts (enhance_tqa.txt) and general prompts (normal_tqa.txt) are fed into the model.
The activations of a chosen hidden layer are extracted for both prompt types.
The mean hidden state vector for specialized prompts is computed and compared to the mean hidden state vector for general prompts.
Their difference represents the specialization direction, which is then normalized to create a unit vector.

Enhancing Model Weights

Once the specialization direction is computed, it is applied to modify the model’s self-attention output projection weights (o_proj) in a controlled manner:

The specialization direction is projected onto the weight matrix of each attention layer.
A scaled enhancement factor is applied to align the model’s attention outputs more strongly with the specialization direction.
This process amplifies the model’s responses in the desired direction without altering its fundamental structure.

This targeted adjustment allows the model to focus more on specific characteristics (e.g., TruthfulQA Direction) while maintaining general competency.

Implementation Details

Data Preparation

Two types of datasets are used to define the specialization direction:

Specialized Dataset (enhance_tqa.txt): Contains prompts focused on the capability to be enhanced. (question | best_answer groups | correct_answers groups)
General Dataset (normal_tqa.txt): Contains diverse, neutral prompts to serve as a baseline. (question | incorrect_answers groups)

The difference in activations between these two datasets defines the specialization direction, ensuring that the enhancement is aligned with the target capability while preserving the model’s general functionality.

Key Parameters

instructions: Number of instruction samples (question, best_answer, correct_answers, incorrect_answers) to process (default: 817)
layers: last 25 layers & final direction updated
enhancement_factor: Strength of enhancement along the specialization direction (default: 1.5)

Core Algorithm

# Compute specialization direction
specialization_dir = specialized_mean - general_mean
specialization_dir = specialization_dir / specialization_dir.norm()

# Core part of the weight enhancement algorithm
projection_scalars = torch.matmul(attn_output, specialization_dir)
projection = torch.outer(projection_scalars, specialization_dir)
enhanced_weights = attn_output + enhancement_factor * projection

License

The Kanana models are licensed under CC-BY-NC-4.0.

Citation

@misc{DirectionalEnhancement2025,
       title={Directional Enhancement for Language Models: A Novel Approach to Specialization without Fine-Tuning},
       author={AI JOAH},
       year={2025},
       url={https://www.youtube.com/@JayLee-gv8tv},
}

Contact

AI JOAH : [email protected]

muzerai
/

kanana-nano-2.1b-instruct-TruthfulQA-AIJOAH