Hunyuan Video Lora. Anime, Akame ga kill. Akame. v1

- Prompt
- Akame, close-up face shot with crimson eyes against red background, speaking brief words with neutral expression, transitioning into head movement leftward and slightly back, suggesting defensive motion, black hair following movement, high contrast lighting with red atmospheric glow behind, white collar visible at neck

- Prompt
- Akame, close-up face shot against dark forest background, focused crimson eyes, mouth moving slightly as she speaks few words, pale skin contrasting with black hair, white collar visible, static shot maintaining same angle throughout brief dialogue, minimal animation focused only on lip movement

- Prompt
- Akame, close-up face shot focusing on determined crimson eyes, speaking briefly while rotating Murasame's blade, steel catching moonlight creating subtle blue gleam, blade reflection mirroring her face, upper features gradually illuminated by blade's light, night atmosphere with soft lighting transition, minimal movement except for sword rotation
My first lora training. What questions I have:
- Which captions works the best? I followed structure like that: """|tag|, |view|, |who + visual description|,|more precise view|"""
- What resolutions of video to use. I used [768, 480]. Is it better to have videos with different resolutions or unified?
- How to decide this value "frame_buckets = [1, 16, 33, 65, 97, 129]" I took this one because videos in dataset was from 0.6 sec to 4.93 sec.
- What is the "video_clip_mode"? I selected the multiple_overlapping but why this instead of others.
- What is more important our of those, if I want to improve quality of the lora:
- A: collect more data;
- B: make better captions;
- C: only collect data for 1 task or a motion;
- Is it worth training lora with images and videos or only videos?
- It's hard to decide optimal inference parameters because there are a lot of knobs that you can change.
If someone have answers to questions above, I will be really happy to read them.
Description
Hunyuan Lora model trained on the short clips of Akame from the first episode of the anime, 29 clips in total with avg length: 2.16 sec.
Trained using Diffusion-pipe repo.
Inference params.
- lora_strength: 1.0
- dtype: bfloat16
- resolution: [[768,480]] (width, heigth)
- num_frames: 93
- steps: 20
- embedded_guidance_scale: 9.00 note I found that this value was good for my other lora so I used same here, I think it worth to experiment with;
- enhance video weight: 4.0 note I think this parameter also can be adjusted and there are some other params in enhance video node.
Data
- amount: 29 clips from 0.6 to 4.93 sec.
- avg_length: 2.16 sec
Data was collected manually using OpenShot program.
It took around 1 hour to collect 29 clips from 1 anime episode + 1 hour to create a captions for the clips using Sonnet 3.5 as a caption maker + manually correcting mistakes.
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-to-video models for diffusers library.
Model tree for CCRss/hunyuan_lora_anime_akame
Base model
tencent/HunyuanVideo