Question about model structure
Hi!
Good job!
I'm trying to reproduce your TemporalNet base on the code of ControlNet, but it works bad. I just made another controlnet model as "temporalnet model". the temporalnet utilizes the last generated image as input and is initialized with the parameters in diff_control_sd15_temporalnet_fp16.safetensors
. The output of the temporalnet is added to the middle and decoder block of stable diffusion model at the same location as the ControlNet, which is initialized with control_sd15_hed.pth
. And the control weight is 1.5 and 0.7 for ControlNet and TemporalNet respectively, comparing with 1.0 in the original ControlNet. Is there any problem about my understanding of TemporalNet structure? Could you please correct it for me? Thanks!
Hi!
So if i'm reading this right, you're trying to re-create this kind of model by using the generated outputs as dataset?
Hi!
Actually, I want to run TemporalNet without WebUI, just using PyTorch codes. So I need to understand the structure of the TemporalNet and how does it work. The above statement " I just made another controlnet model as "temporalnet model". ... And the control weight is 1.5 and 0.7 for ControlNet and TemporalNet respectively
" is my understanding. but according to the experiment results, it seems that i didn't figure out how do the TemporalNet work. Please let me try to clarify it:
- To generate the i-th frame, TemporalNet use the generated (i-1)-th frame as input.
- The outputs of the TemporalNet are added to the middle and decoder blocks of stable diffusion model along with the corresponding output of the ControlNet. For example, the outputs from the middle block of the TemporalNet and the ControlNet are both added to the middle block of the stable diffusion.
- The add operation termed in
2
are weighted sum. That is: $z_{sd_{new}} = z_{sd} + 1.5\times z_{controlnet} + 0.7\times z_{temporalnet}$.- The parameters of the TemporalNet are loaded from
diff_control_sd15_temporalnet_fp16.safetensors
and the ControlNet fromcontrol_sd15_hed.pth
.
Do I misunderstand the structure of the TemporalNet? Could you please correct it for me? Thanks again!