jhj0517 commited on
Commit
f791f84
·
2 Parent(s): d9181cf 52ea725

Merge branch 'master' into huggingface

Browse files
.github/workflows/ci.yml CHANGED
@@ -28,8 +28,11 @@ jobs:
28
  with:
29
  python-version: ${{ matrix.python }}
30
 
 
 
 
31
  - name: Install dependencies
32
- run: pip install -r requirements.txt pytest
33
 
34
  - name: Run test
35
  run: python -m pytest -rs tests
 
28
  with:
29
  python-version: ${{ matrix.python }}
30
 
31
+ - name: Install ffmpeg
32
+ run: sudo apt-get update && sudo apt-get install -y ffmpeg
33
+
34
  - name: Install dependencies
35
+ run: pip install -r requirements.txt pytest scikit-image moviepy
36
 
37
  - name: Run test
38
  run: python -m pytest -rs tests
.github/workflows/publish-docker.yml ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Publish to Docker Hub
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - master
7
+
8
+ jobs:
9
+ build-and-push:
10
+ runs-on: ubuntu-latest
11
+
12
+ steps:
13
+ - name: Log in to Docker Hub
14
+ uses: docker/login-action@v2
15
+ with:
16
+ username: ${{ secrets.DOCKER_USERNAME }}
17
+ password: ${{ secrets.DOCKER_PASSWORD }}
18
+
19
+ - name: Checkout repository
20
+ uses: actions/checkout@v3
21
+
22
+ - name: Set up Docker Buildx
23
+ uses: docker/setup-buildx-action@v3
24
+
25
+ - name: Set up QEMU
26
+ uses: docker/setup-qemu-action@v3
27
+
28
+ - name: Build and push Docker image
29
+ uses: docker/build-push-action@v5
30
+ with:
31
+ context: .
32
+ file: ./docker/Dockerfile
33
+ push: true
34
+ tags: ${{ secrets.DOCKER_USERNAME }}/advancedliveportrait-webui:latest
35
+
36
+ - name: Log out of Docker Hub
37
+ run: docker logout
.gitignore CHANGED
@@ -4,5 +4,7 @@ models/
4
  outputs/
5
  *.png
6
  *.jpg
 
 
7
 
8
  **/.pytest_cache
 
4
  outputs/
5
  *.png
6
  *.jpg
7
+ *.jpeg
8
+ **/__pycache__
9
 
10
  **/.pytest_cache
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright 2024 jhj0517
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
README.md CHANGED
@@ -25,7 +25,8 @@ You can try it in Colab
25
  # Installation And Running
26
  ### Prerequisite
27
  1. `3.9` <= `python` <= `3.12` : https://www.python.org/downloads/release/python-3110/
28
-
 
29
  ## Run Locally
30
  1. git clone this repository
31
  ```
@@ -51,7 +52,7 @@ If you're using Windows, right-click the script and then click on ***Run with Po
51
  ```
52
  git clone https://github.com/jhj0517/AdvancedLivePortrait-WebUI.git
53
  ```
54
- 2. Build the imade
55
  ```
56
  docker compose -f docker/docker-compose.yaml build
57
  ```
 
25
  # Installation And Running
26
  ### Prerequisite
27
  1. `3.9` <= `python` <= `3.12` : https://www.python.org/downloads/release/python-3110/
28
+ 2. **(Opitonal, only if you're using Nvidia GPU)** CUDA 12.4 : https://developer.nvidia.com/cuda-12-4-0-download-archive?target_os=Windows
29
+ 3. (Optional, only needed if you use Video Driven) `FFmpeg`: https://ffmpeg.org/download.html <br> After installing `FFmpeg`, make sure to add the FFmpeg/bin folder to your **system PATH**!
30
  ## Run Locally
31
  1. git clone this repository
32
  ```
 
52
  ```
53
  git clone https://github.com/jhj0517/AdvancedLivePortrait-WebUI.git
54
  ```
55
+ 2. Build the image
56
  ```
57
  docker compose -f docker/docker-compose.yaml build
58
  ```
app.py CHANGED
@@ -20,7 +20,7 @@ class App:
20
  )
21
 
22
  @staticmethod
23
- def create_parameters():
24
  return [
25
  gr.Dropdown(label=_("Model Type"), visible=False, interactive=False,
26
  choices=[item.value for item in ModelType], value=ModelType.HUMAN.value),
@@ -38,10 +38,21 @@ class App:
38
  gr.Slider(label=_("WOO"), minimum=-20, maximum=20, step=0.2, value=0),
39
  gr.Slider(label=_("Smile"), minimum=-2.0, maximum=2.0, step=0.01, value=0),
40
  gr.Slider(label=_("Source Ratio"), minimum=0, maximum=1, step=0.01, value=1),
41
- gr.Slider(label=_("Sample Ratio"), minimum=-0.2, maximum=1.2, step=0.01, value=1),
42
- gr.Dropdown(label=_("Sample Parts"),
43
  choices=[part.value for part in SamplePart], value=SamplePart.ALL.value),
44
- gr.Slider(label=_("Crop Factor"), minimum=1.5, maximum=2.5, step=0.1, value=1.7)
 
 
 
 
 
 
 
 
 
 
 
45
  ]
46
 
47
  def launch(self):
@@ -49,41 +60,68 @@ class App:
49
  with self.i18n:
50
  gr.Markdown(REPO_MARKDOWN, elem_id="md_project")
51
 
52
- with gr.Row():
53
- with gr.Column():
54
- img_ref = gr.Image(label=_("Reference Image"))
55
- with gr.Row():
56
- btn_gen = gr.Button("GENERATE", visible=False)
57
- with gr.Row(equal_height=True):
58
- with gr.Column(scale=9):
59
- img_out = gr.Image(label=_("Output Image"))
60
- with gr.Column(scale=1):
61
- expression_parameters = self.create_parameters()
62
- btn_openfolder = gr.Button('📂')
63
- with gr.Accordion("Opt in features", visible=False):
64
- img_sample = gr.Image()
65
- img_motion_link = gr.Image()
66
- tb_exp = gr.Textbox()
67
-
68
- params = expression_parameters + [img_ref]
69
- opt_in_features_params = [img_sample, img_motion_link, tb_exp]
70
-
71
- gr.on(
72
- triggers=[param.change for param in params],
73
- fn=self.inferencer.edit_expression,
74
- inputs=params + opt_in_features_params,
75
- outputs=img_out,
76
- #show_progress="minimal",
77
- queue=True
78
- )
79
-
80
- btn_openfolder.click(
81
- fn=lambda: self.open_folder(self.args.output_dir), inputs=None, outputs=None
82
- )
83
-
84
- btn_gen.click(self.inferencer.edit_expression,
85
- inputs=params + opt_in_features_params,
86
- outputs=img_out)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
  gradio_launch_args = {
89
  "inbrowser": self.args.inbrowser,
 
20
  )
21
 
22
  @staticmethod
23
+ def create_expression_parameters():
24
  return [
25
  gr.Dropdown(label=_("Model Type"), visible=False, interactive=False,
26
  choices=[item.value for item in ModelType], value=ModelType.HUMAN.value),
 
38
  gr.Slider(label=_("WOO"), minimum=-20, maximum=20, step=0.2, value=0),
39
  gr.Slider(label=_("Smile"), minimum=-2.0, maximum=2.0, step=0.01, value=0),
40
  gr.Slider(label=_("Source Ratio"), minimum=0, maximum=1, step=0.01, value=1),
41
+ gr.Slider(label=_("Sample Ratio"), minimum=-0.2, maximum=1.2, step=0.01, value=1, visible=False),
42
+ gr.Dropdown(label=_("Sample Parts"), visible=False,
43
  choices=[part.value for part in SamplePart], value=SamplePart.ALL.value),
44
+ gr.Slider(label=_("Face Crop Factor"), minimum=1.5, maximum=2.5, step=0.1, value=2)
45
+ ]
46
+
47
+ @staticmethod
48
+ def create_video_parameters():
49
+ return [
50
+ gr.Dropdown(label=_("Model Type"), visible=False, interactive=False,
51
+ choices=[item.value for item in ModelType],
52
+ value=ModelType.HUMAN.value),
53
+ gr.Slider(label=_("First frame eyes alignment factor"), minimum=0, maximum=1, step=0.01, value=1),
54
+ gr.Slider(label=_("First frame mouth alignment factor"), minimum=0, maximum=1, step=0.01, value=1),
55
+ gr.Slider(label=_("Face Crop Factor"), minimum=1.5, maximum=2.5, step=0.1, value=2),
56
  ]
57
 
58
  def launch(self):
 
60
  with self.i18n:
61
  gr.Markdown(REPO_MARKDOWN, elem_id="md_project")
62
 
63
+ with gr.Tabs():
64
+ with gr.TabItem(_("Expression Editor")):
65
+ with gr.Row():
66
+ with gr.Column():
67
+ img_ref = gr.Image(label=_("Reference Image"))
68
+ with gr.Row():
69
+ btn_gen = gr.Button("GENERATE", visible=False)
70
+ with gr.Row(equal_height=True):
71
+ with gr.Column(scale=9):
72
+ img_out = gr.Image(label=_("Output Image"))
73
+ with gr.Column(scale=1):
74
+ expression_parameters = self.create_expression_parameters()
75
+ btn_openfolder = gr.Button('📂')
76
+ with gr.Accordion("Opt in features", visible=False):
77
+ img_sample = gr.Image()
78
+
79
+ params = expression_parameters + [img_ref]
80
+ opt_in_features_params = [img_sample]
81
+
82
+ gr.on(
83
+ triggers=[param.change for param in params],
84
+ fn=self.inferencer.edit_expression,
85
+ inputs=params + opt_in_features_params,
86
+ outputs=img_out,
87
+ show_progress="minimal",
88
+ queue=True
89
+ )
90
+
91
+ btn_openfolder.click(
92
+ fn=lambda: self.open_folder(self.args.output_dir), inputs=None, outputs=None
93
+ )
94
+
95
+ btn_gen.click(self.inferencer.edit_expression,
96
+ inputs=params + opt_in_features_params,
97
+ outputs=img_out)
98
+
99
+ with gr.TabItem(_("Video Driven")):
100
+ with gr.Row():
101
+ img_ref = gr.Image(label=_("Reference Image"))
102
+ vid_driven = gr.Video(label=_("Expression Video"))
103
+ with gr.Column():
104
+ vid_params = self.create_video_parameters()
105
+
106
+ with gr.Row():
107
+ btn_gen = gr.Button(_("GENERATE"), variant="primary")
108
+ with gr.Row(equal_height=True):
109
+ with gr.Column(scale=9):
110
+ vid_out = gr.Video(label=_("Output Video"), scale=9)
111
+ with gr.Column(scale=1):
112
+ btn_openfolder = gr.Button('📂')
113
+
114
+ params = vid_params + [img_ref, vid_driven]
115
+
116
+ btn_gen.click(
117
+ fn=self.inferencer.create_video,
118
+ inputs=params,
119
+ outputs=vid_out
120
+ )
121
+ btn_openfolder.click(
122
+ fn=lambda: self.open_folder(os.path.join(self.args.output_dir, "videos")),
123
+ inputs=None, outputs=None
124
+ )
125
 
126
  gradio_launch_args = {
127
  "inbrowser": self.args.inbrowser,
i18n/translation.yaml CHANGED
@@ -24,6 +24,14 @@ en: # English
24
  OnlyEyes: OnlyEyes
25
  All: All
26
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
27
 
28
  ko: # Korean
29
  Language: 언어
@@ -51,6 +59,14 @@ ko: # Korean
51
  OnlyEyes: 눈만
52
  All: 전부
53
  Value above 5 may appear distorted: 5 이상은 왜곡돼 보일 수 있습니다.
 
 
 
 
 
 
 
 
54
 
55
  ja: # Japanese
56
  Language: 言語
@@ -78,6 +94,14 @@ ja: # Japanese
78
  OnlyEyes: OnlyEyes
79
  All: All
80
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
81
 
82
  es: # Spanish
83
  Language: Idioma
@@ -105,6 +129,14 @@ es: # Spanish
105
  OnlyEyes: OnlyEyes
106
  All: All
107
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
108
 
109
  fr: # French
110
  Language: Langue
@@ -132,6 +164,14 @@ fr: # French
132
  OnlyEyes: OnlyEyes
133
  All: All
134
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
135
 
136
  de: # German
137
  Language: Sprache
@@ -159,6 +199,14 @@ de: # German
159
  OnlyEyes: OnlyEyes
160
  All: All
161
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
162
 
163
  zh: # Chinese
164
  Language: 语言
@@ -186,6 +234,14 @@ zh: # Chinese
186
  OnlyEyes: OnlyEyes
187
  All: All
188
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
189
 
190
  uk: # Ukrainian
191
  Language: Мова
@@ -213,6 +269,14 @@ uk: # Ukrainian
213
  OnlyEyes: OnlyEyes
214
  All: All
215
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
216
 
217
  ru: # Russian
218
  Language: Язык
@@ -240,6 +304,14 @@ ru: # Russian
240
  OnlyEyes: OnlyEyes
241
  All: All
242
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
243
 
244
  tr: # Turkish
245
  Language: Dil
@@ -267,3 +339,11 @@ tr: # Turkish
267
  OnlyEyes: OnlyEyes
268
  All: All
269
  Value above 5 may appear distorted: Value above 5 may appear distorted
 
 
 
 
 
 
 
 
 
24
  OnlyEyes: OnlyEyes
25
  All: All
26
  Value above 5 may appear distorted: Value above 5 may appear distorted
27
+ Expression Editor: Expression Editor
28
+ Video Driven: Video Driven
29
+ Expression Video: Expression Video
30
+ GENERATE: GENERATE
31
+ Output Video: Output Video
32
+ First frame mouth alignment factor: First frame mouth alignment factor
33
+ First frame eyes alignment factor: First frame eyes alignment factor
34
+ Face Crop Factor: Face Crop Factor
35
 
36
  ko: # Korean
37
  Language: 언어
 
59
  OnlyEyes: 눈만
60
  All: 전부
61
  Value above 5 may appear distorted: 5 이상은 왜곡돼 보일 수 있습니다.
62
+ Expression Editor: 표정 편집기
63
+ Video Driven: 영상 변환
64
+ Expression Video: 표정 영상
65
+ GENERATE: 생성
66
+ Output Video: 결과 영상
67
+ First frame mouth alignment factor: 첫 프레임 입 반영 비율
68
+ First frame eyes alignment factor: 첫 프레임 눈 반영 비율
69
+ Face Crop Factor: 얼굴 크롭 비율
70
 
71
  ja: # Japanese
72
  Language: 言語
 
94
  OnlyEyes: OnlyEyes
95
  All: All
96
  Value above 5 may appear distorted: Value above 5 may appear distorted
97
+ Expression Editor: Expression Editor
98
+ Video Driven: Video Driven
99
+ Expression Video: Expression Video
100
+ GENERATE: GENERATE
101
+ Output Video: Output Video
102
+ First frame mouth alignment factor: First frame mouth alignment factor
103
+ First frame eyes alignment factor: First frame eyes alignment factor
104
+ Face Crop Factor: Face Crop Factor
105
 
106
  es: # Spanish
107
  Language: Idioma
 
129
  OnlyEyes: OnlyEyes
130
  All: All
131
  Value above 5 may appear distorted: Value above 5 may appear distorted
132
+ Expression Editor: Expression Editor
133
+ Video Driven: Video Driven
134
+ Expression Video: Expression Video
135
+ GENERATE: GENERATE
136
+ Output Video: Output Video
137
+ First frame mouth alignment factor: First frame mouth alignment factor
138
+ First frame eyes alignment factor: First frame eyes alignment factor
139
+ Face Crop Factor: Face Crop Factor
140
 
141
  fr: # French
142
  Language: Langue
 
164
  OnlyEyes: OnlyEyes
165
  All: All
166
  Value above 5 may appear distorted: Value above 5 may appear distorted
167
+ Expression Editor: Expression Editor
168
+ Video Driven: Video Driven
169
+ Expression Video: Expression Video
170
+ GENERATE: GENERATE
171
+ Output Video: Output Video
172
+ First frame mouth alignment factor: First frame mouth alignment factor
173
+ First frame eyes alignment factor: First frame eyes alignment factor
174
+ Face Crop Factor: Face Crop Factor
175
 
176
  de: # German
177
  Language: Sprache
 
199
  OnlyEyes: OnlyEyes
200
  All: All
201
  Value above 5 may appear distorted: Value above 5 may appear distorted
202
+ Expression Editor: Expression Editor
203
+ Video Driven: Video Driven
204
+ Expression Video: Expression Video
205
+ GENERATE: GENERATE
206
+ Output Video: Output Video
207
+ First frame mouth alignment factor: First frame mouth alignment factor
208
+ First frame eyes alignment factor: First frame eyes alignment factor
209
+ Face Crop Factor: Face Crop Factor
210
 
211
  zh: # Chinese
212
  Language: 语言
 
234
  OnlyEyes: OnlyEyes
235
  All: All
236
  Value above 5 may appear distorted: Value above 5 may appear distorted
237
+ Expression Editor: Expression Editor
238
+ Video Driven: Video Driven
239
+ Expression Video: Expression Video
240
+ GENERATE: GENERATE
241
+ Output Video: Output Video
242
+ First frame mouth alignment factor: First frame mouth alignment factor
243
+ First frame eyes alignment factor: First frame eyes alignment factor
244
+ Face Crop Factor: Face Crop Factor
245
 
246
  uk: # Ukrainian
247
  Language: Мова
 
269
  OnlyEyes: OnlyEyes
270
  All: All
271
  Value above 5 may appear distorted: Value above 5 may appear distorted
272
+ Expression Editor: Expression Editor
273
+ Video Driven: Video Driven
274
+ Expression Video: Expression Video
275
+ GENERATE: GENERATE
276
+ Output Video: Output Video
277
+ First frame mouth alignment factor: First frame mouth alignment factor
278
+ First frame eyes alignment factor: First frame eyes alignment factor
279
+ Face Crop Factor: Face Crop Factor
280
 
281
  ru: # Russian
282
  Language: Язык
 
304
  OnlyEyes: OnlyEyes
305
  All: All
306
  Value above 5 may appear distorted: Value above 5 may appear distorted
307
+ Expression Editor: Expression Editor
308
+ Video Driven: Video Driven
309
+ Expression Video: Expression Video
310
+ GENERATE: GENERATE
311
+ Output Video: Output Video
312
+ First frame mouth alignment factor: First frame mouth alignment factor
313
+ First frame eyes alignment factor: First frame eyes alignment factor
314
+ Face Crop Factor: Face Crop Factor
315
 
316
  tr: # Turkish
317
  Language: Dil
 
339
  OnlyEyes: OnlyEyes
340
  All: All
341
  Value above 5 may appear distorted: Value above 5 may appear distorted
342
+ Expression Editor: Expression Editor
343
+ Video Driven: Video Driven
344
+ Expression Video: Expression Video
345
+ GENERATE: GENERATE
346
+ Output Video: Output Video
347
+ First frame mouth alignment factor: First frame mouth alignment factor
348
+ First frame eyes alignment factor: First frame eyes alignment factor
349
+ Face Crop Factor: Face Crop Factor
modules/live_portrait/live_portrait_inferencer.py CHANGED
@@ -4,16 +4,18 @@ import cv2
4
  import time
5
  import copy
6
  import dill
 
7
  from ultralytics import YOLO
8
  import safetensors.torch
9
  import gradio as gr
10
  from gradio_i18n import Translate, gettext as _
11
  from ultralytics.utils import LOGGER as ultralytics_logger
12
  from enum import Enum
13
- from typing import Union
14
 
15
  from modules.utils.paths import *
16
  from modules.utils.image_helper import *
 
17
  from modules.live_portrait.model_downloader import *
18
  from modules.live_portrait.live_portrait_wrapper import LivePortraitWrapper
19
  from modules.utils.camera import get_rotation_matrix
@@ -32,8 +34,17 @@ class LivePortraitInferencer:
32
  model_dir: str = MODELS_DIR,
33
  output_dir: str = OUTPUTS_DIR):
34
  self.model_dir = model_dir
35
- os.makedirs(os.path.join(self.model_dir, "animal"), exist_ok=True)
36
  self.output_dir = output_dir
 
 
 
 
 
 
 
 
 
 
37
  self.model_config = load_yaml(MODEL_CONFIG)["model_params"]
38
 
39
  self.appearance_feature_extractor = None
@@ -119,7 +130,7 @@ class LivePortraitInferencer:
119
  )
120
  self.stitching_retargeting_module = {"stitching": self.stitching_retargeting_module}
121
 
122
- if self.pipeline is None:
123
  self.pipeline = LivePortraitWrapper(
124
  InferenceConfig(),
125
  self.appearance_feature_extractor,
@@ -134,26 +145,24 @@ class LivePortraitInferencer:
134
 
135
  def edit_expression(self,
136
  model_type: str = ModelType.HUMAN.value,
137
- rotate_pitch=0,
138
- rotate_yaw=0,
139
- rotate_roll=0,
140
- blink=0,
141
- eyebrow=0,
142
- wink=0,
143
- pupil_x=0,
144
- pupil_y=0,
145
- aaa=0,
146
- eee=0,
147
- woo=0,
148
- smile=0,
149
- src_ratio=1,
150
- sample_ratio=1,
151
- sample_parts="All",
152
- crop_factor=1.5,
153
- src_image=None,
154
- sample_image=None,
155
- motion_link=None,
156
- add_exp=None):
157
  if isinstance(model_type, ModelType):
158
  model_type = model_type.value
159
  if model_type not in [mode.value for mode in ModelType]:
@@ -165,199 +174,158 @@ class LivePortraitInferencer:
165
  )
166
 
167
  try:
168
- rotate_yaw = -rotate_yaw
 
 
 
 
 
 
 
 
 
169
 
170
- new_editor_link = None
171
- if isinstance(motion_link, np.ndarray) and motion_link:
172
- self.psi = motion_link[0]
173
- new_editor_link = motion_link.copy()
174
- elif src_image is not None:
175
- if id(src_image) != id(self.src_image) or self.crop_factor != crop_factor:
176
- self.crop_factor = crop_factor
177
- self.psi = self.prepare_source(src_image, crop_factor)
178
- self.src_image = src_image
179
- new_editor_link = []
180
- new_editor_link.append(self.psi)
181
- else:
182
- return None
183
-
184
- psi = self.psi
185
- s_info = psi.x_s_info
186
- #delta_new = copy.deepcopy()
187
- s_exp = s_info['exp'] * src_ratio
188
- s_exp[0, 5] = s_info['exp'][0, 5]
189
- s_exp += s_info['kp']
190
-
191
- es = ExpressionSet()
192
-
193
- if isinstance(sample_image, np.ndarray) and sample_image:
194
- if id(self.sample_image) != id(sample_image):
195
- self.sample_image = sample_image
196
- d_image_np = (sample_image * 255).byte().numpy()
197
- d_face = self.crop_face(d_image_np[0], 1.7)
198
- i_d = self.prepare_src_image(d_face)
199
- self.d_info = self.pipeline.get_kp_info(i_d)
200
- self.d_info['exp'][0, 5, 0] = 0
201
- self.d_info['exp'][0, 5, 1] = 0
202
-
203
- # "OnlyExpression", "OnlyRotation", "OnlyMouth", "OnlyEyes", "All"
204
- if sample_parts == SamplePart.ONLY_EXPRESSION.value or sample_parts == SamplePart.ONLY_EXPRESSION.ALL.value:
205
- es.e += self.d_info['exp'] * sample_ratio
206
- if sample_parts == SamplePart.ONLY_ROTATION.value or sample_parts == SamplePart.ONLY_ROTATION.ALL.value:
207
- rotate_pitch += self.d_info['pitch'] * sample_ratio
208
- rotate_yaw += self.d_info['yaw'] * sample_ratio
209
- rotate_roll += self.d_info['roll'] * sample_ratio
210
- elif sample_parts == SamplePart.ONLY_MOUTH.value:
211
- self.retargeting(es.e, self.d_info['exp'], sample_ratio, (14, 17, 19, 20))
212
- elif sample_parts == SamplePart.ONLY_EYES.value:
213
- self.retargeting(es.e, self.d_info['exp'], sample_ratio, (1, 2, 11, 13, 15, 16))
214
-
215
- es.r = self.calc_fe(es.e, blink, eyebrow, wink, pupil_x, pupil_y, aaa, eee, woo, smile,
216
- rotate_pitch, rotate_yaw, rotate_roll)
217
-
218
- if isinstance(add_exp, ExpressionSet):
219
- es.add(add_exp)
220
-
221
- new_rotate = get_rotation_matrix(s_info['pitch'] + es.r[0], s_info['yaw'] + es.r[1],
222
- s_info['roll'] + es.r[2])
223
- x_d_new = (s_info['scale'] * (1 + es.s)) * ((s_exp + es.e) @ new_rotate) + s_info['t']
224
-
225
- x_d_new = self.pipeline.stitching(psi.x_s_user, x_d_new)
226
-
227
- crop_out = self.pipeline.warp_decode(psi.f_s_user, psi.x_s_user, x_d_new)
228
- crop_out = self.pipeline.parse_output(crop_out['out'])[0]
229
-
230
- crop_with_fullsize = cv2.warpAffine(crop_out, psi.crop_trans_m, get_rgb_size(psi.src_rgb), cv2.INTER_LINEAR)
231
- out = np.clip(psi.mask_ori * crop_with_fullsize + (1 - psi.mask_ori) * psi.src_rgb, 0, 255).astype(np.uint8)
232
-
233
- temp_out_img_path, out_img_path = get_auto_incremental_file_path(TEMP_DIR, "png"), get_auto_incremental_file_path(OUTPUTS_DIR, "png")
234
- save_image(numpy_array=crop_out, output_path=temp_out_img_path)
235
- save_image(numpy_array=out, output_path=out_img_path)
236
-
237
- new_editor_link.append(es)
238
-
239
- return out
240
  except Exception as e:
241
  raise
242
 
243
  def create_video(self,
244
- retargeting_eyes,
245
- retargeting_mouth,
246
- turn_on,
247
- tracking_src_vid,
248
- animate_without_vid,
249
- command,
250
- crop_factor,
251
- src_images=None,
252
- driving_images=None,
253
- motion_link=None,
254
- progress=gr.Progress()):
255
- if not turn_on:
256
- return None, None
257
- src_length = 1
258
-
259
- if src_images is None:
260
- if motion_link is not None:
261
- self.psi_list = [motion_link[0]]
262
- else:
263
- return None, None
264
-
265
- if src_images is not None:
266
- src_length = len(src_images)
267
- if id(src_images) != id(self.src_images) or self.crop_factor != crop_factor:
268
- self.crop_factor = crop_factor
269
- self.src_images = src_images
270
- if 1 < src_length:
271
- self.psi_list = self.prepare_source(src_images, crop_factor, True, tracking_src_vid)
272
- else:
273
- self.psi_list = [self.prepare_source(src_images, crop_factor)]
274
 
275
- cmd_list, cmd_length = self.parsing_command(command, motion_link)
276
- if cmd_list is None:
277
- return None,None
278
- cmd_idx = 0
279
 
280
- driving_length = 0
281
- if driving_images is not None:
282
- if id(driving_images) != id(self.driving_images):
283
- self.driving_images = driving_images
284
- self.driving_values = self.prepare_driving_video(driving_images)
285
- driving_length = len(self.driving_values)
286
 
287
- total_length = max(driving_length, src_length)
288
 
289
- if animate_without_vid:
290
- total_length = max(total_length, cmd_length)
291
 
292
- c_i_es = ExpressionSet()
293
- c_o_es = ExpressionSet()
294
- d_0_es = None
295
- out_list = []
 
 
296
 
297
- psi = None
298
- for i in range(total_length):
299
 
300
- if i < src_length:
301
- psi = self.psi_list[i]
302
- s_info = psi.x_s_info
303
- s_es = ExpressionSet(erst=(s_info['kp'] + s_info['exp'], torch.Tensor([0, 0, 0]), s_info['scale'], s_info['t']))
304
 
305
- new_es = ExpressionSet(es=s_es)
 
 
306
 
307
- if i < cmd_length:
308
- cmd = cmd_list[cmd_idx]
309
- if 0 < cmd.change:
310
- cmd.change -= 1
311
- c_i_es.add(cmd.es)
312
- c_i_es.sub(c_o_es)
313
- elif 0 < cmd.keep:
314
- cmd.keep -= 1
315
 
316
- new_es.add(c_i_es)
317
 
318
- if cmd.change == 0 and cmd.keep == 0:
319
- cmd_idx += 1
320
- if cmd_idx < len(cmd_list):
321
- c_o_es = ExpressionSet(es=c_i_es)
322
- cmd = cmd_list[cmd_idx]
323
- c_o_es.div(cmd.change)
324
- elif 0 < cmd_length:
325
- new_es.add(c_i_es)
326
 
327
- if i < driving_length:
328
- d_i_info = self.driving_values[i]
329
- d_i_r = torch.Tensor([d_i_info['pitch'], d_i_info['yaw'], d_i_info['roll']])#.float().to(device="cuda:0")
330
 
331
- if d_0_es is None:
332
- d_0_es = ExpressionSet(erst = (d_i_info['exp'], d_i_r, d_i_info['scale'], d_i_info['t']))
333
 
334
- self.retargeting(s_es.e, d_0_es.e, retargeting_eyes, (11, 13, 15, 16))
335
- self.retargeting(s_es.e, d_0_es.e, retargeting_mouth, (14, 17, 19, 20))
 
336
 
337
- new_es.e += d_i_info['exp'] - d_0_es.e
338
- new_es.r += d_i_r - d_0_es.r
339
- new_es.t += d_i_info['t'] - d_0_es.t
 
 
 
340
 
341
- r_new = get_rotation_matrix(
342
- s_info['pitch'] + new_es.r[0], s_info['yaw'] + new_es.r[1], s_info['roll'] + new_es.r[2])
343
- d_new = new_es.s * (new_es.e @ r_new) + new_es.t
344
- d_new = self.pipeline.stitching(psi.x_s_user, d_new)
345
- crop_out = self.pipeline.warp_decode(psi.f_s_user, psi.x_s_user, d_new)
346
- crop_out = self.pipeline.parse_output(crop_out['out'])[0]
347
 
348
- crop_with_fullsize = cv2.warpAffine(crop_out, psi.crop_trans_m, get_rgb_size(psi.src_rgb),
349
- cv2.INTER_LINEAR)
350
- out = np.clip(psi.mask_ori * crop_with_fullsize + (1 - psi.mask_ori) * psi.src_rgb, 0, 255).astype(
351
- np.uint8)
352
- out_list.append(out)
353
 
354
- progress(i/total_length, "predicting..")
355
 
356
- if len(out_list) == 0:
357
- return None
358
 
359
- out_imgs = torch.cat([pil2tensor(img_rgb) for img_rgb in out_list])
360
- return out_imgs
 
361
 
362
  def download_if_no_models(self,
363
  model_type: str = ModelType.HUMAN.value,
@@ -528,7 +496,6 @@ class LivePortraitInferencer:
528
  @staticmethod
529
  def retargeting(delta_out, driving_exp, factor, idxes):
530
  for idx in idxes:
531
- # delta_out[0, idx] -= src_exp[0, idx] * factor
532
  delta_out[0, idx] += driving_exp[0, idx] * factor
533
 
534
  @staticmethod
@@ -552,8 +519,15 @@ class LivePortraitInferencer:
552
  return new_img
553
 
554
  def prepare_src_image(self, img):
555
- h, w = img.shape[:2]
556
- input_shape = [256,256]
 
 
 
 
 
 
 
557
  if h != input_shape[0] or w != input_shape[1]:
558
  if 256 < h: interpolation = cv2.INTER_AREA
559
  else: interpolation = cv2.INTER_LINEAR
@@ -624,11 +598,9 @@ class LivePortraitInferencer:
624
  return psi_list
625
 
626
  def prepare_driving_video(self, face_images):
627
- print("Prepare driving video...")
628
- f_img_np = (face_images * 255).byte().numpy()
629
-
630
  out_list = []
631
- for f_img in f_img_np:
632
  i_d = self.prepare_src_image(f_img)
633
  d_info = self.pipeline.get_kp_info(i_d)
634
  out_list.append(d_info)
 
4
  import time
5
  import copy
6
  import dill
7
+ import torch
8
  from ultralytics import YOLO
9
  import safetensors.torch
10
  import gradio as gr
11
  from gradio_i18n import Translate, gettext as _
12
  from ultralytics.utils import LOGGER as ultralytics_logger
13
  from enum import Enum
14
+ from typing import Union, List, Dict, Tuple
15
 
16
  from modules.utils.paths import *
17
  from modules.utils.image_helper import *
18
+ from modules.utils.video_helper import *
19
  from modules.live_portrait.model_downloader import *
20
  from modules.live_portrait.live_portrait_wrapper import LivePortraitWrapper
21
  from modules.utils.camera import get_rotation_matrix
 
34
  model_dir: str = MODELS_DIR,
35
  output_dir: str = OUTPUTS_DIR):
36
  self.model_dir = model_dir
 
37
  self.output_dir = output_dir
38
+ relative_dirs = [
39
+ os.path.join(self.model_dir, "animal"),
40
+ os.path.join(self.output_dir, "videos"),
41
+ os.path.join(self.output_dir, "temp"),
42
+ os.path.join(self.output_dir, "temp", "video_frames"),
43
+ os.path.join(self.output_dir, "temp", "video_frames", "out"),
44
+ ]
45
+ for dir_path in relative_dirs:
46
+ os.makedirs(dir_path, exist_ok=True)
47
+
48
  self.model_config = load_yaml(MODEL_CONFIG)["model_params"]
49
 
50
  self.appearance_feature_extractor = None
 
130
  )
131
  self.stitching_retargeting_module = {"stitching": self.stitching_retargeting_module}
132
 
133
+ if self.pipeline is None or model_type != self.model_type:
134
  self.pipeline = LivePortraitWrapper(
135
  InferenceConfig(),
136
  self.appearance_feature_extractor,
 
145
 
146
  def edit_expression(self,
147
  model_type: str = ModelType.HUMAN.value,
148
+ rotate_pitch: float = 0,
149
+ rotate_yaw: float = 0,
150
+ rotate_roll: float = 0,
151
+ blink: float = 0,
152
+ eyebrow: float = 0,
153
+ wink: float = 0,
154
+ pupil_x: float = 0,
155
+ pupil_y: float = 0,
156
+ aaa: float = 0,
157
+ eee: float = 0,
158
+ woo: float = 0,
159
+ smile: float = 0,
160
+ src_ratio: float = 1,
161
+ sample_ratio: float = 1,
162
+ sample_parts: str = SamplePart.ALL.value,
163
+ crop_factor: float = 2.3,
164
+ src_image: Optional[str] = None,
165
+ sample_image: Optional[str] = None,) -> None:
 
 
166
  if isinstance(model_type, ModelType):
167
  model_type = model_type.value
168
  if model_type not in [mode.value for mode in ModelType]:
 
174
  )
175
 
176
  try:
177
+ with torch.autocast(device_type=self.device, enabled=(self.device == "cuda")):
178
+ rotate_yaw = -rotate_yaw
179
+
180
+ if src_image is not None:
181
+ if id(src_image) != id(self.src_image) or self.crop_factor != crop_factor:
182
+ self.crop_factor = crop_factor
183
+ self.psi = self.prepare_source(src_image, crop_factor)
184
+ self.src_image = src_image
185
+ else:
186
+ return None
187
 
188
+ psi = self.psi
189
+ s_info = psi.x_s_info
190
+ #delta_new = copy.deepcopy()
191
+ s_exp = s_info['exp'] * src_ratio
192
+ s_exp[0, 5] = s_info['exp'][0, 5]
193
+ s_exp += s_info['kp']
194
+
195
+ es = ExpressionSet()
196
+
197
+ if isinstance(sample_image, np.ndarray) and sample_image:
198
+ if id(self.sample_image) != id(sample_image):
199
+ self.sample_image = sample_image
200
+ d_image_np = (sample_image * 255).byte().numpy()
201
+ d_face = self.crop_face(d_image_np[0], 1.7)
202
+ i_d = self.prepare_src_image(d_face)
203
+ self.d_info = self.pipeline.get_kp_info(i_d)
204
+ self.d_info['exp'][0, 5, 0] = 0
205
+ self.d_info['exp'][0, 5, 1] = 0
206
+
207
+ # "OnlyExpression", "OnlyRotation", "OnlyMouth", "OnlyEyes", "All"
208
+ if sample_parts == SamplePart.ONLY_EXPRESSION.value or sample_parts == SamplePart.ONLY_EXPRESSION.ALL.value:
209
+ es.e += self.d_info['exp'] * sample_ratio
210
+ if sample_parts == SamplePart.ONLY_ROTATION.value or sample_parts == SamplePart.ONLY_ROTATION.ALL.value:
211
+ rotate_pitch += self.d_info['pitch'] * sample_ratio
212
+ rotate_yaw += self.d_info['yaw'] * sample_ratio
213
+ rotate_roll += self.d_info['roll'] * sample_ratio
214
+ elif sample_parts == SamplePart.ONLY_MOUTH.value:
215
+ self.retargeting(es.e, self.d_info['exp'], sample_ratio, (14, 17, 19, 20))
216
+ elif sample_parts == SamplePart.ONLY_EYES.value:
217
+ self.retargeting(es.e, self.d_info['exp'], sample_ratio, (1, 2, 11, 13, 15, 16))
218
+
219
+ es.r = self.calc_fe(es.e, blink, eyebrow, wink, pupil_x, pupil_y, aaa, eee, woo, smile,
220
+ rotate_pitch, rotate_yaw, rotate_roll)
221
+
222
+ new_rotate = get_rotation_matrix(s_info['pitch'] + es.r[0], s_info['yaw'] + es.r[1],
223
+ s_info['roll'] + es.r[2])
224
+ x_d_new = (s_info['scale'] * (1 + es.s)) * ((s_exp + es.e) @ new_rotate) + s_info['t']
225
+
226
+ x_d_new = self.pipeline.stitching(psi.x_s_user, x_d_new)
227
+
228
+ crop_out = self.pipeline.warp_decode(psi.f_s_user, psi.x_s_user, x_d_new)
229
+ crop_out = self.pipeline.parse_output(crop_out['out'])[0]
230
+
231
+ crop_with_fullsize = cv2.warpAffine(crop_out, psi.crop_trans_m, get_rgb_size(psi.src_rgb), cv2.INTER_LINEAR)
232
+ out = np.clip(psi.mask_ori * crop_with_fullsize + (1 - psi.mask_ori) * psi.src_rgb, 0, 255).astype(np.uint8)
233
+
234
+ temp_out_img_path, out_img_path = get_auto_incremental_file_path(TEMP_DIR, "png"), get_auto_incremental_file_path(OUTPUTS_DIR, "png")
235
+ save_image(numpy_array=crop_out, output_path=temp_out_img_path)
236
+ save_image(numpy_array=out, output_path=out_img_path)
237
+
238
+ return out
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
239
  except Exception as e:
240
  raise
241
 
242
  def create_video(self,
243
+ model_type: str = ModelType.HUMAN.value,
244
+ retargeting_eyes: float = 1,
245
+ retargeting_mouth: float = 1,
246
+ crop_factor: float = 2.3,
247
+ src_image: Optional[str] = None,
248
+ driving_vid_path: Optional[str] = None,
249
+ progress: gr.Progress = gr.Progress()
250
+ ):
251
+ if self.pipeline is None or model_type != self.model_type:
252
+ self.load_models(
253
+ model_type=model_type
254
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
255
 
256
+ try:
257
+ vid_info = get_video_info(vid_input=driving_vid_path)
 
 
258
 
259
+ if src_image is not None:
260
+ if id(src_image) != id(self.src_image) or self.crop_factor != crop_factor:
261
+ self.crop_factor = crop_factor
262
+ self.src_image = src_image
 
 
263
 
264
+ self.psi_list = [self.prepare_source(src_image, crop_factor)]
265
 
266
+ progress(0, desc="Extracting frames from the video..")
267
+ driving_images, vid_sound = extract_frames(driving_vid_path, os.path.join(self.output_dir, "temp", "video_frames")), extract_sound(driving_vid_path)
268
 
269
+ driving_length = 0
270
+ if driving_images is not None:
271
+ if id(driving_images) != id(self.driving_images):
272
+ self.driving_images = driving_images
273
+ self.driving_values = self.prepare_driving_video(driving_images)
274
+ driving_length = len(self.driving_values)
275
 
276
+ total_length = len(driving_images)
 
277
 
278
+ c_i_es = ExpressionSet()
279
+ c_o_es = ExpressionSet()
280
+ d_0_es = None
 
281
 
282
+ psi = None
283
+ with torch.autocast(device_type=self.device, enabled=(self.device == "cuda")):
284
+ for i in range(total_length):
285
 
286
+ if i == 0:
287
+ psi = self.psi_list[i]
288
+ s_info = psi.x_s_info
289
+ s_es = ExpressionSet(erst=(s_info['kp'] + s_info['exp'], torch.Tensor([0, 0, 0]), s_info['scale'], s_info['t']))
 
 
 
 
290
 
291
+ new_es = ExpressionSet(es=s_es)
292
 
293
+ if i < driving_length:
294
+ d_i_info = self.driving_values[i]
295
+ d_i_r = torch.Tensor([d_i_info['pitch'], d_i_info['yaw'], d_i_info['roll']]) # .float().to(device="cuda:0")
 
 
 
 
 
296
 
297
+ if d_0_es is None:
298
+ d_0_es = ExpressionSet(erst = (d_i_info['exp'], d_i_r, d_i_info['scale'], d_i_info['t']))
 
299
 
300
+ self.retargeting(s_es.e, d_0_es.e, retargeting_eyes, (11, 13, 15, 16))
301
+ self.retargeting(s_es.e, d_0_es.e, retargeting_mouth, (14, 17, 19, 20))
302
 
303
+ new_es.e += d_i_info['exp'] - d_0_es.e
304
+ new_es.r += d_i_r - d_0_es.r
305
+ new_es.t += d_i_info['t'] - d_0_es.t
306
 
307
+ r_new = get_rotation_matrix(
308
+ s_info['pitch'] + new_es.r[0], s_info['yaw'] + new_es.r[1], s_info['roll'] + new_es.r[2])
309
+ d_new = new_es.s * (new_es.e @ r_new) + new_es.t
310
+ d_new = self.pipeline.stitching(psi.x_s_user, d_new)
311
+ crop_out = self.pipeline.warp_decode(psi.f_s_user, psi.x_s_user, d_new)
312
+ crop_out = self.pipeline.parse_output(crop_out['out'])[0]
313
 
314
+ crop_with_fullsize = cv2.warpAffine(crop_out, psi.crop_trans_m, get_rgb_size(psi.src_rgb),
315
+ cv2.INTER_LINEAR)
316
+ out = np.clip(psi.mask_ori * crop_with_fullsize + (1 - psi.mask_ori) * psi.src_rgb, 0, 255).astype(
317
+ np.uint8)
 
 
318
 
319
+ out_frame_path = get_auto_incremental_file_path(os.path.join(self.output_dir, "temp", "video_frames", "out"), "png")
320
+ save_image(out, out_frame_path)
 
 
 
321
 
322
+ progress(i/total_length, desc=f"Generating frames {i}/{total_length} ..")
323
 
324
+ video_path = create_video_from_frames(TEMP_VIDEO_OUT_FRAMES_DIR, frame_rate=vid_info.frame_rate, output_dir=os.path.join(self.output_dir, "videos"))
 
325
 
326
+ return video_path
327
+ except Exception as e:
328
+ raise
329
 
330
  def download_if_no_models(self,
331
  model_type: str = ModelType.HUMAN.value,
 
496
  @staticmethod
497
  def retargeting(delta_out, driving_exp, factor, idxes):
498
  for idx in idxes:
 
499
  delta_out[0, idx] += driving_exp[0, idx] * factor
500
 
501
  @staticmethod
 
519
  return new_img
520
 
521
  def prepare_src_image(self, img):
522
+ if isinstance(img, str):
523
+ img = image_path_to_array(img)
524
+
525
+ if len(img.shape) <= 3:
526
+ img = img[np.newaxis, ...]
527
+
528
+ d, h, w, c = img.shape
529
+ img = img[0] # Select first dimension
530
+ input_shape = [256, 256]
531
  if h != input_shape[0] or w != input_shape[1]:
532
  if 256 < h: interpolation = cv2.INTER_AREA
533
  else: interpolation = cv2.INTER_LINEAR
 
598
  return psi_list
599
 
600
  def prepare_driving_video(self, face_images):
601
+ # print("Prepare driving video...")
 
 
602
  out_list = []
603
+ for f_img in face_images:
604
  i_d = self.prepare_src_image(f_img)
605
  d_info = self.pipeline.get_kp_info(i_d)
606
  out_list.append(d_info)
modules/utils/constants.py CHANGED
@@ -33,4 +33,10 @@ GRADIO_CSS = """
33
  #blink_slider .md.svelte-7ddecg.chatbot.prose {
34
  font-size: 0.7em;
35
  }
36
- """
 
 
 
 
 
 
 
33
  #blink_slider .md.svelte-7ddecg.chatbot.prose {
34
  font-size: 0.7em;
35
  }
36
+ """
37
+
38
+ SOUND_FILE_EXT = ['.mp3', '.wav', '.aac', '.flac', '.ogg', '.m4a', '.wma']
39
+ IMAGE_FILE_EXT = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff', '.webp']
40
+ VIDEO_FILE_EXT = ['.mp4', '.avi', '.mov', '.wmv', '.flv', '.webm', '.mkv', '.mpeg', '.mpg', '.m4v', '.3gp', '.ts', '.vob', '.gif']
41
+ TRANSPARENT_VIDEO_FILE_EXT = ['.webm', '.mov', '.gif']
42
+ SUPPORTED_VIDEO_FILE_EXT = ['.mp4', '.mov', '.webm', '.gif']
modules/utils/image_helper.py CHANGED
@@ -56,6 +56,7 @@ def calc_crop_limit(center, img_size, crop_size):
56
  def save_image(numpy_array: np.ndarray, output_path: str):
57
  out = Image.fromarray(numpy_array)
58
  out.save(output_path, compress_level=1, format="png")
 
59
 
60
 
61
  def image_path_to_array(image_path: str) -> np.ndarray:
 
56
  def save_image(numpy_array: np.ndarray, output_path: str):
57
  out = Image.fromarray(numpy_array)
58
  out.save(output_path, compress_level=1, format="png")
59
+ return output_path
60
 
61
 
62
  def image_path_to_array(image_path: str) -> np.ndarray:
modules/utils/paths.py CHANGED
@@ -6,7 +6,10 @@ PROJECT_ROOT_DIR = os.path.join(os.path.abspath(os.path.dirname(__file__)), ".."
6
  MODELS_DIR = os.path.join(PROJECT_ROOT_DIR, "models")
7
  MODELS_ANIMAL_DIR = os.path.join(MODELS_DIR, "animal")
8
  OUTPUTS_DIR = os.path.join(PROJECT_ROOT_DIR, "outputs")
 
9
  TEMP_DIR = os.path.join(OUTPUTS_DIR, "temp")
 
 
10
  EXP_OUTPUT_DIR = os.path.join(OUTPUTS_DIR, "exp_data")
11
  MODEL_CONFIG = os.path.join(PROJECT_ROOT_DIR, "modules", "config", "models.yaml")
12
  MODEL_PATHS = {
@@ -31,7 +34,7 @@ I18N_YAML_PATH = os.path.join(PROJECT_ROOT_DIR, "i18n", "translation.yaml")
31
 
32
 
33
  def get_auto_incremental_file_path(dir_path: str, extension: str, prefix: str = ""):
34
- counter = 0
35
  while True:
36
  if prefix:
37
  filename = f"{prefix}_{counter:05d}.{extension}"
@@ -39,6 +42,7 @@ def get_auto_incremental_file_path(dir_path: str, extension: str, prefix: str =
39
  filename = f"{counter:05d}.{extension}"
40
  full_path = os.path.join(dir_path, filename)
41
  if not os.path.exists(full_path):
 
42
  return full_path
43
  counter += 1
44
 
@@ -50,7 +54,10 @@ def init_dirs():
50
  MODELS_ANIMAL_DIR,
51
  OUTPUTS_DIR,
52
  EXP_OUTPUT_DIR,
53
- TEMP_DIR
 
 
 
54
  ]:
55
  os.makedirs(dir_path, exist_ok=True)
56
 
 
6
  MODELS_DIR = os.path.join(PROJECT_ROOT_DIR, "models")
7
  MODELS_ANIMAL_DIR = os.path.join(MODELS_DIR, "animal")
8
  OUTPUTS_DIR = os.path.join(PROJECT_ROOT_DIR, "outputs")
9
+ OUTPUTS_VIDEOS_DIR = os.path.join(OUTPUTS_DIR, "videos")
10
  TEMP_DIR = os.path.join(OUTPUTS_DIR, "temp")
11
+ TEMP_VIDEO_FRAMES_DIR = os.path.join(TEMP_DIR, "video_frames")
12
+ TEMP_VIDEO_OUT_FRAMES_DIR = os.path.join(TEMP_VIDEO_FRAMES_DIR, "out")
13
  EXP_OUTPUT_DIR = os.path.join(OUTPUTS_DIR, "exp_data")
14
  MODEL_CONFIG = os.path.join(PROJECT_ROOT_DIR, "modules", "config", "models.yaml")
15
  MODEL_PATHS = {
 
34
 
35
 
36
  def get_auto_incremental_file_path(dir_path: str, extension: str, prefix: str = ""):
37
+ counter = len(os.listdir(dir_path))
38
  while True:
39
  if prefix:
40
  filename = f"{prefix}_{counter:05d}.{extension}"
 
42
  filename = f"{counter:05d}.{extension}"
43
  full_path = os.path.join(dir_path, filename)
44
  if not os.path.exists(full_path):
45
+ full_path = os.path.normpath(full_path)
46
  return full_path
47
  counter += 1
48
 
 
54
  MODELS_ANIMAL_DIR,
55
  OUTPUTS_DIR,
56
  EXP_OUTPUT_DIR,
57
+ TEMP_DIR,
58
+ TEMP_VIDEO_FRAMES_DIR,
59
+ TEMP_VIDEO_OUT_FRAMES_DIR,
60
+ OUTPUTS_VIDEOS_DIR
61
  ]:
62
  os.makedirs(dir_path, exist_ok=True)
63
 
modules/utils/video_helper.py ADDED
@@ -0,0 +1,315 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import subprocess
2
+ import os
3
+ from typing import List, Optional, Union
4
+ import cv2
5
+ from PIL import Image
6
+ import numpy as np
7
+ from dataclasses import dataclass
8
+ import re
9
+ from pathlib import Path
10
+
11
+ from modules.utils.constants import SOUND_FILE_EXT, VIDEO_FILE_EXT, IMAGE_FILE_EXT
12
+ from modules.utils.paths import (TEMP_VIDEO_FRAMES_DIR, TEMP_VIDEO_OUT_FRAMES_DIR, OUTPUTS_VIDEOS_DIR,
13
+ get_auto_incremental_file_path)
14
+
15
+
16
+ @dataclass
17
+ class VideoInfo:
18
+ num_frames: Optional[int] = None
19
+ frame_rate: Optional[int] = None
20
+ duration: Optional[float] = None
21
+ has_sound: Optional[bool] = None
22
+ codec: Optional[str] = None
23
+
24
+
25
+ def extract_frames(
26
+ vid_input: str,
27
+ output_temp_dir: str = TEMP_VIDEO_FRAMES_DIR,
28
+ start_number: int = 0,
29
+ clean=True
30
+ ):
31
+ """
32
+ Extract frames as jpg files and save them into output_temp_dir. This needs FFmpeg installed.
33
+ """
34
+ if clean:
35
+ clean_temp_dir(temp_dir=output_temp_dir)
36
+
37
+ os.makedirs(output_temp_dir, exist_ok=True)
38
+ output_path = os.path.join(output_temp_dir, "%05d.jpg")
39
+
40
+ command = [
41
+ 'ffmpeg',
42
+ '-loglevel', 'error',
43
+ '-y', # Enable overwriting
44
+ '-i', vid_input,
45
+ '-qscale:v', '2',
46
+ '-vf', f'scale=iw:ih',
47
+ '-start_number', str(start_number),
48
+ f'{output_path}'
49
+ ]
50
+
51
+ try:
52
+ subprocess.run(command, check=True)
53
+ print(f"Video frames extracted to \"{os.path.normpath(output_temp_dir)}\"")
54
+ except subprocess.CalledProcessError as e:
55
+ print("Error occurred while extracting frames from the video")
56
+ raise RuntimeError(f"An error occurred: {str(e)}")
57
+
58
+ return get_frames_from_dir(output_temp_dir)
59
+
60
+
61
+ def extract_sound(
62
+ vid_input: str,
63
+ output_temp_dir: str = TEMP_VIDEO_FRAMES_DIR,
64
+ ):
65
+ """
66
+ Extract audio from a video file and save it as a separate sound file. This needs FFmpeg installed.
67
+ """
68
+ if Path(vid_input).suffix == ".gif":
69
+ print("Sound extracting process has passed because gif has no sound")
70
+ return None
71
+
72
+ os.makedirs(output_temp_dir, exist_ok=True)
73
+ output_path = os.path.join(output_temp_dir, "sound.mp3")
74
+
75
+ command = [
76
+ 'ffmpeg',
77
+ '-loglevel', 'error',
78
+ '-y', # Enable overwriting
79
+ '-i', vid_input,
80
+ '-vn',
81
+ output_path
82
+ ]
83
+
84
+ try:
85
+ subprocess.run(command, check=True)
86
+ except subprocess.CalledProcessError as e:
87
+ print(f"Warning: Failed to extract sound from the video: {e}")
88
+
89
+ return output_path
90
+
91
+
92
+ def get_video_info(vid_input: str) -> VideoInfo:
93
+ """
94
+ Extract video information using ffmpeg.
95
+ """
96
+ command = [
97
+ 'ffmpeg',
98
+ '-i', vid_input,
99
+ '-map', '0:v:0',
100
+ '-c', 'copy',
101
+ '-f', 'null',
102
+ '-'
103
+ ]
104
+
105
+ try:
106
+ result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
107
+ encoding='utf-8', errors='replace', check=True)
108
+ output = result.stderr
109
+
110
+ num_frames = None
111
+ frame_rate = None
112
+ duration = None
113
+ has_sound = False
114
+ codec = None
115
+
116
+ for line in output.splitlines():
117
+ if 'Stream #0:0' in line and 'Video:' in line:
118
+ fps_match = re.search(r'(\d+(?:\.\d+)?) fps', line)
119
+ if fps_match:
120
+ frame_rate = float(fps_match.group(1))
121
+
122
+ codec_match = re.search(r'Video: (\w+)', line)
123
+ if codec_match:
124
+ codec = codec_match.group(1)
125
+
126
+ elif 'Duration:' in line:
127
+ duration_match = re.search(r'Duration: (\d{2}):(\d{2}):(\d{2}\.\d{2})', line)
128
+ if duration_match:
129
+ h, m, s = map(float, duration_match.groups())
130
+ duration = h * 3600 + m * 60 + s
131
+
132
+ elif 'Stream' in line and 'Audio:' in line:
133
+ has_sound = True
134
+
135
+ if frame_rate and duration:
136
+ num_frames = int(frame_rate * duration)
137
+
138
+ print(f"Video info - frame_rate: {frame_rate}, duration: {duration}, total frames: {num_frames}")
139
+ return VideoInfo(
140
+ num_frames=num_frames,
141
+ frame_rate=frame_rate,
142
+ duration=duration,
143
+ has_sound=has_sound,
144
+ codec=codec
145
+ )
146
+
147
+ except subprocess.CalledProcessError as e:
148
+ print("Error occurred while getting info from the video")
149
+ return VideoInfo()
150
+
151
+
152
+ def create_video_from_frames(
153
+ frames_dir: str,
154
+ frame_rate: Optional[int] = None,
155
+ sound_path: Optional[str] = None,
156
+ output_dir: Optional[str] = None,
157
+ output_mime_type: Optional[str] = None,
158
+ ):
159
+ """
160
+ Create a video from frames and save it to the output_path. This needs FFmpeg installed.
161
+ """
162
+ if not os.path.exists(frames_dir):
163
+ raise "frames_dir does not exist"
164
+ frames_dir = os.path.normpath(frames_dir)
165
+
166
+ if output_dir is None:
167
+ output_dir = OUTPUTS_VIDEOS_DIR
168
+ os.makedirs(output_dir, exist_ok=True)
169
+
170
+ frame_img_mime_type = ".png"
171
+ pix_format = "yuv420p"
172
+ vid_codec, audio_codec = "libx264", "aac"
173
+
174
+ if output_mime_type is None:
175
+ output_mime_type = ".mp4"
176
+
177
+ output_mime_type = output_mime_type.lower()
178
+ if output_mime_type == ".mov":
179
+ pix_format = "yuva444p10le"
180
+ vid_codec, audio_codec = "prores_ks", "aac"
181
+
182
+ elif output_mime_type == ".webm":
183
+ pix_format = "yuva420p"
184
+ vid_codec, audio_codec = "libvpx-vp9", "libvorbis"
185
+
186
+ elif output_mime_type == ".gif":
187
+ pix_format = None
188
+ vid_codec, audio_codec = "gif", None
189
+
190
+ output_path = get_auto_incremental_file_path(output_dir, output_mime_type.replace(".", ""))
191
+
192
+ if sound_path is None:
193
+ temp_sound = os.path.normpath(os.path.join(TEMP_VIDEO_FRAMES_DIR, "sound.mp3"))
194
+ if os.path.exists(temp_sound):
195
+ sound_path = temp_sound
196
+
197
+ if frame_rate is None:
198
+ frame_rate = 25 # Default frame rate for ffmpeg
199
+
200
+ command = [
201
+ 'ffmpeg',
202
+ '-loglevel', 'error',
203
+ '-y',
204
+ '-framerate', str(frame_rate),
205
+ '-i', os.path.join(frames_dir, f"%05d{frame_img_mime_type}"),
206
+ '-c:v', vid_codec,
207
+ '-vf', 'crop=trunc(iw/2)*2:trunc(ih/2)*2' if pix_format else None,
208
+ ]
209
+
210
+ if output_mime_type == ".gif":
211
+ command += [
212
+ "-filter_complex", "[0:v] palettegen=reserve_transparent=on [p]; [0:v][p] paletteuse",
213
+ "-loop", "0"
214
+ ]
215
+ else:
216
+ command += [
217
+ '-pix_fmt', pix_format
218
+ ]
219
+
220
+ command += [output_path]
221
+
222
+ if output_mime_type != ".gif" and sound_path is not None:
223
+ command += [
224
+ '-i', sound_path,
225
+ '-c:a', audio_codec,
226
+ '-strict', 'experimental',
227
+ '-b:a', '192k',
228
+ '-shortest'
229
+ ]
230
+ try:
231
+ subprocess.run(command, check=True)
232
+ except subprocess.CalledProcessError as e:
233
+ print(f"Error occurred while creating video from frames")
234
+ raise
235
+ return output_path
236
+
237
+
238
+ def create_video_from_numpy_list(frame_list: List[np.ndarray],
239
+ frame_rate: Optional[int] = None,
240
+ sound_path: Optional[str] = None,
241
+ output_dir: Optional[str] = None
242
+ ):
243
+ if output_dir is None:
244
+ output_dir = OUTPUTS_VIDEOS_DIR
245
+ os.makedirs(output_dir, exist_ok=True)
246
+ output_path = get_auto_incremental_file_path(output_dir, "mp4")
247
+
248
+ if frame_rate is None:
249
+ frame_rate = 25
250
+
251
+ if sound_path is None:
252
+ temp_sound = os.path.join(TEMP_VIDEO_FRAMES_DIR, "sound.mp3")
253
+ if os.path.exists(temp_sound):
254
+ sound_path = temp_sound
255
+
256
+ height, width, layers = frame_list[0].shape
257
+ fourcc = cv2.VideoWriter.fourcc(*'mp4v')
258
+
259
+ out = cv2.VideoWriter(output_path, fourcc, frame_rate, (width, height))
260
+
261
+ for frame in frame_list:
262
+ out.write(cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
263
+
264
+ out.release()
265
+
266
+
267
+ def get_frames_from_dir(vid_dir: str,
268
+ available_extensions: Optional[Union[List, str]] = None,
269
+ as_numpy: bool = False) -> List:
270
+ """Get image file paths list from the dir"""
271
+ if available_extensions is None:
272
+ available_extensions = [".jpg", ".jpeg", ".JPG", ".JPEG"]
273
+
274
+ if isinstance(available_extensions, str):
275
+ available_extensions = [available_extensions]
276
+
277
+ frame_names = [
278
+ p for p in os.listdir(vid_dir)
279
+ if os.path.splitext(p)[-1] in available_extensions
280
+ ]
281
+ if not frame_names:
282
+ return []
283
+ frame_names.sort(key=lambda x: int(os.path.splitext(x)[0]))
284
+
285
+ frames = [os.path.join(vid_dir, name) for name in frame_names]
286
+ if as_numpy:
287
+ frames = [np.array(Image.open(frame)) for frame in frames]
288
+
289
+ return frames
290
+
291
+
292
+ def clean_temp_dir(temp_dir: Optional[str] = None):
293
+ """Removes media files from the video frames directory."""
294
+ if temp_dir is None:
295
+ temp_dir = TEMP_VIDEO_FRAMES_DIR
296
+ temp_out_dir = TEMP_VIDEO_OUT_FRAMES_DIR
297
+ else:
298
+ temp_out_dir = os.path.join(temp_dir, "out")
299
+
300
+ clean_files_with_extension(temp_dir, SOUND_FILE_EXT)
301
+ clean_files_with_extension(temp_dir, IMAGE_FILE_EXT)
302
+
303
+ if os.path.exists(temp_out_dir):
304
+ clean_files_with_extension(temp_out_dir, IMAGE_FILE_EXT)
305
+
306
+
307
+ def clean_files_with_extension(dir_path: str, extensions: List):
308
+ """Remove files with the given extensions from the directory."""
309
+ for filename in os.listdir(dir_path):
310
+ if filename.lower().endswith(tuple(extensions)):
311
+ file_path = os.path.join(dir_path, filename)
312
+ try:
313
+ os.remove(file_path)
314
+ except Exception as e:
315
+ print("Error while removing image files")
requirements.txt CHANGED
@@ -13,4 +13,10 @@ ultralytics
13
  tyro
14
  dill
15
  gradio
16
- gradio-i18n
 
 
 
 
 
 
 
13
  tyro
14
  dill
15
  gradio
16
+ gradio-i18n
17
+
18
+
19
+ # Tests
20
+ # pytest
21
+ # scikit-image
22
+ # moviepy
tests/test_config.py CHANGED
@@ -4,13 +4,18 @@ import os
4
  import torch
5
  import functools
6
  import numpy as np
 
 
 
7
 
8
  from modules.utils.paths import *
9
 
10
 
11
  TEST_IMAGE_URL = "https://github.com/microsoft/onnxjs-demo/raw/master/src/assets/EmotionSampleImages/sad_baby.jpg"
12
- TEST_IMAGE_PATH = os.path.join(PROJECT_ROOT_DIR, "tests", "test.png")
13
- TEST_EXPRESSION_OUTPUT_PATH = os.path.join(PROJECT_ROOT_DIR, "tests", "edited_expression.png")
 
 
14
  TEST_EXPRESSION_AAA = 100
15
 
16
 
@@ -40,6 +45,62 @@ def are_images_different(image1_path: str, image2_path: str):
40
  return True
41
 
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  @functools.lru_cache
44
  def is_cuda_available():
45
  return torch.cuda.is_available()
 
4
  import torch
5
  import functools
6
  import numpy as np
7
+ import cv2
8
+ from skimage.metrics import structural_similarity as compare_ssim
9
+ from moviepy.editor import VideoFileClip
10
 
11
  from modules.utils.paths import *
12
 
13
 
14
  TEST_IMAGE_URL = "https://github.com/microsoft/onnxjs-demo/raw/master/src/assets/EmotionSampleImages/sad_baby.jpg"
15
+ TEST_VIDEO_URL = "https://github.com/jhj0517/sample-medias/raw/master/vids/human-face/expression01_short.mp4"
16
+ TEST_IMAGE_PATH = os.path.normpath(os.path.join(PROJECT_ROOT_DIR, "tests", "test.png"))
17
+ TEST_VIDEO_PATH = os.path.normpath(os.path.join(PROJECT_ROOT_DIR, "tests", "test_expression.mp4"))
18
+ TEST_EXPRESSION_OUTPUT_PATH = os.path.normpath(os.path.join(PROJECT_ROOT_DIR, "tests", "edited_expression.png"))
19
  TEST_EXPRESSION_AAA = 100
20
 
21
 
 
45
  return True
46
 
47
 
48
+ def are_videos_different(video1_path: str, video2_path: str):
49
+ cap1 = cv2.VideoCapture(video1_path)
50
+ cap2 = cv2.VideoCapture(video2_path)
51
+
52
+ while True:
53
+ ret1, frame1 = cap1.read()
54
+ ret2, frame2 = cap2.read()
55
+
56
+ if not ret1 or not ret2:
57
+ if ret1 != ret2:
58
+ return True
59
+ break
60
+
61
+ if frame1.shape != frame2.shape:
62
+ frame1 = cv2.resize(frame1, (frame2.shape[1], frame2.shape[0]))
63
+
64
+ score, _ = compare_ssim(frame1, frame2, full=True, multichannel=True)
65
+
66
+ if score < 0.99:
67
+ return True
68
+
69
+ cap1.release()
70
+ cap2.release()
71
+ return False
72
+
73
+
74
+ def validate_video(video_path):
75
+ cap = cv2.VideoCapture(video_path)
76
+ if not cap.isOpened():
77
+ print("Could not open video file.")
78
+ return False
79
+
80
+ frame_count = 0
81
+ while True:
82
+ ret, frame = cap.read()
83
+ if not ret:
84
+ break
85
+ frame_count += 1
86
+
87
+ cap.release()
88
+
89
+ if frame_count == 0:
90
+ print("No frames found in video file.")
91
+ return False
92
+
93
+ return True
94
+
95
+
96
+ def has_sound(video_path: str):
97
+ try:
98
+ video = VideoFileClip(video_path)
99
+ return video.audio is not None
100
+ except Exception as e:
101
+ return False
102
+
103
+
104
  @functools.lru_cache
105
  def is_cuda_available():
106
  return torch.cuda.is_available()
tests/test_video_creation.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import pytest
3
+
4
+ from test_config import *
5
+ from modules.live_portrait.live_portrait_inferencer import LivePortraitInferencer
6
+ from modules.utils.image_helper import save_image
7
+
8
+
9
+ @pytest.mark.parametrize(
10
+ "input_image,expression_video",
11
+ [
12
+ (TEST_IMAGE_PATH, TEST_VIDEO_PATH),
13
+ ]
14
+ )
15
+ def test_video_creation(
16
+ input_image: str,
17
+ expression_video: str
18
+ ):
19
+ if not os.path.exists(TEST_IMAGE_PATH):
20
+ download_image(
21
+ TEST_IMAGE_URL,
22
+ TEST_IMAGE_PATH
23
+ )
24
+ if not os.path.exists(TEST_VIDEO_PATH):
25
+ download_image(
26
+ TEST_VIDEO_URL,
27
+ TEST_VIDEO_PATH
28
+ )
29
+
30
+ inferencer = LivePortraitInferencer()
31
+
32
+ output_video_path = inferencer.create_video(
33
+ driving_vid_path=expression_video,
34
+ src_image=input_image,
35
+ )
36
+
37
+ assert os.path.exists(output_video_path)
38
+ assert validate_video(output_video_path)
39
+ assert has_sound(output_video_path)