aikenml commited on
Commit
7d1aa12
·
1 Parent(s): 92bea3e

Upload folder using huggingface_hub

Browse files
Files changed (12) hide show
  1. .DS_Store +0 -0
  2. LICENSE.txt +661 -0
  3. SegTracker.py +264 -0
  4. aiken +1 -0
  5. aot_tracker.py +186 -0
  6. app.py +782 -0
  7. demo.ipynb +334 -0
  8. demo_instseg.ipynb +353 -0
  9. img2vid.py +26 -0
  10. licenses.md +660 -0
  11. model_args.py +28 -0
  12. seg_track_anything.py +300 -0
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
LICENSE.txt ADDED
@@ -0,0 +1,661 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ GNU AFFERO GENERAL PUBLIC LICENSE
2
+ Version 3, 19 November 2007
3
+
4
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
5
+ Everyone is permitted to copy and distribute verbatim copies
6
+ of this license document, but changing it is not allowed.
7
+
8
+ Preamble
9
+
10
+ The GNU Affero General Public License is a free, copyleft license for
11
+ software and other kinds of works, specifically designed to ensure
12
+ cooperation with the community in the case of network server software.
13
+
14
+ The licenses for most software and other practical works are designed
15
+ to take away your freedom to share and change the works. By contrast,
16
+ our General Public Licenses are intended to guarantee your freedom to
17
+ share and change all versions of a program--to make sure it remains free
18
+ software for all its users.
19
+
20
+ When we speak of free software, we are referring to freedom, not
21
+ price. Our General Public Licenses are designed to make sure that you
22
+ have the freedom to distribute copies of free software (and charge for
23
+ them if you wish), that you receive source code or can get it if you
24
+ want it, that you can change the software or use pieces of it in new
25
+ free programs, and that you know you can do these things.
26
+
27
+ Developers that use our General Public Licenses protect your rights
28
+ with two steps: (1) assert copyright on the software, and (2) offer
29
+ you this License which gives you legal permission to copy, distribute
30
+ and/or modify the software.
31
+
32
+ A secondary benefit of defending all users' freedom is that
33
+ improvements made in alternate versions of the program, if they
34
+ receive widespread use, become available for other developers to
35
+ incorporate. Many developers of free software are heartened and
36
+ encouraged by the resulting cooperation. However, in the case of
37
+ software used on network servers, this result may fail to come about.
38
+ The GNU General Public License permits making a modified version and
39
+ letting the public access it on a server without ever releasing its
40
+ source code to the public.
41
+
42
+ The GNU Affero General Public License is designed specifically to
43
+ ensure that, in such cases, the modified source code becomes available
44
+ to the community. It requires the operator of a network server to
45
+ provide the source code of the modified version running there to the
46
+ users of that server. Therefore, public use of a modified version, on
47
+ a publicly accessible server, gives the public access to the source
48
+ code of the modified version.
49
+
50
+ An older license, called the Affero General Public License and
51
+ published by Affero, was designed to accomplish similar goals. This is
52
+ a different license, not a version of the Affero GPL, but Affero has
53
+ released a new version of the Affero GPL which permits relicensing under
54
+ this license.
55
+
56
+ The precise terms and conditions for copying, distribution and
57
+ modification follow.
58
+
59
+ TERMS AND CONDITIONS
60
+
61
+ 0. Definitions.
62
+
63
+ "This License" refers to version 3 of the GNU Affero General Public License.
64
+
65
+ "Copyright" also means copyright-like laws that apply to other kinds of
66
+ works, such as semiconductor masks.
67
+
68
+ "The Program" refers to any copyrightable work licensed under this
69
+ License. Each licensee is addressed as "you". "Licensees" and
70
+ "recipients" may be individuals or organizations.
71
+
72
+ To "modify" a work means to copy from or adapt all or part of the work
73
+ in a fashion requiring copyright permission, other than the making of an
74
+ exact copy. The resulting work is called a "modified version" of the
75
+ earlier work or a work "based on" the earlier work.
76
+
77
+ A "covered work" means either the unmodified Program or a work based
78
+ on the Program.
79
+
80
+ To "propagate" a work means to do anything with it that, without
81
+ permission, would make you directly or secondarily liable for
82
+ infringement under applicable copyright law, except executing it on a
83
+ computer or modifying a private copy. Propagation includes copying,
84
+ distribution (with or without modification), making available to the
85
+ public, and in some countries other activities as well.
86
+
87
+ To "convey" a work means any kind of propagation that enables other
88
+ parties to make or receive copies. Mere interaction with a user through
89
+ a computer network, with no transfer of a copy, is not conveying.
90
+
91
+ An interactive user interface displays "Appropriate Legal Notices"
92
+ to the extent that it includes a convenient and prominently visible
93
+ feature that (1) displays an appropriate copyright notice, and (2)
94
+ tells the user that there is no warranty for the work (except to the
95
+ extent that warranties are provided), that licensees may convey the
96
+ work under this License, and how to view a copy of this License. If
97
+ the interface presents a list of user commands or options, such as a
98
+ menu, a prominent item in the list meets this criterion.
99
+
100
+ 1. Source Code.
101
+
102
+ The "source code" for a work means the preferred form of the work
103
+ for making modifications to it. "Object code" means any non-source
104
+ form of a work.
105
+
106
+ A "Standard Interface" means an interface that either is an official
107
+ standard defined by a recognized standards body, or, in the case of
108
+ interfaces specified for a particular programming language, one that
109
+ is widely used among developers working in that language.
110
+
111
+ The "System Libraries" of an executable work include anything, other
112
+ than the work as a whole, that (a) is included in the normal form of
113
+ packaging a Major Component, but which is not part of that Major
114
+ Component, and (b) serves only to enable use of the work with that
115
+ Major Component, or to implement a Standard Interface for which an
116
+ implementation is available to the public in source code form. A
117
+ "Major Component", in this context, means a major essential component
118
+ (kernel, window system, and so on) of the specific operating system
119
+ (if any) on which the executable work runs, or a compiler used to
120
+ produce the work, or an object code interpreter used to run it.
121
+
122
+ The "Corresponding Source" for a work in object code form means all
123
+ the source code needed to generate, install, and (for an executable
124
+ work) run the object code and to modify the work, including scripts to
125
+ control those activities. However, it does not include the work's
126
+ System Libraries, or general-purpose tools or generally available free
127
+ programs which are used unmodified in performing those activities but
128
+ which are not part of the work. For example, Corresponding Source
129
+ includes interface definition files associated with source files for
130
+ the work, and the source code for shared libraries and dynamically
131
+ linked subprograms that the work is specifically designed to require,
132
+ such as by intimate data communication or control flow between those
133
+ subprograms and other parts of the work.
134
+
135
+ The Corresponding Source need not include anything that users
136
+ can regenerate automatically from other parts of the Corresponding
137
+ Source.
138
+
139
+ The Corresponding Source for a work in source code form is that
140
+ same work.
141
+
142
+ 2. Basic Permissions.
143
+
144
+ All rights granted under this License are granted for the term of
145
+ copyright on the Program, and are irrevocable provided the stated
146
+ conditions are met. This License explicitly affirms your unlimited
147
+ permission to run the unmodified Program. The output from running a
148
+ covered work is covered by this License only if the output, given its
149
+ content, constitutes a covered work. This License acknowledges your
150
+ rights of fair use or other equivalent, as provided by copyright law.
151
+
152
+ You may make, run and propagate covered works that you do not
153
+ convey, without conditions so long as your license otherwise remains
154
+ in force. You may convey covered works to others for the sole purpose
155
+ of having them make modifications exclusively for you, or provide you
156
+ with facilities for running those works, provided that you comply with
157
+ the terms of this License in conveying all material for which you do
158
+ not control copyright. Those thus making or running the covered works
159
+ for you must do so exclusively on your behalf, under your direction
160
+ and control, on terms that prohibit them from making any copies of
161
+ your copyrighted material outside their relationship with you.
162
+
163
+ Conveying under any other circumstances is permitted solely under
164
+ the conditions stated below. Sublicensing is not allowed; section 10
165
+ makes it unnecessary.
166
+
167
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
168
+
169
+ No covered work shall be deemed part of an effective technological
170
+ measure under any applicable law fulfilling obligations under article
171
+ 11 of the WIPO copyright treaty adopted on 20 December 1996, or
172
+ similar laws prohibiting or restricting circumvention of such
173
+ measures.
174
+
175
+ When you convey a covered work, you waive any legal power to forbid
176
+ circumvention of technological measures to the extent such circumvention
177
+ is effected by exercising rights under this License with respect to
178
+ the covered work, and you disclaim any intention to limit operation or
179
+ modification of the work as a means of enforcing, against the work's
180
+ users, your or third parties' legal rights to forbid circumvention of
181
+ technological measures.
182
+
183
+ 4. Conveying Verbatim Copies.
184
+
185
+ You may convey verbatim copies of the Program's source code as you
186
+ receive it, in any medium, provided that you conspicuously and
187
+ appropriately publish on each copy an appropriate copyright notice;
188
+ keep intact all notices stating that this License and any
189
+ non-permissive terms added in accord with section 7 apply to the code;
190
+ keep intact all notices of the absence of any warranty; and give all
191
+ recipients a copy of this License along with the Program.
192
+
193
+ You may charge any price or no price for each copy that you convey,
194
+ and you may offer support or warranty protection for a fee.
195
+
196
+ 5. Conveying Modified Source Versions.
197
+
198
+ You may convey a work based on the Program, or the modifications to
199
+ produce it from the Program, in the form of source code under the
200
+ terms of section 4, provided that you also meet all of these conditions:
201
+
202
+ a) The work must carry prominent notices stating that you modified
203
+ it, and giving a relevant date.
204
+
205
+ b) The work must carry prominent notices stating that it is
206
+ released under this License and any conditions added under section
207
+ 7. This requirement modifies the requirement in section 4 to
208
+ "keep intact all notices".
209
+
210
+ c) You must license the entire work, as a whole, under this
211
+ License to anyone who comes into possession of a copy. This
212
+ License will therefore apply, along with any applicable section 7
213
+ additional terms, to the whole of the work, and all its parts,
214
+ regardless of how they are packaged. This License gives no
215
+ permission to license the work in any other way, but it does not
216
+ invalidate such permission if you have separately received it.
217
+
218
+ d) If the work has interactive user interfaces, each must display
219
+ Appropriate Legal Notices; however, if the Program has interactive
220
+ interfaces that do not display Appropriate Legal Notices, your
221
+ work need not make them do so.
222
+
223
+ A compilation of a covered work with other separate and independent
224
+ works, which are not by their nature extensions of the covered work,
225
+ and which are not combined with it such as to form a larger program,
226
+ in or on a volume of a storage or distribution medium, is called an
227
+ "aggregate" if the compilation and its resulting copyright are not
228
+ used to limit the access or legal rights of the compilation's users
229
+ beyond what the individual works permit. Inclusion of a covered work
230
+ in an aggregate does not cause this License to apply to the other
231
+ parts of the aggregate.
232
+
233
+ 6. Conveying Non-Source Forms.
234
+
235
+ You may convey a covered work in object code form under the terms
236
+ of sections 4 and 5, provided that you also convey the
237
+ machine-readable Corresponding Source under the terms of this License,
238
+ in one of these ways:
239
+
240
+ a) Convey the object code in, or embodied in, a physical product
241
+ (including a physical distribution medium), accompanied by the
242
+ Corresponding Source fixed on a durable physical medium
243
+ customarily used for software interchange.
244
+
245
+ b) Convey the object code in, or embodied in, a physical product
246
+ (including a physical distribution medium), accompanied by a
247
+ written offer, valid for at least three years and valid for as
248
+ long as you offer spare parts or customer support for that product
249
+ model, to give anyone who possesses the object code either (1) a
250
+ copy of the Corresponding Source for all the software in the
251
+ product that is covered by this License, on a durable physical
252
+ medium customarily used for software interchange, for a price no
253
+ more than your reasonable cost of physically performing this
254
+ conveying of source, or (2) access to copy the
255
+ Corresponding Source from a network server at no charge.
256
+
257
+ c) Convey individual copies of the object code with a copy of the
258
+ written offer to provide the Corresponding Source. This
259
+ alternative is allowed only occasionally and noncommercially, and
260
+ only if you received the object code with such an offer, in accord
261
+ with subsection 6b.
262
+
263
+ d) Convey the object code by offering access from a designated
264
+ place (gratis or for a charge), and offer equivalent access to the
265
+ Corresponding Source in the same way through the same place at no
266
+ further charge. You need not require recipients to copy the
267
+ Corresponding Source along with the object code. If the place to
268
+ copy the object code is a network server, the Corresponding Source
269
+ may be on a different server (operated by you or a third party)
270
+ that supports equivalent copying facilities, provided you maintain
271
+ clear directions next to the object code saying where to find the
272
+ Corresponding Source. Regardless of what server hosts the
273
+ Corresponding Source, you remain obligated to ensure that it is
274
+ available for as long as needed to satisfy these requirements.
275
+
276
+ e) Convey the object code using peer-to-peer transmission, provided
277
+ you inform other peers where the object code and Corresponding
278
+ Source of the work are being offered to the general public at no
279
+ charge under subsection 6d.
280
+
281
+ A separable portion of the object code, whose source code is excluded
282
+ from the Corresponding Source as a System Library, need not be
283
+ included in conveying the object code work.
284
+
285
+ A "User Product" is either (1) a "consumer product", which means any
286
+ tangible personal property which is normally used for personal, family,
287
+ or household purposes, or (2) anything designed or sold for incorporation
288
+ into a dwelling. In determining whether a product is a consumer product,
289
+ doubtful cases shall be resolved in favor of coverage. For a particular
290
+ product received by a particular user, "normally used" refers to a
291
+ typical or common use of that class of product, regardless of the status
292
+ of the particular user or of the way in which the particular user
293
+ actually uses, or expects or is expected to use, the product. A product
294
+ is a consumer product regardless of whether the product has substantial
295
+ commercial, industrial or non-consumer uses, unless such uses represent
296
+ the only significant mode of use of the product.
297
+
298
+ "Installation Information" for a User Product means any methods,
299
+ procedures, authorization keys, or other information required to install
300
+ and execute modified versions of a covered work in that User Product from
301
+ a modified version of its Corresponding Source. The information must
302
+ suffice to ensure that the continued functioning of the modified object
303
+ code is in no case prevented or interfered with solely because
304
+ modification has been made.
305
+
306
+ If you convey an object code work under this section in, or with, or
307
+ specifically for use in, a User Product, and the conveying occurs as
308
+ part of a transaction in which the right of possession and use of the
309
+ User Product is transferred to the recipient in perpetuity or for a
310
+ fixed term (regardless of how the transaction is characterized), the
311
+ Corresponding Source conveyed under this section must be accompanied
312
+ by the Installation Information. But this requirement does not apply
313
+ if neither you nor any third party retains the ability to install
314
+ modified object code on the User Product (for example, the work has
315
+ been installed in ROM).
316
+
317
+ The requirement to provide Installation Information does not include a
318
+ requirement to continue to provide support service, warranty, or updates
319
+ for a work that has been modified or installed by the recipient, or for
320
+ the User Product in which it has been modified or installed. Access to a
321
+ network may be denied when the modification itself materially and
322
+ adversely affects the operation of the network or violates the rules and
323
+ protocols for communication across the network.
324
+
325
+ Corresponding Source conveyed, and Installation Information provided,
326
+ in accord with this section must be in a format that is publicly
327
+ documented (and with an implementation available to the public in
328
+ source code form), and must require no special password or key for
329
+ unpacking, reading or copying.
330
+
331
+ 7. Additional Terms.
332
+
333
+ "Additional permissions" are terms that supplement the terms of this
334
+ License by making exceptions from one or more of its conditions.
335
+ Additional permissions that are applicable to the entire Program shall
336
+ be treated as though they were included in this License, to the extent
337
+ that they are valid under applicable law. If additional permissions
338
+ apply only to part of the Program, that part may be used separately
339
+ under those permissions, but the entire Program remains governed by
340
+ this License without regard to the additional permissions.
341
+
342
+ When you convey a copy of a covered work, you may at your option
343
+ remove any additional permissions from that copy, or from any part of
344
+ it. (Additional permissions may be written to require their own
345
+ removal in certain cases when you modify the work.) You may place
346
+ additional permissions on material, added by you to a covered work,
347
+ for which you have or can give appropriate copyright permission.
348
+
349
+ Notwithstanding any other provision of this License, for material you
350
+ add to a covered work, you may (if authorized by the copyright holders of
351
+ that material) supplement the terms of this License with terms:
352
+
353
+ a) Disclaiming warranty or limiting liability differently from the
354
+ terms of sections 15 and 16 of this License; or
355
+
356
+ b) Requiring preservation of specified reasonable legal notices or
357
+ author attributions in that material or in the Appropriate Legal
358
+ Notices displayed by works containing it; or
359
+
360
+ c) Prohibiting misrepresentation of the origin of that material, or
361
+ requiring that modified versions of such material be marked in
362
+ reasonable ways as different from the original version; or
363
+
364
+ d) Limiting the use for publicity purposes of names of licensors or
365
+ authors of the material; or
366
+
367
+ e) Declining to grant rights under trademark law for use of some
368
+ trade names, trademarks, or service marks; or
369
+
370
+ f) Requiring indemnification of licensors and authors of that
371
+ material by anyone who conveys the material (or modified versions of
372
+ it) with contractual assumptions of liability to the recipient, for
373
+ any liability that these contractual assumptions directly impose on
374
+ those licensors and authors.
375
+
376
+ All other non-permissive additional terms are considered "further
377
+ restrictions" within the meaning of section 10. If the Program as you
378
+ received it, or any part of it, contains a notice stating that it is
379
+ governed by this License along with a term that is a further
380
+ restriction, you may remove that term. If a license document contains
381
+ a further restriction but permits relicensing or conveying under this
382
+ License, you may add to a covered work material governed by the terms
383
+ of that license document, provided that the further restriction does
384
+ not survive such relicensing or conveying.
385
+
386
+ If you add terms to a covered work in accord with this section, you
387
+ must place, in the relevant source files, a statement of the
388
+ additional terms that apply to those files, or a notice indicating
389
+ where to find the applicable terms.
390
+
391
+ Additional terms, permissive or non-permissive, may be stated in the
392
+ form of a separately written license, or stated as exceptions;
393
+ the above requirements apply either way.
394
+
395
+ 8. Termination.
396
+
397
+ You may not propagate or modify a covered work except as expressly
398
+ provided under this License. Any attempt otherwise to propagate or
399
+ modify it is void, and will automatically terminate your rights under
400
+ this License (including any patent licenses granted under the third
401
+ paragraph of section 11).
402
+
403
+ However, if you cease all violation of this License, then your
404
+ license from a particular copyright holder is reinstated (a)
405
+ provisionally, unless and until the copyright holder explicitly and
406
+ finally terminates your license, and (b) permanently, if the copyright
407
+ holder fails to notify you of the violation by some reasonable means
408
+ prior to 60 days after the cessation.
409
+
410
+ Moreover, your license from a particular copyright holder is
411
+ reinstated permanently if the copyright holder notifies you of the
412
+ violation by some reasonable means, this is the first time you have
413
+ received notice of violation of this License (for any work) from that
414
+ copyright holder, and you cure the violation prior to 30 days after
415
+ your receipt of the notice.
416
+
417
+ Termination of your rights under this section does not terminate the
418
+ licenses of parties who have received copies or rights from you under
419
+ this License. If your rights have been terminated and not permanently
420
+ reinstated, you do not qualify to receive new licenses for the same
421
+ material under section 10.
422
+
423
+ 9. Acceptance Not Required for Having Copies.
424
+
425
+ You are not required to accept this License in order to receive or
426
+ run a copy of the Program. Ancillary propagation of a covered work
427
+ occurring solely as a consequence of using peer-to-peer transmission
428
+ to receive a copy likewise does not require acceptance. However,
429
+ nothing other than this License grants you permission to propagate or
430
+ modify any covered work. These actions infringe copyright if you do
431
+ not accept this License. Therefore, by modifying or propagating a
432
+ covered work, you indicate your acceptance of this License to do so.
433
+
434
+ 10. Automatic Licensing of Downstream Recipients.
435
+
436
+ Each time you convey a covered work, the recipient automatically
437
+ receives a license from the original licensors, to run, modify and
438
+ propagate that work, subject to this License. You are not responsible
439
+ for enforcing compliance by third parties with this License.
440
+
441
+ An "entity transaction" is a transaction transferring control of an
442
+ organization, or substantially all assets of one, or subdividing an
443
+ organization, or merging organizations. If propagation of a covered
444
+ work results from an entity transaction, each party to that
445
+ transaction who receives a copy of the work also receives whatever
446
+ licenses to the work the party's predecessor in interest had or could
447
+ give under the previous paragraph, plus a right to possession of the
448
+ Corresponding Source of the work from the predecessor in interest, if
449
+ the predecessor has it or can get it with reasonable efforts.
450
+
451
+ You may not impose any further restrictions on the exercise of the
452
+ rights granted or affirmed under this License. For example, you may
453
+ not impose a license fee, royalty, or other charge for exercise of
454
+ rights granted under this License, and you may not initiate litigation
455
+ (including a cross-claim or counterclaim in a lawsuit) alleging that
456
+ any patent claim is infringed by making, using, selling, offering for
457
+ sale, or importing the Program or any portion of it.
458
+
459
+ 11. Patents.
460
+
461
+ A "contributor" is a copyright holder who authorizes use under this
462
+ License of the Program or a work on which the Program is based. The
463
+ work thus licensed is called the contributor's "contributor version".
464
+
465
+ A contributor's "essential patent claims" are all patent claims
466
+ owned or controlled by the contributor, whether already acquired or
467
+ hereafter acquired, that would be infringed by some manner, permitted
468
+ by this License, of making, using, or selling its contributor version,
469
+ but do not include claims that would be infringed only as a
470
+ consequence of further modification of the contributor version. For
471
+ purposes of this definition, "control" includes the right to grant
472
+ patent sublicenses in a manner consistent with the requirements of
473
+ this License.
474
+
475
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
476
+ patent license under the contributor's essential patent claims, to
477
+ make, use, sell, offer for sale, import and otherwise run, modify and
478
+ propagate the contents of its contributor version.
479
+
480
+ In the following three paragraphs, a "patent license" is any express
481
+ agreement or commitment, however denominated, not to enforce a patent
482
+ (such as an express permission to practice a patent or covenant not to
483
+ sue for patent infringement). To "grant" such a patent license to a
484
+ party means to make such an agreement or commitment not to enforce a
485
+ patent against the party.
486
+
487
+ If you convey a covered work, knowingly relying on a patent license,
488
+ and the Corresponding Source of the work is not available for anyone
489
+ to copy, free of charge and under the terms of this License, through a
490
+ publicly available network server or other readily accessible means,
491
+ then you must either (1) cause the Corresponding Source to be so
492
+ available, or (2) arrange to deprive yourself of the benefit of the
493
+ patent license for this particular work, or (3) arrange, in a manner
494
+ consistent with the requirements of this License, to extend the patent
495
+ license to downstream recipients. "Knowingly relying" means you have
496
+ actual knowledge that, but for the patent license, your conveying the
497
+ covered work in a country, or your recipient's use of the covered work
498
+ in a country, would infringe one or more identifiable patents in that
499
+ country that you have reason to believe are valid.
500
+
501
+ If, pursuant to or in connection with a single transaction or
502
+ arrangement, you convey, or propagate by procuring conveyance of, a
503
+ covered work, and grant a patent license to some of the parties
504
+ receiving the covered work authorizing them to use, propagate, modify
505
+ or convey a specific copy of the covered work, then the patent license
506
+ you grant is automatically extended to all recipients of the covered
507
+ work and works based on it.
508
+
509
+ A patent license is "discriminatory" if it does not include within
510
+ the scope of its coverage, prohibits the exercise of, or is
511
+ conditioned on the non-exercise of one or more of the rights that are
512
+ specifically granted under this License. You may not convey a covered
513
+ work if you are a party to an arrangement with a third party that is
514
+ in the business of distributing software, under which you make payment
515
+ to the third party based on the extent of your activity of conveying
516
+ the work, and under which the third party grants, to any of the
517
+ parties who would receive the covered work from you, a discriminatory
518
+ patent license (a) in connection with copies of the covered work
519
+ conveyed by you (or copies made from those copies), or (b) primarily
520
+ for and in connection with specific products or compilations that
521
+ contain the covered work, unless you entered into that arrangement,
522
+ or that patent license was granted, prior to 28 March 2007.
523
+
524
+ Nothing in this License shall be construed as excluding or limiting
525
+ any implied license or other defenses to infringement that may
526
+ otherwise be available to you under applicable patent law.
527
+
528
+ 12. No Surrender of Others' Freedom.
529
+
530
+ If conditions are imposed on you (whether by court order, agreement or
531
+ otherwise) that contradict the conditions of this License, they do not
532
+ excuse you from the conditions of this License. If you cannot convey a
533
+ covered work so as to satisfy simultaneously your obligations under this
534
+ License and any other pertinent obligations, then as a consequence you may
535
+ not convey it at all. For example, if you agree to terms that obligate you
536
+ to collect a royalty for further conveying from those to whom you convey
537
+ the Program, the only way you could satisfy both those terms and this
538
+ License would be to refrain entirely from conveying the Program.
539
+
540
+ 13. Remote Network Interaction; Use with the GNU General Public License.
541
+
542
+ Notwithstanding any other provision of this License, if you modify the
543
+ Program, your modified version must prominently offer all users
544
+ interacting with it remotely through a computer network (if your version
545
+ supports such interaction) an opportunity to receive the Corresponding
546
+ Source of your version by providing access to the Corresponding Source
547
+ from a network server at no charge, through some standard or customary
548
+ means of facilitating copying of software. This Corresponding Source
549
+ shall include the Corresponding Source for any work covered by version 3
550
+ of the GNU General Public License that is incorporated pursuant to the
551
+ following paragraph.
552
+
553
+ Notwithstanding any other provision of this License, you have
554
+ permission to link or combine any covered work with a work licensed
555
+ under version 3 of the GNU General Public License into a single
556
+ combined work, and to convey the resulting work. The terms of this
557
+ License will continue to apply to the part which is the covered work,
558
+ but the work with which it is combined will remain governed by version
559
+ 3 of the GNU General Public License.
560
+
561
+ 14. Revised Versions of this License.
562
+
563
+ The Free Software Foundation may publish revised and/or new versions of
564
+ the GNU Affero General Public License from time to time. Such new versions
565
+ will be similar in spirit to the present version, but may differ in detail to
566
+ address new problems or concerns.
567
+
568
+ Each version is given a distinguishing version number. If the
569
+ Program specifies that a certain numbered version of the GNU Affero General
570
+ Public License "or any later version" applies to it, you have the
571
+ option of following the terms and conditions either of that numbered
572
+ version or of any later version published by the Free Software
573
+ Foundation. If the Program does not specify a version number of the
574
+ GNU Affero General Public License, you may choose any version ever published
575
+ by the Free Software Foundation.
576
+
577
+ If the Program specifies that a proxy can decide which future
578
+ versions of the GNU Affero General Public License can be used, that proxy's
579
+ public statement of acceptance of a version permanently authorizes you
580
+ to choose that version for the Program.
581
+
582
+ Later license versions may give you additional or different
583
+ permissions. However, no additional obligations are imposed on any
584
+ author or copyright holder as a result of your choosing to follow a
585
+ later version.
586
+
587
+ 15. Disclaimer of Warranty.
588
+
589
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
590
+ APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
591
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
592
+ OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
593
+ THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
594
+ PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
595
+ IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
596
+ ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
597
+
598
+ 16. Limitation of Liability.
599
+
600
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
601
+ WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
602
+ THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
603
+ GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
604
+ USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
605
+ DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
606
+ PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
607
+ EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
608
+ SUCH DAMAGES.
609
+
610
+ 17. Interpretation of Sections 15 and 16.
611
+
612
+ If the disclaimer of warranty and limitation of liability provided
613
+ above cannot be given local legal effect according to their terms,
614
+ reviewing courts shall apply local law that most closely approximates
615
+ an absolute waiver of all civil liability in connection with the
616
+ Program, unless a warranty or assumption of liability accompanies a
617
+ copy of the Program in return for a fee.
618
+
619
+ END OF TERMS AND CONDITIONS
620
+
621
+ How to Apply These Terms to Your New Programs
622
+
623
+ If you develop a new program, and you want it to be of the greatest
624
+ possible use to the public, the best way to achieve this is to make it
625
+ free software which everyone can redistribute and change under these terms.
626
+
627
+ To do so, attach the following notices to the program. It is safest
628
+ to attach them to the start of each source file to most effectively
629
+ state the exclusion of warranty; and each file should have at least
630
+ the "copyright" line and a pointer to where the full notice is found.
631
+
632
+ <one line to give the program's name and a brief idea of what it does.>
633
+ Copyright (C) <year> <name of author>
634
+
635
+ This program is free software: you can redistribute it and/or modify
636
+ it under the terms of the GNU Affero General Public License as published
637
+ by the Free Software Foundation, either version 3 of the License, or
638
+ (at your option) any later version.
639
+
640
+ This program is distributed in the hope that it will be useful,
641
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
642
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
643
+ GNU Affero General Public License for more details.
644
+
645
+ You should have received a copy of the GNU Affero General Public License
646
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
647
+
648
+ Also add information on how to contact you by electronic and paper mail.
649
+
650
+ If your software can interact with users remotely through a computer
651
+ network, you should also make sure that it provides a way for users to
652
+ get its source. For example, if your program is a web application, its
653
+ interface could display a "Source" link that leads users to an archive
654
+ of the code. There are many ways you could offer source, and different
655
+ solutions will be better for different programs; see section 13 for the
656
+ specific requirements.
657
+
658
+ You should also get your employer (if you work as a programmer) or school,
659
+ if any, to sign a "copyright disclaimer" for the program, if necessary.
660
+ For more information on this, and how to apply and follow the GNU AGPL, see
661
+ <https://www.gnu.org/licenses/>.
SegTracker.py ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ sys.path.append("..")
3
+ sys.path.append("./sam")
4
+ from sam.segment_anything import sam_model_registry, SamAutomaticMaskGenerator
5
+ from aot_tracker import get_aot
6
+ import numpy as np
7
+ from tool.segmentor import Segmentor
8
+ from tool.detector import Detector
9
+ from tool.transfer_tools import draw_outline, draw_points
10
+ import cv2
11
+ from seg_track_anything import draw_mask
12
+
13
+
14
+ class SegTracker():
15
+ def __init__(self,segtracker_args, sam_args, aot_args) -> None:
16
+ """
17
+ Initialize SAM and AOT.
18
+ """
19
+ self.sam = Segmentor(sam_args)
20
+ self.tracker = get_aot(aot_args)
21
+ self.detector = Detector(self.sam.device)
22
+ self.sam_gap = segtracker_args['sam_gap']
23
+ self.min_area = segtracker_args['min_area']
24
+ self.max_obj_num = segtracker_args['max_obj_num']
25
+ self.min_new_obj_iou = segtracker_args['min_new_obj_iou']
26
+ self.reference_objs_list = []
27
+ self.object_idx = 1
28
+ self.curr_idx = 1
29
+ self.origin_merged_mask = None # init by segment-everything or update
30
+ self.first_frame_mask = None
31
+
32
+ # debug
33
+ self.everything_points = []
34
+ self.everything_labels = []
35
+ print("SegTracker has been initialized")
36
+
37
+ def seg(self,frame):
38
+ '''
39
+ Arguments:
40
+ frame: numpy array (h,w,3)
41
+ Return:
42
+ origin_merged_mask: numpy array (h,w)
43
+ '''
44
+ frame = frame[:, :, ::-1]
45
+ anns = self.sam.everything_generator.generate(frame)
46
+
47
+ # anns is a list recording all predictions in an image
48
+ if len(anns) == 0:
49
+ return
50
+ # merge all predictions into one mask (h,w)
51
+ # note that the merged mask may lost some objects due to the overlapping
52
+ self.origin_merged_mask = np.zeros(anns[0]['segmentation'].shape,dtype=np.uint8)
53
+ idx = 1
54
+ for ann in anns:
55
+ if ann['area'] > self.min_area:
56
+ m = ann['segmentation']
57
+ self.origin_merged_mask[m==1] = idx
58
+ idx += 1
59
+ self.everything_points.append(ann["point_coords"][0])
60
+ self.everything_labels.append(1)
61
+
62
+ obj_ids = np.unique(self.origin_merged_mask)
63
+ obj_ids = obj_ids[obj_ids!=0]
64
+
65
+ self.object_idx = 1
66
+ for id in obj_ids:
67
+ if np.sum(self.origin_merged_mask==id) < self.min_area or self.object_idx > self.max_obj_num:
68
+ self.origin_merged_mask[self.origin_merged_mask==id] = 0
69
+ else:
70
+ self.origin_merged_mask[self.origin_merged_mask==id] = self.object_idx
71
+ self.object_idx += 1
72
+
73
+ self.first_frame_mask = self.origin_merged_mask
74
+ return self.origin_merged_mask
75
+
76
+ def update_origin_merged_mask(self, updated_merged_mask):
77
+ self.origin_merged_mask = updated_merged_mask
78
+ # obj_ids = np.unique(updated_merged_mask)
79
+ # obj_ids = obj_ids[obj_ids!=0]
80
+ # self.object_idx = int(max(obj_ids)) + 1
81
+
82
+ def reset_origin_merged_mask(self, mask, id):
83
+ self.origin_merged_mask = mask
84
+ self.curr_idx = id
85
+
86
+ def add_reference(self,frame,mask,frame_step=0):
87
+ '''
88
+ Add objects in a mask for tracking.
89
+ Arguments:
90
+ frame: numpy array (h,w,3)
91
+ mask: numpy array (h,w)
92
+ '''
93
+ self.reference_objs_list.append(np.unique(mask))
94
+ self.curr_idx = self.get_obj_num() + 1
95
+ self.tracker.add_reference_frame(frame,mask, self.curr_idx - 1, frame_step)
96
+
97
+ def track(self,frame,update_memory=False):
98
+ '''
99
+ Track all known objects.
100
+ Arguments:
101
+ frame: numpy array (h,w,3)
102
+ Return:
103
+ origin_merged_mask: numpy array (h,w)
104
+ '''
105
+ pred_mask = self.tracker.track(frame)
106
+ if update_memory:
107
+ self.tracker.update_memory(pred_mask)
108
+ return pred_mask.squeeze(0).squeeze(0).detach().cpu().numpy().astype(np.uint8)
109
+
110
+ def get_tracking_objs(self):
111
+ objs = set()
112
+ for ref in self.reference_objs_list:
113
+ objs.update(set(ref))
114
+ objs = list(sorted(list(objs)))
115
+ objs = [i for i in objs if i!=0]
116
+ return objs
117
+
118
+ def get_obj_num(self):
119
+ objs = self.get_tracking_objs()
120
+ if len(objs) == 0: return 0
121
+ return int(max(objs))
122
+
123
+ def find_new_objs(self, track_mask, seg_mask):
124
+ '''
125
+ Compare tracked results from AOT with segmented results from SAM. Select objects from background if they are not tracked.
126
+ Arguments:
127
+ track_mask: numpy array (h,w)
128
+ seg_mask: numpy array (h,w)
129
+ Return:
130
+ new_obj_mask: numpy array (h,w)
131
+ '''
132
+ new_obj_mask = (track_mask==0) * seg_mask
133
+ new_obj_ids = np.unique(new_obj_mask)
134
+ new_obj_ids = new_obj_ids[new_obj_ids!=0]
135
+ # obj_num = self.get_obj_num() + 1
136
+ obj_num = self.curr_idx
137
+ for idx in new_obj_ids:
138
+ new_obj_area = np.sum(new_obj_mask==idx)
139
+ obj_area = np.sum(seg_mask==idx)
140
+ if new_obj_area/obj_area < self.min_new_obj_iou or new_obj_area < self.min_area\
141
+ or obj_num > self.max_obj_num:
142
+ new_obj_mask[new_obj_mask==idx] = 0
143
+ else:
144
+ new_obj_mask[new_obj_mask==idx] = obj_num
145
+ obj_num += 1
146
+ return new_obj_mask
147
+
148
+ def restart_tracker(self):
149
+ self.tracker.restart()
150
+
151
+ def seg_acc_bbox(self, origin_frame: np.ndarray, bbox: np.ndarray,):
152
+ ''''
153
+ Use bbox-prompt to get mask
154
+ Parameters:
155
+ origin_frame: H, W, C
156
+ bbox: [[x0, y0], [x1, y1]]
157
+ Return:
158
+ refined_merged_mask: numpy array (h, w)
159
+ masked_frame: numpy array (h, w, c)
160
+ '''
161
+ # get interactive_mask
162
+ interactive_mask = self.sam.segment_with_box(origin_frame, bbox)[0]
163
+ refined_merged_mask = self.add_mask(interactive_mask)
164
+
165
+ # draw mask
166
+ masked_frame = draw_mask(origin_frame.copy(), refined_merged_mask)
167
+
168
+ # draw bbox
169
+ masked_frame = cv2.rectangle(masked_frame, bbox[0], bbox[1], (0, 0, 255))
170
+
171
+ return refined_merged_mask, masked_frame
172
+
173
+ def seg_acc_click(self, origin_frame: np.ndarray, coords: np.ndarray, modes: np.ndarray, multimask=True):
174
+ '''
175
+ Use point-prompt to get mask
176
+ Parameters:
177
+ origin_frame: H, W, C
178
+ coords: nd.array [[x, y]]
179
+ modes: nd.array [[1]]
180
+ Return:
181
+ refined_merged_mask: numpy array (h, w)
182
+ masked_frame: numpy array (h, w, c)
183
+ '''
184
+ # get interactive_mask
185
+ interactive_mask = self.sam.segment_with_click(origin_frame, coords, modes, multimask)
186
+
187
+ refined_merged_mask = self.add_mask(interactive_mask)
188
+
189
+ # draw mask
190
+ masked_frame = draw_mask(origin_frame.copy(), refined_merged_mask)
191
+
192
+ # draw points
193
+ # self.everything_labels = np.array(self.everything_labels).astype(np.int64)
194
+ # self.everything_points = np.array(self.everything_points).astype(np.int64)
195
+
196
+ masked_frame = draw_points(coords, modes, masked_frame)
197
+
198
+ # draw outline
199
+ masked_frame = draw_outline(interactive_mask, masked_frame)
200
+
201
+ return refined_merged_mask, masked_frame
202
+
203
+ def add_mask(self, interactive_mask: np.ndarray):
204
+ '''
205
+ Merge interactive mask with self.origin_merged_mask
206
+ Parameters:
207
+ interactive_mask: numpy array (h, w)
208
+ Return:
209
+ refined_merged_mask: numpy array (h, w)
210
+ '''
211
+ if self.origin_merged_mask is None:
212
+ self.origin_merged_mask = np.zeros(interactive_mask.shape,dtype=np.uint8)
213
+
214
+ refined_merged_mask = self.origin_merged_mask.copy()
215
+ refined_merged_mask[interactive_mask > 0] = self.curr_idx
216
+
217
+ return refined_merged_mask
218
+
219
+ def detect_and_seg(self, origin_frame: np.ndarray, grounding_caption, box_threshold, text_threshold, box_size_threshold=1, reset_image=False):
220
+ '''
221
+ Using Grounding-DINO to detect object acc Text-prompts
222
+ Retrun:
223
+ refined_merged_mask: numpy array (h, w)
224
+ annotated_frame: numpy array (h, w, 3)
225
+ '''
226
+ # backup id and origin-merged-mask
227
+ bc_id = self.curr_idx
228
+ bc_mask = self.origin_merged_mask
229
+
230
+ # get annotated_frame and boxes
231
+ annotated_frame, boxes = self.detector.run_grounding(origin_frame, grounding_caption, box_threshold, text_threshold)
232
+ for i in range(len(boxes)):
233
+ bbox = boxes[i]
234
+ if (bbox[1][0] - bbox[0][0]) * (bbox[1][1] - bbox[0][1]) > annotated_frame.shape[0] * annotated_frame.shape[1] * box_size_threshold:
235
+ continue
236
+ interactive_mask = self.sam.segment_with_box(origin_frame, bbox, reset_image)[0]
237
+ refined_merged_mask = self.add_mask(interactive_mask)
238
+ self.update_origin_merged_mask(refined_merged_mask)
239
+ self.curr_idx += 1
240
+
241
+ # reset origin_mask
242
+ self.reset_origin_merged_mask(bc_mask, bc_id)
243
+
244
+ return refined_merged_mask, annotated_frame
245
+
246
+ if __name__ == '__main__':
247
+ from model_args import segtracker_args,sam_args,aot_args
248
+
249
+ Seg_Tracker = SegTracker(segtracker_args, sam_args, aot_args)
250
+
251
+ # ------------------ detect test ----------------------
252
+
253
+ origin_frame = cv2.imread('/data2/cym/Seg_Tra_any/Segment-and-Track-Anything/debug/point.png')
254
+ origin_frame = cv2.cvtColor(origin_frame, cv2.COLOR_BGR2RGB)
255
+ grounding_caption = "swan.water"
256
+ box_threshold = 0.25
257
+ text_threshold = 0.25
258
+
259
+ predicted_mask, annotated_frame = Seg_Tracker.detect_and_seg(origin_frame, grounding_caption, box_threshold, text_threshold)
260
+ masked_frame = draw_mask(annotated_frame, predicted_mask)
261
+ origin_frame = cv2.cvtColor(origin_frame, cv2.COLOR_RGB2BGR)
262
+
263
+ cv2.imwrite('./debug/masked_frame.png', masked_frame)
264
+ cv2.imwrite('./debug/x.png', annotated_frame)
aiken ADDED
@@ -0,0 +1 @@
 
 
1
+ hahahahahahahahaha aiken lox
aot_tracker.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from statistics import mode
2
+ import torch
3
+ import torch.nn.functional as F
4
+ import os
5
+ import sys
6
+ sys.path.append("./aot")
7
+ from aot.networks.engines.aot_engine import AOTEngine,AOTInferEngine
8
+ from aot.networks.engines.deaot_engine import DeAOTEngine,DeAOTInferEngine
9
+ import importlib
10
+ import numpy as np
11
+ from PIL import Image
12
+ from skimage.morphology.binary import binary_dilation
13
+
14
+
15
+ np.random.seed(200)
16
+ _palette = ((np.random.random((3*255))*0.7+0.3)*255).astype(np.uint8).tolist()
17
+ _palette = [0,0,0]+_palette
18
+
19
+ import aot.dataloaders.video_transforms as tr
20
+ from aot.utils.checkpoint import load_network
21
+ from aot.networks.models import build_vos_model
22
+ from aot.networks.engines import build_engine
23
+ from torchvision import transforms
24
+
25
+ class AOTTracker(object):
26
+ def __init__(self, cfg, gpu_id=0):
27
+ self.gpu_id = gpu_id
28
+ self.model = build_vos_model(cfg.MODEL_VOS, cfg).cuda(gpu_id)
29
+ self.model, _ = load_network(self.model, cfg.TEST_CKPT_PATH, gpu_id)
30
+ # self.engine = self.build_tracker_engine(cfg.MODEL_ENGINE,
31
+ # aot_model=self.model,
32
+ # gpu_id=gpu_id,
33
+ # short_term_mem_skip=4,
34
+ # long_term_mem_gap=cfg.TEST_LONG_TERM_MEM_GAP)
35
+ self.engine = build_engine(cfg.MODEL_ENGINE,
36
+ phase='eval',
37
+ aot_model=self.model,
38
+ gpu_id=gpu_id,
39
+ short_term_mem_skip=1,
40
+ long_term_mem_gap=cfg.TEST_LONG_TERM_MEM_GAP,
41
+ max_len_long_term=cfg.MAX_LEN_LONG_TERM)
42
+
43
+ self.transform = transforms.Compose([
44
+ tr.MultiRestrictSize(cfg.TEST_MAX_SHORT_EDGE,
45
+ cfg.TEST_MAX_LONG_EDGE, cfg.TEST_FLIP,
46
+ cfg.TEST_MULTISCALE, cfg.MODEL_ALIGN_CORNERS),
47
+ tr.MultiToTensor()
48
+ ])
49
+
50
+ self.model.eval()
51
+
52
+ @torch.no_grad()
53
+ def add_reference_frame(self, frame, mask, obj_nums, frame_step, incremental=False):
54
+ # mask = cv2.resize(mask, frame.shape[:2][::-1], interpolation = cv2.INTER_NEAREST)
55
+
56
+ sample = {
57
+ 'current_img': frame,
58
+ 'current_label': mask,
59
+ }
60
+
61
+ sample = self.transform(sample)
62
+ frame = sample[0]['current_img'].unsqueeze(0).float().cuda(self.gpu_id)
63
+ mask = sample[0]['current_label'].unsqueeze(0).float().cuda(self.gpu_id)
64
+ _mask = F.interpolate(mask,size=frame.shape[-2:],mode='nearest')
65
+
66
+ if incremental:
67
+ self.engine.add_reference_frame_incremental(frame, _mask, obj_nums=obj_nums, frame_step=frame_step)
68
+ else:
69
+ self.engine.add_reference_frame(frame, _mask, obj_nums=obj_nums, frame_step=frame_step)
70
+
71
+
72
+
73
+ @torch.no_grad()
74
+ def track(self, image):
75
+ output_height, output_width = image.shape[0], image.shape[1]
76
+ sample = {'current_img': image}
77
+ sample = self.transform(sample)
78
+ image = sample[0]['current_img'].unsqueeze(0).float().cuda(self.gpu_id)
79
+ self.engine.match_propogate_one_frame(image)
80
+ pred_logit = self.engine.decode_current_logits((output_height, output_width))
81
+
82
+ # pred_prob = torch.softmax(pred_logit, dim=1)
83
+ pred_label = torch.argmax(pred_logit, dim=1,
84
+ keepdim=True).float()
85
+
86
+ return pred_label
87
+
88
+ @torch.no_grad()
89
+ def update_memory(self, pred_label):
90
+ self.engine.update_memory(pred_label)
91
+
92
+ @torch.no_grad()
93
+ def restart(self):
94
+ self.engine.restart_engine()
95
+
96
+ @torch.no_grad()
97
+ def build_tracker_engine(self, name, **kwargs):
98
+ if name == 'aotengine':
99
+ return AOTTrackerInferEngine(**kwargs)
100
+ elif name == 'deaotengine':
101
+ return DeAOTTrackerInferEngine(**kwargs)
102
+ else:
103
+ raise NotImplementedError
104
+
105
+
106
+ class AOTTrackerInferEngine(AOTInferEngine):
107
+ def __init__(self, aot_model, gpu_id=0, long_term_mem_gap=9999, short_term_mem_skip=1, max_aot_obj_num=None):
108
+ super().__init__(aot_model, gpu_id, long_term_mem_gap, short_term_mem_skip, max_aot_obj_num)
109
+ def add_reference_frame_incremental(self, img, mask, obj_nums, frame_step=-1):
110
+ if isinstance(obj_nums, list):
111
+ obj_nums = obj_nums[0]
112
+ self.obj_nums = obj_nums
113
+ aot_num = max(np.ceil(obj_nums / self.max_aot_obj_num), 1)
114
+ while (aot_num > len(self.aot_engines)):
115
+ new_engine = AOTEngine(self.AOT, self.gpu_id,
116
+ self.long_term_mem_gap,
117
+ self.short_term_mem_skip)
118
+ new_engine.eval()
119
+ self.aot_engines.append(new_engine)
120
+
121
+ separated_masks, separated_obj_nums = self.separate_mask(
122
+ mask, obj_nums)
123
+ img_embs = None
124
+ for aot_engine, separated_mask, separated_obj_num in zip(
125
+ self.aot_engines, separated_masks, separated_obj_nums):
126
+ if aot_engine.obj_nums is None or aot_engine.obj_nums[0] < separated_obj_num:
127
+ aot_engine.add_reference_frame(img,
128
+ separated_mask,
129
+ obj_nums=[separated_obj_num],
130
+ frame_step=frame_step,
131
+ img_embs=img_embs)
132
+ else:
133
+ aot_engine.update_short_term_memory(separated_mask)
134
+
135
+ if img_embs is None: # reuse image embeddings
136
+ img_embs = aot_engine.curr_enc_embs
137
+
138
+ self.update_size()
139
+
140
+
141
+
142
+ class DeAOTTrackerInferEngine(DeAOTInferEngine):
143
+ def __init__(self, aot_model, gpu_id=0, long_term_mem_gap=9999, short_term_mem_skip=1, max_aot_obj_num=None):
144
+ super().__init__(aot_model, gpu_id, long_term_mem_gap, short_term_mem_skip, max_aot_obj_num)
145
+ def add_reference_frame_incremental(self, img, mask, obj_nums, frame_step=-1):
146
+ if isinstance(obj_nums, list):
147
+ obj_nums = obj_nums[0]
148
+ self.obj_nums = obj_nums
149
+ aot_num = max(np.ceil(obj_nums / self.max_aot_obj_num), 1)
150
+ while (aot_num > len(self.aot_engines)):
151
+ new_engine = DeAOTEngine(self.AOT, self.gpu_id,
152
+ self.long_term_mem_gap,
153
+ self.short_term_mem_skip)
154
+ new_engine.eval()
155
+ self.aot_engines.append(new_engine)
156
+
157
+ separated_masks, separated_obj_nums = self.separate_mask(
158
+ mask, obj_nums)
159
+ img_embs = None
160
+ for aot_engine, separated_mask, separated_obj_num in zip(
161
+ self.aot_engines, separated_masks, separated_obj_nums):
162
+ if aot_engine.obj_nums is None or aot_engine.obj_nums[0] < separated_obj_num:
163
+ aot_engine.add_reference_frame(img,
164
+ separated_mask,
165
+ obj_nums=[separated_obj_num],
166
+ frame_step=frame_step,
167
+ img_embs=img_embs)
168
+ else:
169
+ aot_engine.update_short_term_memory(separated_mask)
170
+
171
+ if img_embs is None: # reuse image embeddings
172
+ img_embs = aot_engine.curr_enc_embs
173
+
174
+ self.update_size()
175
+
176
+
177
+ def get_aot(args):
178
+ # build vos engine
179
+ engine_config = importlib.import_module('configs.' + 'pre_ytb_dav')
180
+ cfg = engine_config.EngineConfig(args['phase'], args['model'])
181
+ cfg.TEST_CKPT_PATH = args['model_path']
182
+ cfg.TEST_LONG_TERM_MEM_GAP = args['long_term_mem_gap']
183
+ cfg.MAX_LEN_LONG_TERM = args['max_len_long_term']
184
+ # init AOTTracker
185
+ tracker = AOTTracker(cfg, args['gpu_id'])
186
+ return tracker
app.py ADDED
@@ -0,0 +1,782 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from PIL.ImageOps import colorize, scale
2
+ import gradio as gr
3
+ import importlib
4
+ import sys
5
+ import os
6
+
7
+ from matplotlib.pyplot import step
8
+
9
+ from model_args import segtracker_args,sam_args,aot_args
10
+ from SegTracker import SegTracker
11
+
12
+ # sys.path.append('.')
13
+ # sys.path.append('..')
14
+
15
+ import cv2
16
+ from PIL import Image
17
+ from skimage.morphology.binary import binary_dilation
18
+ import argparse
19
+ import torch
20
+ import time
21
+ from seg_track_anything import aot_model2ckpt, tracking_objects_in_video, draw_mask
22
+ import gc
23
+ import numpy as np
24
+ import json
25
+ from tool.transfer_tools import mask2bbox
26
+
27
+ def clean():
28
+ return None, None, None, None, None, None, [[], []]
29
+
30
+ def get_click_prompt(click_stack, point):
31
+
32
+ click_stack[0].append(point["coord"])
33
+ click_stack[1].append(point["mode"]
34
+ )
35
+
36
+ prompt = {
37
+ "points_coord":click_stack[0],
38
+ "points_mode":click_stack[1],
39
+ "multimask":"True",
40
+ }
41
+
42
+ return prompt
43
+
44
+ def get_meta_from_video(input_video):
45
+ if input_video is None:
46
+ return None, None, None, ""
47
+
48
+ print("get meta information of input video")
49
+ cap = cv2.VideoCapture(input_video)
50
+
51
+ _, first_frame = cap.read()
52
+ cap.release()
53
+
54
+ first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2RGB)
55
+
56
+ return first_frame, first_frame, first_frame, ""
57
+
58
+ def get_meta_from_img_seq(input_img_seq):
59
+ if input_img_seq is None:
60
+ return None, None, None, ""
61
+
62
+ print("get meta information of img seq")
63
+ # Create dir
64
+ file_name = input_img_seq.name.split('/')[-1].split('.')[0]
65
+ file_path = f'./assets/{file_name}'
66
+ if os.path.isdir(file_path):
67
+ os.system(f'rm -r {file_path}')
68
+ os.makedirs(file_path)
69
+ # Unzip file
70
+ os.system(f'unzip {input_img_seq.name} -d ./assets ')
71
+
72
+ imgs_path = sorted([os.path.join(file_path, img_name) for img_name in os.listdir(file_path)])
73
+ first_frame = imgs_path[0]
74
+ first_frame = cv2.imread(first_frame)
75
+ first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2RGB)
76
+
77
+ return first_frame, first_frame, first_frame
78
+
79
+ def SegTracker_add_first_frame(Seg_Tracker, origin_frame, predicted_mask):
80
+ with torch.cuda.amp.autocast():
81
+ # Reset the first frame's mask
82
+ frame_idx = 0
83
+ Seg_Tracker.restart_tracker()
84
+ Seg_Tracker.add_reference(origin_frame, predicted_mask, frame_idx)
85
+ Seg_Tracker.first_frame_mask = predicted_mask
86
+
87
+ return Seg_Tracker
88
+
89
+ def init_SegTracker(aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side, origin_frame):
90
+
91
+ if origin_frame is None:
92
+ return None, origin_frame, [[], []], ""
93
+
94
+ # reset aot args
95
+ aot_args["model"] = aot_model
96
+ aot_args["model_path"] = aot_model2ckpt[aot_model]
97
+ aot_args["long_term_mem_gap"] = long_term_mem
98
+ aot_args["max_len_long_term"] = max_len_long_term
99
+ # reset sam args
100
+ segtracker_args["sam_gap"] = sam_gap
101
+ segtracker_args["max_obj_num"] = max_obj_num
102
+ sam_args["generator_args"]["points_per_side"] = points_per_side
103
+
104
+ Seg_Tracker = SegTracker(segtracker_args, sam_args, aot_args)
105
+ Seg_Tracker.restart_tracker()
106
+
107
+ return Seg_Tracker, origin_frame, [[], []], ""
108
+
109
+ def init_SegTracker_Stroke(aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side, origin_frame):
110
+
111
+ if origin_frame is None:
112
+ return None, origin_frame, [[], []], origin_frame
113
+
114
+ # reset aot args
115
+ aot_args["model"] = aot_model
116
+ aot_args["model_path"] = aot_model2ckpt[aot_model]
117
+ aot_args["long_term_mem_gap"] = long_term_mem
118
+ aot_args["max_len_long_term"] = max_len_long_term
119
+
120
+ # reset sam args
121
+ segtracker_args["sam_gap"] = sam_gap
122
+ segtracker_args["max_obj_num"] = max_obj_num
123
+ sam_args["generator_args"]["points_per_side"] = points_per_side
124
+
125
+ Seg_Tracker = SegTracker(segtracker_args, sam_args, aot_args)
126
+ Seg_Tracker.restart_tracker()
127
+ return Seg_Tracker, origin_frame, [[], []], origin_frame
128
+
129
+ def undo_click_stack_and_refine_seg(Seg_Tracker, origin_frame, click_stack, aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side):
130
+
131
+ if Seg_Tracker is None:
132
+ return Seg_Tracker, origin_frame, [[], []]
133
+
134
+ print("Undo!")
135
+ if len(click_stack[0]) > 0:
136
+ click_stack[0] = click_stack[0][: -1]
137
+ click_stack[1] = click_stack[1][: -1]
138
+
139
+ if len(click_stack[0]) > 0:
140
+ prompt = {
141
+ "points_coord":click_stack[0],
142
+ "points_mode":click_stack[1],
143
+ "multimask":"True",
144
+ }
145
+
146
+ masked_frame = seg_acc_click(Seg_Tracker, prompt, origin_frame)
147
+ return Seg_Tracker, masked_frame, click_stack
148
+ else:
149
+ return Seg_Tracker, origin_frame, [[], []]
150
+
151
+
152
+ def seg_acc_click(Seg_Tracker, prompt, origin_frame):
153
+ # seg acc to click
154
+ predicted_mask, masked_frame = Seg_Tracker.seg_acc_click(
155
+ origin_frame=origin_frame,
156
+ coords=np.array(prompt["points_coord"]),
157
+ modes=np.array(prompt["points_mode"]),
158
+ multimask=prompt["multimask"],
159
+ )
160
+
161
+ Seg_Tracker = SegTracker_add_first_frame(Seg_Tracker, origin_frame, predicted_mask)
162
+
163
+ return masked_frame
164
+
165
+ def sam_click(Seg_Tracker, origin_frame, point_mode, click_stack, aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side, evt:gr.SelectData):
166
+ """
167
+ Args:
168
+ origin_frame: nd.array
169
+ click_stack: [[coordinate], [point_mode]]
170
+ """
171
+
172
+ print("Click")
173
+
174
+ if point_mode == "Positive":
175
+ point = {"coord": [evt.index[0], evt.index[1]], "mode": 1}
176
+ else:
177
+ # TODO:add everything positive points
178
+ point = {"coord": [evt.index[0], evt.index[1]], "mode": 0}
179
+
180
+ if Seg_Tracker is None:
181
+ Seg_Tracker, _, _, _ = init_SegTracker(aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side, origin_frame)
182
+
183
+ # get click prompts for sam to predict mask
184
+ click_prompt = get_click_prompt(click_stack, point)
185
+
186
+ # Refine acc to prompt
187
+ masked_frame = seg_acc_click(Seg_Tracker, click_prompt, origin_frame)
188
+
189
+ return Seg_Tracker, masked_frame, click_stack
190
+
191
+ def sam_stroke(Seg_Tracker, origin_frame, drawing_board, aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side):
192
+
193
+ if Seg_Tracker is None:
194
+ Seg_Tracker, _ , _, _ = init_SegTracker(aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side, origin_frame)
195
+
196
+ print("Stroke")
197
+ mask = drawing_board["mask"]
198
+ bbox = mask2bbox(mask[:, :, 0]) # bbox: [[x0, y0], [x1, y1]]
199
+ predicted_mask, masked_frame = Seg_Tracker.seg_acc_bbox(origin_frame, bbox)
200
+
201
+ Seg_Tracker = SegTracker_add_first_frame(Seg_Tracker, origin_frame, predicted_mask)
202
+
203
+ return Seg_Tracker, masked_frame, origin_frame
204
+
205
+ def gd_detect(Seg_Tracker, origin_frame, grounding_caption, box_threshold, text_threshold, aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side):
206
+ if Seg_Tracker is None:
207
+ Seg_Tracker, _ , _, _ = init_SegTracker(aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side, origin_frame)
208
+
209
+ print("Detect")
210
+ predicted_mask, annotated_frame= Seg_Tracker.detect_and_seg(origin_frame, grounding_caption, box_threshold, text_threshold)
211
+
212
+ Seg_Tracker = SegTracker_add_first_frame(Seg_Tracker, origin_frame, predicted_mask)
213
+
214
+
215
+ masked_frame = draw_mask(annotated_frame, predicted_mask)
216
+
217
+ return Seg_Tracker, masked_frame, origin_frame
218
+
219
+ def segment_everything(Seg_Tracker, aot_model, long_term_mem, max_len_long_term, origin_frame, sam_gap, max_obj_num, points_per_side):
220
+
221
+ if Seg_Tracker is None:
222
+ Seg_Tracker, _ , _, _ = init_SegTracker(aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side, origin_frame)
223
+
224
+ print("Everything")
225
+
226
+ frame_idx = 0
227
+
228
+ with torch.cuda.amp.autocast():
229
+ pred_mask = Seg_Tracker.seg(origin_frame)
230
+ torch.cuda.empty_cache()
231
+ gc.collect()
232
+ Seg_Tracker.add_reference(origin_frame, pred_mask, frame_idx)
233
+ Seg_Tracker.first_frame_mask = pred_mask
234
+
235
+ masked_frame = draw_mask(origin_frame.copy(), pred_mask)
236
+
237
+ return Seg_Tracker, masked_frame
238
+
239
+ def add_new_object(Seg_Tracker):
240
+
241
+ prev_mask = Seg_Tracker.first_frame_mask
242
+ Seg_Tracker.update_origin_merged_mask(prev_mask)
243
+ Seg_Tracker.curr_idx += 1
244
+
245
+ print("Ready to add new object!")
246
+
247
+ return Seg_Tracker, [[], []]
248
+
249
+ def tracking_objects(Seg_Tracker, input_video, input_img_seq, fps):
250
+ print("Start tracking !")
251
+ return tracking_objects_in_video(Seg_Tracker, input_video, input_img_seq, fps)
252
+
253
+ def seg_track_app():
254
+
255
+ ##########################################################
256
+ ###################### Front-end ########################
257
+ ##########################################################
258
+ app = gr.Blocks()
259
+
260
+ with app:
261
+ gr.Markdown(
262
+ '''
263
+ <div style="text-align:center;">
264
+ <span style="font-size:3em; font-weight:bold;">Segment and Track Anything(SAM-Track)</span>
265
+ </div>
266
+ '''
267
+ )
268
+
269
+ click_stack = gr.State([[],[]]) # Storage clicks status
270
+ origin_frame = gr.State(None)
271
+ Seg_Tracker = gr.State(None)
272
+
273
+ aot_model = gr.State(None)
274
+ sam_gap = gr.State(None)
275
+ points_per_side = gr.State(None)
276
+ max_obj_num = gr.State(None)
277
+
278
+ with gr.Row():
279
+ # video input
280
+ with gr.Column(scale=0.5):
281
+
282
+ tab_video_input = gr.Tab(label="Video type input")
283
+ with tab_video_input:
284
+ input_video = gr.Video(label='Input video').style(height=550)
285
+
286
+ tab_img_seq_input = gr.Tab(label="Image-Seq type input")
287
+ with tab_img_seq_input:
288
+ with gr.Row():
289
+ input_img_seq = gr.File(label='Input Image-Seq').style(height=550)
290
+ with gr.Column(scale=0.25):
291
+ extract_button = gr.Button(value="extract")
292
+ fps = gr.Slider(label='fps', minimum=5, maximum=50, value=8, step=1)
293
+
294
+ input_first_frame = gr.Image(label='Segment result of first frame',interactive=True).style(height=550)
295
+
296
+
297
+ tab_everything = gr.Tab(label="Everything")
298
+ with tab_everything:
299
+ with gr.Row():
300
+ seg_every_first_frame = gr.Button(value="Segment everything for first frame", interactive=True)
301
+ point_mode = gr.Radio(
302
+ choices=["Positive"],
303
+ value="Positive",
304
+ label="Point Prompt",
305
+ interactive=True)
306
+
307
+ every_undo_but = gr.Button(
308
+ value="Undo",
309
+ interactive=True
310
+ )
311
+
312
+ # every_reset_but = gr.Button(
313
+ # value="Reset",
314
+ # interactive=True
315
+ # )
316
+
317
+ tab_click = gr.Tab(label="Click")
318
+ with tab_click:
319
+ with gr.Row():
320
+ point_mode = gr.Radio(
321
+ choices=["Positive", "Negative"],
322
+ value="Positive",
323
+ label="Point Prompt",
324
+ interactive=True)
325
+
326
+ # args for modify and tracking
327
+ click_undo_but = gr.Button(
328
+ value="Undo",
329
+ interactive=True
330
+ )
331
+ # click_reset_but = gr.Button(
332
+ # value="Reset",
333
+ # interactive=True
334
+ # )
335
+
336
+ tab_stroke = gr.Tab(label="Stroke")
337
+ with tab_stroke:
338
+ drawing_board = gr.Image(label='Drawing Board', tool="sketch", brush_radius=10, interactive=True)
339
+ with gr.Row():
340
+ seg_acc_stroke = gr.Button(value="Segment", interactive=True)
341
+ # stroke_reset_but = gr.Button(
342
+ # value="Reset",
343
+ # interactive=True
344
+ # )
345
+
346
+ tab_text = gr.Tab(label="Text")
347
+ with tab_text:
348
+ grounding_caption = gr.Textbox(label="Detection Prompt")
349
+ detect_button = gr.Button(value="Detect")
350
+ with gr.Accordion("Advanced options", open=False):
351
+ with gr.Row():
352
+ with gr.Column(scale=0.5):
353
+ box_threshold = gr.Slider(
354
+ label="Box Threshold", minimum=0.0, maximum=1.0, value=0.25, step=0.001
355
+ )
356
+ with gr.Column(scale=0.5):
357
+ text_threshold = gr.Slider(
358
+ label="Text Threshold", minimum=0.0, maximum=1.0, value=0.25, step=0.001
359
+ )
360
+
361
+ with gr.Row():
362
+ with gr.Column(scale=0.5):
363
+ with gr.Tab(label="SegTracker Args"):
364
+ # args for tracking in video do segment-everthing
365
+ points_per_side = gr.Slider(
366
+ label = "points_per_side",
367
+ minimum= 1,
368
+ step = 1,
369
+ maximum=100,
370
+ value=16,
371
+ interactive=True
372
+ )
373
+
374
+ sam_gap = gr.Slider(
375
+ label='sam_gap',
376
+ minimum = 1,
377
+ step=1,
378
+ maximum = 9999,
379
+ value=100,
380
+ interactive=True,
381
+ )
382
+
383
+ max_obj_num = gr.Slider(
384
+ label='max_obj_num',
385
+ minimum = 50,
386
+ step=1,
387
+ maximum = 300,
388
+ value=255,
389
+ interactive=True
390
+ )
391
+ with gr.Accordion("aot advanced options", open=False):
392
+ aot_model = gr.Dropdown(
393
+ label="aot_model",
394
+ choices = [
395
+ "deaotb",
396
+ "deaotl",
397
+ "r50_deaotl"
398
+ ],
399
+ value = "r50_deaotl",
400
+ interactive=True,
401
+ )
402
+ long_term_mem = gr.Slider(label="long term memory gap", minimum=1, maximum=9999, value=9999, step=1)
403
+ max_len_long_term = gr.Slider(label="max len of long term memory", minimum=1, maximum=9999, value=9999, step=1)
404
+
405
+ with gr.Column():
406
+ new_object_button = gr.Button(
407
+ value="Add new object",
408
+ interactive=True
409
+ )
410
+ reset_button = gr.Button(
411
+ value="Reset",
412
+ interactive=True,
413
+ )
414
+ track_for_video = gr.Button(
415
+ value="Start Tracking",
416
+ interactive=True,
417
+ )
418
+
419
+ with gr.Column(scale=0.5):
420
+ output_video = gr.Video(label='Output video').style(height=550)
421
+ output_mask = gr.File(label="Predicted masks")
422
+
423
+ ##########################################################
424
+ ###################### back-end #########################
425
+ ##########################################################
426
+
427
+ # listen to the input_video to get the first frame of video
428
+ input_video.change(
429
+ fn=get_meta_from_video,
430
+ inputs=[
431
+ input_video
432
+ ],
433
+ outputs=[
434
+ input_first_frame, origin_frame, drawing_board, grounding_caption
435
+ ]
436
+ )
437
+
438
+ # listen to the input_img_seq to get the first frame of video
439
+ input_img_seq.change(
440
+ fn=get_meta_from_img_seq,
441
+ inputs=[
442
+ input_img_seq
443
+ ],
444
+ outputs=[
445
+ input_first_frame, origin_frame, drawing_board, grounding_caption
446
+ ]
447
+ )
448
+
449
+ #-------------- Input compont -------------
450
+ tab_video_input.select(
451
+ fn = clean,
452
+ inputs=[],
453
+ outputs=[
454
+ input_video,
455
+ input_img_seq,
456
+ Seg_Tracker,
457
+ input_first_frame,
458
+ origin_frame,
459
+ drawing_board,
460
+ click_stack,
461
+ ]
462
+ )
463
+
464
+ tab_img_seq_input.select(
465
+ fn = clean,
466
+ inputs=[],
467
+ outputs=[
468
+ input_video,
469
+ input_img_seq,
470
+ Seg_Tracker,
471
+ input_first_frame,
472
+ origin_frame,
473
+ drawing_board,
474
+ click_stack,
475
+ ]
476
+ )
477
+
478
+ extract_button.click(
479
+ fn=get_meta_from_img_seq,
480
+ inputs=[
481
+ input_img_seq
482
+ ],
483
+ outputs=[
484
+ input_first_frame, origin_frame, drawing_board
485
+ ]
486
+ )
487
+
488
+
489
+ # ------------------- Interactive component -----------------
490
+
491
+ # listen to the tab to init SegTracker
492
+ tab_everything.select(
493
+ fn=init_SegTracker,
494
+ inputs=[
495
+ aot_model,
496
+ long_term_mem,
497
+ max_len_long_term,
498
+ sam_gap,
499
+ max_obj_num,
500
+ points_per_side,
501
+ origin_frame
502
+ ],
503
+ outputs=[
504
+ Seg_Tracker, input_first_frame, click_stack, grounding_caption
505
+ ],
506
+ queue=False,
507
+
508
+ )
509
+
510
+ tab_click.select(
511
+ fn=init_SegTracker,
512
+ inputs=[
513
+ aot_model,
514
+ long_term_mem,
515
+ max_len_long_term,
516
+ sam_gap,
517
+ max_obj_num,
518
+ points_per_side,
519
+ origin_frame
520
+ ],
521
+ outputs=[
522
+ Seg_Tracker, input_first_frame, click_stack, grounding_caption
523
+ ],
524
+ queue=False,
525
+ )
526
+
527
+ tab_stroke.select(
528
+ fn=init_SegTracker_Stroke,
529
+ inputs=[
530
+ aot_model,
531
+ long_term_mem,
532
+ max_len_long_term,
533
+ sam_gap,
534
+ max_obj_num,
535
+ points_per_side,
536
+ origin_frame,
537
+ ],
538
+ outputs=[
539
+ Seg_Tracker, input_first_frame, click_stack, drawing_board
540
+ ],
541
+ queue=False,
542
+ )
543
+
544
+ tab_text.select(
545
+ fn=init_SegTracker,
546
+ inputs=[
547
+ aot_model,
548
+ long_term_mem,
549
+ max_len_long_term,
550
+ sam_gap,
551
+ max_obj_num,
552
+ points_per_side,
553
+ origin_frame
554
+ ],
555
+ outputs=[
556
+ Seg_Tracker, input_first_frame, click_stack, grounding_caption
557
+ ],
558
+ queue=False,
559
+ )
560
+
561
+ # Use SAM to segment everything for the first frame of video
562
+ seg_every_first_frame.click(
563
+ fn=segment_everything,
564
+ inputs=[
565
+ Seg_Tracker,
566
+ aot_model,
567
+ long_term_mem,
568
+ max_len_long_term,
569
+ origin_frame,
570
+ sam_gap,
571
+ max_obj_num,
572
+ points_per_side,
573
+
574
+ ],
575
+ outputs=[
576
+ Seg_Tracker,
577
+ input_first_frame,
578
+ ],
579
+ )
580
+
581
+ # Interactively modify the mask acc click
582
+ input_first_frame.select(
583
+ fn=sam_click,
584
+ inputs=[
585
+ Seg_Tracker, origin_frame, point_mode, click_stack,
586
+ aot_model,
587
+ long_term_mem,
588
+ max_len_long_term,
589
+ sam_gap,
590
+ max_obj_num,
591
+ points_per_side,
592
+ ],
593
+ outputs=[
594
+ Seg_Tracker, input_first_frame, click_stack
595
+ ]
596
+ )
597
+
598
+ # Interactively segment acc stroke
599
+ seg_acc_stroke.click(
600
+ fn=sam_stroke,
601
+ inputs=[
602
+ Seg_Tracker, origin_frame, drawing_board,
603
+ aot_model,
604
+ long_term_mem,
605
+ max_len_long_term,
606
+ sam_gap,
607
+ max_obj_num,
608
+ points_per_side,
609
+ ],
610
+ outputs=[
611
+ Seg_Tracker, input_first_frame, drawing_board
612
+ ]
613
+ )
614
+
615
+ # Use grounding-dino to detect object
616
+ detect_button.click(
617
+ fn=gd_detect,
618
+ inputs=[
619
+ Seg_Tracker, origin_frame, grounding_caption, box_threshold, text_threshold,
620
+ aot_model, long_term_mem, max_len_long_term, sam_gap, max_obj_num, points_per_side
621
+ ],
622
+ outputs=[
623
+ Seg_Tracker, input_first_frame
624
+ ]
625
+ )
626
+
627
+ # Add new object
628
+ new_object_button.click(
629
+ fn=add_new_object,
630
+ inputs=
631
+ [
632
+ Seg_Tracker
633
+ ],
634
+ outputs=
635
+ [
636
+ Seg_Tracker, click_stack
637
+ ]
638
+ )
639
+
640
+ # Track object in video
641
+ track_for_video.click(
642
+ fn=tracking_objects,
643
+ inputs=[
644
+ Seg_Tracker,
645
+ input_video,
646
+ input_img_seq,
647
+ fps,
648
+ ],
649
+ outputs=[
650
+ output_video, output_mask
651
+ ]
652
+ )
653
+
654
+ # ----------------- Reset and Undo ---------------------------
655
+
656
+ # Rest
657
+ reset_button.click(
658
+ fn=init_SegTracker,
659
+ inputs=[
660
+ aot_model,
661
+ long_term_mem,
662
+ max_len_long_term,
663
+ sam_gap,
664
+ max_obj_num,
665
+ points_per_side,
666
+ origin_frame
667
+ ],
668
+ outputs=[
669
+ Seg_Tracker, input_first_frame, click_stack, grounding_caption
670
+ ],
671
+ queue=False,
672
+ show_progress=False
673
+ )
674
+
675
+ # every_reset_but.click(
676
+ # fn=init_SegTracker,
677
+ # inputs=[
678
+ # aot_model,
679
+ # sam_gap,
680
+ # max_obj_num,
681
+ # points_per_side,
682
+ # origin_frame
683
+ # ],
684
+ # outputs=[
685
+ # Seg_Tracker, input_first_frame, click_stack, grounding_caption
686
+ # ],
687
+ # queue=False,
688
+ # show_progress=False
689
+ # )
690
+
691
+ # click_reset_but.click(
692
+ # fn=init_SegTracker,
693
+ # inputs=[
694
+ # aot_model,
695
+ # sam_gap,
696
+ # max_obj_num,
697
+ # points_per_side,
698
+ # origin_frame
699
+ # ],
700
+ # outputs=[
701
+ # Seg_Tracker, input_first_frame, click_stack, grounding_caption
702
+ # ],
703
+ # queue=False,
704
+ # show_progress=False
705
+ # )
706
+
707
+ # stroke_reset_but.click(
708
+ # fn=init_SegTracker_Stroke,
709
+ # inputs=[
710
+ # aot_model,
711
+ # sam_gap,
712
+ # max_obj_num,
713
+ # points_per_side,
714
+ # origin_frame,
715
+ # ],
716
+ # outputs=[
717
+ # Seg_Tracker, input_first_frame, click_stack, drawing_board
718
+ # ],
719
+ # queue=False,
720
+ # show_progress=False
721
+ # )
722
+
723
+ # Undo click
724
+ click_undo_but.click(
725
+ fn = undo_click_stack_and_refine_seg,
726
+ inputs=[
727
+ Seg_Tracker, origin_frame, click_stack,
728
+ aot_model,
729
+ long_term_mem,
730
+ max_len_long_term,
731
+ sam_gap,
732
+ max_obj_num,
733
+ points_per_side,
734
+ ],
735
+ outputs=[
736
+ Seg_Tracker, input_first_frame, click_stack
737
+ ]
738
+ )
739
+
740
+ every_undo_but.click(
741
+ fn = undo_click_stack_and_refine_seg,
742
+ inputs=[
743
+ Seg_Tracker, origin_frame, click_stack,
744
+ aot_model,
745
+ long_term_mem,
746
+ max_len_long_term,
747
+ sam_gap,
748
+ max_obj_num,
749
+ points_per_side,
750
+ ],
751
+ outputs=[
752
+ Seg_Tracker, input_first_frame, click_stack
753
+ ]
754
+ )
755
+
756
+ with gr.Tab(label='Video example'):
757
+ gr.Examples(
758
+ examples=[
759
+ # os.path.join(os.path.dirname(__file__), "assets", "840_iSXIa0hE8Ek.mp4"),
760
+ os.path.join(os.path.dirname(__file__), "assets", "blackswan.mp4"),
761
+ # os.path.join(os.path.dirname(__file__), "assets", "bear.mp4"),
762
+ # os.path.join(os.path.dirname(__file__), "assets", "camel.mp4"),
763
+ # os.path.join(os.path.dirname(__file__), "assets", "skate-park.mp4"),
764
+ # os.path.join(os.path.dirname(__file__), "assets", "swing.mp4"),
765
+ ],
766
+ inputs=[input_video],
767
+ )
768
+
769
+ with gr.Tab(label='Image-seq expamle'):
770
+ gr.Examples(
771
+ examples=[
772
+ os.path.join(os.path.dirname(__file__), "assets", "840_iSXIa0hE8Ek.zip"),
773
+ ],
774
+ inputs=[input_img_seq],
775
+ )
776
+
777
+ app.queue(concurrency_count=1)
778
+ app.launch(debug=True, enable_queue=True, share=True)
779
+
780
+
781
+ if __name__ == "__main__":
782
+ seg_track_app()
demo.ipynb ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import os\n",
10
+ "import cv2\n",
11
+ "from SegTracker import SegTracker\n",
12
+ "from model_args import aot_args,sam_args,segtracker_args\n",
13
+ "from PIL import Image\n",
14
+ "from aot_tracker import _palette\n",
15
+ "import numpy as np\n",
16
+ "import torch\n",
17
+ "import imageio\n",
18
+ "import matplotlib.pyplot as plt\n",
19
+ "from scipy.ndimage import binary_dilation\n",
20
+ "import gc\n",
21
+ "def save_prediction(pred_mask,output_dir,file_name):\n",
22
+ " save_mask = Image.fromarray(pred_mask.astype(np.uint8))\n",
23
+ " save_mask = save_mask.convert(mode='P')\n",
24
+ " save_mask.putpalette(_palette)\n",
25
+ " save_mask.save(os.path.join(output_dir,file_name))\n",
26
+ "def colorize_mask(pred_mask):\n",
27
+ " save_mask = Image.fromarray(pred_mask.astype(np.uint8))\n",
28
+ " save_mask = save_mask.convert(mode='P')\n",
29
+ " save_mask.putpalette(_palette)\n",
30
+ " save_mask = save_mask.convert(mode='RGB')\n",
31
+ " return np.array(save_mask)\n",
32
+ "def draw_mask(img, mask, alpha=0.5, id_countour=False):\n",
33
+ " img_mask = np.zeros_like(img)\n",
34
+ " img_mask = img\n",
35
+ " if id_countour:\n",
36
+ " # very slow ~ 1s per image\n",
37
+ " obj_ids = np.unique(mask)\n",
38
+ " obj_ids = obj_ids[obj_ids!=0]\n",
39
+ "\n",
40
+ " for id in obj_ids:\n",
41
+ " # Overlay color on binary mask\n",
42
+ " if id <= 255:\n",
43
+ " color = _palette[id*3:id*3+3]\n",
44
+ " else:\n",
45
+ " color = [0,0,0]\n",
46
+ " foreground = img * (1-alpha) + np.ones_like(img) * alpha * np.array(color)\n",
47
+ " binary_mask = (mask == id)\n",
48
+ "\n",
49
+ " # Compose image\n",
50
+ " img_mask[binary_mask] = foreground[binary_mask]\n",
51
+ "\n",
52
+ " countours = binary_dilation(binary_mask,iterations=1) ^ binary_mask\n",
53
+ " img_mask[countours, :] = 0\n",
54
+ " else:\n",
55
+ " binary_mask = (mask!=0)\n",
56
+ " countours = binary_dilation(binary_mask,iterations=1) ^ binary_mask\n",
57
+ " foreground = img*(1-alpha)+colorize_mask(mask)*alpha\n",
58
+ " img_mask[binary_mask] = foreground[binary_mask]\n",
59
+ " img_mask[countours,:] = 0\n",
60
+ " \n",
61
+ " return img_mask.astype(img.dtype)"
62
+ ]
63
+ },
64
+ {
65
+ "attachments": {},
66
+ "cell_type": "markdown",
67
+ "metadata": {},
68
+ "source": [
69
+ "### Set parameters for input and output"
70
+ ]
71
+ },
72
+ {
73
+ "cell_type": "code",
74
+ "execution_count": 2,
75
+ "metadata": {},
76
+ "outputs": [],
77
+ "source": [
78
+ "video_name = 'cell'\n",
79
+ "io_args = {\n",
80
+ " 'input_video': f'./assets/{video_name}.mp4',\n",
81
+ " 'output_mask_dir': f'./assets/{video_name}_masks', # save pred masks\n",
82
+ " 'output_video': f'./assets/{video_name}_seg.mp4', # mask+frame vizualization, mp4 or avi, else the same as input video\n",
83
+ " 'output_gif': f'./assets/{video_name}_seg.gif', # mask visualization\n",
84
+ "}"
85
+ ]
86
+ },
87
+ {
88
+ "attachments": {},
89
+ "cell_type": "markdown",
90
+ "metadata": {},
91
+ "source": [
92
+ "### Tuning SAM on the First Frame for Good Initialization"
93
+ ]
94
+ },
95
+ {
96
+ "cell_type": "code",
97
+ "execution_count": null,
98
+ "metadata": {},
99
+ "outputs": [],
100
+ "source": [
101
+ "# choose good parameters in sam_args based on the first frame segmentation result\n",
102
+ "# other arguments can be modified in model_args.py\n",
103
+ "# note the object number limit is 255 by default, which requires < 10GB GPU memory with amp\n",
104
+ "sam_args['generator_args'] = {\n",
105
+ " 'points_per_side': 30,\n",
106
+ " 'pred_iou_thresh': 0.8,\n",
107
+ " 'stability_score_thresh': 0.9,\n",
108
+ " 'crop_n_layers': 1,\n",
109
+ " 'crop_n_points_downscale_factor': 2,\n",
110
+ " 'min_mask_region_area': 200,\n",
111
+ " }\n",
112
+ "cap = cv2.VideoCapture(io_args['input_video'])\n",
113
+ "frame_idx = 0\n",
114
+ "segtracker = SegTracker(segtracker_args,sam_args,aot_args)\n",
115
+ "segtracker.restart_tracker()\n",
116
+ "with torch.cuda.amp.autocast():\n",
117
+ " while cap.isOpened():\n",
118
+ " ret, frame = cap.read()\n",
119
+ " frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)\n",
120
+ " pred_mask = segtracker.seg(frame)\n",
121
+ " torch.cuda.empty_cache()\n",
122
+ " obj_ids = np.unique(pred_mask)\n",
123
+ " obj_ids = obj_ids[obj_ids!=0]\n",
124
+ " print(\"processed frame {}, obj_num {}\".format(frame_idx,len(obj_ids)),end='\\n')\n",
125
+ " break\n",
126
+ " cap.release()\n",
127
+ " init_res = draw_mask(frame,pred_mask,id_countour=False)\n",
128
+ " plt.figure(figsize=(10,10))\n",
129
+ " plt.axis('off')\n",
130
+ " plt.imshow(init_res)\n",
131
+ " plt.show()\n",
132
+ " plt.figure(figsize=(10,10))\n",
133
+ " plt.axis('off')\n",
134
+ " plt.imshow(colorize_mask(pred_mask))\n",
135
+ " plt.show()\n",
136
+ "\n",
137
+ " del segtracker\n",
138
+ " torch.cuda.empty_cache()\n",
139
+ " gc.collect()"
140
+ ]
141
+ },
142
+ {
143
+ "attachments": {},
144
+ "cell_type": "markdown",
145
+ "metadata": {},
146
+ "source": [
147
+ "### Generate Results for the Whole Video"
148
+ ]
149
+ },
150
+ {
151
+ "cell_type": "code",
152
+ "execution_count": null,
153
+ "metadata": {},
154
+ "outputs": [],
155
+ "source": [
156
+ "# For every sam_gap frames, we use SAM to find new objects and add them for tracking\n",
157
+ "# larger sam_gap is faster but may not spot new objects in time\n",
158
+ "segtracker_args = {\n",
159
+ " 'sam_gap': 5, # the interval to run sam to segment new objects\n",
160
+ " 'min_area': 200, # minimal mask area to add a new mask as a new object\n",
161
+ " 'max_obj_num': 255, # maximal object number to track in a video\n",
162
+ " 'min_new_obj_iou': 0.8, # the area of a new object in the background should > 80% \n",
163
+ "}\n",
164
+ "\n",
165
+ "# source video to segment\n",
166
+ "cap = cv2.VideoCapture(io_args['input_video'])\n",
167
+ "fps = cap.get(cv2.CAP_PROP_FPS)\n",
168
+ "# output masks\n",
169
+ "output_dir = io_args['output_mask_dir']\n",
170
+ "if not os.path.exists(output_dir):\n",
171
+ " os.makedirs(output_dir)\n",
172
+ "pred_list = []\n",
173
+ "masked_pred_list = []\n",
174
+ "\n",
175
+ "torch.cuda.empty_cache()\n",
176
+ "gc.collect()\n",
177
+ "sam_gap = segtracker_args['sam_gap']\n",
178
+ "frame_idx = 0\n",
179
+ "segtracker = SegTracker(segtracker_args,sam_args,aot_args)\n",
180
+ "segtracker.restart_tracker()\n",
181
+ "\n",
182
+ "with torch.cuda.amp.autocast():\n",
183
+ " while cap.isOpened():\n",
184
+ " ret, frame = cap.read()\n",
185
+ " if not ret:\n",
186
+ " break\n",
187
+ " frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)\n",
188
+ " if frame_idx == 0:\n",
189
+ " pred_mask = segtracker.seg(frame)\n",
190
+ " torch.cuda.empty_cache()\n",
191
+ " gc.collect()\n",
192
+ " segtracker.add_reference(frame, pred_mask)\n",
193
+ " elif (frame_idx % sam_gap) == 0:\n",
194
+ " seg_mask = segtracker.seg(frame)\n",
195
+ " torch.cuda.empty_cache()\n",
196
+ " gc.collect()\n",
197
+ " track_mask = segtracker.track(frame)\n",
198
+ " # find new objects, and update tracker with new objects\n",
199
+ " new_obj_mask = segtracker.find_new_objs(track_mask,seg_mask)\n",
200
+ " save_prediction(new_obj_mask,output_dir,str(frame_idx)+'_new.png')\n",
201
+ " pred_mask = track_mask + new_obj_mask\n",
202
+ " # segtracker.restart_tracker()\n",
203
+ " segtracker.add_reference(frame, pred_mask)\n",
204
+ " else:\n",
205
+ " pred_mask = segtracker.track(frame,update_memory=True)\n",
206
+ " torch.cuda.empty_cache()\n",
207
+ " gc.collect()\n",
208
+ " save_prediction(pred_mask,output_dir,str(frame_idx)+'.png')\n",
209
+ " # masked_frame = draw_mask(frame,pred_mask)\n",
210
+ " # masked_pred_list.append(masked_frame)\n",
211
+ " # plt.imshow(masked_frame)\n",
212
+ " # plt.show() \n",
213
+ " \n",
214
+ " pred_list.append(pred_mask)\n",
215
+ " \n",
216
+ " \n",
217
+ " print(\"processed frame {}, obj_num {}\".format(frame_idx,segtracker.get_obj_num()),end='\\r')\n",
218
+ " frame_idx += 1\n",
219
+ " cap.release()\n",
220
+ " print('\\nfinished')"
221
+ ]
222
+ },
223
+ {
224
+ "attachments": {},
225
+ "cell_type": "markdown",
226
+ "metadata": {},
227
+ "source": [
228
+ "### Save results for visualization"
229
+ ]
230
+ },
231
+ {
232
+ "cell_type": "code",
233
+ "execution_count": null,
234
+ "metadata": {},
235
+ "outputs": [],
236
+ "source": [
237
+ "# draw pred mask on frame and save as a video\n",
238
+ "cap = cv2.VideoCapture(io_args['input_video'])\n",
239
+ "fps = cap.get(cv2.CAP_PROP_FPS)\n",
240
+ "width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))\n",
241
+ "height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))\n",
242
+ "num_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))\n",
243
+ "\n",
244
+ "if io_args['input_video'][-3:]=='mp4':\n",
245
+ " fourcc = cv2.VideoWriter_fourcc(*\"mp4v\")\n",
246
+ "elif io_args['input_video'][-3:] == 'avi':\n",
247
+ " fourcc = cv2.VideoWriter_fourcc(*\"MJPG\")\n",
248
+ " # fourcc = cv2.VideoWriter_fourcc(*\"XVID\")\n",
249
+ "else:\n",
250
+ " fourcc = int(cap.get(cv2.CAP_PROP_FOURCC))\n",
251
+ "out = cv2.VideoWriter(io_args['output_video'], fourcc, fps, (width, height))\n",
252
+ "\n",
253
+ "frame_idx = 0\n",
254
+ "while cap.isOpened():\n",
255
+ " ret, frame = cap.read()\n",
256
+ " if not ret:\n",
257
+ " break\n",
258
+ " frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)\n",
259
+ " pred_mask = pred_list[frame_idx]\n",
260
+ " masked_frame = draw_mask(frame,pred_mask)\n",
261
+ " # masked_frame = masked_pred_list[frame_idx]\n",
262
+ " masked_frame = cv2.cvtColor(masked_frame,cv2.COLOR_RGB2BGR)\n",
263
+ " out.write(masked_frame)\n",
264
+ " print('frame {} writed'.format(frame_idx),end='\\r')\n",
265
+ " frame_idx += 1\n",
266
+ "out.release()\n",
267
+ "cap.release()\n",
268
+ "print(\"\\n{} saved\".format(io_args['output_video']))\n",
269
+ "print('\\nfinished')"
270
+ ]
271
+ },
272
+ {
273
+ "cell_type": "code",
274
+ "execution_count": null,
275
+ "metadata": {},
276
+ "outputs": [],
277
+ "source": [
278
+ "# save colorized masks as a gif\n",
279
+ "imageio.mimsave(io_args['output_gif'],pred_list,fps=fps)\n",
280
+ "print(\"{} saved\".format(io_args['output_gif']))"
281
+ ]
282
+ },
283
+ {
284
+ "cell_type": "code",
285
+ "execution_count": 6,
286
+ "metadata": {},
287
+ "outputs": [
288
+ {
289
+ "data": {
290
+ "text/plain": [
291
+ "301"
292
+ ]
293
+ },
294
+ "execution_count": 6,
295
+ "metadata": {},
296
+ "output_type": "execute_result"
297
+ }
298
+ ],
299
+ "source": [
300
+ "# manually release memory (after cuda out of memory)\n",
301
+ "del segtracker\n",
302
+ "torch.cuda.empty_cache()\n",
303
+ "gc.collect()"
304
+ ]
305
+ }
306
+ ],
307
+ "metadata": {
308
+ "kernelspec": {
309
+ "display_name": "Python 3.8.5 64-bit ('ldm': conda)",
310
+ "language": "python",
311
+ "name": "python3"
312
+ },
313
+ "language_info": {
314
+ "codemirror_mode": {
315
+ "name": "ipython",
316
+ "version": 3
317
+ },
318
+ "file_extension": ".py",
319
+ "mimetype": "text/x-python",
320
+ "name": "python",
321
+ "nbconvert_exporter": "python",
322
+ "pygments_lexer": "ipython3",
323
+ "version": "3.8.5"
324
+ },
325
+ "orig_nbformat": 4,
326
+ "vscode": {
327
+ "interpreter": {
328
+ "hash": "536611da043600e50719c9460971b5220bad26cd4a87e5994bfd4c9e9e5e7fb0"
329
+ }
330
+ }
331
+ },
332
+ "nbformat": 4,
333
+ "nbformat_minor": 2
334
+ }
demo_instseg.ipynb ADDED
@@ -0,0 +1,353 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import os\n",
10
+ "import cv2\n",
11
+ "from SegTracker import SegTracker\n",
12
+ "from model_args import aot_args,sam_args,segtracker_args\n",
13
+ "from PIL import Image\n",
14
+ "from aot_tracker import _palette\n",
15
+ "import numpy as np\n",
16
+ "import torch\n",
17
+ "import imageio\n",
18
+ "import matplotlib.pyplot as plt\n",
19
+ "from scipy.ndimage import binary_dilation\n",
20
+ "import gc\n",
21
+ "def save_prediction(pred_mask,output_dir,file_name):\n",
22
+ " save_mask = Image.fromarray(pred_mask.astype(np.uint8))\n",
23
+ " save_mask = save_mask.convert(mode='P')\n",
24
+ " save_mask.putpalette(_palette)\n",
25
+ " save_mask.save(os.path.join(output_dir,file_name))\n",
26
+ "def colorize_mask(pred_mask):\n",
27
+ " save_mask = Image.fromarray(pred_mask.astype(np.uint8))\n",
28
+ " save_mask = save_mask.convert(mode='P')\n",
29
+ " save_mask.putpalette(_palette)\n",
30
+ " save_mask = save_mask.convert(mode='RGB')\n",
31
+ " return np.array(save_mask)\n",
32
+ "def draw_mask(img, mask, alpha=0.7, id_countour=False):\n",
33
+ " img_mask = np.zeros_like(img)\n",
34
+ " img_mask = img\n",
35
+ " if id_countour:\n",
36
+ " # very slow ~ 1s per image\n",
37
+ " obj_ids = np.unique(mask)\n",
38
+ " obj_ids = obj_ids[obj_ids!=0]\n",
39
+ "\n",
40
+ " for id in obj_ids:\n",
41
+ " # Overlay color on binary mask\n",
42
+ " if id <= 255:\n",
43
+ " color = _palette[id*3:id*3+3]\n",
44
+ " else:\n",
45
+ " color = [0,0,0]\n",
46
+ " foreground = img * (1-alpha) + np.ones_like(img) * alpha * np.array(color)\n",
47
+ " binary_mask = (mask == id)\n",
48
+ "\n",
49
+ " # Compose image\n",
50
+ " img_mask[binary_mask] = foreground[binary_mask]\n",
51
+ "\n",
52
+ " countours = binary_dilation(binary_mask,iterations=1) ^ binary_mask\n",
53
+ " img_mask[countours, :] = 0\n",
54
+ " else:\n",
55
+ " binary_mask = (mask!=0)\n",
56
+ " countours = binary_dilation(binary_mask,iterations=1) ^ binary_mask\n",
57
+ " foreground = img*(1-alpha)+colorize_mask(mask)*alpha\n",
58
+ " img_mask[binary_mask] = foreground[binary_mask]\n",
59
+ " img_mask[countours,:] = 0\n",
60
+ " \n",
61
+ " return img_mask.astype(img.dtype)"
62
+ ]
63
+ },
64
+ {
65
+ "attachments": {},
66
+ "cell_type": "markdown",
67
+ "metadata": {},
68
+ "source": [
69
+ "### Set parameters for input and output"
70
+ ]
71
+ },
72
+ {
73
+ "cell_type": "code",
74
+ "execution_count": 2,
75
+ "metadata": {},
76
+ "outputs": [],
77
+ "source": [
78
+ "video_name = 'cars'\n",
79
+ "io_args = {\n",
80
+ " 'input_video': f'./assets/{video_name}.mp4',\n",
81
+ " 'output_mask_dir': f'./assets/{video_name}_masks', # save pred masks\n",
82
+ " 'output_video': f'./assets/{video_name}_seg.mp4', # mask+frame vizualization, mp4 or avi, else the same as input video\n",
83
+ " 'output_gif': f'./assets/{video_name}_seg.gif', # mask visualization\n",
84
+ "}"
85
+ ]
86
+ },
87
+ {
88
+ "attachments": {},
89
+ "cell_type": "markdown",
90
+ "metadata": {},
91
+ "source": [
92
+ "### Tuning Grounding-DINO and SAM on the First Frame for Good Initialization"
93
+ ]
94
+ },
95
+ {
96
+ "cell_type": "code",
97
+ "execution_count": null,
98
+ "metadata": {},
99
+ "outputs": [],
100
+ "source": [
101
+ "# choose good parameters in sam_args based on the first frame segmentation result\n",
102
+ "# other arguments can be modified in model_args.py\n",
103
+ "# note the object number limit is 255 by default, which requires < 10GB GPU memory with amp\n",
104
+ "sam_args['generator_args'] = {\n",
105
+ " 'points_per_side': 30,\n",
106
+ " 'pred_iou_thresh': 0.8,\n",
107
+ " 'stability_score_thresh': 0.9,\n",
108
+ " 'crop_n_layers': 1,\n",
109
+ " 'crop_n_points_downscale_factor': 2,\n",
110
+ " 'min_mask_region_area': 200,\n",
111
+ " }\n",
112
+ "\n",
113
+ "# Set Text args\n",
114
+ "'''\n",
115
+ "parameter:\n",
116
+ " grounding_caption: Text prompt to detect objects in key-frames\n",
117
+ " box_threshold: threshold for box \n",
118
+ " text_threshold: threshold for label(text)\n",
119
+ " box_size_threshold: If the size ratio between the box and the frame is larger than the box_size_threshold, the box will be ignored. This is used to filter out large boxes.\n",
120
+ " reset_image: reset the image embeddings for SAM\n",
121
+ "'''\n",
122
+ "grounding_caption = \"car.suv\"\n",
123
+ "box_threshold, text_threshold, box_size_threshold, reset_image = 0.35, 0.5, 0.5, True\n",
124
+ "\n",
125
+ "cap = cv2.VideoCapture(io_args['input_video'])\n",
126
+ "frame_idx = 0\n",
127
+ "segtracker = SegTracker(segtracker_args,sam_args,aot_args)\n",
128
+ "segtracker.restart_tracker()\n",
129
+ "with torch.cuda.amp.autocast():\n",
130
+ " while cap.isOpened():\n",
131
+ " ret, frame = cap.read()\n",
132
+ " frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)\n",
133
+ " pred_mask, annotated_frame = segtracker.detect_and_seg(frame, grounding_caption, box_threshold, text_threshold, box_size_threshold)\n",
134
+ " torch.cuda.empty_cache()\n",
135
+ " obj_ids = np.unique(pred_mask)\n",
136
+ " obj_ids = obj_ids[obj_ids!=0]\n",
137
+ " print(\"processed frame {}, obj_num {}\".format(frame_idx,len(obj_ids)),end='\\n')\n",
138
+ " break\n",
139
+ " cap.release()\n",
140
+ " init_res = draw_mask(annotated_frame, pred_mask,id_countour=False)\n",
141
+ " plt.figure(figsize=(10,10))\n",
142
+ " plt.axis('off')\n",
143
+ " plt.imshow(init_res)\n",
144
+ " plt.show()\n",
145
+ " plt.figure(figsize=(10,10))\n",
146
+ " plt.axis('off')\n",
147
+ " plt.imshow(colorize_mask(pred_mask))\n",
148
+ " plt.show()\n",
149
+ "\n",
150
+ " del segtracker\n",
151
+ " torch.cuda.empty_cache()\n",
152
+ " gc.collect()"
153
+ ]
154
+ },
155
+ {
156
+ "attachments": {},
157
+ "cell_type": "markdown",
158
+ "metadata": {},
159
+ "source": [
160
+ "### Generate Results for the Whole Video"
161
+ ]
162
+ },
163
+ {
164
+ "cell_type": "code",
165
+ "execution_count": null,
166
+ "metadata": {},
167
+ "outputs": [],
168
+ "source": [
169
+ "# For every sam_gap frames, we use SAM to find new objects and add them for tracking\n",
170
+ "# larger sam_gap is faster but may not spot new objects in time\n",
171
+ "segtracker_args = {\n",
172
+ " 'sam_gap': 49, # the interval to run sam to segment new objects\n",
173
+ " 'min_area': 200, # minimal mask area to add a new mask as a new object\n",
174
+ " 'max_obj_num': 255, # maximal object number to track in a video\n",
175
+ " 'min_new_obj_iou': 0.8, # the area of a new object in the background should > 80% \n",
176
+ "}\n",
177
+ "\n",
178
+ "# source video to segment\n",
179
+ "cap = cv2.VideoCapture(io_args['input_video'])\n",
180
+ "fps = cap.get(cv2.CAP_PROP_FPS)\n",
181
+ "# output masks\n",
182
+ "output_dir = io_args['output_mask_dir']\n",
183
+ "if not os.path.exists(output_dir):\n",
184
+ " os.makedirs(output_dir)\n",
185
+ "pred_list = []\n",
186
+ "masked_pred_list = []\n",
187
+ "\n",
188
+ "torch.cuda.empty_cache()\n",
189
+ "gc.collect()\n",
190
+ "sam_gap = segtracker_args['sam_gap']\n",
191
+ "frame_idx = 0\n",
192
+ "segtracker = SegTracker(segtracker_args, sam_args, aot_args)\n",
193
+ "segtracker.restart_tracker()\n",
194
+ "\n",
195
+ "with torch.cuda.amp.autocast():\n",
196
+ " while cap.isOpened():\n",
197
+ " ret, frame = cap.read()\n",
198
+ " if not ret:\n",
199
+ " break\n",
200
+ " frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)\n",
201
+ " if frame_idx == 0:\n",
202
+ " pred_mask, _ = segtracker.detect_and_seg(frame, grounding_caption, box_threshold, text_threshold, box_size_threshold, reset_image)\n",
203
+ " # pred_mask = cv2.imread('./debug/first_frame_mask.png', 0)\n",
204
+ " torch.cuda.empty_cache()\n",
205
+ " gc.collect()\n",
206
+ " segtracker.add_reference(frame, pred_mask)\n",
207
+ " elif (frame_idx % sam_gap) == 0:\n",
208
+ " seg_mask, _ = segtracker.detect_and_seg(frame, grounding_caption, box_threshold, text_threshold, box_size_threshold, reset_image)\n",
209
+ " save_prediction(seg_mask, './debug/seg_result', str(frame_idx)+'.png')\n",
210
+ " torch.cuda.empty_cache()\n",
211
+ " gc.collect()\n",
212
+ " track_mask = segtracker.track(frame)\n",
213
+ " save_prediction(track_mask, './debug/aot_result', str(frame_idx)+'.png')\n",
214
+ " # find new objects, and update tracker with new objects\n",
215
+ " new_obj_mask = segtracker.find_new_objs(track_mask, seg_mask)\n",
216
+ " if np.sum(new_obj_mask > 0) > frame.shape[0] * frame.shape[1] * 0.4:\n",
217
+ " new_obj_mask = np.zeros_like(new_obj_mask)\n",
218
+ " save_prediction(new_obj_mask,output_dir,str(frame_idx)+'_new.png')\n",
219
+ " pred_mask = track_mask + new_obj_mask\n",
220
+ " # segtracker.restart_tracker()\n",
221
+ " segtracker.add_reference(frame, pred_mask)\n",
222
+ " else:\n",
223
+ " pred_mask = segtracker.track(frame,update_memory=True)\n",
224
+ " torch.cuda.empty_cache()\n",
225
+ " gc.collect()\n",
226
+ " \n",
227
+ " save_prediction(pred_mask,output_dir,str(frame_idx)+'.png')\n",
228
+ " # masked_frame = draw_mask(frame,pred_mask)\n",
229
+ " # masked_pred_list.append(masked_frame)\n",
230
+ " # plt.imshow(masked_frame)\n",
231
+ " # plt.show() \n",
232
+ " \n",
233
+ " pred_list.append(pred_mask)\n",
234
+ " \n",
235
+ " \n",
236
+ " print(\"processed frame {}, obj_num {}\".format(frame_idx,segtracker.get_obj_num()),end='\\r')\n",
237
+ " frame_idx += 1\n",
238
+ " cap.release()\n",
239
+ " print('\\nfinished')"
240
+ ]
241
+ },
242
+ {
243
+ "attachments": {},
244
+ "cell_type": "markdown",
245
+ "metadata": {},
246
+ "source": [
247
+ "### Save results for visualization"
248
+ ]
249
+ },
250
+ {
251
+ "cell_type": "code",
252
+ "execution_count": null,
253
+ "metadata": {},
254
+ "outputs": [],
255
+ "source": [
256
+ "# draw pred mask on frame and save as a video\n",
257
+ "cap = cv2.VideoCapture(io_args['input_video'])\n",
258
+ "fps = cap.get(cv2.CAP_PROP_FPS)\n",
259
+ "width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))\n",
260
+ "height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))\n",
261
+ "num_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))\n",
262
+ "\n",
263
+ "if io_args['input_video'][-3:]=='mp4':\n",
264
+ " fourcc = cv2.VideoWriter_fourcc(*\"mp4v\")\n",
265
+ "elif io_args['input_video'][-3:] == 'avi':\n",
266
+ " fourcc = cv2.VideoWriter_fourcc(*\"MJPG\")\n",
267
+ " # fourcc = cv2.VideoWriter_fourcc(*\"XVID\")\n",
268
+ "else:\n",
269
+ " fourcc = int(cap.get(cv2.CAP_PROP_FOURCC))\n",
270
+ "out = cv2.VideoWriter(io_args['output_video'], fourcc, fps, (width, height))\n",
271
+ "\n",
272
+ "frame_idx = 0\n",
273
+ "while cap.isOpened():\n",
274
+ " ret, frame = cap.read()\n",
275
+ " if not ret:\n",
276
+ " break\n",
277
+ " frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)\n",
278
+ " pred_mask = pred_list[frame_idx]\n",
279
+ " masked_frame = draw_mask(frame,pred_mask)\n",
280
+ " # masked_frame = masked_pred_list[frame_idx]\n",
281
+ " masked_frame = cv2.cvtColor(masked_frame,cv2.COLOR_RGB2BGR)\n",
282
+ " out.write(masked_frame)\n",
283
+ " print('frame {} writed'.format(frame_idx),end='\\r')\n",
284
+ " frame_idx += 1\n",
285
+ "out.release()\n",
286
+ "cap.release()\n",
287
+ "print(\"\\n{} saved\".format(io_args['output_video']))\n",
288
+ "print('\\nfinished')"
289
+ ]
290
+ },
291
+ {
292
+ "cell_type": "code",
293
+ "execution_count": null,
294
+ "metadata": {},
295
+ "outputs": [],
296
+ "source": [
297
+ "# save colorized masks as a gif\n",
298
+ "imageio.mimsave(io_args['output_gif'],pred_list,fps=fps)\n",
299
+ "print(\"{} saved\".format(io_args['output_gif']))"
300
+ ]
301
+ },
302
+ {
303
+ "cell_type": "code",
304
+ "execution_count": 13,
305
+ "metadata": {},
306
+ "outputs": [
307
+ {
308
+ "data": {
309
+ "text/plain": [
310
+ "21"
311
+ ]
312
+ },
313
+ "execution_count": 13,
314
+ "metadata": {},
315
+ "output_type": "execute_result"
316
+ }
317
+ ],
318
+ "source": [
319
+ "# manually release memory (after cuda out of memory)\n",
320
+ "del segtracker\n",
321
+ "torch.cuda.empty_cache()\n",
322
+ "gc.collect()"
323
+ ]
324
+ }
325
+ ],
326
+ "metadata": {
327
+ "kernelspec": {
328
+ "display_name": "Python 3.8.5 64-bit ('ldm': conda)",
329
+ "language": "python",
330
+ "name": "python3"
331
+ },
332
+ "language_info": {
333
+ "codemirror_mode": {
334
+ "name": "ipython",
335
+ "version": 3
336
+ },
337
+ "file_extension": ".py",
338
+ "mimetype": "text/x-python",
339
+ "name": "python",
340
+ "nbconvert_exporter": "python",
341
+ "pygments_lexer": "ipython3",
342
+ "version": "3.8.5-final"
343
+ },
344
+ "orig_nbformat": 4,
345
+ "vscode": {
346
+ "interpreter": {
347
+ "hash": "536611da043600e50719c9460971b5220bad26cd4a87e5994bfd4c9e9e5e7fb0"
348
+ }
349
+ }
350
+ },
351
+ "nbformat": 4,
352
+ "nbformat_minor": 2
353
+ }
img2vid.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import os
3
+
4
+ # set the directory containing the images
5
+ img_dir = './assets/840_iSXIa0hE8Ek'
6
+
7
+ # set the output video file name and codec
8
+ out_file = './assets/840_iSXIa0hE8Ek.mp4'
9
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v')
10
+
11
+ # get the dimensions of the first image
12
+ img_path = os.path.join(img_dir, os.listdir(img_dir)[0])
13
+ img = cv2.imread(img_path)
14
+ height, width, channels = img.shape
15
+
16
+ # create the VideoWriter object
17
+ out = cv2.VideoWriter(out_file, fourcc, 10, (width, height))
18
+
19
+ # loop through the images and write them to the video
20
+ for img_name in sorted(os.listdir(img_dir)):
21
+ img_path = os.path.join(img_dir, img_name)
22
+ img = cv2.imread(img_path)
23
+ out.write(img)
24
+
25
+ # release the VideoWriter object and close the video file
26
+ out.release()
licenses.md ADDED
@@ -0,0 +1,660 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## <center> [SAM](https://github.com/facebookresearch/segment-anything/blob/main/LICENSE) </center>
2
+ <div align=center>
3
+
4
+ Apache License
5
+
6
+ Version 2.0, January 2004
7
+
8
+ http://www.apache.org/licenses/
9
+ </div>
10
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
11
+
12
+ 1. Definitions.
13
+
14
+ "License" shall mean the terms and conditions for use, reproduction,
15
+ and distribution as defined by Sections 1 through 9 of this document.
16
+
17
+ "Licensor" shall mean the copyright owner or entity authorized by
18
+ the copyright owner that is granting the License.
19
+
20
+ "Legal Entity" shall mean the union of the acting entity and all
21
+ other entities that control, are controlled by, or are under common
22
+ control with that entity. For the purposes of this definition,
23
+ "control" means (i) the power, direct or indirect, to cause the
24
+ direction or management of such entity, whether by contract or
25
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
26
+ outstanding shares, or (iii) beneficial ownership of such entity.
27
+
28
+ "You" (or "Your") shall mean an individual or Legal Entity
29
+ exercising permissions granted by this License.
30
+
31
+ "Source" form shall mean the preferred form for making modifications,
32
+ including but not limited to software source code, documentation
33
+ source, and configuration files.
34
+
35
+ "Object" form shall mean any form resulting from mechanical
36
+ transformation or translation of a Source form, including but
37
+ not limited to compiled object code, generated documentation,
38
+ and conversions to other media types.
39
+
40
+ "Work" shall mean the work of authorship, whether in Source or
41
+ Object form, made available under the License, as indicated by a
42
+ copyright notice that is included in or attached to the work
43
+ (an example is provided in the Appendix below).
44
+
45
+ "Derivative Works" shall mean any work, whether in Source or Object
46
+ form, that is based on (or derived from) the Work and for which the
47
+ editorial revisions, annotations, elaborations, or other modifications
48
+ represent, as a whole, an original work of authorship. For the purposes
49
+ of this License, Derivative Works shall not include works that remain
50
+ separable from, or merely link (or bind by name) to the interfaces of,
51
+ the Work and Derivative Works thereof.
52
+
53
+ "Contribution" shall mean any work of authorship, including
54
+ the original version of the Work and any modifications or additions
55
+ to that Work or Derivative Works thereof, that is intentionally
56
+ submitted to Licensor for inclusion in the Work by the copyright owner
57
+ or by an individual or Legal Entity authorized to submit on behalf of
58
+ the copyright owner. For the purposes of this definition, "submitted"
59
+ means any form of electronic, verbal, or written communication sent
60
+ to the Licensor or its representatives, including but not limited to
61
+ communication on electronic mailing lists, source code control systems,
62
+ and issue tracking systems that are managed by, or on behalf of, the
63
+ Licensor for the purpose of discussing and improving the Work, but
64
+ excluding communication that is conspicuously marked or otherwise
65
+ designated in writing by the copyright owner as "Not a Contribution."
66
+
67
+ "Contributor" shall mean Licensor and any individual or Legal Entity
68
+ on behalf of whom a Contribution has been received by Licensor and
69
+ subsequently incorporated within the Work.
70
+
71
+ 2. Grant of Copyright License. Subject to the terms and conditions of
72
+ this License, each Contributor hereby grants to You a perpetual,
73
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
74
+ copyright license to reproduce, prepare Derivative Works of,
75
+ publicly display, publicly perform, sublicense, and distribute the
76
+ Work and such Derivative Works in Source or Object form.
77
+
78
+ 3. Grant of Patent License. Subject to the terms and conditions of
79
+ this License, each Contributor hereby grants to You a perpetual,
80
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
81
+ (except as stated in this section) patent license to make, have made,
82
+ use, offer to sell, sell, import, and otherwise transfer the Work,
83
+ where such license applies only to those patent claims licensable
84
+ by such Contributor that are necessarily infringed by their
85
+ Contribution(s) alone or by combination of their Contribution(s)
86
+ with the Work to which such Contribution(s) was submitted. If You
87
+ institute patent litigation against any entity (including a
88
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
89
+ or a Contribution incorporated within the Work constitutes direct
90
+ or contributory patent infringement, then any patent licenses
91
+ granted to You under this License for that Work shall terminate
92
+ as of the date such litigation is filed.
93
+
94
+ 4. Redistribution. You may reproduce and distribute copies of the
95
+ Work or Derivative Works thereof in any medium, with or without
96
+ modifications, and in Source or Object form, provided that You
97
+ meet the following conditions:
98
+
99
+ (a) You must give any other recipients of the Work or
100
+ Derivative Works a copy of this License; and
101
+
102
+ (b) You must cause any modified files to carry prominent notices
103
+ stating that You changed the files; and
104
+
105
+ (c) You must retain, in the Source form of any Derivative Works
106
+ that You distribute, all copyright, patent, trademark, and
107
+ attribution notices from the Source form of the Work,
108
+ excluding those notices that do not pertain to any part of
109
+ the Derivative Works; and
110
+
111
+ (d) If the Work includes a "NOTICE" text file as part of its
112
+ distribution, then any Derivative Works that You distribute must
113
+ include a readable copy of the attribution notices contained
114
+ within such NOTICE file, excluding those notices that do not
115
+ pertain to any part of the Derivative Works, in at least one
116
+ of the following places: within a NOTICE text file distributed
117
+ as part of the Derivative Works; within the Source form or
118
+ documentation, if provided along with the Derivative Works; or,
119
+ within a display generated by the Derivative Works, if and
120
+ wherever such third-party notices normally appear. The contents
121
+ of the NOTICE file are for informational purposes only and
122
+ do not modify the License. You may add Your own attribution
123
+ notices within Derivative Works that You distribute, alongside
124
+ or as an addendum to the NOTICE text from the Work, provided
125
+ that such additional attribution notices cannot be construed
126
+ as modifying the License.
127
+
128
+ You may add Your own copyright statement to Your modifications and
129
+ may provide additional or different license terms and conditions
130
+ for use, reproduction, or distribution of Your modifications, or
131
+ for any such Derivative Works as a whole, provided Your use,
132
+ reproduction, and distribution of the Work otherwise complies with
133
+ the conditions stated in this License.
134
+
135
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
136
+ any Contribution intentionally submitted for inclusion in the Work
137
+ by You to the Licensor shall be under the terms and conditions of
138
+ this License, without any additional terms or conditions.
139
+ Notwithstanding the above, nothing herein shall supersede or modify
140
+ the terms of any separate license agreement you may have executed
141
+ with Licensor regarding such Contributions.
142
+
143
+ 6. Trademarks. This License does not grant permission to use the trade
144
+ names, trademarks, service marks, or product names of the Licensor,
145
+ except as required for reasonable and customary use in describing the
146
+ origin of the Work and reproducing the content of the NOTICE file.
147
+
148
+ 7. Disclaimer of Warranty. Unless required by applicable law or
149
+ agreed to in writing, Licensor provides the Work (and each
150
+ Contributor provides its Contributions) on an "AS IS" BASIS,
151
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
152
+ implied, including, without limitation, any warranties or conditions
153
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
154
+ PARTICULAR PURPOSE. You are solely responsible for determining the
155
+ appropriateness of using or redistributing the Work and assume any
156
+ risks associated with Your exercise of permissions under this License.
157
+
158
+ 8. Limitation of Liability. In no event and under no legal theory,
159
+ whether in tort (including negligence), contract, or otherwise,
160
+ unless required by applicable law (such as deliberate and grossly
161
+ negligent acts) or agreed to in writing, shall any Contributor be
162
+ liable to You for damages, including any direct, indirect, special,
163
+ incidental, or consequential damages of any character arising as a
164
+ result of this License or out of the use or inability to use the
165
+ Work (including but not limited to damages for loss of goodwill,
166
+ work stoppage, computer failure or malfunction, or any and all
167
+ other commercial damages or losses), even if such Contributor
168
+ has been advised of the possibility of such damages.
169
+
170
+ 9. Accepting Warranty or Additional Liability. While redistributing
171
+ the Work or Derivative Works thereof, You may choose to offer,
172
+ and charge a fee for, acceptance of support, warranty, indemnity,
173
+ or other liability obligations and/or rights consistent with this
174
+ License. However, in accepting such obligations, You may act only
175
+ on Your own behalf and on Your sole responsibility, not on behalf
176
+ of any other Contributor, and only if You agree to indemnify,
177
+ defend, and hold each Contributor harmless for any liability
178
+ incurred by, or claims asserted against, such Contributor by reason
179
+ of your accepting any such warranty or additional liability.
180
+
181
+ END OF TERMS AND CONDITIONS
182
+
183
+ APPENDIX: How to apply the Apache License to your work.
184
+
185
+ To apply the Apache License to your work, attach the following
186
+ boilerplate notice, with the fields enclosed by brackets "[]"
187
+ replaced with your own identifying information. (Don't include
188
+ the brackets!) The text should be enclosed in the appropriate
189
+ comment syntax for the file format. We also recommend that a
190
+ file or class name and description of purpose be included on the
191
+ same "printed page" as the copyright notice for easier
192
+ identification within third-party archives.
193
+
194
+ Copyright [yyyy] [name of copyright owner]
195
+
196
+ Licensed under the Apache License, Version 2.0 (the "License");
197
+ you may not use this file except in compliance with the License.
198
+ You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
199
+
200
+ Unless required by applicable law or agreed to in writing, software
201
+ distributed under the License is distributed on an "AS IS" BASIS,
202
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
203
+ See the License for the specific language governing permissions and
204
+ limitations under the License.
205
+
206
+
207
+
208
+
209
+ ## <center> [AOT](https://github.com/yoxu515/aot-benchmark/blob/main/LICENSE) </center>
210
+ <div align=center>
211
+ BSD 3-Clause License
212
+
213
+ Copyright (c) 2020, z-x-yang
214
+
215
+ Copyright (c) All rights reserved.
216
+ </div>
217
+
218
+ Redistribution and use in source and binary forms, with or without
219
+ modification, are permitted provided that the following conditions are met:
220
+
221
+ 1. Redistributions of source code must retain the above copyright notice, this
222
+ list of conditions and the following disclaimer.
223
+
224
+ 2. Redistributions in binary form must reproduce the above copyright notice,
225
+ this list of conditions and the following disclaimer in the documentation
226
+ and/or other materials provided with the distribution.
227
+
228
+ 3. Neither the name of the copyright holder nor the names of its
229
+ contributors may be used to endorse or promote products derived from
230
+ this software without specific prior written permission.
231
+
232
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
233
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
234
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
235
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
236
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
237
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
238
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
239
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
240
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
241
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
242
+
243
+
244
+ ## <center> [Gradio](https://github.com/gradio-app/gradio/blob/main/LICENSE) </center>
245
+
246
+ <div align=center>
247
+ Apache License
248
+
249
+ Version 2.0, January 2004
250
+
251
+ http://www.apache.org/licenses/
252
+ </div>
253
+
254
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
255
+
256
+ 1. Definitions.
257
+
258
+ "License" shall mean the terms and conditions for use, reproduction,
259
+ and distribution as defined by Sections 1 through 9 of this document.
260
+
261
+ "Licensor" shall mean the copyright owner or entity authorized by
262
+ the copyright owner that is granting the License.
263
+
264
+ "Legal Entity" shall mean the union of the acting entity and all
265
+ other entities that control, are controlled by, or are under common
266
+ control with that entity. For the purposes of this definition,
267
+ "control" means (i) the power, direct or indirect, to cause the
268
+ direction or management of such entity, whether by contract or
269
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
270
+ outstanding shares, or (iii) beneficial ownership of such entity.
271
+
272
+ "You" (or "Your") shall mean an individual or Legal Entity
273
+ exercising permissions granted by this License.
274
+
275
+ "Source" form shall mean the preferred form for making modifications,
276
+ including but not limited to software source code, documentation
277
+ source, and configuration files.
278
+
279
+ "Object" form shall mean any form resulting from mechanical
280
+ transformation or translation of a Source form, including but
281
+ not limited to compiled object code, generated documentation,
282
+ and conversions to other media types.
283
+
284
+ "Work" shall mean the work of authorship, whether in Source or
285
+ Object form, made available under the License, as indicated by a
286
+ copyright notice that is included in or attached to the work
287
+ (an example is provided in the Appendix below).
288
+
289
+ "Derivative Works" shall mean any work, whether in Source or Object
290
+ form, that is based on (or derived from) the Work and for which the
291
+ editorial revisions, annotations, elaborations, or other modifications
292
+ represent, as a whole, an original work of authorship. For the purposes
293
+ of this License, Derivative Works shall not include works that remain
294
+ separable from, or merely link (or bind by name) to the interfaces of,
295
+ the Work and Derivative Works thereof.
296
+
297
+ "Contribution" shall mean any work of authorship, including
298
+ the original version of the Work and any modifications or additions
299
+ to that Work or Derivative Works thereof, that is intentionally
300
+ submitted to Licensor for inclusion in the Work by the copyright owner
301
+ or by an individual or Legal Entity authorized to submit on behalf of
302
+ the copyright owner. For the purposes of this definition, "submitted"
303
+ means any form of electronic, verbal, or written communication sent
304
+ to the Licensor or its representatives, including but not limited to
305
+ communication on electronic mailing lists, source code control systems,
306
+ and issue tracking systems that are managed by, or on behalf of, the
307
+ Licensor for the purpose of discussing and improving the Work, but
308
+ excluding communication that is conspicuously marked or otherwise
309
+ designated in writing by the copyright owner as "Not a Contribution."
310
+
311
+ "Contributor" shall mean Licensor and any individual or Legal Entity
312
+ on behalf of whom a Contribution has been received by Licensor and
313
+ subsequently incorporated within the Work.
314
+
315
+ 2. Grant of Copyright License. Subject to the terms and conditions of
316
+ this License, each Contributor hereby grants to You a perpetual,
317
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
318
+ copyright license to reproduce, prepare Derivative Works of,
319
+ publicly display, publicly perform, sublicense, and distribute the
320
+ Work and such Derivative Works in Source or Object form.
321
+
322
+ 3. Grant of Patent License. Subject to the terms and conditions of
323
+ this License, each Contributor hereby grants to You a perpetual,
324
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
325
+ (except as stated in this section) patent license to make, have made,
326
+ use, offer to sell, sell, import, and otherwise transfer the Work,
327
+ where such license applies only to those patent claims licensable
328
+ by such Contributor that are necessarily infringed by their
329
+ Contribution(s) alone or by combination of their Contribution(s)
330
+ with the Work to which such Contribution(s) was submitted. If You
331
+ institute patent litigation against any entity (including a
332
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
333
+ or a Contribution incorporated within the Work constitutes direct
334
+ or contributory patent infringement, then any patent licenses
335
+ granted to You under this License for that Work shall terminate
336
+ as of the date such litigation is filed.
337
+
338
+ 4. Redistribution. You may reproduce and distribute copies of the
339
+ Work or Derivative Works thereof in any medium, with or without
340
+ modifications, and in Source or Object form, provided that You
341
+ meet the following conditions:
342
+
343
+ (a) You must give any other recipients of the Work or
344
+ Derivative Works a copy of this License; and
345
+
346
+ (b) You must cause any modified files to carry prominent notices
347
+ stating that You changed the files; and
348
+
349
+ (c) You must retain, in the Source form of any Derivative Works
350
+ that You distribute, all copyright, patent, trademark, and
351
+ attribution notices from the Source form of the Work,
352
+ excluding those notices that do not pertain to any part of
353
+ the Derivative Works; and
354
+
355
+ (d) If the Work includes a "NOTICE" text file as part of its
356
+ distribution, then any Derivative Works that You distribute must
357
+ include a readable copy of the attribution notices contained
358
+ within such NOTICE file, excluding those notices that do not
359
+ pertain to any part of the Derivative Works, in at least one
360
+ of the following places: within a NOTICE text file distributed
361
+ as part of the Derivative Works; within the Source form or
362
+ documentation, if provided along with the Derivative Works; or,
363
+ within a display generated by the Derivative Works, if and
364
+ wherever such third-party notices normally appear. The contents
365
+ of the NOTICE file are for informational purposes only and
366
+ do not modify the License. You may add Your own attribution
367
+ notices within Derivative Works that You distribute, alongside
368
+ or as an addendum to the NOTICE text from the Work, provided
369
+ that such additional attribution notices cannot be construed
370
+ as modifying the License.
371
+
372
+ You may add Your own copyright statement to Your modifications and
373
+ may provide additional or different license terms and conditions
374
+ for use, reproduction, or distribution of Your modifications, or
375
+ for any such Derivative Works as a whole, provided Your use,
376
+ reproduction, and distribution of the Work otherwise complies with
377
+ the conditions stated in this License.
378
+
379
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
380
+ any Contribution intentionally submitted for inclusion in the Work
381
+ by You to the Licensor shall be under the terms and conditions of
382
+ this License, without any additional terms or conditions.
383
+ Notwithstanding the above, nothing herein shall supersede or modify
384
+ the terms of any separate license agreement you may have executed
385
+ with Licensor regarding such Contributions.
386
+
387
+ 6. Trademarks. This License does not grant permission to use the trade
388
+ names, trademarks, service marks, or product names of the Licensor,
389
+ except as required for reasonable and customary use in describing the
390
+ origin of the Work and reproducing the content of the NOTICE file.
391
+
392
+ 7. Disclaimer of Warranty. Unless required by applicable law or
393
+ agreed to in writing, Licensor provides the Work (and each
394
+ Contributor provides its Contributions) on an "AS IS" BASIS,
395
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
396
+ implied, including, without limitation, any warranties or conditions
397
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
398
+ PARTICULAR PURPOSE. You are solely responsible for determining the
399
+ appropriateness of using or redistributing the Work and assume any
400
+ risks associated with Your exercise of permissions under this License.
401
+
402
+ 8. Limitation of Liability. In no event and under no legal theory,
403
+ whether in tort (including negligence), contract, or otherwise,
404
+ unless required by applicable law (such as deliberate and grossly
405
+ negligent acts) or agreed to in writing, shall any Contributor be
406
+ liable to You for damages, including any direct, indirect, special,
407
+ incidental, or consequential damages of any character arising as a
408
+ result of this License or out of the use or inability to use the
409
+ Work (including but not limited to damages for loss of goodwill,
410
+ work stoppage, computer failure or malfunction, or any and all
411
+ other commercial damages or losses), even if such Contributor
412
+ has been advised of the possibility of such damages.
413
+
414
+ 9. Accepting Warranty or Additional Liability. While redistributing
415
+ the Work or Derivative Works thereof, You may choose to offer,
416
+ and charge a fee for, acceptance of support, warranty, indemnity,
417
+ or other liability obligations and/or rights consistent with this
418
+ License. However, in accepting such obligations, You may act only
419
+ on Your own behalf and on Your sole responsibility, not on behalf
420
+ of any other Contributor, and only if You agree to indemnify,
421
+ defend, and hold each Contributor harmless for any liability
422
+ incurred by, or claims asserted against, such Contributor by reason
423
+ of your accepting any such warranty or additional liability.
424
+
425
+ END OF TERMS AND CONDITIONS
426
+
427
+ APPENDIX: How to apply the Apache License to your work.
428
+
429
+ To apply the Apache License to your work, attach the following
430
+ boilerplate notice, with the fields enclosed by brackets "[]"
431
+ replaced with your own identifying information. (Don't include
432
+ the brackets!) The text should be enclosed in the appropriate
433
+ comment syntax for the file format. We also recommend that a
434
+ file or class name and description of purpose be included on the
435
+ same "printed page" as the copyright notice for easier
436
+ identification within third-party archives.
437
+
438
+ Copyright [yyyy] [name of copyright owner]
439
+
440
+ Licensed under the Apache License, Version 2.0 (the "License");
441
+ you may not use this file except in compliance with the License.
442
+ You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
443
+
444
+ Unless required by applicable law or agreed to in writing, software
445
+ distributed under the License is distributed on an "AS IS" BASIS,
446
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
447
+ See the License for the specific language governing permissions and
448
+ limitations under the License.
449
+
450
+
451
+
452
+ ## <center> [GroundingDINO](https://github.com/yamy-cheng/GroundingDINO/blob/main/LICENSE) </center>
453
+
454
+ <div align="center">
455
+ Apache License
456
+
457
+ Version 2.0, January 2004
458
+
459
+ http://www.apache.org/licenses/
460
+
461
+ </div>
462
+
463
+
464
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
465
+
466
+ 1. Definitions.
467
+
468
+ "License" shall mean the terms and conditions for use, reproduction,
469
+ and distribution as defined by Sections 1 through 9 of this document.
470
+
471
+ "Licensor" shall mean the copyright owner or entity authorized by
472
+ the copyright owner that is granting the License.
473
+
474
+ "Legal Entity" shall mean the union of the acting entity and all
475
+ other entities that control, are controlled by, or are under common
476
+ control with that entity. For the purposes of this definition,
477
+ "control" means (i) the power, direct or indirect, to cause the
478
+ direction or management of such entity, whether by contract or
479
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
480
+ outstanding shares, or (iii) beneficial ownership of such entity.
481
+
482
+ "You" (or "Your") shall mean an individual or Legal Entity
483
+ exercising permissions granted by this License.
484
+
485
+ "Source" form shall mean the preferred form for making modifications,
486
+ including but not limited to software source code, documentation
487
+ source, and configuration files.
488
+
489
+ "Object" form shall mean any form resulting from mechanical
490
+ transformation or translation of a Source form, including but
491
+ not limited to compiled object code, generated documentation,
492
+ and conversions to other media types.
493
+
494
+ "Work" shall mean the work of authorship, whether in Source or
495
+ Object form, made available under the License, as indicated by a
496
+ copyright notice that is included in or attached to the work
497
+ (an example is provided in the Appendix below).
498
+
499
+ "Derivative Works" shall mean any work, whether in Source or Object
500
+ form, that is based on (or derived from) the Work and for which the
501
+ editorial revisions, annotations, elaborations, or other modifications
502
+ represent, as a whole, an original work of authorship. For the purposes
503
+ of this License, Derivative Works shall not include works that remain
504
+ separable from, or merely link (or bind by name) to the interfaces of,
505
+ the Work and Derivative Works thereof.
506
+
507
+ "Contribution" shall mean any work of authorship, including
508
+ the original version of the Work and any modifications or additions
509
+ to that Work or Derivative Works thereof, that is intentionally
510
+ submitted to Licensor for inclusion in the Work by the copyright owner
511
+ or by an individual or Legal Entity authorized to submit on behalf of
512
+ the copyright owner. For the purposes of this definition, "submitted"
513
+ means any form of electronic, verbal, or written communication sent
514
+ to the Licensor or its representatives, including but not limited to
515
+ communication on electronic mailing lists, source code control systems,
516
+ and issue tracking systems that are managed by, or on behalf of, the
517
+ Licensor for the purpose of discussing and improving the Work, but
518
+ excluding communication that is conspicuously marked or otherwise
519
+ designated in writing by the copyright owner as "Not a Contribution."
520
+
521
+ "Contributor" shall mean Licensor and any individual or Legal Entity
522
+ on behalf of whom a Contribution has been received by Licensor and
523
+ subsequently incorporated within the Work.
524
+
525
+ 2. Grant of Copyright License. Subject to the terms and conditions of
526
+ this License, each Contributor hereby grants to You a perpetual,
527
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
528
+ copyright license to reproduce, prepare Derivative Works of,
529
+ publicly display, publicly perform, sublicense, and distribute the
530
+ Work and such Derivative Works in Source or Object form.
531
+
532
+ 3. Grant of Patent License. Subject to the terms and conditions of
533
+ this License, each Contributor hereby grants to You a perpetual,
534
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
535
+ (except as stated in this section) patent license to make, have made,
536
+ use, offer to sell, sell, import, and otherwise transfer the Work,
537
+ where such license applies only to those patent claims licensable
538
+ by such Contributor that are necessarily infringed by their
539
+ Contribution(s) alone or by combination of their Contribution(s)
540
+ with the Work to which such Contribution(s) was submitted. If You
541
+ institute patent litigation against any entity (including a
542
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
543
+ or a Contribution incorporated within the Work constitutes direct
544
+ or contributory patent infringement, then any patent licenses
545
+ granted to You under this License for that Work shall terminate
546
+ as of the date such litigation is filed.
547
+
548
+ 4. Redistribution. You may reproduce and distribute copies of the
549
+ Work or Derivative Works thereof in any medium, with or without
550
+ modifications, and in Source or Object form, provided that You
551
+ meet the following conditions:
552
+
553
+ (a) You must give any other recipients of the Work or
554
+ Derivative Works a copy of this License; and
555
+
556
+ (b) You must cause any modified files to carry prominent notices
557
+ stating that You changed the files; and
558
+
559
+ (c) You must retain, in the Source form of any Derivative Works
560
+ that You distribute, all copyright, patent, trademark, and
561
+ attribution notices from the Source form of the Work,
562
+ excluding those notices that do not pertain to any part of
563
+ the Derivative Works; and
564
+
565
+ (d) If the Work includes a "NOTICE" text file as part of its
566
+ distribution, then any Derivative Works that You distribute must
567
+ include a readable copy of the attribution notices contained
568
+ within such NOTICE file, excluding those notices that do not
569
+ pertain to any part of the Derivative Works, in at least one
570
+ of the following places: within a NOTICE text file distributed
571
+ as part of the Derivative Works; within the Source form or
572
+ documentation, if provided along with the Derivative Works; or,
573
+ within a display generated by the Derivative Works, if and
574
+ wherever such third-party notices normally appear. The contents
575
+ of the NOTICE file are for informational purposes only and
576
+ do not modify the License. You may add Your own attribution
577
+ notices within Derivative Works that You distribute, alongside
578
+ or as an addendum to the NOTICE text from the Work, provided
579
+ that such additional attribution notices cannot be construed
580
+ as modifying the License.
581
+
582
+ You may add Your own copyright statement to Your modifications and
583
+ may provide additional or different license terms and conditions
584
+ for use, reproduction, or distribution of Your modifications, or
585
+ for any such Derivative Works as a whole, provided Your use,
586
+ reproduction, and distribution of the Work otherwise complies with
587
+ the conditions stated in this License.
588
+
589
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
590
+ any Contribution intentionally submitted for inclusion in the Work
591
+ by You to the Licensor shall be under the terms and conditions of
592
+ this License, without any additional terms or conditions.
593
+ Notwithstanding the above, nothing herein shall supersede or modify
594
+ the terms of any separate license agreement you may have executed
595
+ with Licensor regarding such Contributions.
596
+
597
+ 6. Trademarks. This License does not grant permission to use the trade
598
+ names, trademarks, service marks, or product names of the Licensor,
599
+ except as required for reasonable and customary use in describing the
600
+ origin of the Work and reproducing the content of the NOTICE file.
601
+
602
+ 7. Disclaimer of Warranty. Unless required by applicable law or
603
+ agreed to in writing, Licensor provides the Work (and each
604
+ Contributor provides its Contributions) on an "AS IS" BASIS,
605
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
606
+ implied, including, without limitation, any warranties or conditions
607
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
608
+ PARTICULAR PURPOSE. You are solely responsible for determining the
609
+ appropriateness of using or redistributing the Work and assume any
610
+ risks associated with Your exercise of permissions under this License.
611
+
612
+ 8. Limitation of Liability. In no event and under no legal theory,
613
+ whether in tort (including negligence), contract, or otherwise,
614
+ unless required by applicable law (such as deliberate and grossly
615
+ negligent acts) or agreed to in writing, shall any Contributor be
616
+ liable to You for damages, including any direct, indirect, special,
617
+ incidental, or consequential damages of any character arising as a
618
+ result of this License or out of the use or inability to use the
619
+ Work (including but not limited to damages for loss of goodwill,
620
+ work stoppage, computer failure or malfunction, or any and all
621
+ other commercial damages or losses), even if such Contributor
622
+ has been advised of the possibility of such damages.
623
+
624
+ 9. Accepting Warranty or Additional Liability. While redistributing
625
+ the Work or Derivative Works thereof, You may choose to offer,
626
+ and charge a fee for, acceptance of support, warranty, indemnity,
627
+ or other liability obligations and/or rights consistent with this
628
+ License. However, in accepting such obligations, You may act only
629
+ on Your own behalf and on Your sole responsibility, not on behalf
630
+ of any other Contributor, and only if You agree to indemnify,
631
+ defend, and hold each Contributor harmless for any liability
632
+ incurred by, or claims asserted against, such Contributor by reason
633
+ of your accepting any such warranty or additional liability.
634
+
635
+ END OF TERMS AND CONDITIONS
636
+
637
+ APPENDIX: How to apply the Apache License to your work.
638
+
639
+ To apply the Apache License to your work, attach the following
640
+ boilerplate notice, with the fields enclosed by brackets "[]"
641
+ replaced with your own identifying information. (Don't include
642
+ the brackets!) The text should be enclosed in the appropriate
643
+ comment syntax for the file format. We also recommend that a
644
+ file or class name and description of purpose be included on the
645
+ same "printed page" as the copyright notice for easier
646
+ identification within third-party archives.
647
+
648
+ Copyright 2020 - present, Facebook, Inc
649
+
650
+ Licensed under the Apache License, Version 2.0 (the "License");
651
+ you may not use this file except in compliance with the License.
652
+ You may obtain a copy of the License at
653
+
654
+ http://www.apache.org/licenses/LICENSE-2.0
655
+
656
+ Unless required by applicable law or agreed to in writing, software
657
+ distributed under the License is distributed on an "AS IS" BASIS,
658
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
659
+ See the License for the specific language governing permissions and
660
+ limitations under the License.
model_args.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Explanation of generator_args is in sam/segment_anything/automatic_mask_generator.py: SamAutomaticMaskGenerator
2
+ sam_args = {
3
+ 'sam_checkpoint': "ckpt/sam_vit_b_01ec64.pth",
4
+ 'model_type': "vit_b",
5
+ 'generator_args':{
6
+ 'points_per_side': 16,
7
+ 'pred_iou_thresh': 0.8,
8
+ 'stability_score_thresh': 0.9,
9
+ 'crop_n_layers': 1,
10
+ 'crop_n_points_downscale_factor': 2,
11
+ 'min_mask_region_area': 200,
12
+ },
13
+ 'gpu_id': 0,
14
+ }
15
+ aot_args = {
16
+ 'phase': 'PRE_YTB_DAV',
17
+ 'model': 'r50_deaotl',
18
+ 'model_path': 'ckpt/R50_DeAOTL_PRE_YTB_DAV.pth',
19
+ 'long_term_mem_gap': 9999,
20
+ 'max_len_long_term': 9999,
21
+ 'gpu_id': 0,
22
+ }
23
+ segtracker_args = {
24
+ 'sam_gap': 10, # the interval to run sam to segment new objects
25
+ 'min_area': 200, # minimal mask area to add a new mask as a new object
26
+ 'max_obj_num': 255, # maximal object number to track in a video
27
+ 'min_new_obj_iou': 0.8, # the background area ratio of a new object should > 80%
28
+ }
seg_track_anything.py ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import cv2
3
+ from model_args import segtracker_args,sam_args,aot_args
4
+ from PIL import Image
5
+ from aot_tracker import _palette
6
+ import numpy as np
7
+ import torch
8
+ import gc
9
+ import imageio
10
+ from scipy.ndimage import binary_dilation
11
+
12
+ def save_prediction(pred_mask,output_dir,file_name):
13
+ save_mask = Image.fromarray(pred_mask.astype(np.uint8))
14
+ save_mask = save_mask.convert(mode='P')
15
+ save_mask.putpalette(_palette)
16
+ save_mask.save(os.path.join(output_dir,file_name))
17
+
18
+ def colorize_mask(pred_mask):
19
+ save_mask = Image.fromarray(pred_mask.astype(np.uint8))
20
+ save_mask = save_mask.convert(mode='P')
21
+ save_mask.putpalette(_palette)
22
+ save_mask = save_mask.convert(mode='RGB')
23
+ return np.array(save_mask)
24
+
25
+ def draw_mask(img, mask, alpha=0.5, id_countour=False):
26
+ img_mask = np.zeros_like(img)
27
+ img_mask = img
28
+ if id_countour:
29
+ # very slow ~ 1s per image
30
+ obj_ids = np.unique(mask)
31
+ obj_ids = obj_ids[obj_ids!=0]
32
+
33
+ for id in obj_ids:
34
+ # Overlay color on binary mask
35
+ if id <= 255:
36
+ color = _palette[id*3:id*3+3]
37
+ else:
38
+ color = [0,0,0]
39
+ foreground = img * (1-alpha) + np.ones_like(img) * alpha * np.array(color)
40
+ binary_mask = (mask == id)
41
+
42
+ # Compose image
43
+ img_mask[binary_mask] = foreground[binary_mask]
44
+
45
+ countours = binary_dilation(binary_mask,iterations=1) ^ binary_mask
46
+ img_mask[countours, :] = 0
47
+ else:
48
+ binary_mask = (mask!=0)
49
+ countours = binary_dilation(binary_mask,iterations=1) ^ binary_mask
50
+ foreground = img*(1-alpha)+colorize_mask(mask)*alpha
51
+ img_mask[binary_mask] = foreground[binary_mask]
52
+ img_mask[countours,:] = 0
53
+
54
+ return img_mask.astype(img.dtype)
55
+
56
+ def create_dir(dir_path):
57
+ if os.path.isdir(dir_path):
58
+ os.system(f"rm -r {dir_path}")
59
+
60
+ os.makedirs(dir_path)
61
+
62
+ aot_model2ckpt = {
63
+ "deaotb": "./ckpt/DeAOTB_PRE_YTB_DAV.pth",
64
+ "deaotl": "./ckpt/DeAOTL_PRE_YTB_DAV",
65
+ "r50_deaotl": "./ckpt/R50_DeAOTL_PRE_YTB_DAV.pth",
66
+ }
67
+
68
+
69
+ def tracking_objects_in_video(SegTracker, input_video, input_img_seq, fps):
70
+
71
+ if input_video is not None:
72
+ video_name = os.path.basename(input_video).split('.')[0]
73
+ elif input_img_seq is not None:
74
+ file_name = input_img_seq.name.split('/')[-1].split('.')[0]
75
+ file_path = f'./assets/{file_name}'
76
+ imgs_path = sorted([os.path.join(file_path, img_name) for img_name in os.listdir(file_path)])
77
+ video_name = file_name
78
+ else:
79
+ return None, None
80
+
81
+ # create dir to save result
82
+ tracking_result_dir = f'{os.path.join(os.path.dirname(__file__), "tracking_results", f"{video_name}")}'
83
+ create_dir(tracking_result_dir)
84
+
85
+ io_args = {
86
+ 'tracking_result_dir': tracking_result_dir,
87
+ 'output_mask_dir': f'{tracking_result_dir}/{video_name}_masks',
88
+ 'output_masked_frame_dir': f'{tracking_result_dir}/{video_name}_masked_frames',
89
+ 'output_video': f'{tracking_result_dir}/{video_name}_seg.mp4', # keep same format as input video
90
+ 'output_gif': f'{tracking_result_dir}/{video_name}_seg.gif',
91
+ }
92
+
93
+ if input_video is not None:
94
+ return video_type_input_tracking(SegTracker, input_video, io_args, video_name)
95
+ elif input_img_seq is not None:
96
+ return img_seq_type_input_tracking(SegTracker, io_args, video_name, imgs_path, fps)
97
+
98
+
99
+ def video_type_input_tracking(SegTracker, input_video, io_args, video_name):
100
+
101
+ # source video to segment
102
+ cap = cv2.VideoCapture(input_video)
103
+ fps = cap.get(cv2.CAP_PROP_FPS)
104
+
105
+ # create dir to save predicted mask and masked frame
106
+ output_mask_dir = io_args['output_mask_dir']
107
+ create_dir(io_args['output_mask_dir'])
108
+ create_dir(io_args['output_masked_frame_dir'])
109
+
110
+ pred_list = []
111
+ masked_pred_list = []
112
+
113
+ torch.cuda.empty_cache()
114
+ gc.collect()
115
+ sam_gap = SegTracker.sam_gap
116
+ frame_idx = 0
117
+
118
+ with torch.cuda.amp.autocast():
119
+ while cap.isOpened():
120
+ ret, frame = cap.read()
121
+ if not ret:
122
+ break
123
+ frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
124
+
125
+ if frame_idx == 0:
126
+ pred_mask = SegTracker.first_frame_mask
127
+ torch.cuda.empty_cache()
128
+ gc.collect()
129
+ elif (frame_idx % sam_gap) == 0:
130
+ seg_mask = SegTracker.seg(frame)
131
+ torch.cuda.empty_cache()
132
+ gc.collect()
133
+ track_mask = SegTracker.track(frame)
134
+ # find new objects, and update tracker with new objects
135
+ new_obj_mask = SegTracker.find_new_objs(track_mask,seg_mask)
136
+ save_prediction(new_obj_mask, output_mask_dir, str(frame_idx).zfill(5) + '_new.png')
137
+ pred_mask = track_mask + new_obj_mask
138
+ # segtracker.restart_tracker()
139
+ SegTracker.add_reference(frame, pred_mask)
140
+ else:
141
+ pred_mask = SegTracker.track(frame,update_memory=True)
142
+ torch.cuda.empty_cache()
143
+ gc.collect()
144
+
145
+ save_prediction(pred_mask, output_mask_dir, str(frame_idx).zfill(5) + '.png')
146
+ pred_list.append(pred_mask)
147
+
148
+ print("processed frame {}, obj_num {}".format(frame_idx, SegTracker.get_obj_num()),end='\r')
149
+ frame_idx += 1
150
+ cap.release()
151
+ print('\nfinished')
152
+
153
+ ##################
154
+ # Visualization
155
+ ##################
156
+
157
+ # draw pred mask on frame and save as a video
158
+ cap = cv2.VideoCapture(input_video)
159
+ fps = cap.get(cv2.CAP_PROP_FPS)
160
+ width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
161
+ height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
162
+ num_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
163
+
164
+ fourcc = cv2.VideoWriter_fourcc(*"mp4v")
165
+ # if input_video[-3:]=='mp4':
166
+ # fourcc = cv2.VideoWriter_fourcc(*"mp4v")
167
+ # elif input_video[-3:] == 'avi':
168
+ # fourcc = cv2.VideoWriter_fourcc(*"MJPG")
169
+ # # fourcc = cv2.VideoWriter_fourcc(*"XVID")
170
+ # else:
171
+ # fourcc = int(cap.get(cv2.CAP_PROP_FOURCC))
172
+ out = cv2.VideoWriter(io_args['output_video'], fourcc, fps, (width, height))
173
+
174
+ frame_idx = 0
175
+ while cap.isOpened():
176
+ ret, frame = cap.read()
177
+ if not ret:
178
+ break
179
+
180
+ frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
181
+ pred_mask = pred_list[frame_idx]
182
+ masked_frame = draw_mask(frame, pred_mask)
183
+ cv2.imwrite(f"{io_args['output_masked_frame_dir']}/{str(frame_idx).zfill(5)}.png", masked_frame[:, :, ::-1])
184
+
185
+ masked_pred_list.append(masked_frame)
186
+ masked_frame = cv2.cvtColor(masked_frame,cv2.COLOR_RGB2BGR)
187
+ out.write(masked_frame)
188
+ print('frame {} writed'.format(frame_idx),end='\r')
189
+ frame_idx += 1
190
+ out.release()
191
+ cap.release()
192
+ print("\n{} saved".format(io_args['output_video']))
193
+ print('\nfinished')
194
+
195
+ # save colorized masks as a gif
196
+ imageio.mimsave(io_args['output_gif'], masked_pred_list, fps=fps)
197
+ print("{} saved".format(io_args['output_gif']))
198
+
199
+ # zip predicted mask
200
+ os.system(f"zip -r {io_args['tracking_result_dir']}/{video_name}_pred_mask.zip {io_args['output_mask_dir']}")
201
+
202
+ # manually release memory (after cuda out of memory)
203
+ del SegTracker
204
+ torch.cuda.empty_cache()
205
+ gc.collect()
206
+
207
+ return io_args['output_video'], f"{io_args['tracking_result_dir']}/{video_name}_pred_mask.zip"
208
+
209
+
210
+ def img_seq_type_input_tracking(SegTracker, io_args, video_name, imgs_path, fps):
211
+
212
+ # create dir to save predicted mask and masked frame
213
+ output_mask_dir = io_args['output_mask_dir']
214
+ create_dir(io_args['output_mask_dir'])
215
+ create_dir(io_args['output_masked_frame_dir'])
216
+
217
+ pred_list = []
218
+ masked_pred_list = []
219
+
220
+ torch.cuda.empty_cache()
221
+ gc.collect()
222
+ sam_gap = SegTracker.sam_gap
223
+ frame_idx = 0
224
+
225
+ with torch.cuda.amp.autocast():
226
+ for img_path in imgs_path:
227
+ frame_name = os.path.basename(img_path).split('.')[0]
228
+ frame = cv2.imread(img_path)
229
+ frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
230
+
231
+ if frame_idx == 0:
232
+ pred_mask = SegTracker.first_frame_mask
233
+ torch.cuda.empty_cache()
234
+ gc.collect()
235
+ elif (frame_idx % sam_gap) == 0:
236
+ seg_mask = SegTracker.seg(frame)
237
+ torch.cuda.empty_cache()
238
+ gc.collect()
239
+ track_mask = SegTracker.track(frame)
240
+ # find new objects, and update tracker with new objects
241
+ new_obj_mask = SegTracker.find_new_objs(track_mask,seg_mask)
242
+ save_prediction(new_obj_mask, output_mask_dir, f'{frame_name}_new.png')
243
+ pred_mask = track_mask + new_obj_mask
244
+ # segtracker.restart_tracker()
245
+ SegTracker.add_reference(frame, pred_mask)
246
+ else:
247
+ pred_mask = SegTracker.track(frame,update_memory=True)
248
+ torch.cuda.empty_cache()
249
+ gc.collect()
250
+
251
+ save_prediction(pred_mask, output_mask_dir, f'{frame_name}.png')
252
+ pred_list.append(pred_mask)
253
+
254
+ print("processed frame {}, obj_num {}".format(frame_idx, SegTracker.get_obj_num()),end='\r')
255
+ frame_idx += 1
256
+ print('\nfinished')
257
+
258
+ ##################
259
+ # Visualization
260
+ ##################
261
+
262
+ # draw pred mask on frame and save as a video
263
+ height, width = pred_list[0].shape
264
+ fourcc = cv2.VideoWriter_fourcc(*"mp4v")
265
+
266
+ out = cv2.VideoWriter(io_args['output_video'], fourcc, fps, (width, height))
267
+
268
+ frame_idx = 0
269
+ for img_path in imgs_path:
270
+ frame_name = os.path.basename(img_path).split('.')[0]
271
+ frame = cv2.imread(img_path)
272
+ frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
273
+
274
+ pred_mask = pred_list[frame_idx]
275
+ masked_frame = draw_mask(frame, pred_mask)
276
+ masked_pred_list.append(masked_frame)
277
+ cv2.imwrite(f"{io_args['output_masked_frame_dir']}/{frame_name}.png", masked_frame[:, :, ::-1])
278
+
279
+ masked_frame = cv2.cvtColor(masked_frame,cv2.COLOR_RGB2BGR)
280
+ out.write(masked_frame)
281
+ print('frame {} writed'.format(frame_name),end='\r')
282
+ frame_idx += 1
283
+ out.release()
284
+ print("\n{} saved".format(io_args['output_video']))
285
+ print('\nfinished')
286
+
287
+ # save colorized masks as a gif
288
+ imageio.mimsave(io_args['output_gif'], masked_pred_list, fps=fps)
289
+ print("{} saved".format(io_args['output_gif']))
290
+
291
+ # zip predicted mask
292
+ os.system(f"zip -r {io_args['tracking_result_dir']}/{video_name}_pred_mask.zip {io_args['output_mask_dir']}")
293
+
294
+ # manually release memory (after cuda out of memory)
295
+ del SegTracker
296
+ torch.cuda.empty_cache()
297
+ gc.collect()
298
+
299
+
300
+ return io_args['output_video'], f"{io_args['tracking_result_dir']}/{video_name}_pred_mask.zip"