Ayaan Sharif

Ayaan-Sharif

AI & ML interests

NLP, LLM, TEXT, Languages

Recent Activity

liked a model 11 days ago
MiniMaxAI/MiniMax-VL-01
liked a dataset 14 days ago
DAMO-NLP-SG/multimodal_textbook
View all activity

Organizations

Hugging Face Discord Community's profile picture

Ayaan-Sharif's activity

replied to sanchit-gandhi's post 24 days ago
view reply

what if we segment the audio first and then transcribe tho its some extra compute to throw in but imo it would resul tin better result !

reacted to vladbogo's post with 👍 about 1 month ago
view post
Post
Panda-70M is a new large-scale video dataset comprising 70 million high-quality video clips, each paired with textual captions, designed to be used as pre-training for video understanding tasks.

Key Points:
* Automatic Caption Generation: Utilizes an automatic pipeline with multiple cross-modality teacher models to generate captions for video clips.
* Fine-tuned Caption Selection: Employs a fine-tuned retrieval model to select the most appropriate caption from multiple candidates for each video clip.
* Improved Performance: Pre-training on Panda-70M shows significant performance gains in video captioning, text-video retrieval, and text-driven video generation.

Paper: Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers (2402.19479)
Project page: https://snap-research.github.io/Panda-70M/
Code: https://github.com/snap-research/Panda-70M

Congrats to the authors @tschen , @aliaksandr-siarohin et al. for their work!
  • 1 reply
·
New activity in tencent/HunyuanVideo about 2 months ago

multi gpu setup when ?

2
#5 opened about 2 months ago by
Ayaan-Sharif