Stanford AI

university

https://www.ai.stanford.edu

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

orrzohar authored a paper 17 days ago

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

nicholswang authored a paper 18 days ago

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Asap7772 authored a paper 25 days ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

View all activity

Stanford's activity

nicholswang

authored a paper 10 days ago

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published 11 days ago • 21

nicholswang

authored a paper 18 days ago

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published 21 days ago • 49

nicholswang

authored 10 papers about 2 months ago

Action Sensitivity Learning for Temporal Action Localization

Paper • 2305.15701 • Published May 25, 2023

Whitening-based Contrastive Learning of Sentence Embeddings

Paper • 2305.17746 • Published May 28, 2023

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Paper • 2305.18010 • Published May 29, 2023

Describing Differences in Image Sets with Natural Language

Paper • 2312.02974 • Published Dec 5, 2023 • 14

Clustering based Point Cloud Representation Learning for 3D Analysis

Paper • 2307.14605 • Published Jul 27, 2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

Paper • 2307.16377 • Published Jul 31, 2023

Taylor658

posted an update 2 months ago

Post

553

🌐 The Stanford Institute for Human-Centered AI (https://aiindex.stanford.edu/vibrancy/) has released its 2024 Global AI Vibrancy Tool, a way to explore and compare AI progress across 36 countries.

📊 It measures progress across the 8 broad pillars of R&D, Responsible AI, Economy, Education, Diversity, Policy and Governance, Public Opinion and Infrastructure. (Each of these pillars have a number of Sub Indices)

📈 As a whole it is not surprising that the USA was at the top in terms of overall score as of 2023 (AI investment activity is a large part of the economic pillar for example and that is a large part of the overall USA ranking) but drilling in to more STRATEGIC Macro pillars like Education, Infrastructure or R&D reveal interesting growth patterns in Asia (particularly China) and Western Europe that I suspect the 2024 metrics will bear out.

🤖 Hopefully the 2024 Global Vibrancy ranking will break out AI and ML verticals like Computer Vision or NLP and or the AI Agent space as that may also from a global macro level give indications of what is to come globally for AI in 2025.

Taylor658

posted an update 2 months ago

Post

806

🤖💻 Function Calling is a key component of Agent workflows. To call functions, an LLM needs a way to interact with other systems and run code. This usually means connecting it to a runtime environment that can handle function calls, data, and security.

Per the Berkeley Function-Calling Leaderboard there are only 2 fully open source models (The other 2 in the top 20 that are not closed source have cc-by-nc-4.0 licenses) out of the top 20 models that currently have function calling built in as of 17 Nov 2024.
https://gorilla.cs.berkeley.edu/leaderboard.html

The 2 Open Source Models out of the top 20 that currently support function calling are:

meetkai/functionary-medium-v3.1
Team-ACE/ToolACE-8B

This is a both a huge disadvantage AND an opportunity for the Open Source community as Enterprises, Small Business, Government Agencies etc. quickly adopt Agents and Agent workflows over the next few months. Open Source will have a lot of catching up to do as Enterprises will be hesitant to switch from the closed source models that they may initially build their Agent workflows on in the next few months to an open source alternative later.

Hopefully more open source models will support function calling in the near future.

leamhadzic

authored a paper 2 months ago

HourVideo: 1-Hour Video-Language Understanding

Paper • 2411.04998 • Published Nov 7, 2024 • 1

Taylor658

posted an update 3 months ago

Post

2277

The Mystery Bot 🕵️‍♂️ saga I posted about from earlier this week has been solved...🤗

Cohere for AI has just announced its open source Aya Expanse multilingual model. The Initial release supports 23 languages with more on the way soon.🌌 🌍

You can also try Aya Expanse via SMS on your mobile phone using the global WhatsApp number or one of the initial set of country specific numbers listed below.⬇️

🌍WhatsApp - +14313028498
Germany - (+49) 1771786365
USA – +18332746219
United Kingdom — (+44) 7418373332
Canada – (+1) 2044107115
Netherlands – (+31) 97006520757
Brazil — (+55) 11950110169
Portugal – (+351) 923249773
Italy – (+39) 3399950813
Poland - (+48) 459050281

1 reply

Taylor658

posted an update 4 months ago

Post

2520

Spent the weekend testing out some prompts with 🕵️‍♂️Mystery Bot🕵️‍♂️ on my mobile... exciting things are coming soon for the following languages:

🌐Arabic, Chinese, Czech, Dutch, English French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese!🌐

Taylor658

posted an update 5 months ago

Post

1391

📢 2024 CVPR Videos Are Now Available! 🎥

CVPR conference keynotes, panels, posters, workshops, and other content are now available.

⬇️
https://cvpr.thecvf.com/Conferences/2024/Videos

Taylor658

posted an update 5 months ago

Post

2351

💡Andrew Ng recently gave a strong defense of Open Source AI models and the need to slow down legislative efforts in the US and the EU to restrict innovation in Open Source AI at Stanford GSB.

🎥See video below
https://youtu.be/yzUdmwlh1sQ?si=bZc690p8iubolXm_

4 replies

Taylor658

posted an update 7 months ago

Post

810

Researchers from Auburn University and the University of Alberta have explored the limitations of Vision Language Models (VLMs) in their recently published paper titled "Vision language models are blind." ( Vision language models are blind (2407.06581))

Key Findings:🔍
VLMs, including GPT-4o, Gemini-1.5 Pro, Claude-3 Sonnet, and Claude-3.5 Sonnet, struggle with basic visual tasks.
Tasks such as identifying where lines intersect or counting basic shapes are challenging for these models.
The authors noted, "The shockingly poor performance of four state-of-the-art VLMs suggests their vision is, at best, like of a person with myopia seeing fine details as blurry, and at worst, like an intelligent person that is blind making educated guesses"(Vision Language Models Are Blind; 2024).

Human-like Myopia? 👓
VLMs may have a blind spot similar to human myopia.
This limitation makes it difficult for VLMs to perceive details.
Suggests a potential parallel between human and machine vision limitations.

Technical Details: 🔧
The researchers created a new benchmark called BlindTest.
BlindTest consists of simple visual tasks to evaluate VLMs low-level vision capabilities.
Four VLMs were assessed using BlindTest.
Many shortcomings were revealed in the models ability to process basic visual information.

Learn More: 🖼️
For a deeper dive into this research, check out the project page: https://vlmsareblind.github.io/

AI & ML interests

Recent Activity

Team members 365

Stanford's activity