Webis Group

university

https://webis.de/

webis_de

webis-de

Activity Feed Request to join this org

AI & ML interests

Information Retrieval, Natural Language Processing, Text Mining, Argumentation

Recent Activity

verAPPelt updated a dataset about 1 month ago

webis/tip-of-my-tongue-known-item-search

verAPPelt updated a dataset about 1 month ago

webis/tip-of-my-tongue-known-item-search-triplets

christopher new activity about 2 months ago

webis/tip-of-my-tongue-known-item-search-triplets:[bot] Conversion to Parquet

View all activity

webis's activity

cschroeder

posted an update about 3 hours ago

Post

113

🔥 𝐅𝐢𝐧𝐚𝐥 𝐂𝐚𝐥𝐥 𝐚𝐧𝐝 𝐃𝐞𝐚𝐝𝐥𝐢𝐧𝐞 𝐄𝐱𝐭𝐞𝐧𝐬𝐢𝐨𝐧: Survey on Data Annotation and Active Learning

Short summary: We need your support for a web survey in which we investigate how recent advancements in natural language processing, particularly LLMs, have influenced the need for labeled data in supervised machine learning — with a focus on, but not limited to, active learning. See the original post for details.

➡️ Extended Deadline: January 26th, 2025.
Please consider participating or sharing our survey! (If you have any experience with supervised learning in natural language processing, you are eligible to participate in our survey.)

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271

cschroeder

posted an update 15 days ago

Post

384

Here’s just one of the many exciting questions from our survey. If these topics resonate with you and you have experience working on supervised learning with text (i.e., supervised learning in Natural Language Processing), we warmly invite you to participate!

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271
Estimated time required: 5–15 minutes
Deadline for participation: January 12, 2025

—

❤️ We’re seeking responses from across the globe! If you know 1–3 people who might qualify for this survey—particularly those in different regions—please share it with them. We’d really appreciate it!

#NLProc #ActiveLearning #ML

2 replies

cschroeder

posted an update 26 days ago

Post

360

💡𝗟𝗼𝗼𝗸𝗶𝗻𝗴 𝗳𝗼𝗿 𝘀𝘂𝗽𝗽𝗼𝗿𝘁: 𝗛𝗮𝘃𝗲 𝘆𝗼𝘂 𝗲𝘃𝗲𝗿 𝗵𝗮𝗱 𝘁𝗼 𝗼𝘃𝗲𝗿𝗰𝗼𝗺𝗲 𝗮 𝗹𝗮𝗰𝗸 𝗼𝗳 𝗹𝗮𝗯𝗲𝗹𝗲𝗱 𝗱𝗮𝘁𝗮 𝘁𝗼 𝗱𝗲𝗮𝗹 𝘄𝗶𝘁𝗵 𝗮𝗻 𝗡𝗟𝗣 𝘁𝗮𝘀𝗸?

Are you working on Natural Language Processing tasks and have faced the challenge of a lack of labeled data before? 𝗪𝗲 𝗮𝗿𝗲 𝗰𝘂𝗿𝗿𝗲𝗻𝘁𝗹𝘆 𝗰𝗼𝗻𝗱𝘂𝗰𝘁𝗶𝗻𝗴 𝗮 𝘀𝘂𝗿𝘃𝗲𝘆 to explore the strategies used to address this bottleneck, especially in the context of recent advancements, including but not limited to large language models.

The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community.

👉 With only 5–15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP community to overcome a lack of labeled data.

❤️How you can help even more: If you know others working on supervised learning and NLP, please share this survey with them—we’d really appreciate it!

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271
Estimated time required: 5–15 minutes
Deadline for participation: January 12, 2025

#NLP #ML

christopher

posted an update about 1 month ago

Post

1601

The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot

3 replies

christopher

posted an update about 1 month ago

Post

2352

The Lichess database of games, puzzles, and engine evaluations is now on the Hub: https://huggingface.co./Lichess

Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! ♟️ 🤗

- Lichess/positions-datasets-66f50837db5cd3287d60d489
- Lichess/games-datasets-66f508df78f4b43e1bb2d353

verAPPelt

updated 2 datasets about 1 month ago

webis/tip-of-my-tongue-known-item-search

Viewer • Updated Nov 30, 2024 • 1.28M • 42 • 1

webis/tip-of-my-tongue-known-item-search-triplets

Viewer • Updated Nov 30, 2024 • 32.6k • 57 • 2

cschroeder

posted an update about 2 months ago

Post

1089

🐣 New release: small-text v2.0.0.dev1

With small language models on the rise, the new version of small-text has been long overdue! Despite the generative AI hype, many real-world tasks still rely on supervised learning—which is reliant on labeled data.

Highlights:
- Four new query strategies: Try even more combinations than before.
- Vector indices integration: HNSW and KNN indices are now available via a unified interface and can easily be used within your code.
- Simplified installation: We dropped the torchtext dependency and cleaned up a lot of interfaces.

Github: https://github.com/webis-de/small-text

👂 Try it out for yourself! We are eager to hear your feedback.
🔧 Share your small-text applications and experiments in the newly added showcase section.
🌟 Support the project by leaving a star on the repo!

#activelearning #nlproc #machinelearning

christopher

in webis/tip-of-my-tongue-known-item-search-triplets about 2 months ago

[bot] Conversion to Parquet

#2 opened about 2 months ago by

parquet-converter

Librarian Bot: Add language metadata for dataset

#1 opened about 2 months ago by

librarian-bot

mam10eks

updated a dataset about 2 months ago

webis/tip-of-my-tongue-known-item-search-triplets

Viewer • Updated Nov 30, 2024 • 32.6k • 57 • 2

christopher

in webis/tip-of-my-tongue-known-item-search 2 months ago

[bot] Conversion to Parquet

#1 opened 2 months ago by

parquet-converter

cschroeder

posted an update 2 months ago

Post

697

#EMNLP2024 is happening soon! Unfortunately, I will not be on site, but I will present our poster virtually on Wednesday, Nov 13 (7:45 EST / 13:45 CEST) in Virtual Poster Session 2.

In this work, we leverage self-training in an active learning loop in order to train small language models with even less data. Hope to see you there!

1 reply

mam10eks

updated a dataset 2 months ago

webis/tip-of-my-tongue-known-item-search

Viewer • Updated Nov 30, 2024 • 1.28M • 42 • 1

mspl

updated 2 models 4 months ago

webis/acl2024-aae-dialect-classification

Updated Sep 10, 2024 • 1

webis/acl2024-social-bias-classification

Updated Sep 10, 2024

cschroeder

posted an update 4 months ago

Post

401

⚖️ 𝐀𝐈 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐢𝐬 𝐂𝐨𝐩𝐲𝐫𝐢𝐠𝐡𝐭 𝐈𝐧𝐟𝐫𝐢𝐧𝐠𝐞𝐦𝐞𝐧𝐭

This bold claim is not my opinion, but it has been made in a recent "report" of a group, whose stance is recognizable in their name. It is roughly translated as "Authors' Rights Initiative". They published a report which was also presented before the EU Parliament according to the LinkedIn post below.

I am not really interested in politics, but as an EU citizen I am of course somewhat interested in a reasonable and practical version of the EU AI Act. Not saying there should not be rules around data and AI, but this report is obviously very biased towards one side.

While I think the report itself does not deserve attention, I post it in the hope that you find more examples, where they did not address the issue adequately. Feel free to add to my LinkedIn posts (where the original authors will see it) or here.

[en] Executive summary: https://urheber.info/media/pages/diskurs/ai-training-is-copyright-infringement/3b900058e6-1725460935/executive-summary_engl_final_29-08-2024.pdf
[de] Full report: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4946214

LinkedIn: https://www.linkedin.com/posts/activity-7238912869268959232-6cFx

fschlatt

updated 3 models 4 months ago

AI & ML interests

Recent Activity

Team members 29

webis's activity

[bot] Conversion to Parquet

Librarian Bot: Add language metadata for dataset

[bot] Conversion to Parquet