File size: 5,002 Bytes
23d5fd6
 
 
 
 
 
 
 
 
 
eddd71e
 
58bebe5
0949d97
75c448d
58bebe5
75c448d
 
b59710b
58bebe5
23d5fd6
 
 
c5c01db
2d0828a
23d5fd6
 
 
 
 
 
 
 
 
 
 
6d62777
23d5fd6
 
 
 
d655d2e
23d5fd6
30f1a44
 
 
d6e5fb7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98349c6
d6e5fb7
23d5fd6
d6e5fb7
23d5fd6
 
 
cd04579
23d5fd6
c101ae7
ddd8a2e
 
23d5fd6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
language:
- en
tags:
- roberta
- marketing mix
- multi-label
- classification
- microblog
- tweets

widget:
- text: "Best cushioning ever!!! 🤗🤗🤗 my zoom vomeros are the bomb🏃🏽‍♀️💨!!! @nike #run #training"
- text: "Why is @BestBuy always sold-out of Apple's new airpods in their online shop 🤯😡?"
- text: "They’re closing the @Aldo at the Lehigh Vally Mall and KOP 😭"
- text: "@Sony’s XM3’s ain’t as sweet as my bro’s airpod pros but got a real steal 🤑 the other day #deal #headphonez" 
- text: "Nike needs to sponsor more e-sports atheletes with Air Jordans! #nike #esports"
- text: "Say what you want about @Abercrombie's 90s shirtless males ads, they made dang good woll sweaters back in the day. This is one of 3 I have from the late 90s."
- text: "To celebrate this New Year, @Nordstrom is DOUBLING all donations up to $25,000! 🎉 Your donation will help us answer 2X the calls, texts, and chats that come in, and allow us to train 2X more volunteers!"
- text: "It's inspiring to see religious leaders speaking up for workers' rights and fair wages. Every voice matters in the #FightFor15! 💪🏽✊🏼 #Solidarity #WorkersRights"
---
# Model Card for: mmx_classifier_microblog_ENv02
Multi-label classifier that identifies which marketing mix variable(s) a microblog post pertains to.

Version: 0.2 from August 16, 2023

## Model Details
You can use this classifier to determine which of the 4P's of marketing, also known as marketing mix variables, a microblog post (e.g., Tweet) pertains to:

1. Product
2. Place
3. Price
4. Promotion

### Model Description
This classifier is a fine-tuned checkpoint of [cardiffnlp/twitter-roberta-large-2022-154m] (https://huggingface.co./cardiffnlp/twitter-roberta-large-2022-154m). 
It was trained on 15K Tweets that mentioned at least one of 699 brands. The Tweets were first cleaned and then labeled using OpenAI's GPT4. 

Because this is a multi-label classification problem, we use binary cross-entropy (BCE) with logits loss for the fine-tuning. We basically combine a sigmoid layer with BCELoss in a single class.
To obtain the probabilities for each label (i.e., marketing mix variable), you need to "push" the predictions through a sigmoid function. This is already done in the accompanying python notebook.

***IMPORTANT*** At the time of writing this description, Huggingface's pipeline did not support multi-label classifiers.

### Working Paper
Download the working paper from SSRN: ["Creating Synthetic Experts with Generative AI"](https://papers.ssrn.com/abstract_id=4542949)

### Quickstart
```python
# Imports
import pandas as pd, numpy as np, warnings, torch, re
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from bs4 import BeautifulSoup
warnings.filterwarnings("ignore", category=UserWarning, module='bs4')

# Helper Functions
def clean_and_parse_tweet(tweet):
    tweet = re.sub(r"https?://\S+|www\.\S+", " URL ", tweet)
    parsed = BeautifulSoup(tweet, "html.parser").get_text() if "filename" not in str(BeautifulSoup(tweet, "html.parser")) else None
    return re.sub(r" +", " ", re.sub(r'^[.:]+', '', re.sub(r"\\n+|\n+", " ", parsed or tweet)).strip()) if parsed else None

def predict_tweet(tweet, model, tokenizer, device, threshold=0.5):
    inputs = tokenizer(tweet, return_tensors="pt", padding=True, truncation=True, max_length=128).to(device)
    probs = torch.sigmoid(model(**inputs).logits).detach().cpu().numpy()[0]
    return probs, [id2label[i] for i, p in enumerate(probs) if id2label[i] in {'Product', 'Place', 'Price', 'Promotion'} and p >= threshold]

# Setup
device = "mps" if torch.backends.mps.is_built() and torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
synxp = "dmr76/mmx_classifier_microblog_ENv02"
model = AutoModelForSequenceClassification.from_pretrained(synxp).to(device)
tokenizer = AutoTokenizer.from_pretrained(synxp)
id2label = model.config.id2label

# ---->>> Define your Tweet  <<<----
tweet = "Best cushioning ever!!! 🤗🤗🤗  my zoom vomeros are the bomb🏃🏽‍♀️💨!!!  \n @nike #run #training https://randomurl.ai"

# Clean and Predict
cleaned_tweet = clean_and_parse_tweet(tweet)
probs, labels = predict_tweet(cleaned_tweet, model, tokenizer, device)

# Print Labels and Probabilities
print("Please don't forget to cite the paper: https://ssrn.com/abstract=4542949")
print(labels, probs)
```
*Predict thousands tweets with the ***batch processing python notebook***, available in my* [GitHub Repository](https://github.com/dringel/Synthetic-Experts)

### Citation
Please cite the following reference if you use synthetic experts in your work:
```
Ringel, Daniel, Creating Synthetic Experts with Generative Artificial Intelligence (July 15, 2023). Available at SSRN: https://ssrn.com/abstract=4542949
```

### Additional Ressources
[www.synthetic-experts.ai](http://www.synthetic-experts.ai)  
[GitHub Repository](https://github.com/dringel/Synthetic-Experts)