2024-07-05
{
var data = [{
values: [3, 8, 7, 22, 60],
labels: ["wikipedia", "Books1", "Books2", "Webtext2", "Common Crawl"],
textinfo: "label+percent",
type: "pie",
marker: {
colors: ["lightcyan", "cyan", "royalblue", "darkblue", "gold"]
}
}];
var layout = {
template: 'plotly_light',
paper_bgcolor: "rgba(0,0,0,0)",
plot_bgcolor: "rgba(0,0,0,0)",
font: {
size: 26,
color: "white"
},
margin: {"t": 0, "b": 0, "l": 0, "r": 0},
showlegend: false
};
const div = document.createElement('div');
Plotly.newPlot(div, data, layout,{displayModeBar: false});
return div;
}
Underrepresentation on the web means less accuracy and more hallucinations!
K classes, worker \(j\) with weight \(w_j>0\) answers label \(y_i^j\)
\[ \hat{y}_i^{WMV} = \underset{k\in[K]}{\arg\max}\sum_{j} w_j \mathbf{1}(y_i^j=k) \]
Method | MV | NS | DS | GLAD |
---|---|---|---|---|
Label Recovery Accuracy | 0.75 | 0.75 | 0.89 | 0.72 |
Angelova (2004):“Difficult examples are those which obstruct the learning process or mislead the learning algorithm or those which are impossible to reconcile with the rest of the examples”
top-3:
Also: boredom (0.12) and confusion (0.05)
top-3:
Also: confusion (0.12) and anxiety (0.03)
top-3:
Also: Surprise (0.1) and boredom (0.04)
User-user recommendation system including sentiment analysis on customer reviews (ratings + text)
recommend items based on user-user similarity (adjusted cosine)
\[ \mathrm{Adjcos}(u_i, u_j) = \frac{(u_i-\mu_i)^\top (u_j-\mu_j)}{\|u_i-\mu_i\|\|u_j-\mu_j\|} \]
Imbalance 80% of observations are represented by 10% of total votes
\[Thank\ you\]