Amazon Releases Dataset To Detect Counterfactual Phrases For Products
Product retrieval systems, like the one in the Amazon Store, often use the text of product reviews to improve the results of queries. But such systems can be misled by counterfactual statements, which describe events that did not or cannot take place.
Counterfactual statements in reviews are rare, but they can lead to frustrating experiences for customers — as when, for instance, a search for “red shirt” pulls up a product whose reviews make clear that it is not available in red.
To help ease such complications, Amazon has recently released a new dataset publicly to help train machine learning models to recognize counterfactual statements.
At the time this project was started, there were no large-scale datasets that covered counterfactual statements in product reviews in multiple languages. Amazon decided to annotate sentences selected from product reviews for three languages: English, German, and Japanese. Sentences that express counterfactuals are rare in natural-language texts — only 1-2% of sentences, according to one study. Therefore, simply annotating a randomly selected set of sentences would yield a highly imbalanced dataset with a sparse training signal.
Counterfactual statements can be broken into two parts: a statement about the event (if it were available in red), also referred to as the antecedent, and the consequence of the event (I would have bought this shirt), referred to as the consequent.
To identify counterfactual statements, Amazon specified certain relationships between antecedent and consequent in the presence of certain clue words. With the help of professional linguists for all the languages under consideration, they compiled a set of such specifications for conjunctive normal sentences, conjunctive converse sentences, modal propositional sentences, sentences with clue words like “wished”, “hoped”, and the like.
However, not all sentences that contain counterfactual clues express counterfactuals. So, professional linguists also reviewed the selected sentences to determine whether they truly expressed counterfactuals.
Selecting sentences based on precompiled clue word lists could, however, bias the data. Hence, they also selected sentences that do not contain clue words but are highly similar to sentences that do. As a measure of similarity, Amazon used the proximity of sentence embeddings — vector representations of the sentences — computed by a pretrained BERT model.
Counterfactual detection can be modelled as a binary classification task: given a sentence, classify it as positive if it expresses a counterfactual statement and negative otherwise. The research team experimented with different methods for representing sentences, such as bag-of-words representations, static word-embedding-based representations, and contextualized word-embedding-based representations. Different classification algorithms were also evaluated, ranging from logistic regression and support vector machines to multilayer perceptrons. We found that a cross-lingual language model (XLM) based on the RoBERTa model and fine-tuned on the counterfactually annotated sentences performed best overall.
To study the relationship between the dataset and existing datasets, it was trained on a counterfactual detection model and evaluated on the public dataset for a counterfactual-detection competition, which contains counterfactual statements from news articles. Models trained on Amazon’s dataset performed poorly on the competition dataset, indicating that the counterfactual statements in product reviews — the focus of our dataset — are significantly different from those in news articles.
As a simple baseline, Amazon first trained a model on English training data and then applied it to German and Japanese test data, translated into English via a machine translation system. However, this simple baseline resulted in poor performance, indicating that counterfactuals are highly language-specific, so more principled approaches will be needed for their cross-lingual transfer.
The team is still investigating filtration by other types of linguistic constructions, besides counterfactuals, and expanding the detection models to other languages.
Subscribe to our Newsletter
Get the latest updates and relevant offers by sharing your email.