Datasets and code of the paper: 'A personal model of trumpery: Linguistic deception detection in a real-world high-stakes setting'
Paper abstract
Language use differs between truthful and deceptive statements, but not all differences are consistent across people and contexts, complicating the identification of deceit in individuals. By relying on fact-checked tweets, we show in three studies (Study 1: 469 tweets; Study 2: 484 tweets; Study 3: 24 models) how well personalized linguistic deception detection performs by developing the first deception model tailored to an individual: the 45th US president. First, we found substantial linguistic differences between factually correct and incorrect tweets. We developed a quantitative model and achieved 73% overall accuracy. Second, we tested out-of-sample prediction and achieved 74% overall accuracy. Third, we compared our personalized model to linguistic models previously reported in the literature. Our model outperformed existing models by 5pp, demonstrating the added value of personalized linguistic analysis in real-world settings. Our results indicate that factually incorrect tweets by the US president are not random mistakes of the sender.
Additional details
The paper is published in Psychological Science (DOI 10.1177/09567976211015941).
Datasets and R code are provided.
Explanation on how to use the datasets and R code are provided in the methods section of the paper and the supplementary materials.
Funded by: European Research Council Starting grant 638408 Bayesian Markets. For more details, see https://cordis.europa.eu/project/id/638408
For a website with more background information on this paper, please see https://apersonalmodeloftrumpery.com/