Using context, not content, to improve information quality in algorithmic newsfeeds
- Stephan Lewandowsky
Abstract
How can we determine the quality of information online? Conventionally, this is done by inspecting the content itself, such as the storyline or the actors and their relations. Modern search engines use natural language-processing tools that analyse content. We call those “endogenous” cues to information quality. Although endogenous cues are valuable, they have limitations, such as the inability to differentiate between extremist content and counterextremist content because both types of messages tend to be tagged with similar keywords. Relying on content also makes endogenous cues potentially prone to abuse for censorship purposes. By contrast, “exogenous” cues rely on the context—not content—of information to assess quality. A famous example of the use of exogenous cues is Google’s PageRank algorithm, which takes network centrality as a key indicator of quality: Well-connected websites appear higher up in search results, irrespective of their content. We explore a number of possible exogenous cues using two large Twitter/X datasets. We use NewsGuard scores to provide estimates of the quality of domains being shared by users and develop an ensemble of exogenous cues, such as cognitive centrality and the skew of distributions of shares, that can predict quality without any analysis of content. We then embed those cues into a standard newsfeed recommender system that is based on collaborative filtering by boosting recommendations based on the quality signal provided by the ensemble of exogenous cues. Using the same Twitter/X dataset, we show that the modified recommender system can continue to satisfy user preferences while enhancing the quality of recommendations. The results provide an existence proof for the design of newsfeed algorithms that provide users with higher-quality information without getting entangled in content analysis.