TSPA: Detecting Coordinated Information Operations

Overview

Coordinated information operations (IO) on social media are most commonly detected through platform-dependent behavioural traces such as shared retweets, URLs, or hashtags. But those signals vary across platforms and across the differing playbooks of state actors, and they overlook the core objective of an operation: propagating a narrative. My thesis argues that greater emphasis should be placed on what is communicated and when, rather than on *how* users interact.

The answer is Temporal Semantic Proximity Analysis (TSPA): a language-independent, platform-agnostic method that links accounts solely through the semantic and temporal proximity of their posts, using multilingual sentence embeddings, sliding-window cosine similarity, and Leiden community detection. Written at Maastricht University's Department of Advanced Computing Sciences, supervised by Dr. Adriana Iamnitchi.

Why It Matters

By the end of 2022, Meta reported disrupting 200+ global influence networks originating in 68 countries and operating in at least 42 languages. False stories travel 70% further than true ones, and a 2023 UNESCO-Ipsos survey found 87% of respondents believe disinformation has already impacted their country's politics. Tools that flag coordinated inauthentic behaviour are how social media stays transparent, yet the behavioural signatures they rely on today don't transfer between platforms. Content does.

The Data

Experiments run on a public labelled dataset of tweets from X (formerly Twitter) attributed to information operations across 16 distinct state actors, from state governments (Russia, China, Iran) to sub-national political movements. I selected 14 campaigns spanning diverse languages, durations, and IO concentration, from 7.057 tweets (Armenia) to 500.000 tweets (China, Cuba), each paired with topically and temporally matched control accounts for objective evaluation.

The Pipeline

Tweets are cleaned (mentions, URLs, retweet prefixes, and hashtags stripped), filtered, and encoded with the paraphrase-multilingual-mpnet-base-v2 sentence transformer into 768-d vectors: a paraphrase-aware space where the same narrative, reworded or translated across 50+ languages, still lands close together. Within each calendar-aligned monthly window, the pipeline computes the full within-window similarity matrix M = EEᵀ.

A full N×N similarity matrix at N = 100.000 tweets would eat up to 40 GB of RAM. TSPA slices the embedding matrix into row chunks and extracts each tweet's top-K neighbours with numpy.argpartition, O(N log K) per row instead of O(N log N), keeping campaign archives between 1.8 MB and 121 MB, computed on modest hardware.

Per-window similarity hits are collapsed into one score per account pair using three aggregation strategies: maximum, average, and weighted (a Bayesian-style shrinkage that suppresses spontaneous one-off matches while letting heavily-supported pairs converge to their empirical mean). Each campaign becomes a weighted user-to-user graph keeping every account's K′ strongest ties.

From Similarity to Communities

On those graphs I run weighted Leiden community detection, chosen over Louvain for its guaranteed well-connected partitions on Twitter-scale networks. The results are striking: on the *Egypt_UAE* campaign the largest connected component resolves into clean clusters with modularity Q ≈ 0.863 and purity 0.929 against a 0.563 chance level. On *Russia_1*, one detected community groups 1.115 users of which 94% are IO accounts, and a neighbouring cluster of 982 members reaches 100% intra-community IO purity.

TSPA vs. Behavioural Traces

I benchmarked TSPA against five classic behavioural traces (co-retweet, co-URL, hashtag sequences, co-mention, and fast retweet) plus their fused union, scoring accounts by weighted eigenvector centrality and comparing AUC-ROC. TSPA consistently separates IO–IO pairs from mixed pairs, and on several campaigns matches or exceeds every individual behavioural trace. Temporal semantic proximity works as a cross-platform backbone for surfacing coordinated communities, complementing rather than replacing behavioural approaches.

Results & Impact

High-modularity communities with IO purities of up to 100% across 14 real campaigns
Matches or exceeds every individual behavioural trace on several campaigns (AUC-ROC)
Language-independent and platform-agnostic: no retweet graphs or hashtags required
500k-tweet campaigns processed on modest hardware via chunked BLAS + top-K extraction

Tech Stack

Core: Python, NumPy, pandas, scikit-learn, SciPy
NLP & graphs: sentence-transformers (MPNet), NetworkX, igraph, leidenalg
Presentation: Jupyter, VIS-Network.js interactive dashboards, UMAP projections