https://arxiv.org/abs/2511.11532

New study preprint quantifies what we’ve suspected: Trump’s Truth Social posting becomes measurably more erratic after Epstein coverage spiking on Fox News.

  • porcoesphino@mander.xyz
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    1 hour ago

    Probably the only interesting part of that study to me is how they are measuring “erratic” which is using a measure they’ve called “novelty”. Its in appendix A1:

    A.1 Embedding and Novelty Measurement

    To quantify content novelty, we first convert the text of each post into a high-dimensional vector representation (embedding). This process begins by cleaning the raw post content (e.g., stripping HTML tags) and feeding the text into a pre-trained SentenceTransformer model, specifically all-MiniLM-L6-v2. This model maps each post to a 384-dimensional vector. From the full corpus of N posts, we obtain a matrix of “raw" embeddings.

    These raw embeddings are known to suffer from anisotropy (a non-uniform distribution in the vector space), which can make distance metrics unreliable [li2020sentence]. To correct this, we apply a standard decorrelation step. We fit a Principal Component Analysis model with whitening to the entire matrix 𝐄raw. This transformation de-correlates the features and scales them to have unit variance, yielding a matrix of ‘whitened’ embeddings, 𝐄white [su2021whitening]. These whitened vectors are used for all novelty calculations.

    There is a decent primer on the transformer here:

    https://medium.com/@rahultiwari065/unlocking-the-power-of-sentence-embeddings-with-all-minilm-l6-v2-7d6589a5f0aa

    I’m not sure of a great primer on PCA, it kind of finds the dominant directions of a set of vectors.

    With that novelty measurement the eracticness seems to be averaging over a window (seven day) and then measuring euclidean distance.

    I did have a pint just before reading and writing this so there’s probably some mistakes here