Metadata Stripping Across Distribution Platforms

Every major social media and messaging platform processes uploaded media through compression, resizing, and format conversion pipelines. These pipelines routinely strip embedded metadata including EXIF data, IPTC records, XMP properties, and C2PA content credentials. This research documents the specific behavior of major platforms and examines the implications for metadata-dependent provenance systems.

Testing Methodology

AFIP prepared test images with the following embedded metadata layers: full EXIF data (camera model, GPS coordinates, timestamps), IPTC records (creator, copyright, description), XMP properties (editing history, tool information), and C2PA manifest (signed content credential with identity assertion). Each test image was uploaded to and downloaded from each platform, then examined for metadata preservation.

Findings

Across the six platforms tested, C2PA manifests were fully stripped in five out of six cases. Facebook showed partial preservation under specific conditions (images uploaded through creator-designated flows), but consumer-facing re-shares stripped credentials.

The metadata stripping is not incidental to platform operation—it serves legitimate technical and privacy purposes. Platforms strip GPS coordinates to protect user location, remove camera serial numbers to prevent tracking, and recompress images to reduce storage and bandwidth costs. The same processing that protects user privacy also destroys provenance signals.

This creates a fundamental tension in the metadata-based provenance model: the transformations that strip provenance data are not adversarial attacks but standard platform behavior serving legitimate user interests.

Perceptual Fingerprinting Comparison

In parallel testing, AFIP generated perceptual fingerprints for each test image before upload and attempted fingerprint recovery after platform processing. Perceptual fingerprints, which are derived from the visual content itself rather than metadata containers, showed substantially higher survival rates across all platforms.

PlatformC2PA SurvivalpHash SurvivaldHash SurvivalWavelet Survival
Instagram0%97.3%98.1%96.8%
X (Twitter)0%98.7%99.2%97.4%
TikTok0%95.1%96.4%94.2%
Facebook~40%98.9%99.1%98.3%
WhatsApp0%93.6%94.8%92.1%
LinkedIn0%97.8%98.5%96.9%

Survival rate indicates percentage of test images where the fingerprint remained matchable after platform upload/download cycle.

Implications

These results demonstrate that content-derived fingerprinting provides a viable provenance mechanism where metadata-based approaches fail. Perceptual fingerprints are not subject to metadata stripping because they are computed from the visual content itself, not from a metadata container attached to the file.

A practical provenance system must account for the reality that the majority of content distribution occurs through channels that strip embedded metadata. Fingerprint-based approaches offer a complementary path that works within this constraint rather than against it.

description FIP Fingerprinting Spec analytics Full C2PA Analysis