Research

Digital Forensics Database

The original AFIP maintained MEDUSA — the Medical Diagnostic Ultimate System Architecture, a comprehensive database of pathological specimens that served as the world's largest reference repository for diagnostic pathology. The new AFIP Digital Forensics Database applies this same institutional approach to synthetic media, creating a curated, documented repository of analyzed AI-generated content.

Sample Repository

The database catalogs analyzed samples across four forensic domains: AI-generated text (GPT-4, GPT-5, Claude, Gemini, Llama, and other models), synthetic audio (ElevenLabs, VALL-E, Bark, Tortoise TTS, and custom voice cloning systems), manipulated images (Midjourney, DALL-E, Stable Diffusion, and GAN-generated faces), and deepfake video (face-swapped, lip-synced, and fully synthetic video).

Each sample is cataloged with full provenance documentation: the generative model and version used, generation parameters and prompts where available, detection results across multiple AFIP methodologies, confidence scores, false positive/negative analysis, and forensic examiner notes. This level of documentation enables both detection research and legal evidentiary use.

Detection Benchmarks

The database supports AFIP's quarterly benchmark program by providing standardized test corpora for evaluating detection methodology. Benchmark datasets are versioned, documented, and frozen at the time of testing — ensuring reproducible comparison across evaluation periods. As new generative models emerge, corresponding benchmark corpora are generated and added to the testing pipeline.

Current benchmark metrics track detection accuracy, false positive rates, false negative rates, processing latency, and cross-model generalization. All results are published with full methodology documentation, including known limitations and failure modes.

Classification Taxonomy

Every sample in the database is classified using AFIP's forensic taxonomy — a hierarchical classification system modeled on the diagnostic coding frameworks used in the original AFIP's pathology consultation service. The taxonomy covers generation method, content domain, manipulation type, detection difficulty, and forensic significance. This structured approach enables systematic research into detection patterns, model-specific artifacts, and the evolution of generative capability over time.

Forensic Case Studies

Selected entries include detailed forensic case studies documenting the analysis process from initial examination through final determination. These narratives follow the same structure used in AFIP's historic pathology consultation reports: clinical history (context of discovery), gross examination (surface-level analysis), microscopic examination (deep forensic analysis), diagnosis (determination with confidence scoring), and discussion (methodology notes and limitations). They serve as educational resources, methodology documentation, and templates for forensic reporting standards.

Database Access

The Digital Forensics Database is available to qualified researchers, institutional partners, and law enforcement through the Access Portal. Programmatic access via API is available for institutions requiring integration with existing detection workflows.