Fundamentals of Image Forensics
Concepts
Image File Format Structures
Digital images are stored in structured files with distinct sections:
- Header: Contains format identifiers (e.g.,
FFD8for JPEG), dimensions, color mode, and version information. - Metadata Blocks: Store EXIF, XMP, or IPTC data (camera settings, creation dates, author).
- Pixel Data: Raw pixel values encoded via techniques like DCT (JPEG) or filtering (PNG).
Key Structure Analogy
Think of an image file as a book:
- Header = Title page and table of contents
- Metadata = Author/publisher details embedded in margins
- Pixel data = Pages of text (image content)
Metadata Analysis
Metadata can reveal:
- Creation/Modification Timestamps: Original capture vs. edit dates
- Camera/Software Fingerprint: Model used to take photo or edit software
- Geolocation Data: GPS coordinates (if enabled)
Critical Insight
Manipulators often zero-out or alter metadata to mask edits. Presence of inconsistent metadata is a red flag.
Common Manipulation Indicators
Signs of tampering include:
- Histogram Anomalies: Sudden spikes/drops in color distribution
- Error Level Analysis (ELA): Uniform noise patterns indicate re-saving
- Duplicate Regions: Cloned areas with identical pixel patterns
- Border Artifacts: JPEG compression artifacts at edges of edits
Examples
1. Parsing JPEG Structure with Python
from PIL import Image, ImageFile
def analyze_jpeg_structure(file_path):
img = Image.open(file_path)
headers = {
"format": img.format,
"mode": img.mode,
"size": img.size
}
# Extract raw header bytes (first 256 bytes)
with open(file_path, "rb") as f:
header_bytes = f.read(256)
return headers, header_bytes.startswith(b'\xff\xd8')
print(analyze_jpeg_structure("sample.jpg"))
2. Extracting EXIF Metadata
import exifread
def get_exif_data(file_path):
with open(file_path, "rb") as f:
tags = exifread.process_file(f)
exif_data = {}
for tag, value in tags.items():
if "Date" in str(tag) or "Software" in str(tag):
exif_data[str(tag)] = str(value)
return exif_data
print(get_exif_data("photo.jpeg"))
3. Detecting Duplicate Regions
import cv2
import numpy as np
def find_duplicate_regions(image_path, threshold=0.8):
img = cv2.imread(image_path)
orb = cv2.SIFT_create()
kp, des = orb.detectAndCompute(img, None)
# Brute-force matcher
bf = cv2.BFMatcher()
matches = bf.knnMatch(des, des, k=2)
good_matches = []
for m, n in matches:
if m.distance < threshold * n.distance:
good_matches.append(m)
return len(good_matches) > 50 # Threshold for duplicates
print(find_duplicate_regions("suspicous.png"))
4. Basic Histogram Analysis
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
def plot_color_histogram(file_path):
img = Image.open(file_path).convert("RGB")
rgb = np.array(img)
plt.figure(figsize=(10, 4))
colors = ("r", "g", "b")
for i, color in enumerate(colors):
hist = np.histogram(rgb[:, :, i], bins=256, range=(0, 256))[0]
plt.plot(hist, color=color)
plt.xlim([0, 256])
plt.show()
plot_color_histogram("edited.jpg")
Key Notes
- JPEG Compression Artifacts: 8x8 pixel blocks create repeating noise patterns. Edited regions often show inconsistent artifact distribution.
- Metadata Mutation: Common manipulation tactic involves stripping EXIF or inserting fake data. Always cross-reference multiple metadata fields.
- ELA Implementation: Save an image with very high JPEG quality (85-100%). Differences between original and re-saved file highlight edited areas as brighter regions.
- Byte Signature Importance: Known file signatures (e.g.,
89 52 4E 47 0D 0A 1A 0Afor PNG) act as first-level authenticity checks. - Limits of Automated Tools: No algorithm is foolproof. Human analysis remains critical for context interpretation.