// mod-osint · architecture document · v1.1

MOD-OSINT

Development Lifecycle · Team Structure · Architecture · Handoff Protocol
Phase 0 ✓ Complete Phase 1 ◈ Active Phase 2–6 ○ Planned 6 Roles Defined Scrum · Context Snapshots · Handoff Schema
Lifecycle Overview
// Full development arc — visible to all roles at all times
Phase 0 · Foundation
Core Infrastructure
Deliverables
DONE run.py — unified CLI
DONE orchestrator.py — headless runner
DONE module_registry.json
DONE profile schema (JSON)
DONE SQLite schema v1
DONE install_modules.py
DONE 3 ProfileIntel modules
DONE venv-safe file loader
DONE BLUEPRINT.md + README
Phase 1 · Intelligence Modules
OSS Wrapper Layer — recon_discovery phase
Sub-phases
1A · USERNAME/EMAIL
Maigret wrapper
Holehe wrapper
New: username_hits table
New: email_service_hits
1B · DNS/DOMAIN
dnspython wrapper
theHarvester wrapper
New: dns_records table
1C · GEO/EXIF
piexif wrapper
ipinfo SDK
New: geo_hits table
New: ip_geo_hits
1D · NLP/NER
spaCy en_core_web_sm
Runs on existing snippets
New: extracted_entities
GATE: all --dry-run pass · 6 new tables · regression suite green
Phase 3 · Reporting
Output Formats — JSON · HTML · PDF · STIX 2.1 · CSV Matrix
Jinja2 templates · WeasyPrint PDF · STIX bundle export · interactive HTML timeline
Phase 4 · API + Auth
FastAPI REST Layer + JWT RBAC
Full endpoint design · 4-tier role hierarchy · env-based secrets · OpenAPI docs · fixes existing api/ debt
Phase 5 · Hardening
Security · Compliance · Packaging · Docker
SQLCipher · PII retention policy · bandit · detect-secrets · pyproject.toml · pip install mod-osint
Phase 6 · Extension Framework
Community Module Standard
MODULE_META contract · cookiecutter template · registry CLI · version policy · CI compliance check
North Star — Done When
python run.py --profile ./subjects/jane_doe.json --output report.pdf

→ Where she appears online
→ What data has been exposed in breaches
→ What platforms she is registered on
→ Where she has been geographically
→ Who she is associated with
→ Confidence-rated canonical identity record
→ All sourced · all timestamped · all traceable to raw DB

Sprint Cadence

Sprint Structure
DURATION1-week sprints
STANDUPAsync daily — snapshot format
REVIEWEnd of each sub-phase
GATEValidation suite before merge
RETROEnd of each numbered phase
Artifact Ownership
BLUEPRINTArchitect (SA)
REGISTRYLead Dev (LD)
SCHEMABackend Dev (BD)
TESTSQA Engineer (QA)
SNAPSHOTSScrum Master (SM)
REPORTS/UIFull Stack (FS)
Regression Rules
  • run.py --dry-run exits 0 on every commit
  • All existing modules keep run() contract
  • SQLite schema changes are additive only
  • No new hard-exits in module code
  • install_modules.py stays current
  • BLUEPRINT.md updated each phase
Development Team
// 6 roles · skills · phase assignments · responsibilities
SA · Systems Architect
Systems Architect
@arch · owns: BLUEPRINT.md, schema contracts, phase gates
  • Distributed systems design · modular architecture patterns
  • Schema design (relational, graph, document)
  • UML · C4 model · ADR authoring
  • Security architecture · threat modeling
  • API contract design (OpenAPI, STIX)
  • Cross-phase dependency mapping
  • Technology selection with justification
Responsibilities
  • Owns BLUEPRINT.md — updates after every phase gate
  • Defines all table schemas before BD implements
  • Approves any change to module contract (run() signature)
  • Writes ADR (Architecture Decision Records) for all key decisions
  • Produces context snapshots at phase boundaries
ALL PHASES SCHEMA GATE ADR OWNER
SM · Scrum Master
Scrum Master
@scrum · owns: snapshots, handoffs, sprint board, blockers
  • Agile/Scrum facilitation · sprint planning · retrospectives
  • Technical project management · risk tracking
  • Cross-role communication · async coordination
  • Context compression · handoff documentation
  • Blocker identification and escalation
  • Velocity tracking · burn-down management
Responsibilities
  • Issues SNAP-xxx context snapshots at each handoff point
  • Maintains the signal registry (REF, BLOCK, GATE, MERGE tags)
  • Runs sprint planning — maps tasks to roles
  • Owns the branch strategy document
  • Writes the HANDOFF.md at each phase transition
ALL SPRINTS HANDOFF OWNER SIGNAL REGISTRY
LD · Lead Developer
Lead Developer
@lead · owns: module_registry.json, run.py, orchestrator.py
  • Python 3.9+ · importlib · subprocess · asyncio
  • Module system design · plugin architecture
  • Git workflow · branch strategy · PR review
  • OSS tool evaluation and integration patterns
  • CI/CD pipeline design (GitHub Actions)
  • Code review · mentoring junior developers
  • Performance profiling · regression testing
Responsibilities
  • Owns run.py and orchestrator.py — no changes without LD review
  • Maintains module_registry.json
  • Reviews all new module PRs for contract compliance
  • Writes and maintains install_modules.py
  • Sets coding standards + linting config
PHASE 0–6 PR GATEKEEPER MODULE CONTRACT
BD · Backend Developer
Backend Developer
@backend · owns: SQLite schema, module logic, OSS wrappers
  • Python · SQLite · SQL (schema design, migrations)
  • REST API consumption · OAuth · API key management
  • subprocess management · JSON normalization
  • OSS OSINT tool operation (Maigret, Holehe, spaCy, etc.)
  • Data pipeline design · ETL patterns
  • Unit + integration testing (pytest)
  • Rate limiting · retry logic · error handling
Responsibilities
  • Implements all module logic in modules/
  • Writes CREATE TABLE SQL approved by SA
  • Wraps each OSS tool per the Tier 1/2/3 strategy
  • Writes pytest unit tests for all public functions
  • Maintains requirements.txt and OSS version pins
PHASE 1–4 MODULE IMPL SCHEMA IMPL
QA · Quality Assurance
QA Engineer
@qa · owns: tests/, regression corpus, validation gates
  • pytest · pytest-cov · fixtures · parametrize
  • Regression testing · golden file testing
  • Security testing (bandit, detect-secrets)
  • API testing (httpx, Postman/Newman)
  • CI/CD integration · GitHub Actions
  • Test data management · synthetic profile corpus
  • Performance benchmarking · load testing
Responsibilities
  • Owns tests/ directory — all test files
  • Maintains 10-profile regression corpus
  • Writes and enforces validation gate checklists
  • Runs bandit + detect-secrets on each phase
  • Signs off phase completion with test results
ALL PHASES GATE OWNER CI PIPELINE
FS · Full Stack
Full Stack Designer
@fullstack · owns: report templates, API routes, HTML/CLI UX
  • Python FastAPI · Pydantic · async routes
  • Jinja2 templating · HTML/CSS · JavaScript
  • PDF generation (WeasyPrint, ReportLab)
  • STIX 2.1 spec · JSON-LD · data serialization
  • JWT auth flows · RBAC implementation
  • CLI UX design · argparse · click
  • Data visualization (charts, timelines, graphs)
Responsibilities
  • Implements all report output formats (Phase 3)
  • Builds FastAPI routes and auth layer (Phase 4)
  • Owns Jinja2 templates in modules/report_gen/templates/
  • Designs CLI UX improvements to run.py
  • Implements HTML status dashboard
PHASE 3–5 API LAYER REPORT OUTPUT

RACI Matrix — Key Artifacts

Artifact SASMLDBDQAFS
BLUEPRINT.mdR/ACCIII
module_registry.jsonAIRCII
SQLite schema (CREATE TABLE)AICRCI
Module code (modules/)CIARCC
run.py / orchestrator.pyCIR/AICI
tests/CICCR/AI
Context Snapshots (SNAP-xxx)CR/AIIII
Report templatesIICICR/A
FastAPI routes / authAICICR
requirements.txtCIARCC
R=Responsible · A=Accountable · C=Consulted · I=Informed
UML — Architecture
// Structural components · behavioral attributes · class relationships

Class Diagram — Core System

«abstract» ModuleBase + MODULE_META: dict + DB_PATH: str + PROFILES_DIR: str + run(dry_run: bool) → dict «abstract» + validate() → bool + schema() → str PassiveIdentityRecon - PROVIDER: str - ALLOWED_DOMAINS: list[str] - WAIT: float + load_profiles(dir) → Generator + build_queries(vars) → list + call_google(q, start, num) → dict + call_bing(q, count, offset) → dict - _store(conn, ...) → int + run(dry_run) → dict «implements» «writes» results BreachDataLookup - HIBP_API_KEY: str - DEHASHED_EMAIL: str - DEHASHED_API_KEY: str - _query_targets(profile) → dict - _hibp_breaches(email) → list - _hibp_pastes(email) → list - _dehashed_search(type,val) → dict - _store_breach(conn,...) → None + run(dry_run) → dict «implements» «writes» breach_hits ProfileMerge - DB_PATH: str - MERGED_DIR: str + merge_profiles(db, out) → dict + export_to_csv(db, path) → int - _fetch_recon(conn) → dict - _fetch_breaches(conn) → dict - _build_summary(r,b) → dict + run(dry_run) → dict «implements» «reads» results, breach_hits «writes» merged JSON files Orchestrator - PIPELINE_PHASES: list[str] - pipeline_status: dict + run_phase(name, registry) → None + _load_module(name, path) → Module + _save_status() → None + main() → None «database» SQLiteStore results breach_hits username_hits ← Phase 1a email_service_hits ← Phase 1a dns_records ← Phase 1b geo_hits, ip_geo_hits ← Phase 1c extracted_entities ← Phase 1d subjects, timeline_events ← Phase 2 inheritance calls reads/writes DB

Sequence Diagram — Pipeline Execution

:User run.py Orchestrator passive_recon breach_lookup profile_merge SQLite run --phase profile_intel run_phase("profile_intel", registry) _load_module + run() INSERT INTO results (...) n rows committed {hits: n} _load_module + run() INSERT INTO breach_hits (...) {hibp: n, dehashed: n} _load_module + run() SELECT + write JSON {profile: merged_path, ...} Pipeline complete
Flowcharts
// Function-level flow · decision trees · module lifecycle

Module Execution Flow (every module)

run(dry_run) dry_run == True? YES print dry-run messages count profiles NO credentials ready? NO print [!] warning return {} YES init_db(DB_PATH) → open/create SQLite, ensure tables exist FOR each profile in load_profiles(PROFILES_DIR) build_queries(profile_vars) → list of (query_str, name, loc) → skip if name fields empty call_api(query) → paginate → normalize → _store(conn,...) → hits → HTTPError caught per query, logs warning, continues loop return summary dict

Profile Validation Flow (run.py --validate-profiles)

validate_profiles(profiles_dir) isfile(path) or !isdir? YES [!] path is file, not dir print correction hint, exit 1 NO FOR each *.json file in directory valid JSON + required fields? NO [FAIL] print error errors += 1 YES [ ok ] print name · return errors count
Branch Strategy
// Git workflow · phase branches · module branches · merge rules
main develop v0.1 foundation v0.2 install fix feat/phase-1a-username-email maigret wrapper holehe wrapper schema +tests merge 1a feat/phase-1b-dns-domain dnspython harvester merge 1b feat/phase-1c-geo-exif piexif ipinfo feat/phase-1d-spacy-ner GATE Phase 1 v1.0 hotfix/venv-import-fix hotfix merge main release develop feature branch hotfix validation gate (QA sign-off required)
Branch Naming Convention
main                        production releases
develop                     integration branch
feat/phase-{N}{x}-{desc}    feature work
  feat/phase-1a-maigret-wrapper
  feat/phase-2-correlation-engine
  feat/phase-3-pdf-report

hotfix/{issue-desc}         production bug fixes
  hotfix/venv-import-fix

chore/{desc}                non-feature work
  chore/update-requirements
  chore/blueprint-update
Merge Rules
  • feat/* → develop requires: LD review + QA gate pass
  • develop → main requires: SA review + full regression suite
  • hotfix/* → main + develop simultaneously
  • No direct commits to main or develop
  • Every merge includes updated BLUEPRINT.md
  • Every merge updates install_modules.py if modules changed
  • Squash merges for feat/* — preserves clean develop history
Parallel Branch Coordination — Phase 1

Phase 1 has four sub-branches that can run in parallel after the develop branch is established. The SM issues a SNAP (context snapshot) at the start of each sub-phase so developers branching from the same develop commit have identical context. Merge order is gated — 1a must merge before 1c can begin (email_service_hits table is a dependency of the geo correlation in 1c).

BranchOwnerDepends OnCan Parallel With
feat/phase-1aBDdevelop basefeat/phase-1b
feat/phase-1bBDdevelop basefeat/phase-1a
feat/phase-1cBDfeat/phase-1a mergedfeat/phase-1b
feat/phase-1dBDfeat/phase-1a merged (needs results table)feat/phase-1b, 1c
Phase 1 — Deep Dive
// Sub-phase 1a · Maigret + Holehe · schema · module stubs

1a Module File Structure

modules/
├── username_enum/
│   ├── __init__.py
│   ├── maigret_wrapper.py      ← BD implements, LD reviews
│   └── holehe_wrapper.py       ← BD implements, LD reviews
├── dns_intel/
│   ├── __init__.py
│   ├── dnspython_wrapper.py    ← Phase 1b
│   └── harvester_wrapper.py   ← Phase 1b
├── geo_intel/
│   ├── __init__.py
│   ├── exif_extractor.py      ← Phase 1c
│   └── ip_locator.py          ← Phase 1c
└── nlp_intel/
    ├── __init__.py
    └── spacy_ner.py           ← Phase 1d

Phase 1a — New SQLite Tables (SA-approved schema)

CREATE TABLE IF NOT EXISTS username_hits (
    id           INTEGER PRIMARY KEY,
    profile_file TEXT,
    username     TEXT,
    site         TEXT,
    url          TEXT,
    status       TEXT,  -- found|not_found|error
    response_ms  INTEGER,
    raw_json     TEXT,
    retrieved_at TEXT
);
CREATE TABLE IF NOT EXISTS email_service_hits (
    id           INTEGER PRIMARY KEY,
    profile_file TEXT,
    email        TEXT,
    service      TEXT,
    domain       TEXT,
    registered   INTEGER, -- 1=yes 0=no
    rate_limited INTEGER, -- 1=yes
    raw_json     TEXT,
    retrieved_at TEXT
);

maigret_wrapper.py — Stub with Pseudocode

MODULE_META = { "name": "maigret_wrapper", "version": "1.0.0", "phase": "recon_discovery", "requires": ["maigret>=0.4.4"], "license": "MIT", "input_tables": [], "output_tables": ["username_hits"], } FUNCTION validate() → bool: // Check maigret is importable TRY import maigret → return True EXCEPT ImportError → print install hint → return False FUNCTION _run_maigret(username: str) → list[dict]: // Option A: subprocess (most reliable across maigret versions) result = subprocess.run( ["maigret", username, "--json", "-"], capture_output=True, timeout=120 ) PARSE result.stdout as JSON → extract site hits RETURN [{site, url, status, response_ms, raw}] // Option B: direct library call (faster, version-dependent) // from maigret import MaigretSite — use if API is stable FUNCTION _normalize(username, raw_hits, profile_file) → list[tuple]: FOR EACH hit in raw_hits: status = "found" if hit.status == "Claimed" else "not_found" YIELD (profile_file, username, hit.site, hit.url, status, hit.response_ms, json.dumps(hit.raw), now_utc()) FUNCTION run(dry_run=False) → dict: IF dry_run: validate() → print status → RETURN {"dry_run": True} IF NOT validate(): RETURN {} conn = _init_db(DB_PATH) // creates username_hits if not exists summary = {} FOR EACH profile_file, profile in load_profiles(PROFILES_DIR): usernames = _extract_usernames(profile) // profile.username + aliases hits = 0 FOR EACH username in usernames: raw = _run_maigret(username) rows = _normalize(username, raw, profile_file) conn.executemany(INSERT_SQL, rows) hits += len(rows) sleep(WAIT) summary[basename(profile_file)] = hits conn.close() RETURN summary

holehe_wrapper.py — Stub

FUNCTION _run_holehe(email: str) → list[dict]: // Holehe exposes a clean async Python API import holehe.core as holehe_core results = asyncio.run(holehe_core.holehe_check_email(email)) RETURN [{service, domain, registered: bool, rate_limited: bool, raw}] FUNCTION run(dry_run=False) → dict: // same contract as all modules IF dry_run → validate, count profiles, RETURN {dry_run: True} FOR EACH profile: emails = _extract_emails(profile) FOR EACH email: hits = _run_holehe(email) _store_email_hits(conn, profile_file, email, hits) RETURN summary

Phase 1 — Regression Test Requirements

tests/test_phase1.py — QA Owns
class TestMaigretWrapper:
    test_dry_run_returns_dict()
    test_validate_checks_import()
    test_normalize_maps_status()      # "Claimed" → "found"
    test_store_inserts_to_db()         # uses tmp sqlite fixture
    test_run_skips_on_no_usernames()   # empty profile
    test_run_handles_subprocess_error() # maigret not installed

class TestHolehe:
    test_dry_run_returns_dict()
    test_store_registered_flag()
    test_rate_limited_flag_stored()

class TestSchemaPhase1:
    test_username_hits_table_exists()
    test_email_service_hits_table_exists()
    test_schema_is_additive()           # phase 0 tables unchanged

module_registry.json — Phase 1a Additions

{
  "name": "maigret_wrapper",
  "path": "modules/username_enum/maigret_wrapper.py",
  "phase": "recon_discovery",
  "enabled": true,
  "description": "Username enumeration via Maigret across 3000+ sites",
  "requires": ["maigret>=0.4.4"],
  "output_tables": ["username_hits"]
},
{
  "name": "holehe_wrapper",
  "path": "modules/username_enum/holehe_wrapper.py",
  "phase": "recon_discovery",
  "enabled": true,
  "description": "Email service registration check via Holehe (120+ services)",
  "requires": ["holehe>=1.6.1"],
  "output_tables": ["email_service_hits"]
}
Handoff Schema
// Role-to-role transfer protocol · signal tags · HANDOFF.md format
Signal Tag Reference
REF:SNAP-xxx Reference a context snapshot
BLOCK:reason Work cannot proceed — escalate
DONE:artifact Artifact complete and tested
GATE:phase-N Validation gate — QA must sign off
MERGE:branch Ready to merge into develop

HANDOFF.md Template

FROM: @backend (BD) TO: @lead (LD) HANDOFF-2026-P1A-001
PHASE:         1a — username_enum modules
BRANCH:        feat/phase-1a-username-email
SNAPSHOT REF:  REF:SNAP-P1A-001
STATUS:        DONE:maigret_wrapper.py DONE:holehe_wrapper.py

WHAT WAS DONE:
  - modules/username_enum/maigret_wrapper.py — full implementation, run() contract
  - modules/username_enum/holehe_wrapper.py  — full implementation, run() contract
  - modules/username_enum/__init__.py        — package init
  - tests/test_phase1a.py                   — 12 tests, all passing
  - username_hits + email_service_hits tables created via _init_db()

WHAT LD NEEDS TO DO:
  1. Review modules/username_enum/*.py for contract compliance
  2. Add entries to module_registry.json (schema in SNAP-P1A-001)
  3. Update install_modules.py to include new files
  4. Run: python run.py --list-modules → verify [ready]
  5. Approve PR for merge into develop

KNOWN ISSUES:
  - Maigret subprocess timeout: currently hardcoded 120s — LD decision: make configurable via env?
  - Holehe asyncio: may conflict with existing event loops in future async contexts — flagged

BLOCKS:       None
GATE NEEDED:  GATE:phase-1a — QA must run test suite before LD approves PR

FILES CHANGED:
  modules/username_enum/maigret_wrapper.py   NEW
  modules/username_enum/holehe_wrapper.py    NEW
  modules/username_enum/__init__.py          NEW
  tests/test_phase1a.py                     NEW
  requirements.txt                          MODIFIED (added maigret, holehe pins)
    

Handoff Flows by Role Pair

SA BD
Delivers: Table schema SQL, architecture decisions, module interface spec
Format: ADR-xxx.md in docs/adr/ + updated BLUEPRINT.md section
Trigger: Phase kickoff or schema change request
Signal: REF:ADR-xxx in all related files
BD QA
Delivers: Implemented module + unit test stubs + test fixtures
Format: PR description with HANDOFF.md block
Trigger: Module passes BD's own smoke test
Signal: DONE:module_name GATE:phase-Nx
QA LD
Delivers: Test results report + coverage % + security scan output
Format: QA-REPORT-xxx.md in tests/reports/
Trigger: All tests pass + bandit clean
Signal: MERGE:feat/phase-Nx
LD SM
Delivers: Merged feature + updated registry + install_modules.py
Format: Merge commit message following convention
Trigger: PR approved and merged to develop
Signal: SM issues new SNAP snapshot for next phase
Context Snapshots
// Compressed history · calibration artifacts · multi-role alignment
What a Snapshot Is

A context snapshot (SNAP-xxx) is a machine-readable + human-readable document issued by the SM at every significant handoff point. Its purpose is to give any team member — or a new LLM context — enough compressed history to resume work correctly without reading the entire project history. Every snapshot is tagged, versioned, and stored in docs/snapshots/.

SNAP-P0-FINAL issued: 2026-03-30 · @scrum · closing Phase 0
PROJECT: mod-osint · OSINT orchestration platform
PHASE_COMPLETE: 0 — Foundation
PHASE_ACTIVE: 1a — username/email recon wrappers
REPO_ROOT: ~/mod-osint/
PYTHON: 3.9+ · venv · miniconda confirmed working
CANONICAL_STRUCTURE:
run.py · orchestrator.py · module_registry.json · requirements.txt
.osint_keys.example · BLUEPRINT.md · README.md · install_modules.py
modules/__init__.py
modules/profile_intel/__init__.py
modules/profile_intel/passive_identity_recon.py DONE
modules/profile_intel/breach_data_lookup.py DONE
modules/profile_intel/profile_merge.py DONE
osint_profiles/example.json DONE
SQLITE_SCHEMA_V1: results · breach_hits (both exist after first run)
MODULE_CONTRACT: run(dry_run: bool = False) → dict · MUST NOT hard-exit · MUST load config from env
LOADER: importlib.util.spec_from_file_location — venv-safe — NO sys.path games
KEY_DECISIONS:
ADR-001: modules/ (lowercase) — PEP8, case-sensitive FS safe
ADR-002: File-based module loading — eliminates venv import failures
ADR-003: Wrapper-first strategy — OSS tools over original reimplementation
ADR-004: Soft failures — modules warn+skip, never hard-exit pipeline
KNOWN_DEBT: api/auth.py · api/rbac.py — hardcoded secrets (Phase 4 fix)
REGRESSION_GATE: python run.py --dry-run exits 0 · always · on every commit
NEXT_TASK: BD: implement modules/username_enum/maigret_wrapper.py per ADR-003
REF: REF:SNAP-P0-FINAL in all Phase 1a work
SNAP-P1A-001 issued: [sprint 2 kickoff] · @scrum · Phase 1a start
INHERITS: REF:SNAP-P0-FINAL — all prior context applies
SPRINT_GOAL: Two working OSS wrappers in recon_discovery phase with schema + tests
BRANCH: feat/phase-1a-username-email (cut from develop)
NEW_TABLES_THIS_SPRINT:
username_hits (id, profile_file, username, site, url, status, response_ms, raw_json, retrieved_at)
email_service_hits (id, profile_file, email, service, domain, registered, rate_limited, raw_json, retrieved_at)
TOOLS_TO_WRAP:
maigret>=0.4.4 — subprocess preferred (json output flag) — fallback: library
holehe>=1.6.1 — async library call — asyncio.run() wrapper
ASSIGNMENTS:
BD → implement both wrappers · write unit test stubs
QA → write integration tests · fixture: tmp sqlite · fixture: mock subprocess
LD → review PRs · update registry + install_modules
SA → review schema before BD starts (no schema changes after BD starts)
SM → issue SNAP-P1A-002 on merge to develop
GATE_CRITERIA:
python run.py --list-modules → maigret_wrapper [ready]
python run.py --list-modules → holehe_wrapper [ready]
python run.py --dry-run → phase recon_discovery has 2 modules, both complete
pytest tests/test_phase1a.py -v → all pass
bandit modules/username_enum/ → no HIGH findings
BLOCKS: None at sprint start
SIGNAL: Use REF:SNAP-P1A-001 in all commits, PRs, and handoff docs this sprint

Snapshot Naming Convention

SNAP-{PHASE}-{SEQUENCE}

SNAP-P0-FINAL       Phase 0 closing snapshot
SNAP-P1A-001        Phase 1a sprint 1 kickoff
SNAP-P1A-002        Phase 1a closing (post-merge)
SNAP-P1B-001        Phase 1b kickoff
SNAP-P2-001         Phase 2 kickoff
SNAP-HOTFIX-001     Emergency hotfix context

docs/snapshots/
├── SNAP-P0-FINAL.md
├── SNAP-P1A-001.md
├── SNAP-P1A-002.md       ← SM writes this after 1a merges
└── ...

Every snapshot references its parent:
  INHERITS: REF:SNAP-P0-FINAL
Using Snapshots for LLM Context Resumption

When development requires LLM assistance across session boundaries, paste the most recent SNAP document as the first message. The snapshot provides: current phase, completed artifacts, active branch, module contract, key decisions (ADRs), known debt, and next task. This replaces the need to re-read the entire conversation history and prevents regression to stale assumptions.

// Meta-prompt template for LLM context resumption:

"You are continuing development on mod-osint.
Current context snapshot: [paste SNAP-P1A-001]
Task: Implement modules/username_enum/maigret_wrapper.py
Contract: run(dry_run=False) → dict, no hard exits, env config only
Schema: [paste username_hits CREATE TABLE]
Style: match existing modules in modules/profile_intel/
Reference implementation: modules/profile_intel/passive_identity_recon.py"
Pseudocode & Stubs
// Phase 2 correlation engine · Phase 3 report gen · Phase 4 API

Phase 2 — Correlation Engine (entity_resolver.py)

CLASS EntityResolver: // Reads all intelligence tables, builds canonical subject records FUNCTION resolve_all(conn) → list[Subject]: // Step 1: Collect all unique identifiers across all source tables identifiers = {} identifiers["email"] = SELECT DISTINCT query_value FROM breach_hits WHERE query_type='email' identifiers["username"] = SELECT DISTINCT username FROM username_hits WHERE status='found' identifiers["name"] = SELECT DISTINCT name FROM results WHERE name IS NOT NULL identifiers["phone"] = SELECT DISTINCT query_value FROM breach_hits WHERE query_type='phone' identifiers["ip"] = SELECT DISTINCT ip FROM ip_geo_hits // Step 2: Group identifiers by profile_file (known subject) by_profile = GROUP all identifiers BY profile_file // Step 3: For each profile, create/update a Subject record subjects = [] FOR EACH profile_file, id_set in by_profile: subject = get_or_create_subject(conn, profile_file) upsert_identifiers(conn, subject.id, id_set) subject.confidence = ConfidenceScorer.score(subject, id_set) subjects.append(subject) // Step 4: Cross-subject linking (same email in two profiles?) detect_cross_profile_links(conn, subjects) RETURN subjects CLASS ConfidenceScorer: WEIGHTS = { "email_exact": 1.0, // exact email match = strongest signal "username_found": 0.7, "name_exact": 0.6, "name_fuzzy": 0.3, "phone_exact": 0.9, "ip_corroborate": 0.4, "corroboration": 0.1, // +0.1 per additional source confirming } FUNCTION score(subject, id_set) → float: score = 0.0 sources_confirming = COUNT DISTINCT source_tables for this subject score += WEIGHTS[each matched id type] score += WEIGHTS["corroboration"] * (sources_confirming - 1) RETURN min(1.0, score) // clamp to [0.0, 1.0]

Phase 3 — HTML Report Generator (html_report.py)

FUNCTION generate_html_report(subject_id: int, output_path: str) → str: // Fetch all data for this subject subject = fetch_subject(conn, subject_id) recon_hits = fetch_recon(conn, subject.profile_file) breach_hits = fetch_breaches(conn, subject.profile_file) usernames = fetch_username_hits(conn, subject.profile_file) timeline = fetch_timeline(conn, subject_id) geo_points = fetch_geo(conn, subject.profile_file) // Load Jinja2 template env = jinja2.Environment(loader=FileSystemLoader("modules/report_gen/templates")) template = env.get_template("report.html.j2") // Render with context html = template.render( subject=subject, recon_hits=recon_hits, breach_hits=breach_hits, usernames=usernames, timeline=timeline, geo_points=geo_points, generated_at=utcnow(), confidence_pct=int(subject.confidence * 100) ) WRITE html to output_path RETURN output_path FUNCTION generate_pdf(subject_id, output_path) → str: html_path = generate_html_report(subject_id, "/tmp/report_tmp.html") weasyprint.HTML(html_path).write_pdf(output_path) RETURN output_path

Phase 4 — FastAPI Route Stubs (api/routes/profiles.py)

ROUTER = APIRouter(prefix="/api/v1/profiles", tags=["profiles"]) @ROUTER.post("/", response_model=ProfileResponse, status_code=201) ASYNC FUNCTION create_profile( data: ProfileCreate, // Pydantic model — validates all fields current_user = require_role("analyst") ) → ProfileResponse: validate_profile_data(data) path = save_profile_json(data, PROFILES_DIR) RETURN ProfileResponse(id=uuid(), path=path, status="created") @ROUTER.post("/{profile_id}/run") ASYNC FUNCTION trigger_run( profile_id: str, phases: list[str] = ["profile_intel"], current_user = require_role("analyst") ) → RunResponse: run_id = uuid() // Dispatch to background task — don't block the HTTP response background_tasks.add_task(execute_pipeline, run_id, profile_id, phases) RETURN RunResponse(run_id=run_id, status="queued") @ROUTER.get("/{profile_id}/report") ASYNC FUNCTION get_report( profile_id: str, fmt: str = "json", // json | html | pdf | stix | csv current_user = require_role("viewer") ) → FileResponse | JSONResponse: MATCH fmt: "pdf"generate_pdf(profile_id) → FileResponse "html"generate_html(profile_id) → FileResponse "stix"generate_stix(profile_id) → JSONResponse "csv"export_csv(profile_id) → FileResponse _ → get_merged_json(profile_id) → JSONResponse

Module Contract Compliance Checker (Phase 6 — Extension Framework)

FUNCTION validate_module_contract(module_path: str) → ValidationResult: // Load module without executing run() mod = spec_from_file_location(name, path) checks = [] // Check 1: has MODULE_META checks.append(hasattr(mod, "MODULE_META")) // Check 2: has run() with correct signature sig = inspect.signature(mod.run) checks.append("dry_run" in sig.parameters) // Check 3: run() return annotation is dict checks.append(sig.return_annotation == dict) // Check 4: has validate() checks.append(hasattr(mod, "validate")) // Check 5: has schema() if output_tables is non-empty has_outputs = len(mod.MODULE_META.get("output_tables", [])) > 0 IF has_outputs: checks.append(hasattr(mod, "schema")) // Check 6: dry_run=True returns without side effects result = mod.run(dry_run=True) checks.append(isinstance(result, dict)) checks.append(result.get("dry_run") == True) RETURN ValidationResult(passed=all(checks), checks=checks)