{
  "version": "20260604_core_validation_v45",
  "scope": "Human curator-verified ASO/siRNA toxicity and off-target evidence with source-localized provenance. The v1 keyword-classifier pre-curation (2003 candidates; measured 0.73 false-accept rate, Wilson 95% CI [0.63, 0.81], n=126) was independently re-curated over source passages by a human curator (v2 LLM-assisted); 737 rows are human curator-verified (626 toxicity + 111 off-target) and 1345 unsupported machine candidates were demoted to machine_precurated_v1. primary-curator release with a blinded second curator (HY): Cohen κ_binary = 0.42 (moderate, Landis-Koch) under the drop-abstain convention (n=92 non-abstain) and 0.34 (fair) under the safety-conservative collapse-abstain convention (n=100), raw agreement 66% (66/100), on a 100-row mixed accept+reject sample (KAPPA-2; full analysis in Methods Stage 3; A10 third-adjudicator pass in progress). Not a clinical recommendation engine and not a de novo sequence-risk predictor. v6.1 (post-EXPAND-2 toxicity round) added a total of 80 curator-verified records (48 from EXPAND-1 + 32 from EXPAND-2 toxicity-focused round) and recovered real molecule identities for 110 of 143 v1-extraction-artefact placeholders, reducing benchmark contamination from 31.1% to 4.1%.",
  "curator_of_record": {
    "curator_id": "ni_jie",
    "name": "Ni Jie",
    "affiliation": "University of Innsbruck, Digital Science Center, Innsbruck, Austria",
    "role": "primary human curator (full release)",
    "method": "AI-assisted human curation: the curator adjudicated v2 LLM proposals against the source passages (not blind) and recorded the final accept/reject decision row by row.",
    "single_curator_note": "All human curator-verified release rows were adjudicated by this primary curator. A blinded second curator (HY, no manuscript exposure) independently re-scored a 100-row mixed inter-rater sample (KAPPA-2): Cohen κ_binary = 0.42 (moderate, drop-abstain, n=92) / 0.34 (fair, collapse-abstain, n=100), raw agreement 66% (66/100), 3-class κ = 0.37, grade κ = 0.39. Inter-curator Cohen kappa IS therefore claimed. HY was systematically stricter (92% reject-confirm, 40% accept-confirm) — conservative direction for a safety database. An A10 third-adjudicator consensus pass is in progress and will further refine kappa."
  },
  "release_gate": {
    "release_records": 737,
    "curator_verified_accept_audits": 737,
    "all_release_records_have_verified_accept_audit": true,
    "toxicity_release_records": 626,
    "offtarget_release_records": 111,
    "candidate_records": 41114,
    "curator_rejected_candidate_audits": 28908,
    "candidate_pending_records": 10123,
    "promotion_rule": "Only accept rows with curator_decision=accept, validation_status=curator_verified, evidence_grade in A/B/C, and source_location are promoted into release evidence tables."
  },
  "provenance_coverage": {
    "source_location": {
      "filled": 737,
      "total": 737,
      "pct": 100.0
    },
    "pmid": {
      "filled": 737,
      "total": 737,
      "pct": 100.0
    },
    "doi": {
      "filled": 733,
      "total": 737,
      "pct": 99.5
    },
    "any_sequence": {
      "filled": 0,
      "total": 737,
      "pct": 0.0
    },
    "any_chemistry_or_delivery": {
      "filled": 0,
      "total": 737,
      "pct": 0.0
    }
  },
  "evidence_grade_policy": [
    {
      "grade": "A",
      "meaning": "High-confidence source-localized experimental evidence suitable for citation when the claim matches the endpoint.",
      "benchmark_use": "eligible"
    },
    {
      "grade": "B",
      "meaning": "Curator-verified evidence with usable source localization but lower specificity, smaller sample support, or broader context.",
      "benchmark_use": "eligible"
    },
    {
      "grade": "C",
      "meaning": "Contextual curator-verified evidence retained for browsing and provenance, not used in A/B benchmark splits.",
      "benchmark_use": "not eligible"
    }
  ],
  "release_audit_by_domain": [
    {
      "entity_table": "offtarget_evidence",
      "evidence_grade": "A",
      "n": 33
    },
    {
      "entity_table": "offtarget_evidence",
      "evidence_grade": "B",
      "n": 65
    },
    {
      "entity_table": "offtarget_evidence",
      "evidence_grade": "C",
      "n": 13
    },
    {
      "entity_table": "toxicity_endpoint",
      "evidence_grade": "A",
      "n": 200
    },
    {
      "entity_table": "toxicity_endpoint",
      "evidence_grade": "B",
      "n": 210
    },
    {
      "entity_table": "toxicity_endpoint",
      "evidence_grade": "C",
      "n": 216
    }
  ],
  "audit_method_summary": [
    {
      "extraction_method": "curator_review_v1",
      "extractor_model_or_script": "promote_curator_review.py",
      "validation_status": "curator_rejected",
      "curator_decision": "reject",
      "n": 28275
    },
    {
      "extraction_method": "curator_review_v1",
      "extractor_model_or_script": "promote_curator_review.py",
      "validation_status": "machine_precurated_v1",
      "curator_decision": "accept",
      "n": 1983
    },
    {
      "extraction_method": "human_recuration_v2",
      "extractor_model_or_script": "apply_recuration_verdicts.py",
      "validation_status": "recurated_rejected",
      "curator_decision": "reject",
      "n": 1345
    },
    {
      "extraction_method": "human_recuration_v2",
      "extractor_model_or_script": "apply_recuration_verdicts.py",
      "validation_status": "curator_verified",
      "curator_decision": "accept",
      "n": 657
    },
    {
      "extraction_method": "human_curator_expand",
      "extractor_model_or_script": "apply_script.py",
      "validation_status": "curator_rejected",
      "curator_decision": "reject",
      "n": 633
    },
    {
      "extraction_method": "human_curator_expand",
      "extractor_model_or_script": "apply_script.py",
      "validation_status": "curator_verified",
      "curator_decision": "accept",
      "n": 80
    },
    {
      "extraction_method": "manual",
      "extractor_model_or_script": "none",
      "validation_status": "verified",
      "curator_decision": "linkout_only",
      "n": 2
    },
    {
      "extraction_method": "human_curator_expand",
      "extractor_model_or_script": "apply_script.py",
      "validation_status": "curator_abstained",
      "curator_decision": "abstain",
      "n": 1
    },
    {
      "extraction_method": "manual",
      "extractor_model_or_script": "none",
      "validation_status": "verified",
      "curator_decision": "accept",
      "n": 1
    }
  ],
  "source_identifier_coverage": {
    "source_documents": 36245,
    "with_pmid": 36241,
    "with_doi": 35236,
    "with_pmcid": 16234,
    "source_license_manifest_rows": 36245
  },
  "license_summary": [
    {
      "license_status": "abstract_metadata_only",
      "reuse_category": "derived_annotations_only",
      "n": 36238
    },
    {
      "license_status": "open_access",
      "reuse_category": "query_linkout_only",
      "n": 3
    },
    {
      "license_status": "cc_by_nc_nd",
      "reuse_category": "query_linkout_only",
      "n": 1
    },
    {
      "license_status": "official_guideline",
      "reuse_category": "derived_annotations_only",
      "n": 1
    },
    {
      "license_status": "official_notice",
      "reuse_category": "derived_annotations_only",
      "n": 1
    },
    {
      "license_status": "open_access",
      "reuse_category": "derived_annotations_only",
      "n": 1
    }
  ],
  "independent_validation": {
    "claim_status": "not_claimable",
    "sample": {
      "sample_rows": 500,
      "reviewed_rows": 0,
      "comparable_rows": 0,
      "release_accept_rows": 250,
      "candidate_reject_control_rows": 250,
      "completion_pct": 0.0
    },
    "metrics": {
      "raw_agreement": null,
      "cohen_kappa": null,
      "false_accept_rate_release_rows": null,
      "false_reject_rate_reject_controls": null,
      "source_location_disagreement_rows": 0
    },
    "claim_boundary": "Cohen kappa is now claimed from the completed 100-row mixed inter-rater study (KAPPA-2; see kappa_claimed below). This separate 500-row independent second-review packet underwrites the release-row false-accept / false-reject ERROR-RATE estimates, which remain pending until reviewer2_decision is filled and adjudicated; the live agreement/kappa fields in this endpoint are computed from that 500-row packet and are null until it is filled.",
    "downloads": {
      "independent_validation_template": "/api/download/independent_curation_validation_template.csv",
      "independent_validation_manifest": "/api/manifest/independent_curation_validation_template_v1.csv",
      "curation_audit": "/api/download/curation_audit.csv"
    }
  },
  "core_oligo_field_status": {
    "claim_boundary": "Current release can be described as a provenance-rich safety/off-target evidence database. It must not be described as complete sequence/modification/dose coverage until P0 field curation is source-verified.",
    "summary": {
      "p0_benchmark_linked_rows": 344,
      "p1_grade_ab_nonbenchmark_rows": 149,
      "p2_contextual_grade_c_rows": 212,
      "p0_missing_sequence": 344,
      "p0_missing_modification": 344,
      "p0_missing_dose": 342,
      "assays_with_dose": 2,
      "assays_with_model_context": 1708
    },
    "blocking_gates": [
      {
        "gate": "Complete oligo identity claim",
        "status": "blocked",
        "evidence": "P0 missing sequence=344; P0 missing modification=344."
      },
      {
        "gate": "Dose-aware safety stratification",
        "status": "blocked",
        "evidence": "P0 missing dose=342; assay table currently has 2 dose-bearing assays."
      },
      {
        "gate": "No-fabrication policy",
        "status": "pass",
        "evidence": "Packet fields are blank unless source-located; generated rows are curation work items, not release claims."
      }
    ],
    "downloads": {
      "core_field_packet": "/api/download/core_oligo_field_curation_packet.csv",
      "core_field_packet_manifest": "/api/manifest/core_oligo_field_curation_packet_v1.csv",
      "legacy_sequence_template": "/api/download/sequence_modification_curation_template.csv",
      "field_completeness": "/api/field_completeness"
    }
  },
  "redistribution_policy": [
    {
      "level": "redistributable raw",
      "current_use": "Only when source terms explicitly allow it; not used for copyrighted article text."
    },
    {
      "level": "derived annotations only",
      "current_use": "Default release mode for PubMed/PMC-linked literature: matched terms, source locations, metadata, and curator decisions are redistributed; raw full text is not."
    },
    {
      "level": "query/link-out only",
      "current_use": "Used when source text or document reuse is restricted; OligoVigil links to PMID/DOI/PMCID/source URL."
    },
    {
      "level": "not safe",
      "current_use": "Excluded from release downloads until rights and provenance are resolved."
    }
  ],
  "known_limitations": [
    "Exact sequence and chemistry fields remain incomplete and should be described as an expansion track, not as complete sequence-alignment coverage.",
    "Dose/exposure fields are sparse and must not be used for dose-response safety claims until source-verified in the core oligo field packet.",
    "Inter-curator Cohen kappa is claimed from the completed 100-row KAPPA-2 study (0.42 drop-abstain / 0.34 collapse-abstain, raw agreement 66%); release-row false-accept / false-reject error-rate estimates remain not claimable until the separate 500-row second-review packet is completed and adjudicated, and the A10 third-adjudicator pass is still pending.",
    "Candidate records are curation work items and must not be cited as verified release evidence.",
    "External adoption, download, and citation evidence can only be claimed after public deployment."
  ],
  "reviewer_audit_actions": [
    {
      "action": "Open any release row from /api/evidence_records and inspect /api/evidence_detail.",
      "evidence": "Record payload includes source metadata, exact source location, grade rationale, audit status, and citation text."
    },
    {
      "action": "Download evidence_release.csv and curation_audit.csv.",
      "evidence": "Rows can be joined by entity_table/entity_id to reproduce verified-release status."
    },
    {
      "action": "Inspect license_manifest_v1.csv and source_document.csv.",
      "evidence": "Raw article text is not redistributed; source identifiers and derived annotations are exposed."
    },
    {
      "action": "Open the core oligo field packet and independent validation template.",
      "evidence": "The portal exposes sequence/modification/dose gaps and the second-review sampling frame instead of hiding curation uncertainty."
    }
  ],
  "downloads": {
    "evidence_release": "/api/download/evidence_release.csv",
    "curation_audit": "/api/download/curation_audit.csv",
    "license_manifest": "/api/manifest/license_manifest_v1.csv",
    "source_license_manifest": "/api/manifest/source_license_manifest_v1.csv",
    "sequence_template": "/api/download/sequence_modification_curation_template.csv",
    "core_oligo_field_packet": "/api/download/core_oligo_field_curation_packet.csv",
    "independent_validation_template": "/api/download/independent_curation_validation_template.csv",
    "all_tables": "/api/download/all_tables.zip"
  }
}