Packet Schemas
Pydantic models for all five packet types.
transcript_packet.v1
Primary audio intelligence output.
segments[]— utterances with speaker labelsasr_model— defaults to voxtralconfidence— per-segment [0.0, 1.0]word_timestamps_available
diarization_packet.v1
Speaker turn segmentation.
segments[]— speaker turnsoverlap_events[]— simultaneous speakersconfidence— per-turn [0.0, 1.0]
acoustic_packet.v1
Acoustic feature extraction.
observations[]— pause, speech_rate, pitch_shift, etc.baseline_comparison— none / above / below / unknowninterpretation_scope— contextual_signal_only
visual_packet.v1
Visual behavior observation.
observations[]— posture, gaze, gesture, etc.claim— observable behavior claiminterpretation_scope— observable_behavior_only
merged_evidence_packet.v1
Unified output — no interpretation.
timeline[]— modality-sorted eventssubjects[]— speaker-to-subject mappingmerge_metadata.validation_status— passed / passed_with_warnings / failed