Five service families, five modalities. We author the goldens, rubrics, verifiers, and annotated datasets — not just labels — that AI teams use to train, evaluate, and benchmark their models.
Expand each to see what it covers and what you receive. Schemas shown alongside are illustrative examples — not client deliverables.
{
"criterion": "Calls get_weather before answering",
"weight": "mandatory",
"hli": "Answer Correctness",
"justification": "User asked for current temp; tool call required.",
"evidence": "messages[3].toolCall.name == 'get_weather'",
"verdict": "PASS"
}{
"meta_info": { "task": "personal_agent", "difficulty": "multi_app_complex" },
"messages": [
{ "role": "user", "content": "Reschedule my 3pm and tell Sam." },
{ "role": "assistant", "thinking": "Need calendar + messaging tools…" },
{ "role": "assistant", "toolCall": { "name": "calendar.move", "args": {} } },
{ "role": "tool", "toolResult": { "status": "ok" } }
]
}We craft and optimize prompts that improve response accuracy, reasoning, and interaction quality — spanning code-generation prompts, multi-turn conversation prompts, and instruction tuning. Each prompt is written to a clear output spec with explicit edge cases and acceptance criteria.
What you get
Where each capability applies across the five modalities we cover.
| Capability | Text | Code | Image | Video | Audio |
|---|---|---|---|---|---|
| LLM Training & Alignment | |||||
| Prompt engineering | · | · | · | ||
| SFT data | · | · | · | ||
| RLHF / preference data | · | · | · | ||
| Agent trajectories | · | · | · | ||
| Evaluation & Benchmarking | |||||
| Response evaluation | · | · | · | ||
| RLVR & verifiers | · | · | · | ||
| Rubrics & verifier authoring | · | · | |||
| Benchmarking | · | · | |||
| QA & data validation | · | ||||
| Linguistic & content quality | · | · | · | · | |
| Image & Video Annotation | |||||
| Image & video annotation | · | · | · | ||
| Domain annotation | · | · | · | ||
| Brand & location annotation | · | · | · | ||
| Audio, Speech & Transcription | |||||
| Audio annotation & speech evaluation | · | · | · | ||
| Transcription services | · | · | |||
| Voice assistants & conversational AI | · | · | · | ||
| Multimodal & Generative | |||||
| Multimodal evaluation | |||||
| Prompt-to-image & editing eval | · | · | · | ||
| Derendering & data collection | · | · | |||
Two ways in — whether you have work to ship or want to contribute to it.