Research

Data quality makes all the difference.

We're driven by the conviction that model performance is fundamentally bounded by training data quality. Through expert collaboration, rigorous curation methodologies, and deep domain expertise, we research datasets that power tomorrow's models.

Research and Blog

How We Improved Terminal-Bench 2.0 Scores by Over 5x Using Tinker and Harbor

Michael E.Spencer Mateega
Michael E., Spencer M.·March 31, 2026

Technical Blog

Read

IDE-Bench: Evaluating Large Language Models as IDE Agents on Real-World Software Engineering Tasks

Spencer MateegaJeff YTiana Costello
+3
Spencer M., Jeff Y., Tiana C. +3 more·January 20, 2026

Research Paper and Benchmark

Read

Joining AfterQuery

Kishan Gandham
Kishan G.·December 26, 2025

Blog

Read

Market-Bench: Evaluating LLMs on Introductory Quantitative Trading

Abhay SSam JSpencer Mateega
Abhay S., Sam J., Spencer M.·December 13, 2025

Research Paper and Benchmark

Read

App-Bench: Evaluating Coding Agents on Generating Economically Useful Web-Apps

Andrew ZSam JSpencer Mateega
Andrew Z., Sam J., Spencer M.·October 25, 2025

Benchmark

Read

The AfterQuery Thesis

Spencer Mateega
Spencer M.·October 20, 2025

Blog

Read

UI-Bench: A Benchmark for Evaluating User Interface Understanding

Sam JAgustin GarcinuñoSpencer Mateega
Sam J., Agustin G., Spencer M.·August 28, 2025

Research Paper and Benchmark

Read

LeetBench: A Benchmark for Competitive Programming & Algorithmic Reasoning

Spencer MateegaTiana Costello
Spencer M., Tiana C.·July 21, 2025

Benchmark

Read

VADER: Vulnerability Assessment, Detection, Explanation, and Remediation

Ethan LiuCarlos GeorgescuSpencer Mateega
+2
Ethan L., Carlos G., Spencer M. +2 more·May 26, 2025

Research Paper and Benchmark

Read

FinanceQA: A Question Answering Benchmark for Financial Data

Spencer MateegaCarlos GeorgescuDanny Tang
Spencer M., Carlos G., Danny T.·January 30, 2025

Research Paper and Benchmark

Read

Core Research Areas

Botanical illustration
1
2
3
4
5
1

AI Safety and Security

Our research focuses on developing novel approaches to AI training that help models understand and respect human values without sacrificing capability.

2

÷Multimodal Learning

We advance AI's ability to understand and reason across visual, audio, and textual modalities simultaneously.

3

>Computer Use & Automation

We've created training data that teaches AI agents to understand context, anticipate user needs, and execute complex multi-step workflows across diverse software environments.

4

Data Quality & Curation

We've developed rigorous methodologies for identifying, filtering, and enhancing training data quality to drive superior model performance.

5

Model Evaluation

Our evaluation frameworks go beyond traditional benchmarks to rigorously assess real-world AI performance across diverse real-world scenarios.

Ready to build better AI?