Reduced iPhone manual QA dependence by building a natural language input AI Agent that
simulates user actions
with 70%+ success vs 0% on multiple frontier models evaluated with pass@k against actual
test suites.
Improved Swift XCTest generation by creating APIs for agents to observe and interact with
iPhones and
multi-agent validation to reduce semantic errors.
Generated iPhone test cases with 35+ actions by implementing context engineering techniques
such as
multi-agent delegation, Qdrant RAG, and token condensing.