JetBrains: IDE‑native поиск делает кодовых агентов на 8% быстрее и на 5% дешевле
Источник: https://blog.jetbrains.com/ai/2026/05/what-happens-when-you-give-agents-ide-native-seach-tools/
Краткое содержание
Команда JetBrains AI описывает eval‑пайплайн, в котором те же кодовые задачи прогоняются с дополнительной MCP‑обвязкой и без неё. Подход eval‑driven development: ничего не выходит в дефолт, пока eval не докажет улучшение. Результаты:
- медианная latency: 83.11 с → 79.03 с (−8.33%)
- P95 latency: 268.71 с → 213.17 с (−16.44%)
- общая стоимость: $44.17 → $41.67 (−5.60%)
- доля задач с превышением лимита $0.50/задача: 6.67% → 4.44% (−33.28%)
Качество (доля задач с прохождением всех тестов) статистически значимо не изменилось.
Мотив: когда агенты ищут код, по умолчанию они вызывают шелл‑утилиты grep/find. Эти инструменты не видят структуру проекта, границы символов и языковую семантику; агент сжигает токены на «шумном» выводе и follow‑up вызовах. Команда собрала «prebundled skill» — связку из system prompt и единого MCP‑инструмента с четырьмя режимами (file, text, regex, symbol) и универсальным роутером поверх IDE‑индексов. Перед выбором финальной конфигурации тестировалось четыре варианта; выбрана лучшая по latency/cost. Результат проверен на разных моделях (включая GPT 5.4) и языках (Java, Kotlin); Kotlin показал максимальное снижение стоимости.
В посте показаны два «сравнительных трейса». Первая задача — обновление service/controller слоёв для комментариев и ответов: baseline 472 с (десятки шагов с list/search/jar inspect/javap/curl), with tooling 127 с (read SKILL.md → пара search → точечные read+edit). Вторая — Jackson key deserializer: 150 с → 34 с с точечным поиском.
Пример
# Псевдо-описание prebundled MCP search skill
mcp_tool: ide_search
modes:
file: "search-file '<glob>'"
text: "search-text '<query>' --file-mask '<glob>'"
regex: "search-regex '<pattern>' --file-mask '<glob>'"
symbol: "find-symbol '<name>' --kind <class|fun|var>"
skill_prompt: |
Use ide_search instead of shell grep/find. Prefer:
- 'symbol' for finding declarations / refs
- 'text' for plain phrase lookup
- 'regex' only if 'text' is insufficient
Значимость
Это редкий случай, когда AI‑тулинг публикует не маркетинговые «SOTA» цифры, а воспроизводимое eval‑исследование с пэрд‑дельтами и порогом значимости (p < 0.05). Тренд «agent‑native context» (тот же тезис, что в посте Sber AI о смене парадигмы данных под агентов) получает количественное подтверждение: структурированный контекст IDE экономит время и деньги без потери качества. Снижение P95 latency на 16% и budget overruns на 33% — практически значимая цифра для команд, гоняющих агентов в CI/CD.
🧾 Транскрипт (формат)
We Gave Agents IDE-Native Search Tools. They Got Faster and Cheaper. Source: https://blog.jetbrains.com/ai/2026/05/what-happens-when-you-give-agents-ide-native-seach-tools/
We ran the same coding tasks with and without prebundled tooling, across multiple models and languages. Here’s what changed.
Eval-driven development
IDE-native search reduced latency, cost, and budget overruns. The comparison below uses paired task-level deltas. Aggregate medians and totals are shown for orientation. Budget overruns are tasks that exceeded the USD 0.50 per-task cap.
8.33% Median latency reduced 83.11s → 79.03s 16.44% P95 latency reduced 268.71s → 213.17s 5.60% Total cost reduced USD 44.17 → USD 41.67 33.28% Budget overruns reduced 6.67% → 4.44%
Why We Built This When coding agents search code, they default to shell tools. grep and find work, but they’re blind to project structure, symbol boundaries, and language semantics. The agent burns tokens sifting through noisy output and making follow-up calls to narrow things down.
So we tried something obvious: what if the agent could use the IDE’s own search instead?
We built a prebundled skill that pairs a search prompt with a unified MCP tool. One tool, four modes: file search, text search, regex, and symbol lookup. A universal router dispatches calls to the right backend.
MCP Tools Functions the agent calls via an MCP server during task execution. IDE-native tools can tap into indices, ASTs, and project models that shell tools cannot see.
Skills Packaged agent behaviors: a prompt plus orchestration logic. A skill can work on its own, use tools, or ship bundled with the tools it needs.
Nothing ships by default until the eval says it should. We tested four different configurations of this tooling before picking one.
Methodology The eval pipeline spins up an MCP server alongside the IDE so the agent has access to the configured tools and skills. We run identical coding tasks with and without tooling, then compare with paired delta analysis.
We track four things: quality, latency, cost, and budget discipline. Quality asks whether all tests passed. Latency tracks median and P95 task time. Cost converts token consumption into dollars. Budget discipline tracks how often a single task exceeds the USD 0.50 budget cap.
We report improvement deltas only when they pass our significance threshold: p < 0.05, paired test with 95% confidence intervals. Metrics without a significant change are either omitted from the charts or called out explicitly. We tried four configuration variants, selected the one with the best latency and cost tradeoff, then re-ran it on different models and languages to check that the results held.
Eval frame
Same tasks, same grading, one controlled difference. Quality All-tests-passed rate, checked before performance claims. Latency Median and P95 task duration, compared with paired deltas. Cost Token use converted to dollars across the task set. Budget discipline Share of tasks exceeding the USD 0.50 single-task cap.
Results The selected configuration was a prebundled search skill plus a unified IDE-native tool and universal router. Compared with the no-tooling baseline, it reduced latency and cost without producing a statistically significant quality change.
Baseline vs. tooling
Absolute metrics moved in the right direction. Median latency Baseline83.11s
With tooling79.03s
P95 latency Baseline268.71s
With tooling213.17s
Total cost BaselineUSD 44.17
With toolingUSD 41.67
Budget overruns Baseline6.67%
With tooling4.44%
Budget overruns 33.28% P95 latency 16.44% Median latency 8.33% Total cost 5.60%
No statistically significant change in quality. All shown deltas passed the significance threshold.
Trace snapshots
The difference is visible in the agent’s path through the project. These are shortened traces from cases that improved in both time and cost. The baseline spends more steps discovering context; the prebundled setup gets to the relevant files faster.
Service comments and replies prompt Update service and controller layers for comments and replies. before: no prebundled IDE search agent> list files -> search x2 -> list files x2 agent> jar inspect x5 -> javap -> jar inspect -> javap x5 agent> curl download -> decompile -> search -> find files x2 agent> read 9 files -> edit file x8 -> respond time: 472s after: prebundled skill and unified search agent> read SKILL.md -> search x3 -> read 5 files agent> read FeatureController.java -> read 4 files agent> edit file x2 -> respond time: 127s
Jackson key deserializer prompt Preserve detailed error messages from a custom key deserializer. before: broad code walk agent> list files -> search x2 -> read README.md agent> search x5 -> read DeserializationContext.java agent> search x4 -> read StdDeserializer.java agent> search -> read DeserializerCache.java agent> read MapEntryDeserializer.java -> read JsonMappingException.java agent> edit file -> respond time: 150s after: targeted search agent> read SKILL.md -> search x3 agent> read MapDeserializer.java agent> read StdKeyDeserializer.java agent> read DeserializationContext.java agent> edit file -> respond time: 34s
Configuration Explorer We tested four tool configurations before choosing the final shape. Lower latency and lower total cost are better, so the lower-left corner of the plot is the target.
Configuration search
The selected option had the best latency while preserving cost reduction. Median latency, 78s to 84s Total cost, USD 39.50 to USD 45.00 Baseline 4 Search Tools Unified Search Tool 4 Tools + Router Unified Tool + Router
Cross-Model Validation We re-ran the experiment with GPT 5.4 on Java and Kotlin codebases. The pattern holds: latency and cost both drop. Kotlin saw the biggest cost improvement, with total cost falling 13.48%.
Cross-model check
The effect held beyond the original run. Codex 5.2 Median latency8.33%
Total cost5.60%
P95 latency16.44%
GPT 5.4, Java Median latency3.75%
Total cost4.07%
P95 latency13.00%
GPT 5.4, Kotlin Median latency6.92%
Total cost13.48%
P95 latencynot significant
Missing bars mean that metric was not statistically significant for that model and language.
How Models Adopt Tooling Codex sends 91% of its search calls through the new IDE-native tool. Claude is a different story: Opus uses it for about half its searches, and Haiku only 28%, preferring grep and find instead.
This makes sense. Claude already has strong built-in code search, so it leans on what it knows. Codex doesn’t, so it grabs the better tool when one is available. The takeaway: prebundled tooling fills gaps. Where the model already has good search, it adds less. Where search is weak, it makes a real difference.
Tool adoption
Models do not use new tools at the same rate. Codex 91 8 1
Claude Opus 53 28 19
Claude Haiku 28 33 39
IDE Search grep find
What’s Next The eval pipeline works. Now we’re using it.
We’re running the same experiment on smaller models next. Our hunch is that they’ll benefit even more, since they have less built-in search capability to fall back on.
The current results are strongest on Java and Kotlin. We’re expanding to Python, .NET, and TypeScript with bigger sample sizes.
Meanwhile, the winning configuration is being prepared for the integrated IntelliJ IDEA MCP Server, so agent sessions can use IDE-native tooling when the server is enabled.
The next step is to turn this feature on by default in upcoming AI Assistant plugin updates.
Want to try it before the default rollout? Set these registry keys to true: llm.chat.agent.codex.mcp.idea, llm.chat.agent.skills.settings.enabled, and llm.agents.contrib.bundled.skills.sync.enabled. In AI Assistant, choose Codex for the best results. Ask the agent to find something across the current project. Measure first, ship second, keep measuring after. That’s the whole approach.