merged · closed beta · 2026

Technical screening — without interviews.

Instead of Leetcode — one calibrated task in a real repo. The candidate opens a pull request. The system scores it automatically: tests, diff shape, commit quality, responses to review.

Enter portal How it works

for HR managers

~2 minPR scoring time

87%rubric accuracy

0 hrsof senior time

pull request · #42open

@@ src/billing/invoice.ts @@
   const amount = base * qty;
−  const tax = amount * 0.2;
+  const tax = calcTax(amount, country);

+  // edge case: UA VAT exemption
+  if (country === 'UA' && isExempt(plan)) {
+    return amount;
+  }
   return amount + tax;

tests✔CI passed87/87

diff✔Diff focus3 files, +24 −4

llm✔Rubric (LLM judge)4.6 / 5.0

senior · legacy-invoicePASS

Problem

Technical interviews are broken. Everyone knows it, and everyone keeps doing them.

Leetcode measures Leetcode prep. System design — the skill of drawing boxes. Behavioral — the skill of telling STAR stories. None of them show how a person actually works day to day.

And in 2026 even that illusion of signal is gone: Copilot and Cursor close a typical task in 10 minutes. Your seniors run dozens of screening calls a month and dread how much time this stage burns.

Method	What it measures	Signal quality	Cost
Leetcode screen	Leetcode prep	Low	2–4 hrs / candidate
System design interview	Diagram drawing	Medium	1–2 hrs / candidate
Behavioral (STAR)	Storytelling skill	Low	1 hr / candidate
merged PR screen	Real work in a repo	High	~2 min, automatic

* Cost estimate — screening one candidate, including engineer time

How it works

Four steps. Zero hours of engineering time.

01
30 sec
to set up
Recruiter picks a task
From the bank — matched to the candidate's level (Junior / Middle / Senior) and stack. No call, no whiteboard. 30 seconds in the panel.
02
45–120 min
candidate time
Candidate opens a pull request
Gets a private repo with real context. AI is allowed — tasks are designed so it's necessary but not sufficient.
03
~2 min
after submit
System scores it automatically
CI tests, diff focus, commit quality, responses to auto-review. An LLM judge reads the entire PR against a structured rubric.
04
instantly
report ready
Recruiter sees a ranked report
Rubric scores, link to the PR, strengths and weaknesses. After that — just a final interview with the team for culture fit.

Levels

The task is calibrated to the level.

The product's moat is task design. We don't fight AI — we design tasks so that without understanding the system, AI is just a typewriter. Every task is hand-calibrated on live candidates.

Junior

45 min

Add a feature to a clean repo

A small project with its own conventions. You have to read the README, avoid breaking anything else, and write a test. Cursor can handle this — we're filtering out people who can't.

Key signals

Reads instructions30%
Doesn't break existing code40%
Writes a test30%

Expected score2.0–3.5 / 5.0

Middle

90 min

Reproduce a bug and fix it

A larger repo, vaguely phrased task: "users complain that Y behaves oddly in case Z." AI doesn't know what to fix — you have to localize the root cause.

Key signals

Decomposition35%
Choice of fix layer35%
Rationale in PR30%

Expected score3.0–4.5 / 5.0

Senior

120 min

Legacy with architectural debt

Task: "ship this feature so that in six months it can be extended to W without a rewrite." A design doc in the PR is required — AI will write the code, but it won't make the decisions for you.

Key signals

Trade-offs40%
Extensibility35%
Rationale quality25%

Expected score3.5–5.0 / 5.0

NOTE

AI is allowed and expected. A "blind" Claude solution scores 30–40/100: tests fail on edge cases, the PR description is empty, commits are one big blob. The rubric measures understanding of the system, not the fact that code got written.

What we actually measure

"Green CI" is only 20% of the signal.

The other 80% is what an LLM judge does better than a human interviewer when it has a structured rubric: reads the whole PR, the description, the commit history, and the responses to comments. No fatigue, no bias.

45%

Automated CI

55%

LLM judge

Rubric weights100% total

Automated CI

Tests pass
Deterministic, no sampling error
20%
Diff focus and size
Minimal changes, no unrelated edits
15%
Commit quality
Atomic commits, Conventional Commits
10%

LLM judge

Rationale in the PR description
Whether the "why" is explained, not just the "what"
20%
Task decomposition
Train of thought, solution steps
20%
Trade-offs and architecture
Alternatives considered, choice justified
15%

* Weights are configurable at the task template level

Who it's for

The developer is the user. The hiring manager is the buyer.

Each role gets what matters: developers get a fair async evaluation without live coding; managers get their team's time back and a sharper final round.

Developers

live coding on camera

~2 min

from submit to report

You're not coding under stress for 45 minutes — you take the task and work at your own pace.
AI assistants aren't banned — they're expected. Your usual Cursor / Claude / Copilot setup just works.
You see the rubric up front: what's being scored, which signals matter. No vibe check.

Engineering leaders

40+ hrs

returned to the team each month

meeting instead of 4–6 rounds

Your team doesn't burn 40 hours a month on screening calls.
The final meeting is about the person, not technical basics.
You see how the candidate thinks and justifies decisions — not just what they wrote.

Honest objections

Five things you're thinking right now.

What if the candidate just hands the task to Claude?: Let them. Tasks are designed so a "blind" AI solution scores 30–40 out of 100: tests fail on edge cases, the PR description is empty, commits are one big blob, responses to auto-review are generic. We don't measure "wrote code" — we measure "understood the system."
Tasks will leak onto the internet and into model training data.: Every task is parameterized: one template yields dozens of variants with different seeds, names, and emphases in the requirements. A public solution to one specific version won't pass another. Plus — an option for private tasks on your own code on the enterprise plan.
A senior won't do a three-hour take-home.: Agreed. For seniors — a 45-minute paired session: the candidate shares their screen, solves alongside AI, and the system records telemetry (how long they thought, what they searched, how much they rewrote). It fits on a calendar and measures more than a classic interview.
This doesn't replace the final interview with the team, does it?: We're not removing the interview. We're removing the stage where you burn 40 engineering hours filtering out people who simply can't code. Culture fit and "do I want to work with them for five years" stays a live conversation.
How is this better than HackerRank / CodeSignal / Codility?: They measure Leetcode — in 2026. We measure real work: a PR into a real repo with real context, scored by an LLM judge against a rubric. A different product category — work-sample assessment for the AI era.

Blog

Essays on hiring in the AI era.

Screening practice, LLM-judge rubrics, open reports from the closed beta, guides for recruiters and candidates — no marketing, no "book a demo."

All articles

Demo

We'll show it on your stack.

15 minutes. You tell us who you're hiring. We'll show what screening in merged would look like instead of calls. Sample tasks for your stack — in your inbox the same day.

Closed beta, Ukraine market
No prepayment, no contract
Response within 24 hours

Alternative

Don't want to fill out a form? Email us directly: [email protected]

Technical screening — without interviews.

Technical interviews are broken. Everyone knows it, and everyone keeps doing them.

Four steps. Zero hours of engineering time.

Recruiter picks a task

Candidate opens a pull request

System scores it automatically

Recruiter sees a ranked report

The task is calibrated to the level.

Add a feature to a clean repo

Reproduce a bug and fix it

Legacy with architectural debt

"Green CI" is only 20% of the signal.

The developer is the user. The hiring manager is the buyer.

Developers

Engineering leaders

Five things you're thinking right now.

Essays on hiring in the AI era.

Чому індустрія IT HR помре за 3 роки — і що стане на її місце

Як ми генеруємо задачі: 4-агентний Bedrock-пайплайн і новий Verifier

Як працює merged для рекрутера: від заявки до фіналу

We'll show it on your stack.