How do I choose a software development company when proposals all look the same?

Stop reading proposals as documents. AI tooling collapsed the cost of producing fluent proposals to near zero, so the document itself stopped containing the signal. Filter candidates with questions whose answers AI cannot fake: ask for a specific failure mode in your system, what part of the brief they would reject, a public production URL of similar work, a specific architectural decision plus what would change their mind, willingness to take a paid scoping engagement, the name of who actually writes the code, and their biggest concern about the project. Then offer the top two or three a paid two-week scoping engagement at $1,700 to $2,600. Whoever produces the strongest plan wins the build.

Why do software development quotes range from $4,400 to $44,000 for the same project?

AI tooling made it cheap to write a fluent proposal. The $4,400 bid is usually a subcontracted team or someone who plans to deliver something close enough and call it done. The $44,000 bid is senior engineers who read your brief and priced it like real engineering. The proposals look similar on paper because the documents are the cheap artifact now. A paid Discovery Sprint at $2,600 is how you find out which bid is real before committing to the build.

What is a paid scoping engagement and why does it work?

A paid scoping engagement is a small fixed-price commitment (typically $1,700 to $2,600) where the candidate produces a written plan, architecture decision, milestone breakdown, and binding fixed-price quote for the build. The fee is credited toward the build if you hire them. It works as a filter because candidates who refuse paid scoping are signaling something: they cannot do the work without writing speculative code, or they run an AI workflow that collapses when asked for binding commitments, or they are subcontracted bidders with no authority over engineering decisions. Three paid scopings cost less than one failed engagement.

What are red flags in a software development proposal?

Six red flags signal an AI-generated or unserious proposal: it does not name a single specific element of your system or business; the technical sections describe approaches in textbook language without committing to one ('we could use either X or Y depending on requirements'); the proposal lists all common best practices in the relevant area as differentiators; the pricing is broken into round-number percentages ('design 20%, development 60%, QA 20%') rather than tied to specific milestones; the candidate's stated past work cannot be verified through a public URL or open-source contributions; or the proposal is unusually long for the scope. Two or more in the same proposal usually means the document was written without substantive engagement with your project.

How long should I take to evaluate software development proposals?

Two to three weeks total, structured as: (1) discard proposals that do not name a specific element of your system; (2) schedule 30-minute calls with the top three remaining candidates and ask three filter questions; (3) offer the top two finalists a paid scoping engagement at $1,700 to $2,600 each; (4) compare the actual scoping plans and pick. Total cost: about the price of two scoping engagements. You will know more about the finalists than you would have learned from reading thirty proposals.

How to Choose a Software Development Company in 2026: 7 Questions Whose Answers AI Cannot Fake

A founder running a real-estate-adjacent platform, evaluating Upwork bids: "All these guys overseas. What's going to separate one from another?" A co-founder of a healthcare EHR startup, after his AI search query returned five agencies he had never heard of: "Three of them quoted similar numbers. The fourth was a third the price. The fifth was double." An owner of a $5 million service business, comparing our $170,000 build quote to his anchor of $80,000: "That's a lot more than I was anticipating."

These are not three different problems. They are the same problem viewed from three angles. The proposal in your inbox stopped being a useful filter. The price spread between your candidates is now real and large. Your traditional evaluation tools were calibrated for a market that no longer exists.

Here is what changed, why the spread got bigger, and seven questions you can ask candidates that produce answers AI tooling cannot fake.

Why the spread got bigger

In 2023 a software development proposal was a meaningful filter. Writing a coherent technical proposal took a real engineer roughly four to eight hours of substantive work. The act of producing the document was itself proof of capacity. Bad candidates self-selected out because they could not write the document at all.

In 2026 that filter is broken. AI tooling collapsed the cost of producing a fluent, well-structured proposal to near zero. A real engineer who spent four hours reading your brief and a contractor running ChatGPT for free now produce documents that read identically. The artifact stopped containing the signal.

What you are seeing in the spread between your candidates is the result. The bottom of the range is bait pricing from sellers who know the proposal will look the same as the more expensive options on paper, and who plan to either subcontract the work, run a junior team unsupervised, or simply produce something close enough and call it done. The top is somebody who actually read your brief, will assign senior engineers to the work, and priced it like real engineering. The middle is everything in between. The proposals in your inbox cannot tell you which is which.

The traditional evaluation tools (read all five proposals, compare technical depth, check references) do not produce a clear winner anymore. Technical depth reads similar across the pile. References are a narrow signal in a market with low repeat rates. Buyers default either to the lowest price or to vibes-based picking, and the failure rates speak for themselves.

We have one buyer in our recent pipeline whose decision sequence we can reconstruct from website analytics. He arrived through an AI search recommendation. He spent five minutes and forty-three seconds on our website before booking a call. His path: pricing first. Team page next. LinkedIn profiles of named team members. Case studies. Pricing again. Booking. He moved from "is this real" to "are these real people" to "have they shipped this before" to "I trust the price enough to talk." That is not a man reading proposals. That is a man looking for evidence that proposals cannot manufacture.

This is the lens you need. Stop reading proposals as documents. Look for the evidence that AI cannot produce.

The filter that still works

You cannot evaluate proposals as documents anymore. You can evaluate the people who produce them, but only with questions whose answers AI tooling cannot fake.

The pattern across the seven questions below is the same: ask for specificity that requires the candidate to have already done the work, not predicted what the work might look like. AI is fluent at predicting plausible answers. AI is poor at producing answers that match a real engineer's actual judgment on a specific problem.

Question 1: "Walk me through a specific failure mode in this system that worries you."

A good candidate names a specific failure mode in your system, not in software in general. We had a recent inbound conversation with a non-technical owner of a price-comparison site who shared a five-layer audit document of his own platform. The audit had found that one of his thirteen retailers was silently returning the wrong product across roughly one third of his verified catalog. He asked us about it. The right answer was specific: contain that retailer first by disabling its parser, build a parallel parser, run both for a full scrape pass, retire the broken path. The wrong answer was "we have rigorous testing practices" or "we will build robust error handling."

A bad candidate produces generic language about "robustness" and "edge cases." Generic answers come from someone who has not engaged with your specific system. The point of this question is not the answer itself. The point is that a real engineer who read your brief will have already located one or two real failure modes and can name them on the spot. AI-generated answers tend to abstract away from your specific situation and reach for general best practices, which read as plausible but contain no actual diagnosis of your project.

Question 2: "What part of this brief did you reject and why?"

A good candidate disagrees with at least one thing you wrote.

A buyer we worked with recently was evaluating five vendors for a healthcare EHR rebuild. His existing codebase was working but limited. Three of the five vendors recommended a full rewrite. He told us in his post-shortlist email: "Considerable time and effort went into this, so a factory reset would be a tough pill to swallow." The vendors who pushed the rewrite were not engineering. They were performing certainty. We did the opposite. We told him the right starting point was a paid code audit, and that audit might say keep most of it. He told us that openness to existing code was the differentiator that put us on the shortlist.

A bad candidate accepts the entire brief uncritically. That is either a candidate who will deliver exactly what the brief says (often wrong) or a candidate who has not read it carefully. This question is also a strong test of seniority. Junior engineers and AI-augmented contractors tend to take the brief as given. Senior engineers push back on details that do not make sense. If nobody is pushing back on your brief, your brief is probably wrong somewhere and nobody is going to tell you until you are paying $44,000 to find out.

Question 3: "Show me a public production URL of work in this shape."

A good candidate hands you a URL, not a portfolio screenshot. The URL is to live software you can use, not to a marketing case study. They will tell you specifically what part of that system is structurally similar to yours.

The buyer who arrived from AI search and spent five minutes and forty-three seconds on our site did not read our blog. He went to the team page and clicked the LinkedIn profile of every named engineer. He was checking that the people existed. He cared about that more than the technical content of any proposal. His exact words on the call: "I appreciate you not handing me to some junior person." A public URL plus a verifiable engineer is the modern version of a portfolio. AI cannot manufacture a working URL with five years of accumulated user behavior. It cannot manufacture a LinkedIn profile with a real career history. Real engineers can.

A bad candidate offers vague references, NDA excuses for why they cannot show work, or screenshots that may or may not represent something they personally built. NDAs are real but they are also a common cover. If a candidate cannot produce a single public URL across their entire portfolio, treat it as a signal. Most experienced engineers have at least one open-source contribution, one published demo, or one client willing to reference them on a call.

Question 4: "What is your decision about [a specific architectural choice in the brief] and what would change your mind?"

A good candidate names a specific decision they would make and the strongest counter-argument against their own decision.

The price-comparison-site owner asked us about scrape progress storage. The right answer was specific: progress lives in a separate table from the catalog itself, two tables actually, one for each scrape run and one for each item per run, because you want history for the audit and you want to be able to resume mid-run without mutating catalog rows. The strongest counter-argument is that this requires an extra join on every page load. We named both. We told him at what data scale that counter-argument would push us to a different design (sub-50ms reads at 100K rows). That tells the buyer how we think, not just what we conclude.

A bad candidate offers menu-style "we could do A or B or C" without committing. Real engineers commit. AI tooling tends to enumerate options because committing to one path requires judgment that the model has no real basis for. The "what would change your mind" half of the question is the harder filter. Ask it explicitly. A candidate who can name the specific evidence that would push them to a different decision is showing you the shape of their thinking. A candidate who insists their decision is correct under all conditions is selling, not engineering.

Question 5: "Will you do a paid two-week scoping engagement before the build?"

This is the strongest filter in the list. Paid scoping is a small fixed-price engagement (typically in the range of $1,700 to $2,600) with a defined deliverable: written plan, architecture decision, milestone breakdown, and a binding fixed-price quote for the build. The fee is credited toward the build if you hire them.

The owner of the $5M service business we mentioned in the opening came to us with a Claude-generated mockup of his ideal platform and an $80,000 anchor in his head. We told him the build, including a client-facing portal his largest restaurant-chain customer needed, would land closer to $170,000. The number was a shock. The right way to manage that gap was not negotiation. It was a paid Discovery Sprint at $2,600 that would scope the actual work, identify the parts that could be deferred to Phase 2, and produce a binding fixed-price quote for what we agreed to build. Buyers who say yes to this are buyers who are serious. Buyers who refuse it are buyers who are still hoping that one of the $4,400 bidders is real.

A good candidate accepts and tells you exactly what the deliverable will look like. A bad candidate refuses, redirects, or tries to negotiate the fee away. They are telling you something. Either they cannot do the work without writing speculative code, or they are running an AI workflow that collapses when asked for binding commitments, or they are a subcontracted bidder with no authority over the engineering call. None of those should be working on your project.

The math also favors paid scoping. Paying three candidates the standard scoping fee costs less than the cost of one failed engagement. The failed engagement happens when you pick the lowest bid because the proposals all looked the same. The paid filter is cheaper than the free filter once you account for the failure rate of the free filter.

Question 6: "Who specifically writes the code, and can I have a 15-minute call with them?"

A good candidate names a specific engineer, sets up the call, and shows up to it.

The healthcare EHR co-founder explicitly asked us a version of this in his shortlist email: "Will the full team work on our applications, or will it be a smaller subset?" He was benchmarking us against vendors who had quoted "two to four people." He did not want a number. He wanted to know that the people he met on the discovery call would be the same people writing the code. Both of our co-founders were on his discovery call. He told us afterward, by name: "I appreciate you not handing me to some junior person."

A bad candidate evades. The most common evasion is "we have a senior team" without naming who. The next most common is agreeing to a call and then sending someone different. If the person on the sales call is not the person on the engineering call, your project will be written by people you have not evaluated. This is especially important for solo accountability. If you need one point of contact for technical decisions, you need to know who that person is on Day 1, not Day 30.

Larger agencies will tell you they cannot guarantee specific assignments because of "team scheduling." That is true at agencies that rotate engineers across projects to optimize utilization. It is also a strong signal that the engineer who wrote the proposal is not the engineer who will write your code.

Question 7: "What concerns you most about this project after reading the brief?"

A good candidate names a specific technical or operational concern and explains why.

The price-comparison site owner had a comprehensive audit of his system. His matching logic was duplicated across four different files in his codebase. We told him that consolidating those four into one shared module was the riskiest part of his Phase 2 work, because the matching logic touches the live revenue path and if it goes wrong it goes wrong silently across the catalog. We asked for a feature flag, a parallel run of both code paths for one full scrape pass before retiring the old code, and an explicit rollback procedure. We named that as our largest technical concern. We also told him we would not negotiate it.

A bad candidate either has no concerns ("we are confident we can deliver") or names generic concerns ("scope might creep"). Confidence without concerns is either inexperience or sales theater. Real engineers worry about specific things on specific projects, and they can name those things in the first conversation.

This question also surfaces whether a candidate has actually read the brief. Generic concerns ("communication", "scope", "timeline") apply to every project. Specific concerns (a named matcher consolidation on a live revenue path, a named broken-retailer pollution issue across one third of the catalog, a named sunk-cost risk in the existing codebase) apply to your project.

The paid scoping path

Question 5 above is the structural change worth implementing if you are evaluating multiple candidates right now. The mechanics:

Shortlist three candidates from your inbound pile based on first impressions, references, and any public production work they showed you.
Offer each of them a paid scoping engagement at the same fixed price (commonly in the range of $1,700 to $2,600) and the same deliverable. Two weeks of work. Written plan, architecture, milestones, binding quote.
Compare the three plans on substance. The winning plan is the one where the engineering judgment is most aligned with your business reality. Whichever candidate produces that plan wins the build, with the scoping fee credited.
The two candidates who did not win still produced a real artifact. You paid for it. You can use the documents internally to inform the build, or simply walk away from the relationship without burning months of bad faith on either side.

This structure converts the discovery work that traditionally bleeds both sides into billable, comparable artifacts. It filters out the candidates who would have ghosted or under-delivered before they get anywhere near your codebase.

The objection most buyers raise is that paid scoping feels like paying for what should be free. The honest reply is that you are already paying for it. You are just paying with your time, attention, and the cost of picking wrong. Three paid scopings cost less than the time you would have spent comparing thirty AI-generated proposals. They cost a small fraction of one failed $44,000 engagement. The paid filter is cheaper than the free filter once you account for failure rates.

If you want to see how this works on our side, our paid Discovery Sprint is a $2,600 two-week engagement that produces exactly the deliverable described above. The fee is credited toward the build.

Red flags in AI-generated proposals

If you read a proposal and any of the following are true, treat it as a signal:

The proposal does not name a single specific element of your system or business. It could have been written for any project of similar size.
The technical sections describe approaches in textbook language without committing to one ("we could use either X or Y depending on requirements").
The proposal includes a list of all common best practices in the relevant area, presented as differentiators.
The pricing is broken into round-number percentages of the total ("design 20 percent, development 60 percent, QA 20 percent") rather than tied to specific milestones.
The candidate's stated past work cannot be verified through a public URL, an introduction to a former client, or open-source contributions.
The proposal is unusually long for the scope. Real proposals tend to be tight because the engineer's time is valuable. AI-generated proposals tend toward thoroughness because there is no marginal cost to adding words.

None of these alone is conclusive. Two or more in the same proposal usually means the document was written without substantive engagement with your specific project.

What to do this week if you have proposals on your desk

If you are reading five proposals right now and feel paralyzed, do this:

Stop reading them as documents. Treat them as filters for who is worth talking to.
Pick the three candidates whose proposals showed at least one specific element of your system. Discard the rest.
Schedule a 30-minute call with each of the three. On the call, ask Questions 1, 4, and 7 from the list above.
Pick two finalists. Offer each a paid scoping engagement in the range of $1,700 to $2,600. Whoever delivers the stronger plan wins.
Total elapsed time: two to three weeks. Total cost: the price of two scoping engagements. You will know more about the two finalists by the end of the scoping engagements than you would have learned from reading thirty proposals.

The proposal-as-filter market is broken. The questions above and the paid scoping structure are how serious buyers are filtering candidates in 2026. The candidates who survive these filters are the ones who can deliver. The candidates who do not survive them were going to fail your project anyway, just three months later and at greater cost.

If you are inside a buying process right now and the spread between candidates feels wrong, it is. The filter that still works is the one that costs the candidates real effort and the buyers real commitment. Free proposals stopped being information. Paid scoping is information. The buyers who figured this out first will pick better vendors. The vendors who figured this out first will sign better clients.

If you are evaluating software development companies for a regulated-industry project (legal tech, healthcare, or compliance-heavy operations), our 30-minute strategy call is the place to start. Book one here. No pitch. Honest read on whether we are a fit.