More

stephantul · 2026-06-17T12:32:11 1781699531

Ha thanks, it was pink a while ago

stephantul · 2026-06-17T11:48:36 1781696916

Extreme programming in a nutshell. I like doing this to features: build it, then take it down and rebuild but better.

jpitz · 2026-06-17T13:58:02 1781704682

"Plan to throw one away. You will anyway."

stephantul · 2026-06-17T09:31:36 1781688696

Thanks! This is very similar indeed. Related: I see a lot of “drive-by” PRs by agents, who obviously have no intent of ever maintaining the code they wrote.

api · 2026-06-17T11:20:36 1781695236

Puppy cannons. Puppy carpet bombing.

ChrisMarshallNY · 2026-06-17T11:39:38 1781696378

A new definition of "puppy mill."

stephantul · 2026-06-17T11:38:26 1781696306

Puppy slush automatically pushed through vents into your codebase

stephantul · 2026-06-17T09:29:43 1781688583

I’m not sure I share your view of PRs. I still see submitting PRs as something that puts pressure on maintainers. Even incorrect PRs take time to verify and review.

I also don’t see how this differs between the “gap” and the “fence” part of the metaphor. Whether someone submits a rewrite/removal (fence) or a new feature (gap) for PR review, it’s still going to cost me attention.

joshka · 2026-06-17T10:25:58 1781691958

It's only pressure if you believe the social contract in a PR is that everything that is written is something you're obligated to read / respond to. If you flip that a bit to a PR being the first step in a way of saying "I tried a thing and it worked, what's necessary to make that an actual thing that other can use", then you will sort of land here.

Previously I wrote https://news.ycombinator.com/item?id=48517931

wnoise · 2026-06-17T12:39:43 1781699983

Saying your quoted words is a request for someone to read and respond; it's still pressure and a burden.

stephantul · 2026-06-11T05:38:33 1781156313

It’s an interesting question: I’d say this is more of a vulnerability creator than the actual vulnerability.

Similar to how using very difficult technologies makes you more likely to create code with vulnerabilities: the technologies are not the vulnerability, but it’s easier to cause them.

stephantul · 2026-06-09T17:26:10 1781025970

This paper oversells on the title. Like, what is chronos, which embedding model was used, which reranker, how was the reranking done, why is chronos much better than claude code

stephantul · 2026-05-31T15:46:48 1780242408

Because it destroys the economics of scraping. It’s too expensive with proof of work, or at least not as economically viable

gruez · 2026-05-31T15:52:49 1780242769

Depends on what type of scraping you're trying to stop. For the dumb scrapers that would try to scrape every page on a git forge (for which there are a bazillion pages for a modest project, because of how the site works), yeah it might deter them enough to stop. For anything high value (eg. reddit comments or retail prices), 10s of cpu time isn't going to stop them.

pmontra · 2026-05-31T15:59:08 1780243148

It will not scare away bots but 10 seconds of wait (CPU or only a sleep) will turn away many real users. "This site is so slow, I'll use something else." A kind of reverse captcha.

Hnrobert42 · 2026-05-31T16:33:14 1780245194

Maybe, the proof of work can run in the background.

btown · 2026-05-31T17:28:12 1780248492

Or it can run as part of a checkout wizard's "verifying your browser and processing your payment, don't close your tab" step.

mattstir · 2026-06-01T03:29:42 1780284582

At which point it exists solely to punish real human users? What scraper bot is going through checkout?

BenjiWiebe · 2026-06-01T17:37:21 1780335441

The credit card tester bots go through the checkout process.

PoW wouldn't be a big issue for them though since their volume is much lower.

stephantul · 2026-05-31T17:03:12 1780246992

Sure, the whole premise is exactly that proof of work reduces the value of scraping, while having negligible impact on users. If the data is so valuable that bot operators are willing to pay 10s of cpu, then other measures are necessary.

Nevertheless even for these high value cases, you can still argue that it disincentivizes the business model, it becomes less efficient.

thayne · 2026-05-31T17:44:49 1780249489

If it's high value, there isn't really much you can do that will be completely effective. Traditional captchas can often be beaten by AI, or by "captcha farms" where impoverished people are paid pennies to complete captchas. Fingerprinting can be beaten by using a full browser to make the requests. Basically anything you do is just a matter of making it more expensive for bots to access it.

arbol · 2026-05-31T19:46:39 1780256799

Beating fingerprinting and beating traditional captcha is far more expensive than solving pow. Pow doesn't stop anyone, not even the most novice bot operators

tobyhinloopen · 2026-06-01T05:25:43 1780291543

You can just download all of Reddit from torrent sites

ranger_danger · 2026-05-31T23:43:17 1780270997

5W load for 2 seconds is 0.002Wh, I think we'll be fine

arbol · 2026-05-31T19:44:47 1780256687

Except it doesn't

stephantul · 2026-05-29T20:06:58 1780085218

I was also at the event and was pretty disappointed. Most of the talks were pretty low on information. I was at the “build” stage, which supposedly was the technical stage, but the talks there didn’t really go into technical specifics.

The papyrus talk was awesome though.

stephantul · 2026-05-18T04:35:50 1779078950

It's not probabilistic, and exact matches will always be preferred over non-exact. So if you search for a function name this will surface it.

stephantul · 2026-05-18T04:19:25 1779077965

This is a bit rude.

We didn't generate this project, we wrote it, a lot of it manually, and trained custom models. We'd been working in the real-time retrieval space for a while, and we thought coding was a good fit for this specific technology.

esperent · 2026-05-18T05:00:37 1779080437

My comment above wasn't meant to be rude. And you do have extensive benchmarks against grep etc so it's clear you understand the importance of that.

But I still think you're missing the harder but more important proof which is agent evals. Have you done any of that?

I would personally love to find tools in this space which can make agents more efficient and I do believe there's a scope for massive improvements compared to default workflows. But my evals with RTK and Headroom have made me wary that a tool can look like it should work, conceptually make sense, pass non-agentic benchmarks, and still make an actual agentic workflow worse.

stephantul · 2026-05-18T06:11:48 1779084708

It was directed at the parent who implied that we didn’t think about this.

I agree with your point about the evals and how you can get discontinuities: good search can be worse than bad search when agents can do many searches. We’re working on it

esperent · 2026-05-18T12:18:32 1779106712

When you share them, please also share the setup for people to easily rerun them. Nearly every eval I've seen shares the llm session transcript but not the actual harness setup etc. that they used.