

The system prompt discovered in the leak explicitly warns the model: “You are operating UNDERCOVER… Your commit messages… MUST NOT contain ANY Anthropic-internal information. Do not blow your cover.”
This is so incredibly stupid.
You’ve tried security.
You’ve tried security through obscurity.
Now try security through giving instructions to an LLM via a system prompt to not blow its cover.








To me what’s wild about it is that it’s completely filled with houses, and the houses seem to all respect the orientation of the nearest street.
You’d think that they’d say “Ok, well in this section we have these two roads coming at a narrow angle, let’s just make this a park”, or something to make the places where the two grids join a little less ugly.