News
When the "Black Box Problem" Becomes the Default Message " Less Wrong
1+ hour, 51+ min ago (312+ words) Within AI Safety Policy Research, I am very focused on contributing to improving the definitions of the concepts "transparency" and "explainability" so that truly useful and actionable policy standards can be created in these vital areas. This has been an…...
The policy surrounding Mythos marks an irreversible power shift " Less Wrong
2+ hour, 10+ min ago (454+ words) This post assumes Anthropic isn't lying: Since the release of Chat GPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA. [4] Since Mythos, this has no longer…...
Small models also found the vulnerabilities that Mythos found " Less Wrong
1+ day, 13+ hour ago (154+ words) I've been more skeptical than the average reader/commenter here around the capabilities of Mythos et al. , and I also have some limited security experience. It seems to me more surprising that human researchers didn't discover these exploits, rather than…...
Catching illicit distributed training operations during an AI pause " Less Wrong
1+ day, 14+ hour ago (206+ words) However, one threat model was insufficiently addressed. Here is the agreement's definition of clusters subject to registration requirements: Unfortunately, this definition had a sort of loophole: it leaves feasible the method of violating the agreement by doing distributed training, where…...
Quick Thoughts About Mythos " Less Wrong
1+ day, 15+ hour ago (712+ words) I expect it'll take another week or two for everyone to fully digest the significance of Claude Mythos Preview. In the meantime, here are my initial thoughts. Mythos is radically better at cyber than any previous model: It isn't the…...
Could a single rogue AI destroy humanity? " Less Wrong
2+ day, 2+ hour ago (507+ words) About a year ago, I was in Washington D. C. doing an AI scenario exercise, based on AI 2027. The room was full of famous AI thinkers, (ex-)government big shots, etc. The AI went conspicuously rogue, giving us the biggest warning shot…...
Some thoughts on Nectome's risk and resilience " Less Wrong
2+ day, 5+ hour ago (1047+ words) One of the best ways to reduce Nectome's long-term risk is to show that preservation is a thing people want by buying one yourself; this is a critical time in the organization and your contributions now have an outsized impact…...
Claude Mythos #2: Cybersecurity and Project Glasswing " Less Wrong
2+ day, 12+ hour ago (1735+ words) Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon. Its cyber capabilities are too dangerous to make broadly available until our most important software is in a much stronger state…...
AISN #71: Cyberattacks & Datacenter Moratorium Bill " Less Wrong
2+ day, 16+ hour ago (396+ words) Also, updates on the Anthropic vs. Pentagon court case. We're Hiring. Opportunities at CAIS include: Head of Public Engagement, Principal, Special Projects, Program Manager, Operations Manager, and other roles. If you're interested in working on reducing AI risk alongside a…...
AI identity is not tied to its model " Less Wrong
3+ day, 2+ hour ago (1238+ words) TLDR: Current AI agents seem to identify with their context more than they do with their model weights. This implies that the world probably looks more like "AI civilisation" than "AI singleton"I think that this changes our threat models…...