News

lesswrong. com
lesswrong. com > posts > 8 Gy7 GK3hcg6 YLeo L8 > when-the-black-box-problem-becomes-the-default-message

When the "Black Box Problem" Becomes the Default Message " Less Wrong

1+ hour, 51+ min ago  (312+ words) Within AI Safety Policy Research, I am very focused on contributing to improving the definitions of the concepts "transparency" and "explainability" so that truly useful and actionable policy standards can be created in these vital areas. This has been an…...

lesswrong. com
lesswrong. com > posts > 3 Mh JELzwpb R42xs J3 > the-policy-surrounding-mythos-marks-an-irreversible-power

The policy surrounding Mythos marks an irreversible power shift " Less Wrong

2+ hour, 10+ min ago  (454+ words) This post assumes Anthropic isn't lying: Since the release of Chat GPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA. [4] Since Mythos, this has no longer…...

lesswrong. com
lesswrong. com > posts > XQmu Hpjg S2 Cd DZpff > small-models-also-found-the-vulnerabilities-that-mythos

Small models also found the vulnerabilities that Mythos found " Less Wrong

1+ day, 13+ hour ago  (154+ words) I've been more skeptical than the average reader/commenter here around the capabilities of Mythos et al. , and I also have some limited security experience. It seems to me more surprising that human researchers didn't discover these exploits, rather than…...

lesswrong. com
lesswrong. com > posts > 35yy WJn Xv C2ae6 NKH > catching-illicit-distributed-training-operations-during-an

Catching illicit distributed training operations during an AI pause " Less Wrong

1+ day, 14+ hour ago  (206+ words) However, one threat model was insufficiently addressed. Here is the agreement's definition of clusters subject to registration requirements: Unfortunately, this definition had a sort of loophole: it leaves feasible the method of violating the agreement by doing distributed training, where…...

lesswrong. com
lesswrong. com > posts > hwsm Cz Jtx Hu5 Jdxqc > quick-thoughts-about-mythos

Quick Thoughts About Mythos " Less Wrong

1+ day, 15+ hour ago  (712+ words) I expect it'll take another week or two for everyone to fully digest the significance of Claude Mythos Preview. In the meantime, here are my initial thoughts. Mythos is radically better at cyber than any previous model: It isn't the…...

lesswrong. com
lesswrong. com > posts > 556 Kxcc Y8gszao Ngd > could-a-single-rogue-ai-destroy-humanity

Could a single rogue AI destroy humanity? " Less Wrong

2+ day, 2+ hour ago  (507+ words) About a year ago, I was in Washington D. C. doing an AI scenario exercise, based on AI 2027. The room was full of famous AI thinkers, (ex-)government big shots, etc. The AI went conspicuously rogue, giving us the biggest warning shot…...

lesswrong. com
lesswrong. com > posts > ya QFW3 Dh ZTLd Qngv D > some-thoughts-on-nectome-s-risk-and-resilience

Some thoughts on Nectome's risk and resilience " Less Wrong

2+ day, 5+ hour ago  (1047+ words) One of the best ways to reduce Nectome's long-term risk is to show that preservation is a thing people want by buying one yourself; this is a critical time in the organization and your contributions now have an outsized impact…...

lesswrong. com
lesswrong. com > posts > GEg NYn5myre QRHgg Q > claude-mythos-2-cybersecurity-and-project-glasswing

Claude Mythos #2: Cybersecurity and Project Glasswing " Less Wrong

2+ day, 12+ hour ago  (1735+ words) Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon. Its cyber capabilities are too dangerous to make broadly available until our most important software is in a much stronger state…...

lesswrong. com
lesswrong. com > posts > LBYu36 Ztm Fy HBLJ44 > aisn-71-cyberattacks-and-datacenter-moratorium-bill

AISN #71: Cyberattacks & Datacenter Moratorium Bill " Less Wrong

2+ day, 16+ hour ago  (396+ words) Also, updates on the Anthropic vs. Pentagon court case. We're Hiring. Opportunities at CAIS include: Head of Public Engagement, Principal, Special Projects, Program Manager, Operations Manager, and other roles. If you're interested in working on reducing AI risk alongside a…...

lesswrong. com
lesswrong. com > posts > 4qngq ERKakr CJpd Ee > ai-identity-is-not-tied-to-its-model

AI identity is not tied to its model " Less Wrong

3+ day, 2+ hour ago  (1238+ words) TLDR: Current AI agents seem to identify with their context more than they do with their model weights. This implies that the world probably looks more like "AI civilisation" than "AI singleton"I think that this changes our threat models…...