标签： Anthropic Claude natural language autoencoders

Anthropic发布自然语言自编码器研究

Anthropic公布新研究，让Claude模型将其内部激活（数字思考）翻译成人类可读文本，推动AI可解释性进展。 [[10]](https://x.com/AnthropicAI/status/2052435436157452769)

继续阅读