AI Just SHOCKED Everyone: It’s Officially Self-Aware!?

Anthropic just showed that Claude can notice its own internal “thoughts.” Using concept injection, researchers perturbed Claude’s activations with vectors like “all caps” or “ocean,” and Claude 4/4.1 correctly flagged and named the injected concept at a sweet-spot strength—around 20% success with zero false positives on production models, peaking roughly two-thirds through the network’s layers. They also proved prefilling tricks can be made to look “intentional” by retroactively injecting evidence into prior activations—implying the model consults its previously computed intentions, not just surface text. This is early, unreliable, but real machine introspection.

Credit to : AI Revolution

Please support our Sponsors here :