Incredibly fascinating paper out of Truthful AI, University College London, Warsaw University of Technology, University of Toronto, AIS, Independent, UC Berkeley
Musing 133: Emergent Misalignment: Narrow…
Incredibly fascinating paper out of Truthful AI, University College London, Warsaw University of Technology, University of Toronto, AIS, Independent, UC Berkeley