
Microsoft's research team has identified a method to break the safety guardrails of 15 language models using a single prompt. This finding raises concerns about the robustness of AI safety measures.




A new concept in AI interpretability, termed introspective interpretability, aims to enhance how AI models explain their internal processes and decisions, potentially improving user understanding and trust.

Fundamental, a San Francisco startup, has launched Nexus, a large tabular model designed to analyze structured data, securing $255 million in funding to support its development.









A landmark trial has commenced in Los Angeles, where Meta and Google face accusations of deliberately addicting children to their platforms, potentially reshaping accountability in the tech industry.
A recent study replicates Gao et al's findings on weight-sparse transformers, revealing that while these models yield interpretable circuits, they may not faithfully represent the model's true computations.