These are public posts tagged with #aisafety. You can interact with them if you have an account anywhere in the fediverse.
"We are releasing a taxonomy of failure modes in AI agents to help security professionals and machine learning engineers think through how AI systems can fail and design them with safety and security in mind.
(...)
While identifying and categorizing the different failure modes, we broke them down across two pillars, safety and security.
- Security failures are those that result in core security impacts, namely a loss of confidentiality, availability, or integrity of the agentic AI system; for example, such a failure allowing a threat actor to alter the intent of the system.
- Safety failure modes are those that affect the responsible implementation of AI, often resulting in harm to the users or society at large; for example, a failure that causes the system to provide differing quality of service to different users without explicit instructions to do so.
We then mapped the failures along two axes—novel and existing.
- Novel failure modes are unique to agentic AI and have not been observed in non-agentic generative AI systems, such as failures that occur in the communication flow between agents within a multiagent system.
- Existing failure modes have been observed in other AI systems, such as bias or hallucinations, but gain in importance in agentic AI systems due to their impact or likelihood.
As well as identifying the failure modes, we have also identified the effects these failures could have on the systems they appear in and the users of them. Additionally we identified key practices and controls that those building agentic AI systems should consider to mitigate the risks posed by these failure modes, including architectural approaches, technical controls, and user design approaches that build upon Microsoft’s experience in securing software as well as generative AI systems."
#AI #GenerativeAI #AIAgents #AgenticAI #AISafety #Microsoft #CyberSecurity #LLMs #Chatbots #Hallucinations
"This report outlines several case studies on how actors have misused our models, as well as the steps we have taken to detect and counter such misuse. By sharing these insights, we hope to protect the safety of our users, prevent abuse or misuse of our services, enforce our Usage Policy and other terms, and share our learnings for the benefit of the wider online ecosystem. The case studies presented in this report, while specific, are representative of broader patterns we're observing across our monitoring systems. These examples were selected because they clearly illustrate emerging trends in how malicious actors are adapting to and leveraging frontier AI models. We hope to contribute to a broader understanding of the evolving threat landscape and help the wider AI ecosystem develop more robust safeguards.
The most novel case of misuse detected was a professional 'influence-as-a-service' operation showcasing a distinct evolution in how certain actors are leveraging LLMs for influence operation campaigns. What is especially novel is that this operation used Claude not just for content generation, but also to decide when social media bot accounts would comment, like, or re-share posts from authentic social media users. As described in the full report, Claude was used as an orchestrator deciding what actions social media bot accounts should take based on politically motivated personas. Read the full report here."
https://www.anthropic.com/news/detecting-and-countering-malicious-uses-of-claude-march-2025
#AI #GenerativeAI #Claude #Anthropic #AISafety #SocialMedia #LLMs #Chatbots #Bots
Detecting and Countering Malicious Uses of Claude
www.anthropic.comStudy: AI-Powered Research Now Prowess Outstrips Human Experts, Raising Bioweapon Risks
#AI #Virology #Biohazard #Biosecurity #DualUse #AISafety #LLMs #VCT #Research #Science #EthicsInAI #AIRegulation #AISafety #AIEthics
"When asked directly about the most pressing digital threats, be it AI misuse or quantum computing, Schneier quipped. "I generally hate ranking threats, but if I had to pick candidates for 'biggest,' it would be one of these: income inequality, late-stage capitalism, or climate change," he wrote. "Compared to those, cybersecurity is a rounding error."
(...)
Asked directly about NSA reforms post-Snowden, Schneier was skeptical, responding: "Well, they haven't had any leaks of any magnitude since then, so hopefully they did learn something about OPSEC. But near as we can tell, nothing substantive has been reformed."
Schneier further clarified, "We should assume that the NSA has developed far more extensive surveillance technology since then," stressing the importance of vigilance.
He touched on the fusion of AI and democracy - a theme of his upcoming book Rewiring Democracy - noting that he didn't "think that AI as a technology will change how different types of government will operate. It's more that different types of governments will shape AI."
He is pessimistic that countries will harness AI's power to do good and help improving quality of life.
"It would be fantastic if governments prioritized these things," he said. "[This] seems unrealistic in a world where countries are imagining some sort of AI 'arms race' and where monopolistic corporations are controlling the technologies. To me, that speaks to the solutions: international cooperation and breaking the tech monopolies. And, yes, those are two things that are not going to happen.""
#CyberSecurity #NSA #Surveillance #AI #AISafety #QuantumComputing #Cryptography #Encryption
OpenAI’s New o3/o4-mini Models Add Invisible Characters to Text, Sparking Watermark Debate
#AI #OpenAI #ChatGPT #GenAI #LLMs #AIModels #o3 #o4mini #AIWatermarking #Unicode #Typography #AIEthics #AISafety #AIContent
"OpenAI has slashed the time and resources it spends on testing the safety of its powerful artificial intelligence models, raising concerns that its technology is being rushed out without sufficient safeguards.
Staff and third-party groups have recently been given just days to conduct “evaluations”, the term given to tests for assessing models’ risks and performance, on OpenAI’s latest large language models, compared to several months previously.
According to eight people familiar with OpenAI’s testing processes, the start-up’s tests have become less thorough, with insufficient time and resources dedicated to identifying and mitigating risks, as the $300bn start-up comes under pressure to release new models quickly and retain its competitive edge."
https://www.ft.com/content/8253b66e-ade7-4d1f-993b-2d0779c7e7d8
Testers have raised concerns that its technology is…
Financial Times#OpenAI #AI #TrainingAI #AISafety
"OpenAI has slashed the time and resources it spends on testing the safety of its powerful artificial intelligence models... Staff and third-party groups have recently been given just days to conduct 'evaluations', the term given to tests for assessing models’ risks and performance, on OpenAI’s latest large language models, compared to several months previously."
https://www.ft.com/content/8253b66e-ade7-4d1f-993b-2d0779c7e7d8
Testers have raised concerns that its technology is…
Financial TimesResearchers concerned to find AI models hiding their true “reasoning” processes - Remember when teachers demanded that you "show your work" in school? Some ... - https://arstechnica.com/ai/2025/04/researchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes/ #largelanguagemodels #simulatedreasoning #machinelearning #aialignment #airesearch #anthropic #aisafety #srmodels #chatgpt #biz #claude #ai
New Anthropic research shows one AI model conceals…
Ars TechnicaToday's latest 'AI scandal', it's churning out porn and child abuse material. Not surprising and very likely happening across numerous models independent of any Bigtech safeguarding. Genai cannot be legislated, it's much more difficult than legislating the Internet and that's fairly impossible already. Do we teach this adequately? Or even at all?
#ai #genai #aisafety #ailegislation #academia
AI Image Generator’s Exposed Database Reveals What People Used It For
https://www.wired.com/story/genomis-ai-image-database-exposed
https://ai-2027.com - excellent blend of reality and fiction. The original intention may have been forecasting, but I read it more as a cautionary tale giving issues related to AI a more concrete form. This includes:
- Technical work on AI alignment
- Job loss
- Contentration of power and the question of who controls powerful AI systems
- Geopolitical tensions
- The consequences of Europe lagging behind
EFA is calling for urgent action on #AI safety after the government has paused its plans on mandatory AI regulatory guardrails.
AI safety and risk guardrails belong in law which benefits everyone by providing certainty to business and protecting the public.
Anthropic Unveils Interpretability Framework To Make Claude’s AI Reasoning More Transparent
#AI #Anthropic #ClaudeAI #AIInterpretability #ResponsibleAI #AITransparency #MachineLearning #AIResearch #AIAlignment #AIEthics #ReinforcementLearning #AISafety
"Backed by nine governments – including Finland, France, Germany, Chile, India, Kenya, Morocco, Nigeria, Slovenia and Switzerland – as well as an assortment of philanthropic bodies and private companies (including Google and Salesforce, which are listed as “core partners”), Current AI aims to “reshape” the AI landscape by expanding access to high-quality datasets; investing in open source tooling and infrastructure to improve transparency around AI; and measuring its social and environmental impact.
European governments and private companies also partnered to commit around €200bn to AI-related investments, which is currently the largest public-private investment in the world. In the run up to the summit, Macron announced the country would attract €109bn worth of private investment in datacentres and AI projects “in the coming years”.
The summit ended with 61 countries – including France, China, India, Japan, Australia and Canada – signing a Statement on Inclusive and Sustainable Artificial Intelligence for People and the Planet at the AI Action Summit in Paris, which affirmed a number of shared priorities.
This includes promoting AI accessibility to reduce digital divides between rich and developing countries; “ensuring AI is open, inclusive, transparent, ethical, safe, secure and trustworthy, taking into account international frameworks for all”; avoiding market concentrations around the technology; reinforcing international cooperation; making AI sustainable; and encouraging deployments that “positively” shape labour markets.
However, the UK and US governments refused to sign the joint declaration."
#AI #AIActionSummit #AISafety #AIEthics #ResponsibleAI #AIGovernance
Governments, companies and civil society groups gathered…
ComputerWeekly.comWe tested different AI models to identify the largest of three numbers with the fractional parts .11, .9, and .099999. You'll be surprised that some AI mistakenly identifying the number ending in .11 as the largest. We also test AI engines on the pronunciation of decimal numbers. #AI #ArtificialIntelligence #MachineLearning #DecimalComparison #MathError #AISafety #DataScience #Engineering #Science #Education #TTMO
I will be attending EAGxPrague conference in May.
I have been a big fan of https://80000hours.org for some time and given my background, I am interested in AI safety and also in "AI for good".
This is my first in-person involvement with the effective altruism community. I am well aware that there are some controversies around the movement, so I am quite curious about what I find when I finally meet the community in person.
This makes it your best opportunity to have a positive…
80,000 HoursAfter all these recent episodes, I don't know how anyone can have the nerve to say out loud that the Trump administration and the Republican Party value freedom of expression and oppose any form of censorship. Bunch of hypocrites! United States of America: The New Land of SELF-CENSORSHIP.
"The National Institute of Standards and Technology (NIST) has issued new instructions to scientists that partner with the US Artificial Intelligence Safety Institute (AISI) that eliminate mention of “AI safety,” “responsible AI,” and “AI fairness” in the skills it expects of members and introduces a request to prioritize “reducing ideological bias, to enable human flourishing and economic competitiveness.”
The information comes as part of an updated cooperative research and development agreement for AI Safety Institute consortium members, sent in early March. Previously, that agreement encouraged researchers to contribute technical work that could help identify and fix discriminatory model behavior related to gender, race, age, or wealth inequality. Such biases are hugely important because they can directly affect end users and disproportionately harm minorities and economically disadvantaged groups.
The new agreement removes mention of developing tools “for authenticating content and tracking its provenance” as well as “labeling synthetic content,” signaling less interest in tracking misinformation and deep fakes. It also adds emphasis on putting America first, asking one working group to develop testing tools “to expand America’s global AI position.”"
https://www.wired.com/story/ai-safety-institute-new-directive-america-first/
Anthropic CEO Dario Amodei's idea of a 'quit job' button for AI has ignited debate on AI autonomy, ethics, and potential risks.Anthropic CEO Dario Amodei recently proposed a 'quit job' button for AI models, raising questions about AI autonomy and ethical implications. #AI #AIsafety #Anthropic #Autonomy #DarioAmodei #ethics #innovation #technology
https://redrobot.online/2025/03/anthropic-ceo-proposes-quit-job-button-for-ai-sparking-skepticism/
Anthropic CEO Dario Amodei's idea of a 'quit job' button…
Le Red RobotAnthropic Pushes Stronger AI Rules to White House While Quietly Dropping Old Safety Commitments
#AI #Anthropic #AIRegulation #AISafety #AIdevelopment #AItools #CyberSecurity #AIethics #AImodels #AIregulation #AIoversight #AIrisks #AIpolicy #AIsecurity
Why Unlearning Is Hard For AI, And Why Your Business Should Care https://www.byteseu.com/783709/ #AI #AIBusiness #AIData #AIFluency #AIGovernance. #AILiteracy #AISafety #ArtificialIntelligence #DataGovernance #FoundationModels #LargeLanguageModels