• Not_mikey@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    8
    ·
    8 hours ago

    As I was reading I was wondering why they weren’t using the top line models, they used sonnet instead of opus, gpt mini, Gemini flash etc. They really buried the lead on this one, last sentence:

    They recommend “formally verified safety architectures” as a solution. You’ll be shocked to learn that Emergence happens to offer just such a thing!

    So this company set up the test so that the AI would fail so they could sell you on there guardrail software. Even then the article says sonnet did pretty well.