• TrackinDaKraken@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    6 days ago

    Because the font doesn’t matter, computers don’t read the shapes, except in the case of OCR, where the text is a raster image.

    • unwarlikeExtortion@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      6 days ago

      It isn’t about font, it’s about (unicode) characters. Lucky for the model, most are named for “normal” letters they resemble, so it’s similar to a font problem.

      Lots of edgy teens use these “fonts” (characters) on their Instagram bios as well, so thess things surely made the cut for training data.

    • nixukty@lemmy.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 days ago

      well it’s not just a font. doing this does degrade performance, since rather than the words being common, individual tokens, each fancy cursed letter is processed as a really uncommon unique one. it will still answer correctly most of the time since in embeddings they are similar sentences, but it will probably answer worse (unless in thinking it rewrites your prompt correctly).