• poVoq@slrpnk.netM
    link
    fedilink
    arrow-up
    22
    ·
    7 days ago

    Not only wikis sadly. Anything that has public facing deep links that trigger extensive database operations are being hammered by these bots and few servers can take the load.

  • thatsnothowyoudoit@lemmy.ca
    link
    fedilink
    arrow-up
    12
    ·
    edit-2
    7 days ago

    We use NGINX’s 444 response A LOT.

    In coordination with careful rate-limiting, it’s been a dramatic improvement.

    The worst of the bots don’t advertise their User Agent (or worse, attempt to present they’re a normal user making 100s of requests a second) but there’s lots of low hanging fruit.

  • Tiresia@slrpnk.net
    link
    fedilink
    arrow-up
    8
    ·
    7 days ago

    On the plus side, this isn’t a problem with AI, this is a problem with AI companies having more investment money than they know what to do with. The moment the hype fades and they don’t want to hemmorage money scraping every wiki on the internet thousands of times per day, this traffic will go back to a far more sane amount.

    • poVoq@slrpnk.netM
      link
      fedilink
      arrow-up
      3
      ·
      7 days ago

      It will probably go down, but the process itself is kind of unavoidable for training LLMs, so I doubt things will go back to how they were before.