Skip to main content
Home

Main navigation

  • home
  • slides
  • essays
  • articles
  • books
  • video
  • q & a
  • blog
Secondary menu
  • about us
    • scholars
    • privacy
    • support
    • image rights
    • credits
    • contact us
  • resources

An Archaeological Site in the AI Jungle

  • Cloudflare Indus stats May June 2026

What are the AI platforms doing to websites like Harappa.com which have been around since the dawn of the Internet?

Like colonial powers, they happily claim territory through extra-legal procedures (although they are said to be far more intelligent than human beings, they can't understand our terms of service which kindly asks them to not crawl!). In the way they appropriate content researched and spun up by others, they are no different from the European colonial powers in the 16th through 19th centuries claiming lands and peoples with whom they had nothing in common. By force of server power and data centers instead of guns and cannon, they force themselves on the web and human knowledge, taking as much value as they can, in a way that would make Dutch and British East India companies proud.

Looking at Cloudflare statistics for 7 days between May 29th and June 4th 2026 casts some light on this continuously evolving terrain, one that is actually quite important to how people learn and information circulates in the world. Are there differences in rapaciousness amongst AI platforms? Are they helping or hindering awareness and understanding of the ancient Indus civilization? We get about 30,000 visitors a day, a medium-sized content site, many of whom are complaining that they are losing traffic and visitors to AI platforms sucking the air out of the web.

The number in the Cloudflare stats above that jumps out is OpenAI (ChatGPT's) enormous use of content from Harappa.com, almost 30GB of data download in 45K sessions. Don't worry, this trillion dollar company is not on the hook for, and has never offered to pay for the data it consumes. Their knowledge theft is continuous. They send little traffic back to Harappa.com. One side takes all is their moto.

Google, thanks to its search engine (which sends some 65% of the traffic to Harappa.com) is more magnanimous. It sends two-and-a-half times as many sessions to Harappa.com than its use of site content, over 50K. How this is intertwined with search is unclear, but it shows how much the user of Google is interested in learning more, and Google's AI interface (currently) facilitates that. Our data shows about half the visitors are students of all ages, a chunk are interested amateurs, and almost ten percent are scholars so it makes sense for them to ask for more.

As with any frenzy, there are the actors who provide no return at all to those they take from: Amazon, Apple, Huawei and ByteDance. It really isn't that hard to block these agents, and maybe I should. Interestingly Grok does not appear. Is its agent cloaked? Does it not care about the ancient Indus civilization?

Meta and Microsoft offer crumbs. DuckDuckGo is perhaps the fairest but numbers are tiny. Anthropic's Claude is surprising for its limited referrals. I use their agent Claude and find it exceptional, as do many in the San Francisco Bay Area, where even I as a former CTO (Chief Technology Officer) cannot anymore understand the language on billboards, so completely do AI and driverless cars dominate the Zeitgeist. In fact I use AI to fight AI, for Claude helps write the server rules in Cloudflare needed to block bots, usually from Far and Southeast Asia, which occasionally pummel the site with useless traffic.

Not sure what to think about all this in the larger scheme of things. Our mission is educational, so it is great that it is informing a wider audience. ChatGPT is helping that mission by taking quality content to so many people, so I have mixed feelings about their theft - is it for good? So is Gemini (Google). Yet they are feasting on many years work of curation and the more important real scholarship by many archaeologists and scholars beneath that. They are not funding research or synthesis, but largely and simply living off it.

Sadly, they don't all see the need to feed the sources of their current product advantage. Will that ever change? Is this a new kind of technological colonialism that knows few bounds? Only Anthropic settled with authors recently for using their books without permission (case not concluded). Still, as others (see Ted Chiang No Artificial Intelligence is Not Conscious) have pointed out, their ethical policies are disingenuous: they don't hold themselves responsible for any of their creature Claude's failings, which is the very basis of an ethical framework.

Is there a model, perhaps like AdSense once was, that brings some of the value created back to the originators of that value? How long can an archaeological site in the AI jungle last when the looters and antiquities traffickers are running about?

- Omar Khan, June 8 2026

Image: Cloudflare AI Panel. On the top right are designs based on two ancient Indus merchants as they appeared on a Mesopotamian seal from about 2020 BCE.

News

Blog Posts by Subject (16)
  • Animals
  • Art
  • Children
  • City Life
  • Conferences
  • Crafts and Industry
  • Evolution
  • Excavations
  • Food
  • Homes
  • Media
  • Museums
  • Mysteries
  • News
  • People
  • Seals
© Harappa.com 1995-2026 31