# robots.txt mostly grouped by company. # Last Updated: 10/29/2025 # User-agent: AI2Bot User-agent: Ai2Bot-Dolma Disallow: / User-agent: Amazonbot User-agent: contxbot User-agent: AmazonAdBot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: ClaudeBot Disallow: / User-agent: cohere-ai Disallow: / User-agent: Diffbot Disallow: / User-agent: FriendlyCrawler Disallow: / User-agent: Google-CloudVertexBot User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: iaskspider/2.0 Disallow: / User-agent: ICC-Crawler Disallow: / User-agent: ImagesiftBot Disallow: / User-agent: img2dataset Disallow: / User-agent: ISSCyberRiskCrawler Disallow: / User-agent: Kangaroo Bot Disallow: / User-agent: LinkedInBot Disallow: / User-agent: FacebookBot User-agent: FacebookExternalHit User-agent: Meta-WebIndexer User-agent: Meta-ExternalAds User-agent: Meta-ExternalAgent User-agent: Meta-ExternalFetcher Disallow: / User-agent: PanguBot Disallow: / User-agent: Sidetrade indexer bot Disallow: / User-agent: TikTokSpider Disallow: / User-agent: Timpibot Disallow: / User-agent: VelenPublicWebCrawler Disallow: / User-agent: Webzio-Extended Disallow: / User-agent: YouBot Disallow: / User-agent: Arquivo-web-crawler User-agent: bch-web-crawler Disallow: / User-agent: CriteoBot Disallow: / User-agent: DnBCrawler User-agent: DnBCrawler-Analytics Disallow: / User-agent: FAST-WebCrawler Disallow: / User-agent: FyndSearchEngine-Crawler User-agent: FyndSearchEngine-ReCrawler Disallow: / User-agent: MiniWebCrawler Disallow: / User-agent: Bravebot Disallow: / User-agent: AdkernelTopicCrawler Disallow: / User-agent: Spider Disallow: / User-agent: DataForSeoBot Disallow: / User-agent: Thinkbot Disallow: / User-agent: SeekportBot Disallow: / User-agent: Brightbot Disallow: / User-agent: Discordbot Disallow: / User-agent: Twitterbot Disallow: / User-agent: Leikibot Disallow: / User-agent: Flyriverbot Disallow: / User-agent: coccocbot-web User-agent: coccocbot-image Disallow: / User-agent: GenomeCrawlerd Disallow: / User-agent: Qwantbot Disallow: / User-agent: DVbot Disallow: / User-agent: HaloBot Disallow: / User-agent: Website-info.net Disallow: / User-agent: ShapBot Disallow: / User-agent: Slackbot-LinkExpanding Disallow: / User-agent: PATHspider Disallow: / User-agent: BitSightBot Disallow: / User-agent: UGAResearchAgent Disallow: / User-agent: LinkupBot Disallow: / User-agent: bitlybot Disallow: / User-agent: StartmeBot Disallow: / User-agent: Pleroma Disallow: / User-agent: SeobilityBot Disallow: / User-agent: Yeti Disallow: / # Category: Chinese Search Engine # URL: https://www.so.com/ User-agent: 360Spider Disallow: / # Category: Commercial Web Scrape / Web Crawl Company Crap # URL: http://80legs.com/the-80legs-web-crawler/ User-agent: 80legs User-agent: voltron Disallow: / # Category: Commercial Advertising Crap # URL: https://www.adbeat.com/operation_policy User-agent: adbeat_bot Disallow: / # Category: Commercial Advertising Crap # URL: https://www.admantx.com/ User-agent: admantx Disallow: / # Category: Commercial SEO Crap # URL: https://seostar.co/robot/ User-agent: adsbot Disallow: / # Category: Unknown # URL: Unknown User-agent: AdsrvrBot Disallow: / # Category: Commercial Company Aggregating ads.txt # URL: https://www.adstxt.com/ User-agent: adstxt.com Disallow: / # Category: Unknown Sites Scanning ads.txt # URL: https://github.com/InteractiveAdvertisingBureau/adstxtcrawler User-agent: AdsTxtCrawler User-agent: AdsTxtCrawler-CyberAgent User-agent: AppNexusAdsTxtCrawler User-agent: gumgumAdsTxtCrawler Disallow: / # Category: Commercial Marketing Link Crap # URL: https://ahrefs.com/ User-agent: AhrefsBot Disallow: / # Category: Distributed JVM App # URL: https://akka.io/ User-agent: akka-http Disallow: / # Category: Commercial SEO Crap # URL: http://alphaseobot.com/bot.html # AKA: AlphaBot User-agent: AlphaSeoBot Disallow: / # Category: Huawei Web Crawler # URL: https://aspiegel.com/ User-agent: AspiegelBot Disallow: / # Category: Commercial Social Media monitoring # URL: https://awario.com/bots.html User-agent: AwarioBot User-agent: AwarioRssBot User-agent: AwarioSmartBot Disallow: / # Category: Chinese Search Engine User-agent: Baiduspider Disallow: / # Category: Commercial Data Mining # URL: https://www.exensa.com/ User-agent: Barkrowler User-agent: BUbiNG Disallow: / # Category: Commercial Advertising Crap # URL: https://www.bidswitch.com/ User-agent: bidswitchbot Disallow: / # Category: Commercial Advertising Crap # URL: https://bidtellect.com/ User-agent: Bidtellect Disallow: / # Category: Commercial SEO Backlink Crap # URL: http://webmeup-crawler.com/ User-agent: BLEXBot Disallow: / # Category: Commercial Brand Protection Crap # URL: https://www.brandverity.com/why-is-brandverity-visiting-me User-agent: BrandVerity Disallow: / # Category: Commercial Pinterest Wannabe # URL: https://www.bublup.com/bublup-bot User-agent: BublupBot Disallow: / # Category: Lists what technologies it finds sites built with # URL: https://builtwith.com/ User-agent: BuiltWith Disallow: / # Category: Non-Profit Data Harvesting # URL: http://commoncrawl.org/big-picture/frequently-asked-questions/ User-agent: CCBot Disallow: / # Category: Commercial Advertising Crap # URL: https://www.centro.net/ User-agent: Centro Ads.txt Crawler Disallow: / # Category: Commercial Brand monitoring # URL: https://www.checkmarknetwork.com/ User-agent: CheckMarkNetwork Disallow: / # Category: Commercial Data Mining Crap # URL: https://www.clickagy.com/ User-agent: Clickagy Intelligence Bot v2 Disallow: / # Category: Commercial German Browser / Search Engine # URL: https://cliqz.com/en/cliqzbot User-agent: Cliqzbot Disallow: / # Category: Shady Vulnerability Scanner # URL: https://commonscan.org/ User-agent: commonscan Disallow: / User-agent: Cotoyogi Disallow: / # Category: SEO Crap # URL: https://dataforseo.com/dataforseo-bot User-agent: DataForSeoBot Disallow: / # Category: Commercial Analytics Company # URL: https://www.dataprovider.com/ User-agent: Dataprovider Disallow: / # Category: Korean Search Engine # URL: https://www.daum.net/ User-agent: DAUM Disallow: / # Category: Domain Harvester # URL: https://github.com/kgretzky/dcrawl User-agent: dcrawl Disallow: / # Category: Commercial SEO Marketing Crap # URL: https://www.deepcrawl.com/bot/ User-agent: deepcrawl Disallow: / # Category: Commercial SEO Harvesting # URL: http://www.domaincrawler.com/ User-agent: domaincrawler Disallow: / # Category: Commercial Backlink, Metrics, Rankings, etc... # URL: https://domainstats.com/ User-agent: DomainStatsBot Disallow: / # Category: Expired Domain Bot? # URL: https://www.domcop.com/bot User-agent: DomCopBot Disallow: / # Category: Commercial Backlink Crap # URL: https://moz.com/ User-agent: DotBot User-agent: rogerbot Disallow: / # Category: Commercial Marketing Crap # URL: https://www.exalead.com User-agent: Exabot Disallow: / # Category: Unknown (Website Down) # URL: https://extlinks.com/Bot.html User-agent: ExtLinksBot Disallow: / # Category: "New" Search Engine for Maximum Privacy # URL: http://femtosearch.com/ User-agent: FemtosearchBot Disallow: / # Category: # URL: https://garlik.com/ User-agent: Garlik Disallow: / # Category: Commercial Ad Network # URL: https://getintent.com/bot.html User-agent: GetIntent Crawler Disallow: / # Category: Gigablast Search Engine # URL: https://www.gigablast.com/ User-agent: Gigabot User-agent: G-i-g-a-b-o-t Disallow: / # Category: Crawling Project # URL: http://glutenfreepleasure.com/ User-agent: Gluten Free Crawler Disallow: / # Category: Spell Checker - Indexing? # URL: https://www.grammarly.com/ User-agent: Grammarly Disallow: / # Category: Commercial Contextual Intelligence Crap # URL: https://www.grapeshot.com/crawler/ User-agent: grapeshot Disallow: / # Category: Commercial Japanese Marketing Firm # URL: http://hatenaantenna.g.hatena.ne.jp/ User-agent: Hatena Antenna Disallow: / # Category: Commercial Website Audit & Monitoring # URL: https://hexometer.com/ User-agent: Hexometer Disallow: / # Category: Chinese GeoIP Wannabe # URL: https://en.ipip.net/ User-agent: HTTP Banner Detection Disallow: / # Category: Random Blogs # URL: https://hubpages.com/ User-agent: HubPages Disallow: / # Category: Commercial Advertising Crap # URL: https://integralads.com/site-indexing-policy/ User-agent: ias_crawler User-agent: ias_wombles Disallow: / User-agent: IbouBot Disallow: / # Category: German Search Engine # URL: https://infotiger.com/bot User-agent: InfoTigerBot Disallow: / # Category: Italian ISP # URL: https://www.tiscali.it/ User-agent: IstellaBot Disallow: / # Category: Java based HTTP client # URL: https://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html User-agent: Jersey Disallow: / # Category: Chinese Translation Site # URL: https://www.keybot.com/ User-agent: Keybot Disallow: / # Category: Crap # URL: https://line.me/en/ User-agent: Linespider Disallow: / # Category: Translation Bot # URL: https://www.linguee.com/ User-agent: Linguee Disallow: / # Category: Commercial Link Indexer # URL: https://www.linkdex.com/en-us/about/bots/ User-agent: linkdex User-agent: linkdexbot Disallow: / # Category: Unknown - "security research purposes" # URL: http://ltx71.com/ User-agent: ltx71 Disallow: / # MaCoCu - Some BS student project # URL: https://www.clarin.si/ User-agent: MaCoCu Disallow: / # Category: Commercial Social Media Monitoring # URL: https://www.brandwatch.com/legal/magpie-crawler/ User-agent: magpie-crawler Disallow: / # Category: Russian Mail / Social / Other Crap # URL: http://go.mail.ru/help/robots User-agent: Mail.Ru User-agent: Mail.RU_Bot Disallow: / # Category: Unknown # URL: Unknown User-agent: MauiBot Disallow: / # Category: Commercial Backlinks Crawler # URL: https://monitorbacklinks.com User-agent: MBCrawler Disallow: / # Category: Commercial Russian SEO Crap # URL: https://megaindex.com/ User-agent: MegaIndex.ru User-agent: MegaIndex.com Disallow: / # Category: Commercial SEO Marketing Crap # URL: https://mj12bot.com/ User-agent: MJ12bot Disallow: / # Category: Commercial Analytics Crap # URL: https://moat.com/ User-agent: moatbot Disallow: / # Category: UK Search Engine # URL: https://www.mojeek.com/bot.html User-agent: MojeekBot Disallow: / # Category: SEO Crap # URL: https://metrics-tools.de/robot.html User-agent: MTRobot Disallow: / # Category: Commercial Metrics Crap # URL: https://www.netcraft.com/ User-agent: NetcraftSurveyAgent Disallow: / # Category: A HTTP client for Android, Kotlin, and Java # URL: https://square.github.io/okhttp/ User-agent: okhttp Disallow: / # Category: A vertical search engine # URL: http://omgili.com/Crawler.html User-agent: omgilibot User-agent: omgili Disallow: / # Category: Commercial Data Mining Crap # URL: https://panscient.com/faq.htm User-agent: panscient Disallow: / # Category: Huawei Search Engine # URL: https://aspiegel.com/ User-agent: PetalBot Disallow: / # Category: Commercial Site # URL: http://www.pinterest.com/bot.html User-agent: Pinterestbot Disallow: / # Category: Commercial Data Mining Crap # URL: https://pipl.com/bot/ User-agent: PiplBot Disallow: / # Category: Commercial Advertising # URL: https://www.comscore.com/ User-agent: proximic Disallow: / # Category: Commercial Pic Search Indexer # URL: https://www.picsearch.com/bot.html User-agent: psbot Disallow: / # Category: Advertising User-agent: Quantcastbot Disallow: / # Category: F-Secure Research Crap # URL: http://riddler.io/about User-agent: Riddler Disallow: / # Category: Commercial web scraper for hire. # URL: https://scrapinghub.com/ User-agent: Quick-Crawler User-agent: Scrapy Disallow: / # Category: Commercial SEO Spider Software # URL: https://www.screamingfrog.co.uk/ User-agent: Screaming Frog SEO Spider Disallow: / # Category: Commercial Media Intelligence Crap # URL: http://www.carma.com User-agent: ScooperBot Disallow: / # Category: German Search Engine? # URL: http://seekport.com/ User-agent: Seekport Crawler Disallow: / # Semantic Scholar - Looking for academic PDFs User-agent: SemanticScholarBot Disallow: / # Category: Commercial Marketing Crap # URL: https://www.semrush.com/bot/ User-agent: SiteAuditBot User-agent: SemrushBot User-agent: SemrushBot-BA Disallow: / # Category: Commercial SEO Garbage # URL: https://www.seobility.net/en/bot/ User-agent: Seobility Disallow: / # Category: Commercial Backlink Checker # URL: https://en.seokicks.de/ User-agent: SEOkicks Disallow: / # Category: Commercial SEO Crap # URL: https://serpstat.com/ User-agent: serpstatbot Disallow: / # Category: Czech Portal / Search Engine # URL: https://napoveda.seznam.cz/en/seznamcz-web-search/ User-agent: SeznamBot Disallow: / # Category: Unknown (Website Down) - Backlink Checker # URL: https://siteexplorer.info User-agent: SiteExplorer Disallow: / # Category: Commercial Advertising Marketing Crap # URL: http://www.similartech.com/smtbot User-agent: SMTBot Disallow: / # Category: Chinese Search Engine User-agent: Sogou Spider Disallow: / # Category: Commercial SEO Solution Crap # URL: https://www.seoprofiler.com/ User-agent: spbot Disallow: / # Category: Commercial Language Processing # URL: https://nlp.fi.muni.cz/projects/biwec/ User-agent: SpiderLing Disallow: / # Category: Some Commercial Crap # URL: http://sur.ly/bot.html User-agent: SurdotlyBot Disallow: / # Category: Unknown # URL: Unknown User-agent: The Knowledge AI Disallow: / # Category: Commercial Social media monitoring & analytics # URL: http://www.trendiction.com/en/publisher/bot User-agent: trendictionbot Disallow: / # Category: Commercial Advertising Crap # URL: https://www.thetradedesk.com/us/ttd-content User-agent: TTD-Content Disallow: / # Category: Helps edu prevent plagiarism # URL: https://turnitin.com/robot/crawlerinfo.html User-agent: TurnitinBot Disallow: / # Category: Commercial Machine Learning Text Classifier # URL: https://www.uclassify.com/ User-agent: uclassify Disallow: / # Category: Commercial Russian CMS Detector Crap # URL: https://webdatastats.com/policy.html User-agent: WebDataStats Disallow: / # https://well-known.dev/about/ User-agent: WellKnownBot Disallow: / # Category: Russian Search Engine User-agent: Yandex User-agent: YandexBot Disallow: / # Category: Backlink Checker # URL: http://www.zombiedomain.net/robot/ User-agent: Zombiebot Disallow: / # Category: Commercial Italian SEO Crap # URL: https://suite.seozoom.it/ User-agent: ZoomBot User-agent: Linkbot Disallow: / # Category: Commercial Advertising Crap # URL: https://www.zoominfo.com/ User-agent: ZoominfoBot Disallow: / User-agent: * Disallow: /wp-login.php Disallow: /xmlrpc.php Crawl-Delay: 10 Sitemap: https://www.extremeoverclocking.com/sitemap.xml.gz