psychothumbs@lemmy.world to Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ@lemmy.dbzer0.comEnglish · 1 year agoThe New York Times tried to block the Internet Archive: another reason to value the latterwalledculture.orgexternal-linkmessage-square68fedilinkarrow-up11.17Karrow-down113cross-posted to: technology@lemmy.world
arrow-up11.16Karrow-down1external-linkThe New York Times tried to block the Internet Archive: another reason to value the latterwalledculture.orgpsychothumbs@lemmy.world to Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ@lemmy.dbzer0.comEnglish · 1 year agomessage-square68fedilinkcross-posted to: technology@lemmy.world
minus-squarepootriarch@poptalk.scrubbles.techlinkfedilinkEnglisharrow-up4·1 year ago It exists, it’s called a robots.txt file that the developers can put into place, and then bots like the webarchive crawler will ignore the content. the internet archive doesn’t respect robots.txt: Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes. the only way to stay out of the internet archive is to follow the process they created and hope they agree to remove you. or firewall them. https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/
the internet archive doesn’t respect robots.txt:
the only way to stay out of the internet archive is to follow the process they created and hope they agree to remove you. or firewall them.
https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/