README.md

# imgslipstream

## what is it

a simple script to scan a html file and replace every img tag's src with an embedded copy of the base64-encoded data, i.e. embedding the image in the file. (back when i used to play with windows xp installers adding things to the disk such as updates used to be known as 'slipstreaming')

## why

because imgur announced recently (apr 2023) that they would be deleting a bunch of images, and i wanted to keep the pictures with some twine games that i've been playing recently, and i also didn't want to do that by hand (i am also something of a data hoarder ;)

## how

the python script doesn't pretend to understand html, it reads the file into memory, then looks for the <img opening tag, looks for a src= attribute following that, tries to determine if that contains a url, and if it does, downloads the url, base64 encodes the result, and jams it in to the src attribute. it also looks for &lt;img and src=&quot; which crop up in twine files a fair bit. it then writes the results to a new file.

## caveats

badly formatted html won't stop it, but it also will react unpredictably. it tries to strip unnecessary whitespace from the url, but that doesn't always work right. if a quote is missing, it may miss the url entirely, or it may grab half the file and think that is the url, i capped the url length at 40 characters (now 100 and adjustable on the command line) to try and avoid sending garbage requests. i have no idea what it does with urls that fail to load (now throws a warning if the server doesn't send a 200), or don't contain an image. i do not recommend removing the original file until you have thoroughly checked everything. it cannot handle already local files, but it should throw a warning alerting you to them. it cannot handle javascript loaded images at all unless it fits with standard html, if the js uses img it should throw a warning but no guarantees.