a simple script to scan a html file and replace every img tag's src with an embedded copy of the base64-encoded data, i.e. embedding the image in the file. (back when i used to play with windows xp installers adding things to the disk such as updates used to be known as 'slipstreaming')
Parses a HTML file for <img> tags, downloads the linked URLs, and embeds them into the HTML file.
positional arguments:
source
options:
-h, --help show this help message and exit
--verbose, -v Print more logging to the console.
--files, -f Make local files instead of embedding.
--maxlen, -m MAXLEN Change the max length of URLs allowed (default 100).
```
## why
because imgur announced recently (apr 2023) that they would be deleting a bunch of images, and i wanted to keep the pictures with some twine games that i've been playing recently, and i also didn't want to do that by hand (i am also something of a data hoarder ;)
## how
the python script doesn't pretend to understand html, it reads the file into memory, then looks for the <img opening tag, looks for a src= attribute following that, tries to determine if that contains a url, and if it does, downloads the url, base64 encodes the result, and jams it in to the src attribute. it also looks for <img and src=" which crop up in twine files a fair bit. it then writes the results to a new file.
the python script doesn't pretend to understand html, it reads the file into memory, then looks for the \<img opening tag, looks for a src= attribute following that, tries to determine if that contains a url, and if it does, downloads the url, base64 encodes the result, and jams it in to the src attribute. it also looks for \<img and src=\" which crop up in twine files a fair bit. it then writes the results to a new file.