Commit a5cb6001 authored by The Heavy's avatar The Heavy 🚂
Browse files

Finally fix those pesky HTML characters in the README

parent 7ff01c9b
Loading
Loading
Loading
Loading
+18 −1
Original line number Diff line number Diff line
@@ -4,13 +4,30 @@

a simple script to scan a html file and replace every img tag's src with an embedded copy of the base64-encoded data, i.e. embedding the image in the file. (back when i used to play with windows xp installers adding things to the disk such as updates used to be known as 'slipstreaming')

## usage

```
usage: imgslipstream.py [-h] [--verbose] [--files] [--maxlen MAXLEN] source

Parses a HTML file for <img> tags, downloads the linked URLs, and embeds them into the HTML file.

positional arguments:
  source

options:
  -h, --help           show this help message and exit
  --verbose, -v        Print more logging to the console.
  --files, -f          Make local files instead of embedding.
  --maxlen, -m MAXLEN  Change the max length of URLs allowed (default 100).
```

## why

because imgur announced recently (apr 2023) that they would be deleting a bunch of images, and i wanted to keep the pictures with some twine games that i've been playing recently, and i also didn't want to do that by hand (i am also something of a data hoarder ;)

## how

the python script doesn't pretend to understand html, it reads the file into memory, then looks for the <img opening tag, looks for a src= attribute following that, tries to determine if that contains a url, and if it does, downloads the url, base64 encodes the result, and jams it in to the src attribute. it also looks for &lt;img and src=&quot; which crop up in twine files a fair bit. it then writes the results to a new file.
the python script doesn't pretend to understand html, it reads the file into memory, then looks for the \<img opening tag, looks for a src= attribute following that, tries to determine if that contains a url, and if it does, downloads the url, base64 encodes the result, and jams it in to the src attribute. it also looks for \&lt;img and src=\&quot; which crop up in twine files a fair bit. it then writes the results to a new file.

## caveats