#+TITLE: HTML to Org
I've been attempting to maintain a books list for the past several
years in various different ways.
For 2015, I'd duct taped together some scripts to run on a private
server that would watch for changes to a Dropbox folder, process raw
markdown files in there and stitch them together — so adding a book
was as simple as adding a markdown text file to the right folder and
it would show up on my site.
With my recent move to org-mode and github-pages, I basically copy-pasted the
generated html into a #+BEGIN_HTML — #+END_HTML section in the books.org
document, and that worked reasonably well. However, I really wanted to
normalize the contents to make them easier to parse and explore and so I
ended up writing some chicken scheme to convert HTML to Org.
I was pleasantly surprised with how easy the html-parser API made it to
handle HTML; I initially misunderstood what the seed was supposed to
do but it was a breeze after I cleared that up.
At the other end of the spectrum, I also spent a non-trivial amount of
time today typing out a book list I'd maintained in a notebook by hand
from 2012 - 2013. Sadly I didn’t have the energy to type out the notes I
had and decided to just record the titles.