HTML to Org
I've been attempting to maintain a books list for the past several years in various different ways.
For 2015, I'd duct taped together some scripts to run on a private server that would watch for changes to a Dropbox folder, process raw markdown files in there and stitch them together — so adding a book was as simple as adding a markdown text file to the right folder and it would show up on my site.
With my recent move to org-mode and github-pages, I basically copy-pasted the
generated html into a #+BEGIN_HTML — #+END_HTML section in the books.org
document, and that worked reasonably well. However, I really wanted to
normalize the contents to make them easier to parse and explore and so I
ended up writing some chicken scheme to convert HTML to Org.
I was pleasantly surprised with how easy the html-parser API made it to
handle HTML; I initially misunderstood what the seed was supposed to
do but it was a breeze after I cleared that up.
(use html-parser)
(use srfi-1)
(use srfi-13)
(use utils)
;; Quickly read stdin
(define input-file
(let loop ((line (read-line))
(contents '()))
(if (not (eof-object? line))
(loop (read-line) (cons line contents))
(string-intersperse (reverse contents) "\n"))))
;; Utility counter for lists
(define (make-counter initval)
(let ((counter initval))
(lambda (action)
(case action
((get) counter)
((inc) (set! counter (+ counter 1)) counter)
((dec) (set! counter (- counter 1)) counter)))))
;; The actual parser to convert tags to org markup
(define parse
(let ((list-counter (make-counter 2)))
(make-html-parser
'start: (lambda (tag attrs seed virtual?)
(case tag
((h3) (cons "** " seed))
((ul) (list-counter 'inc) seed)
((li) (cons " " (cons (string-concatenate (make-list (list-counter 'get) "*"))
seed)))
((a) (cons (string-concatenate `("[[" ,(cadr (assoc 'href attrs)) "]["))
seed))
((em) (cons "/" seed))
((hr) (cons "\n-----\n" seed))
((sup) (cons "^" seed))
((blockquote) (cons "\n#+BEGIN_QUOTE\n" seed))
((strong) (cons "*" seed))
((p) seed)
(else seed)))
'text: (lambda (text seed)
(cons text seed))
'end: (lambda (tag attrs parent-seed seed virtual?)
(case tag
((ul) (list-counter 'dec) seed)
((a) (cons "]]" seed))
((p) (cons "\n" seed))
((strong) (cons "*" seed))
((em) (cons "/" seed))
((blockquote) (cons "\n#+END_QUOTE\n" seed))
(else seed))))))
;; Run the parser on the input, reverse because it cons's the results
(display
(string-concatenate (reverse (parse '() input-file))))
At the other end of the spectrum, I also spent a non-trivial amount of time today typing out a book list I'd maintained in a notebook by hand from 2012 - 2013. Sadly I didn’t have the energy to type out the notes I had and decided to just record the titles.