HTML to Org
I've been attempting to maintain a books list for the past several years in various different ways.
For 2015, I'd duct taped together some scripts to run on a private server that would watch for changes to a Dropbox folder, process raw markdown files in there and stitch them together — so adding a book was as simple as adding a markdown text file to the right folder and it would show up on my site.
With my recent move to org-mode and github-pages, I basically copy-pasted the
generated html into a #+BEGIN_HTML
— #+END_HTML
section in the books.org
document, and that worked reasonably well. However, I really wanted to
normalize the contents to make them easier to parse and explore and so I
ended up writing some chicken scheme to convert HTML to Org.
I was pleasantly surprised with how easy the html-parser API made it to
handle HTML; I initially misunderstood what the seed
was supposed to
do but it was a breeze after I cleared that up.
(use html-parser) (use srfi-1) (use srfi-13) (use utils) ;; Quickly read stdin (define input-file (let loop ((line (read-line)) (contents '())) (if (not (eof-object? line)) (loop (read-line) (cons line contents)) (string-intersperse (reverse contents) "\n")))) ;; Utility counter for lists (define (make-counter initval) (let ((counter initval)) (lambda (action) (case action ((get) counter) ((inc) (set! counter (+ counter 1)) counter) ((dec) (set! counter (- counter 1)) counter))))) ;; The actual parser to convert tags to org markup (define parse (let ((list-counter (make-counter 2))) (make-html-parser 'start: (lambda (tag attrs seed virtual?) (case tag ((h3) (cons "** " seed)) ((ul) (list-counter 'inc) seed) ((li) (cons " " (cons (string-concatenate (make-list (list-counter 'get) "*")) seed))) ((a) (cons (string-concatenate `("[[" ,(cadr (assoc 'href attrs)) "][")) seed)) ((em) (cons "/" seed)) ((hr) (cons "\n-----\n" seed)) ((sup) (cons "^" seed)) ((blockquote) (cons "\n#+BEGIN_QUOTE\n" seed)) ((strong) (cons "*" seed)) ((p) seed) (else seed))) 'text: (lambda (text seed) (cons text seed)) 'end: (lambda (tag attrs parent-seed seed virtual?) (case tag ((ul) (list-counter 'dec) seed) ((a) (cons "]]" seed)) ((p) (cons "\n" seed)) ((strong) (cons "*" seed)) ((em) (cons "/" seed)) ((blockquote) (cons "\n#+END_QUOTE\n" seed)) (else seed)))))) ;; Run the parser on the input, reverse because it cons's the results (display (string-concatenate (reverse (parse '() input-file))))
At the other end of the spectrum, I also spent a non-trivial amount of time today typing out a book list I'd maintained in a notebook by hand from 2012 - 2013. Sadly I didn’t have the energy to type out the notes I had and decided to just record the titles.