Blogged by Ujihisa. Standard methods of programming and thoughts including Clojure, Vim, LLVM, Haskell, Ruby and Mathematics written by a Japanese programmer. github/ujihisa

Friday, June 19, 2009

Pandoc the Ultimate Markdown Utility

I installed pandoc by svn trunk source code, because I failed to install the dependency library haddock by MacPorts.

Install (for OS X)

Assume you have already * installed cabal. * have your local bin directory ~/bin

cabal install utf8-string
cabal install zip-archive
svn checkout http://pandoc.googlecode.com/svn/trunk/ pandoc
cd pandoc
CABALOPTS=--user make
make test
PREFIX=~ make install-exec

Note that the environment variable CABALOPTS=--user. I use cabal by normal user, not by super user. In the case we have to explicitly set it. Now you have pandoc commands in your ~/bin.

$ ls ~/bin
hsmarkdown
html2markdown
markdown2pdf
pandoc
(and others already you have...)

Usage

I think I don't have to show the usage of those commands, because those filenames are enough comprehensive. Therefore, I'll show some example input and output below.

$ pandoc -h
pandoc [OPTIONS] [FILES]
Input formats:  native, markdown, markdown+lhs, rst, rst+lhs, html, latex, latex+lhs
Output formats:  native, html, html+lhs, s5, docbook, opendocument, odt, latex, latex+lhs, context, texinfo, man, markdown, markdown+lhs, rst, rst+lhs, mediawiki, rtf
Options:
  -f FORMAT, -r FORMAT  --from=FORMAT, --read=FORMAT                    
  -t FORMAT, -w FORMAT  --to=FORMAT, --write=FORMAT                     
  -s                    --standalone                                    
  -o FILENAME           --output=FILENAME                               
  -p                    --preserve-tabs                                 
                        --tab-stop=TABSTOP                              
                        --strict                                        
                        --reference-links                               
  -R                    --parse-raw                                     
  -S                    --smart                                         
  -m[URL]               --latexmathml[=URL], --asciimathml[=URL]        
                        --mimetex[=URL]                                 
                        --jsmath[=URL]                                  
                        --gladtex                                       
  -i                    --incremental                                   
  -N                    --number-sections                               
                        --no-wrap                                       
                        --sanitize-html                                 
                        --email-obfuscation=none|javascript|references  
                        --toc, --table-of-contents                      
  -c CSS                --css=CSS                                       
  -H FILENAME           --include-in-header=FILENAME                    
  -B FILENAME           --include-before-body=FILENAME                  
  -A FILENAME           --include-after-body=FILENAME                   
  -C FILENAME           --custom-header=FILENAME                        
  -T STRING             --title-prefix=STRING                           
  -D FORMAT             --print-default-header=FORMAT                   
                        --dump-args                                     
                        --ignore-args                                   
  -v                    --version                                       
  -h                    --help                                          

Assume sample.md is below:

## This is a pen
Is this a pen? No, it is Nancy.

    def fib(n)
      rand(n) ** 2
    end

* hara y y hara y?
* y

Then

$ pandoc -t html sample.md
<div id="this-is-a-pen"
><h2
  >This is a pen</h2
  ><p
  >Is this a pen? No, it is Nancy.</p
  ><pre
  ><code
    >def fib(n)
  rand(n) ** 2
end
</code
    ></pre
  ><ul
  ><li
    >hara y y hara y?</li
    ><li
    >y</li
    ></ul
  ></div
>

What a crazy line breaks!

$ pandoc -t html sample.md
\subsection{This is a pen}

Is this a pen? No, it is Nancy.

\begin{verbatim}
def fib(n)
  rand(n) ** 2
end
\end{verbatim}
\begin{itemize}
\item
  hara y y hara y?
\item
  y
\end{itemize}

Yes.

$ echo 'hi' | pandoc -t html
<p
>hi</p
>

Stdin OK.

$ pandoc -t html sample.md | html2markdown
## This is a pen

Is this a pen? No, it is Nancy.

    def fib(n)
      rand(n) ** 2
    end

-   hara y y hara y?
-   y

That's amazing! That's what I've wanted!

I used html2text written in python for my blogger.vim. It can be used for converting markdown from html, but I found a lot of misconversions. I had to write some wrapper for my blogger.vim.

Now I got released from the burden. I decided to switch using this pandoc instead of the python library html2text.

3 comments:

  1. You might mention to your nonHaskeller readers that that the cabal install package builder as well as the Haddock machinery you mention are all included in the new Haskell Platform alog with the GHC compiler. There are easy to install binaries for all platforms. http://hackage.haskell.org/platform
    For pandoc, they should probably follow your method since the documents should be read, there are some options they might like that the simple "cabal install pandoc" wont give them, like syntax highlighting for code in html, and bibliographic support that maps onto bibtex and so forth.

    ReplyDelete
  2. I forgot to add that if they arent interested in these other pandoc features, then for sure, once they finish the installer - a regular dmg, like the one for firefox - then they need only type "cabal install pandoc" in the terminal, and it will be installed in ~/.cabal/bin (So these should be put in the $PATH. I dont think that is automated.
    I notice you had trouble installing Haddoc via Macports -- I think MacPorts is now a thing of the past for haskell related material on the Mac (and likewise for other platforms), since the Haskell Platform includes cabal install which gives you all of hackage.haskell.org with a simple cabal update followed by cabal install pony.
    (There was always a tendency for Mac users to install the GHC via macports, rather than the binaries on the GHC page, which has led to some unjustified complaints.)

    ReplyDelete
  3. I've never known the binary Haskell Platform. It seems to be difficult for me to install via dmg file instead of source or MacPorts, it is very good not to wait for a long time to build ghc or other libraries. A lot of my friends gave up installing ghc because of the 12+ hours building.

    Yes, I had a trouble with installing Haddoc via MacPorts then. I'll try the Haskell Platform soon.

    Sorry it took so long to reply :-)
    Thanks!

    ReplyDelete

Followers