By a strange coincidence when I was looking for text editing options for another project, the Stackoverflow guys released MarkdownSharp last week, being a C# port and extension to what was originally written in Perl.
A couple of days later I have JMD (Java MarkDown) with the same extensions and unit tests. At this stage—and certainly while the code stabilizes and work progresses in passing all the tests—it is an almost line-for-line translation of the C# source as this makes it easier to apply patches. This isn’t the Java Way, in particular Java favours a more DI-centric approach typified by Spring rather than static configuration.
Ugliness and architectural issues aside, it will do for now. You can:
- Download it from the Downloads page; or
- Retrieve it from Github at git://github.com/cletus/jmd.git.
It is built with Maven and should build out of the box (assuming correctly configured Maven). Running on my machine:
- Intel Q9450 CPU (2.66GHz);
- 8GB DDR2 RAM;
- Windows 7 Ultimate 64; and
- Intel X25-M G2 80GB SSD.
The results are:
JMD test run 1 Amps_and_angle_encoding OK 2 Auto_links OK 3 Backslash_escapes OK 4 Blockquotes_with_code_blocks OK 5 Code_Blocks OK 6 Code_Spans OK 7 Hard_wrapped_paragraphs_with_list_like_lines OK 8 Horizontal_rules OK 9 Images OK 10 Inline_HTML_Advanced Mismatch 11 Inline_HTML_comments OK 12 Inline_HTML_Simple OK 13 Links_inline_style OK 14 Links_reference_style OK 15 Links_shortcut_references OK 16 Literal_quotes_in_titles OK 17 Markdown_Documentation_Basics OK 18 Markdown_Documentation_Syntax OK 19 Nested_blockquotes OK 20 Ordered_and_unordered_lists Mismatch 21 Strong_and_em_together OK 22 Tabs OK 23 Tidyness OK^ Tests : 23 OK : 21 (^ 1 whitespace differences) Mismatch : 2 input string length: 475 4000 iterations in 6.301 seconds (1.575 ms per iteration) input string length: 2356 1000 iterations in 6.390 seconds (6.390 ms per iteration) input string length: 27737 100 iterations in 10.503 seconds (105.031 ms per iteration) input string length: 11075 1 iteration in 0.037 seconds input string length: 88607 1 iteration in 0.518 seconds input string length: 354431 1 iteration in 4.992 seconds
To compare, on the same machine, these are the MarkdownSharp results in Visual Studio 2008:
MarkdownSharp v1.006 test run on \mdtest-1.1 001 Amps_and_angle_encoding OK 002 Auto_links OK 003 Backslash_escapes OK^ 004 Blockquotes_with_code_blocks OK 005 Code_Blocks OK 006 Code_Spans OK 007 Hard_wrapped_paragraphs_with_list_like_lines OK 008 Horizontal_rules OK 009 Images OK 010 Inline_HTML_Advanced Mismatch 011 Inline_HTML_comments OK 012 Inline_HTML_Simple OK 013 Links_inline_style OK 014 Links_reference_style OK 015 Links_shortcut_references OK 016 Literal_quotes_in_titles OK 017 Markdown_Documentation_Basics OK 018 Markdown_Documentation_Syntax OK 019 Nested_blockquotes OK 020 Ordered_and_unordered_lists Mismatch 021 Strong_and_em_together OK 022 Tabs OK 023 Tidyness OK^ Tests : 23 OK : 21 (^ 2 whitespace differences) Mismatch : 2 MarkdownSharp v1.006 test run on \test-input 001 markdown-readme OK 002 reality-check OK Tests : 2 OK : 2 Mismatch : 0 MarkdownSharp v1.006 benchmark, takes 10 ~ 30 seconds... input string length: 475 4000 iterations in 3827 ms (0.95675 ms per iteration) input string length: 2356 1000 iterations in 4205 ms (4.205 ms per iteration) input string length: 27737 100 iterations in 4736 ms (47.36 ms per iteration) input string length: 11075 1 iteration in 23 ms input string length: 88607 1 iteration in 191 ms input string length: 354431 1 iteration in 1025 ms
So Java is roughly half the speed of C# in this regard, which is more difference than I’d expect for what is essentially the same code. At this preliminary stage I can only attribute this to the .Net Regex libraries being better.
JMD is released under the same permissive MIT license as MarkdownSharp. Please feel free to use it, let me know what you think or to contribute.
6 comments:
Did you benchmark it against the existing MarkdownJ?
No I didn't benchmark against MarkdownJ.
Considering both are written in a similar fashion (regexes based on the original Perl Markdown) I'd be surprised if they were much different.
That regex-based version however is (for me) just a stopgap to the real version currently being written (more on this later).
hi there,
did you build against JDK 6 or JDK 5 ?
also, did you use 32-bit bverson or 64 bit version ?
thank you,
BR,
~A
The code was built and run against Java 6u17 (32 bit on Windows 7/64).
The code is written to the Java 5 level (which basically means no @Override annotations on interface implementations).
Maybe you should redo it using Google's Java Regex library, which runs in constant time:
http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html
I also would be interested in a comparison with the newly released "pegdown" (http://github.com/sirthias/pegdown), another pure-Java Markdown processor taking a completely different approach. Rather than going the way of regular expression pegdown relies on a PEG parser. And it passes the original Markdown Testsuite to 100% (ignoring whitespace).
Post a Comment