Announcing JMD: Java MarkDown (port of MarkdownSharp)

By a strange coincidence when I was looking for text editing options for another project, the Stackoverflow guys released MarkdownSharp last week, being a C# port and extension to what was originally written in Perl.

A couple of days later I have JMD (Java MarkDown) with the same extensions and unit tests. At this stage—and certainly while the code stabilizes and work progresses in passing all the tests—it is an almost line-for-line translation of the C# source as this makes it easier to apply patches. This isn’t the Java Way, in particular Java favours a more DI-centric approach typified by Spring rather than static configuration.

Ugliness and architectural issues aside, it will do for now. You can:

  1. Download it from the Downloads page; or
  2. Retrieve it from Github at git://github.com/cletus/jmd.git.

It is built with Maven and should build out of the box (assuming correctly configured Maven). Running on my machine:

  • Intel Q9450 CPU (2.66GHz);
  • 8GB DDR2 RAM;
  • Windows 7 Ultimate 64; and
  • Intel X25-M G2 80GB SSD.

The results are:

JMD test run

1   Amps_and_angle_encoding                                 OK
2   Auto_links                                              OK
3   Backslash_escapes                                       OK
4   Blockquotes_with_code_blocks                            OK
5   Code_Blocks                                             OK
6   Code_Spans                                              OK
7   Hard_wrapped_paragraphs_with_list_like_lines            OK
8   Horizontal_rules                                        OK
9   Images                                                  OK
10  Inline_HTML_Advanced                                    Mismatch
11  Inline_HTML_comments                                    OK
12  Inline_HTML_Simple                                      OK
13  Links_inline_style                                      OK
14  Links_reference_style                                   OK
15  Links_shortcut_references                               OK
16  Literal_quotes_in_titles                                OK
17  Markdown_Documentation_Basics                           OK
18  Markdown_Documentation_Syntax                           OK
19  Nested_blockquotes                                      OK
20  Ordered_and_unordered_lists                             Mismatch
21  Strong_and_em_together                                  OK
22  Tabs                                                    OK
23  Tidyness                                                OK^

Tests      : 23
OK         : 21 (^ 1 whitespace differences)
Mismatch   : 2

input string length: 475
4000 iterations in 6.301 seconds (1.575 ms per iteration)
input string length: 2356
1000 iterations in 6.390 seconds (6.390 ms per iteration)
input string length: 27737
100 iterations in 10.503 seconds (105.031 ms per iteration)
input string length: 11075
1 iteration in 0.037 seconds
input string length: 88607
1 iteration in 0.518 seconds
input string length: 354431
1 iteration in 4.992 seconds

To compare, on the same machine, these are the MarkdownSharp results in Visual Studio 2008:

MarkdownSharp v1.006 test run on \mdtest-1.1

001 Amps_and_angle_encoding                                OK
002 Auto_links                                             OK
003 Backslash_escapes                                      OK^
004 Blockquotes_with_code_blocks                           OK
005 Code_Blocks                                            OK
006 Code_Spans                                             OK
007 Hard_wrapped_paragraphs_with_list_like_lines           OK
008 Horizontal_rules                                       OK
009 Images                                                 OK
010 Inline_HTML_Advanced                                   Mismatch
011 Inline_HTML_comments                                   OK
012 Inline_HTML_Simple                                     OK
013 Links_inline_style                                     OK
014 Links_reference_style                                  OK
015 Links_shortcut_references                              OK
016 Literal_quotes_in_titles                               OK
017 Markdown_Documentation_Basics                          OK
018 Markdown_Documentation_Syntax                          OK
019 Nested_blockquotes                                     OK
020 Ordered_and_unordered_lists                            Mismatch
021 Strong_and_em_together                                 OK
022 Tabs                                                   OK
023 Tidyness                                               OK^

Tests        : 23
OK           : 21 (^ 2 whitespace differences)
Mismatch     : 2

MarkdownSharp v1.006 test run on \test-input

001 markdown-readme                                        OK
002 reality-check                                          OK

Tests        : 2
OK           : 2
Mismatch     : 0


MarkdownSharp v1.006 benchmark, takes 10 ~ 30 seconds...

input string length: 475
4000 iterations in 3827 ms (0.95675 ms per iteration)
input string length: 2356
1000 iterations in 4205 ms (4.205 ms per iteration)
input string length: 27737
100 iterations in 4736 ms (47.36 ms per iteration)
input string length: 11075
1 iteration in 23 ms
input string length: 88607
1 iteration in 191 ms
input string length: 354431
1 iteration in 1025 ms

So Java is roughly half the speed of C# in this regard, which is more difference than I’d expect for what is essentially the same code. At this preliminary stage I can only attribute this to the .Net Regex libraries being better.

JMD is released under the same permissive MIT license as MarkdownSharp. Please feel free to use it, let me know what you think or to contribute.

7 comments:

Dave Newton said...

Did you benchmark it against the existing MarkdownJ?

William Shields said...

No I didn't benchmark against MarkdownJ.

Considering both are written in a similar fashion (regexes based on the original Perl Markdown) I'd be surprised if they were much different.

That regex-based version however is (for me) just a stopgap to the real version currently being written (more on this later).

anjan said...

hi there,

did you build against JDK 6 or JDK 5 ?
also, did you use 32-bit bverson or 64 bit version ?

thank you,

BR,
~A

William Shields said...

The code was built and run against Java 6u17 (32 bit on Windows 7/64).

The code is written to the Java 5 level (which basically means no @Override annotations on interface implementations).

Anonymous said...

Maybe you should redo it using Google's Java Regex library, which runs in constant time:

http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html

Mathias said...

I also would be interested in a comparison with the newly released "pegdown" (http://github.com/sirthias/pegdown), another pure-Java Markdown processor taking a completely different approach. Rather than going the way of regular expression pegdown relies on a PEG parser. And it passes the original Markdown Testsuite to 100% (ignoring whitespace).

Anonymous said...

I just downloaded end executed the tests and 5 of them fails.

-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running TestSuite
Tests run: 46, Failures: 5, Errors: 0, Skipped: 0, Time elapsed: 0.905 sec <<< FAILURE!

Results :

Failed tests:
testMarkDown(com.cforcoding.jmd.MarkDownTest)
testMarkDown(com.cforcoding.jmd.MarkDownTest)
testMarkDown(com.cforcoding.jmd.MarkDownTest)
testMarkDown(com.cforcoding.jmd.MarkDownTest)
testMarkDown(com.cforcoding.jmd.MarkDownTest)

Tests run: 46, Failures: 5, Errors: 0, Skipped: 0

Post a Comment