<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-336308386934546555.post6693774793451367620..comments</id><updated>2010-01-23T19:17:29.255+08:00</updated><title type='text'>Comments on C for Coding: Markdown Headings, Grief and Unknown Elements to t...</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.cforcoding.com/feeds/6693774793451367620/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html'/><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>11</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-7036827005617490211</id><published>2010-01-23T19:17:29.255+08:00</published><updated>2010-01-23T19:17:29.255+08:00</updated><title type='text'>Let me also include a pointer to BableMark, a mark...</title><content type='html'>Let me also include a pointer to BableMark, a markdown implementation testbed: http://babelmark.bobtfish.net/?markdown=x%3Cmax(a,b)%0D%0A&amp;amp;normalize=on&lt;br /&gt;&lt;br /&gt;You may find the grammar linked to for the &amp;quot;PEG Markdown&amp;quot; implementation useful.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/7036827005617490211'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/7036827005617490211'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264245449255#c7036827005617490211' title=''/><author><name>dtm</name><uri>http://dtm.livejournal.com/</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-3625868028415273039</id><published>2010-01-23T18:46:38.073+08:00</published><updated>2010-01-23T18:46:38.073+08:00</updated><title type='text'>Right; Markdown seems full of fallback cases.

In ...</title><content type='html'>Right; Markdown seems full of fallback cases.&lt;br /&gt;&lt;br /&gt;In fact, I wonder if that isn&amp;#39;t the right way to specify it, as a series of less-specific constructs.  There was a parsing approach I read a paper on a while ago that I unfortunately can&amp;#39;t remember the name of - it was slightly heavy on memory usage, but maybe that&amp;#39;s OK.  Rather than the traditional lexer/parser split, this approach was based on one uniform approach to the whole document, which was a series of functions:&lt;br /&gt;&lt;br /&gt;String -&amp;gt; [(ParsedStructure, restOfString)]&lt;br /&gt;&lt;br /&gt;where I&amp;#39;m using Haskell notation there to mean a list of pairs.  The first element of the list would be the most preferred parse, then the second-most-preferred parse, etc.  So as an example.  The paper&amp;#39;s example implementation language was Haskell, so all these lists were being constructed lazily; in Java you&amp;#39;d probably have to construct something that returned Iterators&amp;gt;.  The paper showed how to chain together different definitions so that if you had a high-level construct that was something like:&lt;br /&gt;&lt;br /&gt;INLINESTUFF = &amp;#39;*&amp;#39; TEXTRUN &amp;#39;*&amp;#39; | TEXTRUN&lt;br /&gt;&lt;br /&gt;you could then chain together a &amp;quot;recognize &amp;#39;*&amp;#39;&amp;quot; function and a &amp;quot;recognize TEXTRUN&amp;quot; function into a &amp;quot;recognize INLINESTUFF&amp;quot; function.  I wish I could remember the name of this approach so that I could find it because the details of how you combined stuff so that the whole result was efficient escape me at the moment.  Throwing seemingly related terms at Google isn&amp;#39;t helping.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/3625868028415273039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/3625868028415273039'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264243598073#c3625868028415273039' title=''/><author><name>dtm</name><uri>http://dtm.livejournal.com/</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-4309860515915086254</id><published>2010-01-23T09:25:52.992+08:00</published><updated>2010-01-23T09:25:52.992+08:00</updated><title type='text'>@dtm: I haven't seen the Markdown Extra one but ye...</title><content type='html'>@dtm: I haven&amp;#39;t seen the Markdown Extra one but yes I&amp;#39;ve seen the other. Markdown Extra is incomplete (and probably use some examples).&lt;br /&gt;&lt;br /&gt;Reading the token definitions it&amp;#39;s pretty much dead on what I&amp;#39;d derived anyway.&lt;br /&gt;&lt;br /&gt;Unfortunately Markdown is a really poor candidate for EBNF and the like because there are a lot of &amp;quot;else&amp;quot; clauses like:&lt;br /&gt;&lt;br /&gt;&amp;lt;http://example.com&amp;gt;&lt;br /&gt;&lt;br /&gt;is clearly a URL but to a lexer it could be:&lt;br /&gt;&lt;br /&gt;- a URL&lt;br /&gt;- a tag (although the rule for this would fail); or&lt;br /&gt;- a &amp;quot;textrun&amp;quot; (in MDE-speke)&lt;br /&gt;&lt;br /&gt;But the textrun is a fallback case. And there are lots of these fallback cases like:&lt;br /&gt;&lt;br /&gt;&amp;gt; blockquote&lt;br /&gt;&lt;br /&gt;which could reasonably be scanned as QUOTE TEXTRUN but the fallback case of TEXTRUN which is needed for&lt;br /&gt;&lt;br /&gt;sometext&lt;br /&gt;&lt;br /&gt;also covers the block quote, which leads to ambiguity unless you specifically enumerate all the possible start characters which is possible but I&amp;#39;ve had issues along that line too.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/4309860515915086254'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/4309860515915086254'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264209952992#c4309860515915086254' title=''/><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='07140129710674369084'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-7118225236857324432</id><published>2010-01-23T09:00:21.527+08:00</published><updated>2010-01-23T09:00:21.527+08:00</updated><title type='text'>What are you using as the canonical source of what...</title><content type='html'>What are you using as the canonical source of what Markdown should be?  The original spec of John Gruber (found at http://daringfireball.net/projects/markdown/syntax) is known to be a bit ambiguous and troublesome in some details.  Are you aware of the &amp;quot;Markdown Extra&amp;quot; spec at http://michelf.com/specs/markdown-extra/ ?</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/7118225236857324432'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/7118225236857324432'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264208421527#c7118225236857324432' title=''/><author><name>dtm</name><uri>http://dtm.livejournal.com/</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-5619014758116029861</id><published>2010-01-22T07:43:26.768+08:00</published><updated>2010-01-22T07:43:26.768+08:00</updated><title type='text'>Your four articles are well worth reading - thank ...</title><content type='html'>Your four articles are well worth reading - thank you!&lt;br /&gt;&lt;br /&gt;You have hinted along the way that there are certain portions of Markdown that are either troublesome, conflicting or in some way cause grief. I would love to see a wrap up  article with a summary of what you would like to see as a &amp;quot;Markdown Mark II&amp;quot;, complete with a sensible lexer/parser and spec.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/5619014758116029861'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/5619014758116029861'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264117406768#c5619014758116029861' title=''/><author><name>goyuix</name><uri>http://goyuix.myopenid.com/</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-7399967617714087525</id><published>2010-01-22T06:54:25.881+08:00</published><updated>2010-01-22T06:54:25.881+08:00</updated><title type='text'>@Daniel: You're probably not going to get EBNF out...</title><content type='html'>@Daniel: You&amp;#39;re probably not going to get EBNF out of this. Context-sensitivity is the issue. Perhaps that can be unrolled to a CFG but I&amp;#39;m not entirely convinced of that. The code so far makes decisions and looks for tokens based on context.&lt;br /&gt;&lt;br /&gt;It&amp;#39;ll be something to discuss once I get the first cut out there.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/7399967617714087525'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/7399967617714087525'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264114465881#c7399967617714087525' title=''/><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='07140129710674369084'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-1885402510200979127</id><published>2010-01-22T02:22:37.415+08:00</published><updated>2010-01-22T02:22:37.415+08:00</updated><title type='text'>I think you are on the right track treating each l...</title><content type='html'>I think you are on the right track treating each line as a token from the lexical analysis. It seems to me that markdown &amp;quot;grammar&amp;quot; is much more concerned with vertical elements than horizontal elements. The horizontal elements -- emphasis, bold, code, hyperlinks -- are rather easily handled.&lt;br /&gt;&lt;br /&gt;On the vertical, however, you have normal blocks, quote blocks, code blocks, item blocks, numbered blocks, block separators and whatever combinations might be valid.&lt;br /&gt;&lt;br /&gt;My suggestion, in fact, is for you to do two grammars: one at the document level, and one at the block level. You get two lexers. The first will return block tokens. Depending on the type of block, a different second lexer will be applied. Likewise, two grammars.&lt;br /&gt;&lt;br /&gt;Once you have such thing working, you can probably reduce it to a single complex lexer/grammar much more easily than do it right from the start.&lt;br /&gt;&lt;br /&gt;As a side note, I&amp;#39;m very interested in this project, because I&amp;#39;d like to try doing it with Scala and either the backtracking parser or the ratpack parser. And ADTs instead of visitor patterns. :-) So I have a vested interested in having an EBNF for it. :-)</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/1885402510200979127'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/1885402510200979127'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264098157415#c1885402510200979127' title=''/><author><name>Daniel</name><uri>http://www.blogger.com/profile/07505997833685327219</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-9003337771345319161</id><published>2010-01-21T20:17:09.620+08:00</published><updated>2010-01-21T20:17:09.620+08:00</updated><title type='text'>Java is first. C# will be second.

Javascript is a...</title><content type='html'>Java is first. C# will be second.&lt;br /&gt;&lt;br /&gt;Javascript is an interesting one. Because the regex library might be faster than the hand-coded solution. We&amp;#39;ll see when it gets to that stage.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/9003337771345319161'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/9003337771345319161'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264076229620#c9003337771345319161' title=''/><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='07140129710674369084'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-8675714478682520677</id><published>2010-01-21T19:51:39.295+08:00</published><updated>2010-01-21T19:51:39.295+08:00</updated><title type='text'>Love this set of articles. Are you just working in...</title><content type='html'>Love this set of articles. Are you just working in Java, or will the final code be easily portable to Javascript, PHP, etc?</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/8675714478682520677'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/8675714478682520677'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1264074699295#c8675714478682520677' title=''/><author><name>DisgruntledGoat</name><uri>http://svivian.myopenid.com/</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-4908866681328006930</id><published>2010-01-18T06:26:54.775+08:00</published><updated>2010-01-18T06:26:54.775+08:00</updated><title type='text'>Part of the problem has been that obviously there'...</title><content type='html'>Part of the problem has been that obviously there&amp;#39;s no formal grammar to follow so I&amp;#39;ve had to play it by ear. Add to that that you have choices about how to handle certain cases (eg do you treat something as a lexical or parsing issue?) and I&amp;#39;ve had to stop and rethink my approach several times.&lt;br /&gt;&lt;br /&gt;Last night I came to the realization that I am going to have to have an extra line-level step that I was otherwise hoping to avoid. I started to realize my previous single-pass approach was going to end up with some hideous corner cases that were simply too complex to reasonably deal with whereas an extra pass would much more reasonable deal with them and let me better handle the inline parsing to boot.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/4908866681328006930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/4908866681328006930'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1263767214775#c4908866681328006930' title=''/><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='07140129710674369084'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-336308386934546555.post-5615488152165725522</id><published>2010-01-18T04:04:20.186+08:00</published><updated>2010-01-18T04:04:20.186+08:00</updated><title type='text'>To clarify, I think Markdown has 2 syntaxes: 1) A ...</title><content type='html'>To clarify, I think Markdown has 2 syntaxes: 1) A block-level syntax that should tokenize whole lines as particular kinds of lines; and, 2) an in-line syntax that can be parsed separately.&lt;br /&gt;&lt;br /&gt;I think you could reasonably do something like:&lt;br /&gt;&lt;br /&gt;  line-grammar:&lt;br /&gt;     regular-line&lt;br /&gt;   | block-line&lt;br /&gt;   | empty-line&lt;br /&gt;   | heading-line-block&lt;br /&gt;   | heading-line-inline&lt;br /&gt;&lt;br /&gt;  regular-line: &amp;lt;&amp;lt; any line starting with regular characters &amp;gt;&amp;gt; &lt;br /&gt;  block-line: &amp;lt;&amp;lt; line starting with spaces &amp;gt;&amp;gt;&lt;br /&gt;  empty-line: &amp;lt;&amp;lt; &amp;gt;&amp;gt;&lt;br /&gt;  heading-line-block: &amp;lt;&amp;lt; full line of &amp;#39;###&amp;#39; heading characters&amp;gt;&amp;gt;&lt;br /&gt;  heading-inline-block: &amp;lt;&amp;lt; &amp;#39;##&amp;#39; with text &amp;gt;&amp;gt;&lt;br /&gt;&lt;br /&gt;With that, the inline-checking at the start of lines would be fairly minimal and could be easily put into lexing rules (or hand written).&lt;br /&gt;&lt;br /&gt;Once you have all of the lines, a block-level parser could collate and segment blocks, producing a line-oriented stream of tokens that could then be parsed by the in-line lexer/parser.&lt;br /&gt;&lt;br /&gt;It&amp;#39;s certainly more work, but I think it better reflects the syntax of the language.  Indeed, I have a feeling that Markdown is unparseable in LL(k) where k is any sane number.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/5615488152165725522'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/336308386934546555/6693774793451367620/comments/default/5615488152165725522'/><link rel='alternate' type='text/html' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html?showComment=1263758660186#c5615488152165725522' title=''/><author><name>gabe</name><uri>http://www.blogger.com/profile/04674905748219516290</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html' ref='tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620' source='http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620' type='text/html'/></entry></feed>