A recent question on the markdown-discuss mailing list resulted in some suggestions for an extension to Python-Markdown. I was able to point Simon to the documentation for writing extensions, but it occurs to me that that document could be a little overwhelming for a first-timer. Especially when all he needs is to alter the behavior of a few inline patterns.
So, without further ado, I present a tutorial which steps through creating a Python-Markdown Extension which incorporates something similar to Simon's suggestion.
First, we need to establish the syntax we will be implementing. While Simon's suggestion would work as is, I'm more inclined to implement a slight variation which follows the prior art of the txt2tags project. Interestingly, the CREOLE project more-or-less adopted this same syntax and has an interesting explanation of the reasoning behind their community based decision. While I may not agree with all their reasoning, I do like the idea that in each instance double characters are used for markup. That way, there's less chance of a single character needing to be escaped - both for the machine and human reader. So, the syntax looks like this:
Two hyphens for --strike--. Two underscores for __underline__. Two asterisks for **bold**. Two slashes for //italic//.
The first step is to create the boilerplate code that will be required by any Python-Markdown Extension.
import markdown class MyExtension(markdown.Extension): def extendMarkdown(self, md, md_globals): # Insert code here to change markdown's behavior. pass def makeExtension(configs=None): return MyExtension(configs=configs)
Save the above code as mdx_myextension.py. Now, obviously, that code doesn't
really do anything useful, but now that we have it in place, we can actually
start implementing our new syntax.
To start, let's implement the one part of that syntax that doesn't overlap with
Markdown's standard syntax; the --strike-- syntax. I'm actually going to call
it "del" (delete) rather than "strike" as the html generated will be the <del>
tag.
The first step is to write a regular expression to match the del syntax.
DEL_RE = r'(--)(.*?)--'
Now, there are probably a few things I should explain about that. First, you may
note that the first set of hyphens ((--)) are grouped in parentheses. This is
because we will be using a generic pattern class provided by Python-Markdown.
Specifically, the SimpleTextPattern which expects the text content to be found
in group(3) of the regular expression. As the entire text (including markup)
will be in group(1), we add the extra group to force the content we want into
group(3).
Second, you may want to note that the content is matched using a non-greedy
match (.*?). Otherwise, everything between the first occurrence and the last
would all be placed inside one <del> tag, which we do not want.
So, let's incorporate our regular expression into Markdown:
DEL_RE = r'(--)(.*?)--' class MyExtension(markdown.Extension): def extendMarkdown(self, md, md_globals): # Create the del pattern del_tag = markdown.inlinepatterns.SimpleTagPattern(DEL_RE, 'del') # Insert del pattern into markdown parser md.inlinepatterns.add('del', del_tag, '>not_strong')
If you noticed, we added two lines. The first line creates an instance of a
SimpleTagPattern. This generic pattern class takes two arguments; the
regular expression to match against (in this case DEL_RE), and the name of
the tag to insert the text of group(3) into ("del").
The second line adds our new pattern to the Markdown parser. In the event that
it is not obvious, the extendMarkdown method of any markdown.Extension class is passed two arguments; "md" and "md_globals". "md" is actually the instance
of the Markdown class. This allows you to alter anything you want in the class
from your extension. In this case, we are adding a new inline pattern named
"del", using our pattern instance del_tag after the pattern named
"not_strong" (thus the '>not_strong').
Now let's test our new extension. Open a python interpreter in the same directory as you saved your file ("mdx_myextension.py") and try the following:
>>> import markdown >>> markdown.markdown('foo --deleted-- bar', ['myextension']) u'<p>foo <del>deleted</del> bar</p>'
Notice that we passed in "myextension" as an extension name. Markdown automatically appended "mdx_" to the name and tried to import it. As long as the file is on your PYTHONPATH, Markdown will find it and load the extension.
Let's add our syntax for underline, or as I'm referring to it __ins__ for the <ins> tag.
DEL_RE = r'(--)(.*?)--' INS_RE = r'(__)(.*?)__' class MyExtension(markdown.Extension): def extendMarkdown(self, md, md_globals): del_tag = markdown.inlinepatterns.SimpleTagPattern(DEL_RE, 'del') md.inlinepatterns.add('del', del_tag, '>not_strong') ins_tag = markdown.inlinepatterns.SimpleTagPattern(INS_RE, 'ins') md.inlinepatterns.add('ins', ins_tag, '>del')
That should be self explanatory. We simply created a new pattern which matches
our "ins" syntax and added it after the "del" pattern. What's interesting about
this is that we do not even need to alter the existing bold syntax (__bold__)
as our pattern has been inserted into the parser before the existing bold
pattern (named "strong"). Therefore, by the time that the "strong" pattern gets
to run, our extension has already identified the double underscores as inserts,
so there's no match against the "strong" pattern.
Therefore, if all we wanted to implement was ins and del syntax we are done -
well, except maybe giving it a decent name. Go ahead and test it out. That
being the case, we'll stop here, and pick up with Part 2 (coming soon) where
we implement the new bold and italic syntax which replaces Markdown's
existing syntax.

