Parsing Japanese Text in Markdown-Python for Stylizing and Semantic Purposes

, in category Technical

fonts, Japanese, markdown, pelican, python

Due to my studies I (will) often use Japanese in my blog. As I gave some thought to typography and readability, I found the default appearance of Japanese text to be in stark contrast with the rest of my design.1 To target specifically Japanese text, I wrote a small Markdown-Python extension for use in static blog generators as Jekyll and Pelican (or pretty much anything that utilizes Markdown-Python to parse Markdown in HTML) and embed such text in a span with the language attribute set to Japanese. The added, and probably more important bonus, aside from styling and semantic reasons, is that this method counters the negative effects of Han unification in so-called CJK-languages.


I’ve added the extension on it’s own repository on my github for anyone interested, but as it serves it’s purpose for me as-is I have no further interest in maintaining it at the moment.2


Copy the script into your python-markdown extension directory.

If you’re using Pelican as static site generator, open your project’s and add 'japanese' to the MD_EXTENSIONS list:

    MD_EXTENSIONS = ['japanese']


Using a simple regular expression (\{\{)(.+?)(\}\}), the extension treats double {} brackets as span tags with a lang="ja" attribute.


will output

    <span lang="ja">読書クラブ</span>

Example 1 (fonts): just compare 読書(どくしょ)クラブ (custom) to 読書(どくしょ)クラブ (Meiryo) to 読書(どくしょ)クラブ (MS Gothic default).3

Example 2 (unihan): compare the Chinese to Japanese characters: (), (), ().4


Although it’s a bit of a risk performance-wise, I’m quite a fan of Google’s free webfonts.5 Due the complexity of the Japanese character-set, development on these have been slow6, but Google’s Noto Font is getting quite performant and with the Japanese font set supporting near 7000 characters, it should pose no problem for most webprojects. Since it works better, typography-wise, with the rest of my fonts, I use this one over fonts as Meiryo that’re more widespread across all platforms.

Using the CSS below, I ensure max compatibility by using Meiryo and others as fallback if the page can’t connect to Google’s font API.

@import url(;

    [lang="ja"] {
      font-family: "Noto Sans Japanese", "メイリオ","Meiryo","ヒラギノ角ゴ Pro W3",
      "Hiragino Kaku Gothic Pro","MS Pゴシック","MS PGothic",Sans-Serif;
      font-weight: 100;
      font-size: 95%;

Further reading

  1. This is less so on mobile devices. Most Windows web browsers default to MS Gothic, lacking anti-aliasing found in newer fonts as Meiryo, and require some manual adjustments. For maximal compatibility, I prefer to do this in-site. If no further customization is necessary, just adding Meiryo as fallback font in the page’s font-family is sufficient, eg. font-family: Arial, Helvetica, Meiryo,sans-serif;

  2. A possible extension could be one where different regular expressions test for different languages and thus deliver different lang attributes. If I find the need for that on my own blog (eg. Korean), I’ll update this.  

  3. For furigana support I use a slightly editted version of an existing MD extension available at

  4. If a multilingual page uses only Japanese, it’s sufficient to put a Japanese font as fallback in the body’s font-family. If occasionally Chinese or Korean characters are used as well, this approach, aside from semantic benefits, remains more recommended. 

  5. I use Quicksand and Poirot One for all my latin-based text on this page, for example. 

  6. Adobe’s competing, subscription-based Typekit apparently offers a wider range of Japanese webfonts for anyone interested: