Home 🎯 Master of Markdown - Debunking the Parsing Issue in GitHub Pages
Post
Cancel

🎯 Master of Markdown - Debunking the Parsing Issue in GitHub Pages

📺 Context

  • You keep a GitHub repository for documentation purpose.
  • All the documents are written in Markdown1 syntax.
  • You activate GitHub Pages2 to turn the repo into a nice little website (cause you’re don’t prefer reinventing the wheel).
  • The site is published, and out of the blue, beautiful markdown text formattings are breaking up! 🤦‍♂️
  • Googling and going through stack overflow threads and GitHub Docs aren’t giving anything useful.
  • At this point, life ain’t making sense anymore. ☹ But what if I tell you, there’s a way? Take the red pill 💊, and see how deep the rabbit hole goes! 👇

⏩ TL;DR

In a rush? Skip to The Fix 🔨 to get the solution.

🍦 Background

I’m an avid fan of cyber security engagements, and so I keep my solution writeups for CTF (Capture The Flag)3 contests in my handy ctf-journal repo.

This is how the README.md looks like in VS Code4 and GitHub web:

GitHub View

After publishing through GitHub pages, the markdown-to-HTML conversion didn’t seem to go the right way.

Pages View

See the scaled out middle section? Clearly, GitHub’s static site service is misinterpreting the markdown. Let’s investigate and sort it out.

🔎 Solution

📌 Markdown Has Flavors?

Back in 2004, Markdown came into being with the intention of being appealing to human readers in its markup form. This made it widely popular in blogging, documentation, software development and readmes.

Markdown Courtesy: @developer_anand

But it had some ambiguities and inconsistencies, and so a lot of markdown flavors (syntax variations) started making their appearance.

💡 To solve this issue, an uniform open source specification called CommonMark5 was born.

GitHub made their own dialect of markdown based on this spec by extending its features. This is called GitHub Flavored Markdown, often shortened as GFM. Now we know we are dealing with GFM whenever we use GitHub web.

In case you’re curious, check out the list of flavors and a comparison between the commonmark compliant ones.

📌 Strange Case of Dr Jekyll

When a repo is activated for the website service, GitHub uses Jekyll6 to build the site under the hood and then deploy into GitHub Pages. Jekyll is natively supported by GitHub, so the folder structure follows a generic Jekyll site layout.

Jekyll

The configuration of a Jekyll site entirely takes place in a _config.yml file in the root directoy.

It offers a lot of configuration options for ease of development, integration and maintenance.

Interestingly, there are options for markdown flavors as well. 🤔

📌 Connecting The Dots 🧵

All of the markdown variants are parsed by their corresponding markdown processor for conversion into HTML format. GitHub Pages supports two such processors:

  1. Kramdown
  2. GitHub’s own processor that renders GFM

Kramdown is the default Markdown renderer for Jekyll. In this regard, GitHub claims-

You can use GFM with either processor, but only our GFM processor will always match the results you see on GitHub.

Cool! We found the stumbling brick. 😃

Since GitHub was invoking Jekyll out-of-the-box with default options, my site’s GFM markdown was being rendered by Kramdown and thus the HTML turned out different!

Meme

📌 The Fix🔨

Following the official docs, we can modify our repo’s _config.yml file (create one if nonexistant) in the root directory. Add markdown: GFM option to override the kramdown parser.

Fixing

Trigger the build and deployment workflow7 again and review the site once it is deployed.

Pages Fixed

All fixed! ✔ Faith in life restored! ❤

That’s all for today folks.


📝 References


🧲 Terminologies


  1. Markdown: A simple markup language for formatting texts and compiling into HTML

  2. GitHub Pages: A sweet static website service by GitHub that builds a site from the static contents of a repository. Such sites automatically bind to a github.io domain by default. 

  3. Capture The Flag: CTF is an information security competition which deals with forensics, cryptography, binary analysis, reverse engeneering and many more cyber security topics. 

  4. VS Code: Visual Studio Code is a code editor by Microsoft for building and debugging modern web and cloud applications. 

  5. Commonmark: A rationalized specification of Markdown that defines the language syntax and offers a tests suite to validate Markdown implementations. 

  6. Jekyll: A static site generator with a simplified build process involving markdown and HTML files. 

  7. Workflow: A configurable automated process that runs one or more jobs. The default pages build and deployment workflow by GitHub builds the sources and deploys it to GitHub pages. 

This post is licensed under CC BY 4.0 by the author.

-

-