The goal of tinkr is to convert (R)Markdown files to XML and back to allow their editing with xml2 (XPath!) instead of numerous complicated regular expressions. If new to XPath refer to this great intro. Possible applications are R scripts using this and XPath in xml2 to:

Only the body of the (R) Markdown file is cast to XML, using the Commonmark specification via the commonmark package. YAML metadata could be edited using the yaml package, which is not the goal of this package.

The current workflow I have in mind is

  1. use to_xml to obtain XML from (R) Markdown (based on commonmark::markdown_xml and blogdown:::split_yaml_body).

  2. edit the XML using xml2.

  3. use to_md to save back the resulting (R) Markdown (this uses a XSLT stylesheet, and the xslt package).

Maybe there could be shortcuts functions for some operations in 2, maybe not.

Installation

Wanna try the package and tell me what doesn’t work?

remotes::install_github("ropenscilabs/tinkr")

Examples

This is a basic example. We read “example1.md”, change all headers 3 to headers 1, and save it back to md.

# From Markdown to XML
path <- system.file("extdata", "example1.md", package = "tinkr")
yaml_xml_list <- to_xml(path)

library("magrittr")
# transform level 3 headers into level 1 headers
body <- yaml_xml_list$body
body %>%
  xml2::xml_find_all(xpath = './/d1:heading',
                     xml2::xml_ns(.)) %>%
  .[xml2::xml_attr(., "level") == "3"] -> headers3

xml2::xml_set_attr(headers3, "level", 1)

yaml_xml_list$body <- body

# Back to Markdown
to_md(yaml_xml_list, "newmd.md")
file.edit("newmd.md")

For R Markdown files, to ease editing of chunk label and options, to_xml munges the chunk info into different attributes. E.g. below you see that code_blocks can have a language, name, echo attributes.

Loss of Markdown style

General principles and solution

The (R)md to XML to (R)md loop on which tinkr is based is slightly lossy because of Markdown syntax redundancy, so the loop from (R)md to R(md) via to_xml and to_md will be a bit lossy. For instance

  • lists can be created with either “+”, “-” or "*“. When using tinkr, the (R)md after editing will only use”-" for lists.

  • Links built like [word][smallref] and bottom [smallref]: URL become [word](URL).

  • Characters are escaped (e.g. “[” when not for a link).

  • Block quotes lines all get “>” whereas in the input only the first could have a “>” at the beginning of the first line.

  • For tables see the next subsection.

Such losses make your (R)md different, and the git diff a bit harder to parse, but should not change the documents your (R)md is rendered to. If it does, report a bug in the issue tracker!

A solution to not loose your Markdown style, e.g. your preferring "*" over “-” for lists is to tweak our XSL stylesheet and provide its filepath as stylesheet_path argument to to_md.

The special case of tables

  • Tables are not pretty anymore (only three dashes for each cell cf spec) after a full loop to_xml + to_md. If you’re an XSL wizard, feel free to help us prettify Markdown tables i.e. make the number of dashes under headers dependent on the longest string in the column, see this issue.

Meta

Please note that the ‘tinkr’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.