Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. R build status Coverage status

The goal of tinkr is to convert (R)Markdown files to XML and back to allow their editing with xml2 (XPath!) instead of numerous complicated regular expressions. Would you like to kknow more? This is great intro if you are new to XPath and this is a good resource on XSLT for XML transformations.

Use-Cases

Possible applications are R scripts using this and XPath in xml2 to:

  • change levels of headers, cf this script and this pull request to roweb2
  • change chunk labels and options
  • extract all runnable code (including inline code)
  • insert arbitrary markdown elements
  • modify link URLs
  • your idea, feel free to suggest use cases!

Workflow

Only the body of the (R) Markdown file is cast to XML, using the Commonmark specification via the commonmark package. YAML metadata could be edited using the yaml package, which is not the goal of this package.

We have created an R6 class object called yarn to store the representation of both the YAML and the XML data, both of which are accessible through the $body and $yaml elements. In addition, the namespace prefix is set to “md” in the $ns element.

You can perform XPath queries using the $body and $ns elements:

library("tinkr")
library("xml2")
path <- system.file("extdata", "example1.md", package = "tinkr")
ex1 <- tinkr::yarn$new(path)
# find all ropensci.org blog links
xml_find_all(
  x = ex1$body, 
  xpath = ".//md:link[contains(@destination,'ropensci.org/blog')]", 
  ns = ex1$ns
)
#| {xml_nodeset (7)}
#| [1] <link destination="https://ropensci.org/blog/2018/08/21/birds-radolfzell/" title="" ...
#| [2] <link destination="https://ropensci.org/blog/2018/09/04/birds-taxo-traits/" title=" ...
#| [3] <link destination="https://ropensci.org/blog/2018/08/21/birds-radolfzell/" title="" ...
#| [4] <link destination="https://ropensci.org/blog/2018/08/14/where-to-bird/" title="">\n ...
#| [5] <link destination="https://ropensci.org/blog/2018/08/21/birds-radolfzell/" title="" ...
#| [6] <link destination="https://ropensci.org/blog/2018/08/28/birds-ocr/" title="">\n  <t ...
#| [7] <link destination="https://ropensci.org/blog/2018/09/04/birds-taxo-traits/" title=" ...

Installation

Wanna try the package and tell me what doesn’t work?

remotes::install_github("ropenscilabs/tinkr")

Examples

Markdown

This is a basic example. We read “example1.md”, change all headers 3 to headers 1, and save it back to md. Because the {xml2} objects are passed by reference, manipulating them does not require reassignment.

library("magrittr")
#| 
#| Attaching package: 'magrittr'
#| The following objects are masked from 'package:testthat':
#| 
#|     equals, is_less_than, not
library("tinkr")
# From Markdown to XML
path <- system.file("extdata", "example1.md", package = "tinkr")
# Level 3 header example:
cat(tail(readLines(path, 40)), sep = "\n")
#| ### Getting a list of 50 species from occurrence data
#| 
#| For more details about the following code, refer to the [previous post
#| of the series](https://ropensci.org/blog/2018/08/21/birds-radolfzell/).
#| The single difference is our adding a step to keep only data for the
#| most recent years.
ex1  <- tinkr::yarn$new(path)
# transform level 3 headers into level 1 headers
ex1$body %>%
  xml2::xml_find_all(xpath = ".//md:heading[@level='3']", ex1$ns) %>% 
  xml2::xml_set_attr("level", 1)

# Back to Markdown
tmp <- tempfile(fileext = "md")
ex1$write(tmp)
# Level three headers are now Level one:
cat(tail(readLines(tmp, 40)), sep = "\n")
#| # Getting a list of 50 species from occurrence data
#| 
#| For more details about the following code, refer to the [previous post
#| of the series](https://ropensci.org/blog/2018/08/21/birds-radolfzell/).
#| The single difference is our adding a step to keep only data for the
#| most recent years.
unlink(tmp)

R Markdown

For R Markdown files, to ease editing of chunk label and options, to_xml munges the chunk info into different attributes. E.g. below you see that code_blocks can have a language, name, echo attributes.

path <- system.file("extdata", "example2.Rmd", package = "tinkr")
rmd <- tinkr::yarn$new(path)
rmd$body
#| {xml_document}
#| <document xmlns="http://commonmark.org/xml/1.0">
#|  [1] <code_block xml:space="preserve" language="r" name="setup" include="FALSE" eval="T ...
#|  [2] <heading level="2">\n  <text xml:space="preserve">R Markdown</text>\n</heading>
#|  [3] <paragraph>\n  <text xml:space="preserve">This is an </text>\n  <strikethrough>\n  ...
#|  [4] <paragraph>\n  <text xml:space="preserve">When you click the </text>\n  <strong>\n ...
#|  [5] <code_block xml:space="preserve" language="r" name="" eval="TRUE" echo="TRUE">summ ...
#|  [6] <heading level="2">\n  <text xml:space="preserve">Including Plots</text>\n</heading>
#|  [7] <paragraph>\n  <text xml:space="preserve">You can also embed plots, for example:</ ...
#|  [8] <code_block xml:space="preserve" language="python" name="" fig.cap="&quot;pretty p ...
#|  [9] <code_block xml:space="preserve" language="python" name="">plot(pressure)\n</code_ ...
#| [10] <paragraph>\n  <text xml:space="preserve">Non-RMarkdown blocks are also considered ...
#| [11] <code_block info="bash" xml:space="preserve" name="">echo "this is an unevaluted b ...
#| [12] <code_block xml:space="preserve" name="">This is an ambiguous code block\n</code_b ...
#| [13] <paragraph>\n  <text xml:space="preserve">Note that the </text>\n  <code xml:space ...
#| [14] <table>\n  <table_header>\n    <table_cell align="left">\n      <text xml:space="p ...
#| [15] <paragraph>\n  <text xml:space="preserve">blabla</text>\n</paragraph>

Note that all of the features in {tinkr} work for both Markdown and R Markdown.

Inserting new markdown elements

Inserting new nodes into the AST is surprisingly difficult if there is a default namespace, so we have provided a method in the yarn object that will take plain markdown and translate it to XML nodes and insert them into the document for you. For example, you can add a new code block:

path <- system.file("extdata", "example2.Rmd", package = "tinkr")
rmd <- tinkr::yarn$new(path)
xml2::xml_find_first(rmd$body, ".//md:code_block", rmd$ns)
#| {xml_node}
#| <code_block space="preserve" language="r" name="setup" include="FALSE" eval="TRUE">
new_code <- c(
  "```{r xml-block, message = TRUE}",
  "message(\"this is a new chunk from {tinkr}\")",
  "```")
new_table <- data.frame(
  package = c("xml2", "xslt", "commonmark", "tinkr"),
  cool = TRUE
)
# Add chunk into document after the first chunk
rmd$add_md(new_code, where = 1L)
# Add a table after the second chunk:
rmd$add_md(knitr::kable(new_table), where = 2L)
# show the first 21 lines of modified document
rmd$head(21)
#| ---
#| title: "Untitled"
#| author: "M. Salmon"
#| date: "September 6, 2018"
#| output: html_document
#| ---
#| 
#| ```{r setup, include=FALSE, eval=TRUE}
#| knitr::opts_chunk$set(echo = TRUE)
#| ```
#| 
#| ```{r xml-block, message = TRUE}
#| message("this is a new chunk from {tinkr}")
#| ```
#| 
#| | package                    | cool                | 
#| | :------------------------- | :------------------ |
#| | xml2                       | TRUE                | 
#| | xslt                       | TRUE                | 
#| | commonmark                 | TRUE                | 
#| | tinkr                      | TRUE                |

Loss of Markdown style

General principles and solution

The (R)md to XML to (R)md loop on which tinkr is based is slightly lossy because of Markdown syntax redundancy, so the loop from (R)md to R(md) via to_xml and to_md will be a bit lossy. For instance

  • lists can be created with either “+”, “-” or “*“. When using tinkr, the (R)md after editing will only use”-" for lists.

  • Links built like [word][smallref] and bottom [smallref]: URL become [word](URL).

  • Characters are escaped (e.g. “[” when not for a link).

  • Block quotes lines all get “>” whereas in the input only the first could have a “>” at the beginning of the first line.

  • For tables see the next subsection.

Such losses make your (R)md different, and the git diff a bit harder to parse, but should not change the documents your (R)md is rendered to. If it does, report a bug in the issue tracker!

A solution to not loose your Markdown style, e.g. your preferring “*” over “-” for lists is to tweak our XSL stylesheet and provide its filepath as stylesheet_path argument to to_md.

The special case of tables

  • Tables are supposed to remain/become pretty after a full loop to_xml + to_md. If you notice something amiss, e.g. too much space compared to what you were expecting, please open an issue.

Meta

Please note that the ‘tinkr’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.