The goal of tinkr is to convert (R)Markdown files to XML and back to allow their editing with xml2
(XPath!) instead of numerous complicated regular expressions. Would you like to kknow more? This is great intro if you are new to XPath and this is a good resource on XSLT for XML transformations.
Possible applications are R scripts using this and XPath in xml2
to:
Only the body of the (R) Markdown file is cast to XML, using the Commonmark specification via the commonmark
package. YAML metadata could be edited using the yaml
package, which is not the goal of this package.
We have created an R6 class object called yarn to store the representation of both the YAML and the XML data, both of which are accessible through the $body
and $yaml
elements. In addition, the namespace prefix is set to “md” in the $ns
element.
You can perform XPath queries using the $body
and $ns
elements:
library("tinkr")
library("xml2")
path <- system.file("extdata", "example1.md", package = "tinkr")
ex1 <- tinkr::yarn$new(path)
# find all ropensci.org blog links
xml_find_all(
x = ex1$body,
xpath = ".//md:link[contains(@destination,'ropensci.org/blog')]",
ns = ex1$ns
)
#| {xml_nodeset (7)}
#| [1] <link destination="https://ropensci.org/blog/2018/08/21/birds-radolfzell/" title="" ...
#| [2] <link destination="https://ropensci.org/blog/2018/09/04/birds-taxo-traits/" title=" ...
#| [3] <link destination="https://ropensci.org/blog/2018/08/21/birds-radolfzell/" title="" ...
#| [4] <link destination="https://ropensci.org/blog/2018/08/14/where-to-bird/" title="">\n ...
#| [5] <link destination="https://ropensci.org/blog/2018/08/21/birds-radolfzell/" title="" ...
#| [6] <link destination="https://ropensci.org/blog/2018/08/28/birds-ocr/" title="">\n <t ...
#| [7] <link destination="https://ropensci.org/blog/2018/09/04/birds-taxo-traits/" title=" ...
Wanna try the package and tell me what doesn’t work?
remotes::install_github("ropenscilabs/tinkr")
This is a basic example. We read “example1.md”, change all headers 3 to headers 1, and save it back to md. Because the {xml2} objects are passed by reference, manipulating them does not require reassignment.
library("magrittr")
#|
#| Attaching package: 'magrittr'
#| The following objects are masked from 'package:testthat':
#|
#| equals, is_less_than, not
library("tinkr")
# From Markdown to XML
path <- system.file("extdata", "example1.md", package = "tinkr")
# Level 3 header example:
cat(tail(readLines(path, 40)), sep = "\n")
#| ### Getting a list of 50 species from occurrence data
#|
#| For more details about the following code, refer to the [previous post
#| of the series](https://ropensci.org/blog/2018/08/21/birds-radolfzell/).
#| The single difference is our adding a step to keep only data for the
#| most recent years.
ex1 <- tinkr::yarn$new(path)
# transform level 3 headers into level 1 headers
ex1$body %>%
xml2::xml_find_all(xpath = ".//md:heading[@level='3']", ex1$ns) %>%
xml2::xml_set_attr("level", 1)
# Back to Markdown
tmp <- tempfile(fileext = "md")
ex1$write(tmp)
# Level three headers are now Level one:
cat(tail(readLines(tmp, 40)), sep = "\n")
#| # Getting a list of 50 species from occurrence data
#|
#| For more details about the following code, refer to the [previous post
#| of the series](https://ropensci.org/blog/2018/08/21/birds-radolfzell/).
#| The single difference is our adding a step to keep only data for the
#| most recent years.
unlink(tmp)
For R Markdown files, to ease editing of chunk label and options, to_xml
munges the chunk info into different attributes. E.g. below you see that code_blocks
can have a language
, name
, echo
attributes.
path <- system.file("extdata", "example2.Rmd", package = "tinkr")
rmd <- tinkr::yarn$new(path)
rmd$body
#| {xml_document}
#| <document xmlns="http://commonmark.org/xml/1.0">
#| [1] <code_block xml:space="preserve" language="r" name="setup" include="FALSE" eval="T ...
#| [2] <heading level="2">\n <text xml:space="preserve">R Markdown</text>\n</heading>
#| [3] <paragraph>\n <text xml:space="preserve">This is an </text>\n <strikethrough>\n ...
#| [4] <paragraph>\n <text xml:space="preserve">When you click the </text>\n <strong>\n ...
#| [5] <code_block xml:space="preserve" language="r" name="" eval="TRUE" echo="TRUE">summ ...
#| [6] <heading level="2">\n <text xml:space="preserve">Including Plots</text>\n</heading>
#| [7] <paragraph>\n <text xml:space="preserve">You can also embed plots, for example:</ ...
#| [8] <code_block xml:space="preserve" language="python" name="" fig.cap=""pretty p ...
#| [9] <code_block xml:space="preserve" language="python" name="">plot(pressure)\n</code_ ...
#| [10] <paragraph>\n <text xml:space="preserve">Non-RMarkdown blocks are also considered ...
#| [11] <code_block info="bash" xml:space="preserve" name="">echo "this is an unevaluted b ...
#| [12] <code_block xml:space="preserve" name="">This is an ambiguous code block\n</code_b ...
#| [13] <paragraph>\n <text xml:space="preserve">Note that the </text>\n <code xml:space ...
#| [14] <table>\n <table_header>\n <table_cell align="left">\n <text xml:space="p ...
#| [15] <paragraph>\n <text xml:space="preserve">blabla</text>\n</paragraph>
Note that all of the features in {tinkr} work for both Markdown and R Markdown.
Inserting new nodes into the AST is surprisingly difficult if there is a default namespace, so we have provided a method in the yarn object that will take plain markdown and translate it to XML nodes and insert them into the document for you. For example, you can add a new code block:
path <- system.file("extdata", "example2.Rmd", package = "tinkr")
rmd <- tinkr::yarn$new(path)
xml2::xml_find_first(rmd$body, ".//md:code_block", rmd$ns)
#| {xml_node}
#| <code_block space="preserve" language="r" name="setup" include="FALSE" eval="TRUE">
new_code <- c(
"```{r xml-block, message = TRUE}",
"message(\"this is a new chunk from {tinkr}\")",
"```")
new_table <- data.frame(
package = c("xml2", "xslt", "commonmark", "tinkr"),
cool = TRUE
)
# Add chunk into document after the first chunk
rmd$add_md(new_code, where = 1L)
# Add a table after the second chunk:
rmd$add_md(knitr::kable(new_table), where = 2L)
# show the first 21 lines of modified document
rmd$head(21)
#| ---
#| title: "Untitled"
#| author: "M. Salmon"
#| date: "September 6, 2018"
#| output: html_document
#| ---
#|
#| ```{r setup, include=FALSE, eval=TRUE}
#| knitr::opts_chunk$set(echo = TRUE)
#| ```
#|
#| ```{r xml-block, message = TRUE}
#| message("this is a new chunk from {tinkr}")
#| ```
#|
#| | package | cool |
#| | :------------------------- | :------------------ |
#| | xml2 | TRUE |
#| | xslt | TRUE |
#| | commonmark | TRUE |
#| | tinkr | TRUE |
The (R)md to XML to (R)md loop on which tinkr
is based is slightly lossy because of Markdown syntax redundancy, so the loop from (R)md to R(md) via to_xml
and to_md
will be a bit lossy. For instance
lists can be created with either “+”, “-” or “*“. When using tinkr
, the (R)md after editing will only use”-" for lists.
Links built like [word][smallref]
and bottom [smallref]: URL
become [word](URL)
.
Characters are escaped (e.g. “[” when not for a link).
Block quotes lines all get “>” whereas in the input only the first could have a “>” at the beginning of the first line.
For tables see the next subsection.
Such losses make your (R)md different, and the git diff a bit harder to parse, but should not change the documents your (R)md is rendered to. If it does, report a bug in the issue tracker!
A solution to not loose your Markdown style, e.g. your preferring “*” over “-” for lists is to tweak our XSL stylesheet and provide its filepath as stylesheet_path
argument to to_md
.
Please note that the ‘tinkr’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.