Convert PubMed Central table nodes into a list of tibbles

pmc_table(doc)

Arguments

doc

xml_document from PubMed Central

Value

a list of tibbles

Note

Saves the caption and footnotes as attributes and collapses multiline headers, expands all rowspan and colspan attributes and adds subheadings to column one.

Author

Chris Stubben

Examples

# doc <- pmc_xml("PMC2231364") doc <- xml2::read_xml(system.file("extdata/PMC2231364.xml", package = "tidypmc" )) x <- pmc_table(doc)
#> Parsing 4 tables
#> Adding footnotes to Table 1
sapply(x, dim)
#> Table 1 Table 2 Table 3 Table 4 #> [1,] 39 23 4 34 #> [2,] 5 5 4 4
x
#> $`Table 1` #> # A tibble: 39 x 5 #> subheading `Potential operon… `Gene ID` `Putative or pred… `Reference (s)` #> <chr> <chr> <chr> <chr> <chr> #> 1 Iron uptake … yfeABCD operon* (… YPO2439-… Transport/binding… yfeABCD [54] #> 2 Iron uptake … hmuRSTUV operon (… YPO0279-… Transport/binding… hmuRSTUV [55] #> 3 Iron uptake … ysuJIHG* (r > 0.9… YPO1529-… Iron uptake - #> 4 Iron uptake … sufABCDS* (r > 0.… YPO2400-… Iron-regulated Fe… - #> 5 Iron uptake … YPO1854-1856* (r … YPO1854-… Iron uptake or he… - #> 6 Sulfur metab… tauABCD operon (r… YPO0182-… Transport/binding… tauABCD [56] #> 7 Sulfur metab… ssuEADCB operon (… YPO3623-… Sulphur metabolism ssu operon [57] #> 8 Sulfur metab… cys operon (r > 0… YPO3010-… Cysteine synthesis - #> 9 Sulfur metab… YPO1317-1319 (r >… YPO1317-… Sulfur metabolism? - #> 10 Sulfur metab… YPO4109-4111 (r >… YPO4109-… Sulfur metabolism? - #> # … with 29 more rows #> #> $`Table 2` #> # A tibble: 23 x 5 #> subheading `Gene locus` `Gene ID` Description reference #> <chr> <chr> <chr> <chr> <chr> #> 1 Category A: … yfeABCD YPO2439-2442 Inorganic iron and mangan… [36] #> 2 Category A: … yfuABC YPO2958-2960 Inorganic iron transport … [37] #> 3 Category A: … ybt locus YPO1906-1916 Siderophore-dependent Yer… [74] #> 4 Category A: … hmuRSTUV YPO0279-0283 Heme transport system [38] #> 5 Category A: … TonB-exbB-ex… YPO2193, YP… TonB-ExbB-ExbD complex [75] #> 6 Category A: … yiuABCR YPO1310-1313 Putative siderophore ABC-… [76] #> 7 Category A: … ysuFJIHG YPO1528-1532 Siderophore biosynthetic … [76] #> 8 Category B: … fitABCD YPO4022-4025 Putative iron ABC transpo… NA #> 9 Category B: … Others YPO0778-0776 putative siderophore bios… NA #> 10 Category B: … NA YPO1011-1012 putative TonB-dependent o… NA #> # … with 13 more rows #> #> $`Table 3` #> # A tibble: 4 x 4 #> Cluster `Genes or operons for … `Strict consensus of … `Hits of consensus` #> <chr> <chr> <chr> <chr> #> 1 Cluster… rps-rpm-rpl operon, rp… PurR-like box: 5' ACG… rps-rpm-rpl operon, p… #> 2 Cluster… hmuRSTUV, YPO0682, YPO… Fur-like box: 5' TGAT… hmuRSTUV, YPO0682, YP… #> 3 Cluster… cysB, ssuEADCB, cysK, … Fnr-like box: 5' TGAN… ssuEADCB, cysK, YPO15… #> 4 Cluster… sdhCDAB-sucABCD, nuoA-… Fnr/Crp-like box: 5' … sdhCDAB-sucABCD, pta,… #> #> $`Table 4` #> # A tibble: 34 x 4 #> subheading `Environmental pertur… Description `Reference (s)` #> <chr> <chr> <chr> <chr> #> 1 Stimulon ana… Temperature shift NA [8, 9] #> 2 Stimulon ana… Vegetative growth tem… Shift from 26°C to 37°C… NA #> 3 Stimulon ana… Heat shock Shift from 37°C to 45°C… NA #> 4 Stimulon ana… Cold shock Shift from 37°C to 10°C… NA #> 5 Stimulon ana… Osmotic stress NA [10] #> 6 Stimulon ana… High osmolarity Treatment with 0.5 M so… NA #> 7 Stimulon ana… High salinity Treatment with 0.5 M Na… NA #> 8 Stimulon ana… Oxidative stress Treatment with 1 mM H2O… NA #> 9 Stimulon ana… Mild acid stress Shift from pH7.2 to pH … NA #> 10 Stimulon ana… Low Mg2+ Growth under 10 μM Mg2+ [15] #> # … with 24 more rows #>
attributes(x[[1]])
#> $names #> [1] "subheading" "Potential operon (r value)" #> [3] "Gene ID" "Putative or predicted function" #> [5] "Reference (s)" #> #> $row.names #> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 #> #> $class #> [1] "tbl_df" "tbl" "data.frame" #> #> $caption #> [1] "Stress-responsive operons in Y. pestis predicted from microarray expression data" #> #> $footnotes #> [1] "'r' represents the correlation coefficient of adjacent genes; '*' represent the defined operon has the similar expression pattern in two other published microarray datasets [7, 21]; '?' inferred functions of uncharacterized genes; '-' means the corresponding operons have not been experimentally validated in other bacteria." #>