Separate all matching text into multiple rows

separate_text(txt, pattern, column = "text")

Arguments

txt

a tibble, usually results from pmc_text

pattern

either a regular expression or a vector of words to find in text

column

column name, default "text"

Value

a tibble

Note

passed to grepl and str_extract_all

Author

Chris Stubben

Examples

# doc <- pmc_xml("PMC2231364") doc <- xml2::read_xml(system.file("extdata/PMC2231364.xml", package = "tidypmc")) txt <- pmc_text(doc) separate_text(txt, "[ATCGN]{5,}")
#> # A tibble: 9 x 5 #> match section paragraph sentence text #> <chr> <chr> <int> <int> <chr> #> 1 ACGCAATCG… Results and Discussion;… 2 3 A 16 basepair (bp) box… #> 2 AAACGTTTN… Results and Discussion;… 2 4 It is very similar to … #> 3 TGATAATGA… Results and Discussion;… 2 5 A 21 bp box (5'-TGATAA… #> 4 GATAATGAT… Results and Discussion;… 2 6 It is a 10-1-10 invert… #> 5 TGANNNNNN… Results and Discussion;… 2 7 A 15 bp box (5'-TGANNN… #> 6 TTGATN Results and Discussion;… 2 8 It is a part of the E.… #> 7 NATCAA Results and Discussion;… 2 8 It is a part of the E.… #> 8 GTTAATTAA Results and Discussion;… 3 4 The ArcA regulator can… #> 9 GTTAATTAA… Results and Discussion;… 3 5 An ArcA-box-like seque…
separate_text(txt, "\\([A-Z]{3,6}s?\\)")
#> # A tibble: 5 x 5 #> match section paragraph sentence text #> <chr> <chr> <int> <int> <chr> #> 1 (EMSA) Abstract 2 5 Several regulatory DNA motif… #> 2 (PMNs) Background 1 8 Most of the organisms that i… #> 3 (SOM) Methods; Clustering a… 1 4 For the original and the fil… #> 4 (MEME) Methods; Discovery of… 1 3 Collections of promoter sequ… #> 5 (IPTG) Methods; Gel mobility… 1 3 Expression of His-Fur was in…
# pattern can be a vector of words separate_text(txt, c("hmu", "ybt", "yfe", "yfu"))
#> # A tibble: 4 x 5 #> match section paragraph sentence text #> <chr> <chr> <int> <int> <chr> #> 1 yfe Results and Discussion; Cl… 3 4 Genes in category A (yfe… #> 2 hmu Results and Discussion; Cl… 3 4 Genes in category A (yfe… #> 3 yfu Results and Discussion; Cl… 3 4 Genes in category A (yfe… #> 4 ybt Results and Discussion; Cl… 3 4 Genes in category A (yfe…
# wrappers for separate_text with extra step to expand matched ranges separate_refs(txt)
#> # A tibble: 93 x 6 #> id match section paragraph sentence text #> <dbl> <chr> <chr> <int> <int> <chr> #> 1 1 [1] Backgrou… 1 1 Yersinia pestis is the etiological … #> 2 2 [2] Backgrou… 1 3 To produce a transmissible infectio… #> 3 3 [3] Backgrou… 1 9 However, a few bacilli are taken up… #> 4 4 [4,5] Backgrou… 1 10 Residence in this niche also facili… #> 5 5 [4,5] Backgrou… 1 10 Residence in this niche also facili… #> 6 6 [6] Backgrou… 2 1 A DNA microarray is able to determi… #> 7 7 [7-9] Backgrou… 2 2 We and others have measured the gen… #> 8 8 [7-9] Backgrou… 2 2 We and others have measured the gen… #> 9 9 [7-9] Backgrou… 2 2 We and others have measured the gen… #> 10 10 [10] Backgrou… 2 2 We and others have measured the gen… #> # … with 83 more rows
#> # A tibble: 103 x 6 #> gene match section paragraph sentence text #> <chr> <chr> <chr> <int> <int> <chr> #> 1 purR PurR Abstract 2 5 Several regulatory D… #> 2 phoP PhoP Background 2 3 We also identified t… #> 3 ompR OmpR Background 2 3 We also identified t… #> 4 oxyR OxyR Background 2 3 We also identified t… #> 5 csrA CsrA Results and Discussion 1 3 After the determinat… #> 6 slyA SlyA Results and Discussion 1 3 After the determinat… #> 7 phoPQ PhoPQ Results and Discussion 1 3 After the determinat… #> 8 hmsH hmsHF… Results and Discussion… 3 3 For example, the hem… #> 9 hmsF hmsHF… Results and Discussion… 3 3 For example, the hem… #> 10 hmsR hmsHF… Results and Discussion… 3 3 For example, the hem… #> # … with 93 more rows
separate_tags(txt, "YPO")
#> # A tibble: 35 x 6 #> id match section paragraph sentence text #> <chr> <chr> <chr> <int> <int> <chr> #> 1 YPO19… YPO199… Results and Discussi… 1 4 For the operons, YPO… #> 2 YPO19… YPO199… Results and Discussi… 1 4 For the operons, YPO… #> 3 YPO19… YPO199… Results and Discussi… 1 4 For the operons, YPO… #> 4 YPO10… YPO108… Results and Discussi… 1 4 For the operons, YPO… #> 5 YPO10… YPO108… Results and Discussi… 1 4 For the operons, YPO… #> 6 YPO08… YPO0881 Results and Discussi… 1 5 Microarray analysis … #> 7 YPO08… YPO0882 Results and Discussi… 1 5 Microarray analysis … #> 8 YPO08… YPO0883 Results and Discussi… 1 5 Microarray analysis … #> 9 YPO08… YPO0884 Results and Discussi… 1 5 Microarray analysis … #> 10 YPO08… YPO088… Results and Discussi… 2 4 However, only a smal… #> # … with 25 more rows