Take a data frame of coded text documents and return a data frame of the codes captured within.
Value
If the data frame contains coded text in the document_text
column, output will be a data frame with three columns: "doc",
"qcode", and "text".
The \code{doc} is the \code{doc_id} from the input data frame.
\code{qcode} is the code that the captured text was marked up with.
\code{text} is the text that was captured.
Details
This function takes a text document containing coded text of the form:
"stuff to ignore (QCODE) coded text we care about (/QCODE){#my_code}
more stuff to ignore"
and turns it into a data frame with one row per coded
item, of the form: docid,qcode,text
parse_qcodes
assumes that it is being passed a data frame, the
parse_one_document
function is called to do the heavy lifting
extracting the coded text from the document_text
column.
Newline characters are replaced with an HTML <br>
in the captured text.
If no valid qcodes are found, parse_qcodes
returns an empty data frame
(no rows).
Examples
parse_qcodes(my_documents)
#> Error: object 'my_documents' not found
# Data frames can be piped into this function
my_documents %>%
parse_qcodes()
#> Error: object 'my_documents' not found