Take a data frame of coded text documents and return a data frame of the codes captured within.
Arguments
- x
A data frame containing the text to be coded; requires columns "doc_id" and "document_text"
- ...
Other parameters optionally passed in
Value
If the data frame contains coded text in the document_text
column, output will be a data frame with three columns: "doc", "qcode", and "text".
The \code{doc} is the \code{doc_id} from the input data frame.
\code{qcode} is the code that the captured text was marked up with.
\code{text} is the text that was captured.
Details
This function takes a text document containing coded text of the form:
"stuff to ignore (QCODE) coded text we care about (/QCODE){#my_code}
more stuff to ignore"
and turns it into a data frame with one row per coded
item, of the form: docid,qcode,text
parse_qcodes
assumes that it is being passed a data frame, the
parse_one_document
function is called to do the heavy lifting
extracting the coded text from the document_text
column.
Newline characters are replaced with an HTML <br>
in the captured text.
If no valid qcodes are found, parse_qcodes
returns an empty data frame
(no rows).
Examples
parse_qcodes(my_documents)
#> Error in eval(expr, envir, enclos): object 'my_documents' not found
# Data frames can be piped into this function
my_documents %>%
parse_qcodes()
#> Error in eval(expr, envir, enclos): object 'my_documents' not found