Skip to contents

Take a data frame of coded text documents and return a data frame of the codes captured within.

Usage

parse_qcodes(x, ...)

Arguments

x

A data frame containing the text to be coded; requires columns "doc_id" and "document_text"

...

Other parameters optionally passed in

Value

If the data frame contains coded text in the document_text

column, output will be a data frame with three columns: "doc", "qcode", and "text".

    The \code{doc} is the \code{doc_id} from the input data frame.

    \code{qcode} is the code that the captured text was marked up with.

    \code{text} is the text that was captured.

Details

This function takes a text document containing coded text of the form:

"stuff to ignore (QCODE) coded text we care about (/QCODE){#my_code}
more stuff to ignore"

and turns it into a data frame with one row per coded item, of the form: docid,qcode,text

parse_qcodes assumes that it is being passed a data frame, the parse_one_document function is called to do the heavy lifting extracting the coded text from the document_text column.

Newline characters are replaced with an HTML <br> in the captured text.

If no valid qcodes are found, parse_qcodes returns an empty data frame (no rows).

Examples

parse_qcodes(my_documents)
#> Error in eval(expr, envir, enclos): object 'my_documents' not found

# Data frames can be piped into this function
my_documents %>%
  parse_qcodes()
#> Error in eval(expr, envir, enclos): object 'my_documents' not found