Approach to Correcting Misspelt Local Government Areas
Source:vignettes/interactive.Rmd
interactive.Rmd
Motivation
Nigeria has 774 Local Government Areas (LGAs). There are a number of factors that can make working with them challenging:
- Some LGAs share the same name with their respective States e.g. Bauchi LGA in Bauchi State, and this can lead to wrong application of the data.
- Some LGAs bear the same name although they are in different States, e.g. Obi LGA in Benue and Nasarawa States.
- LGA spelling discrepancies are rife in the literature, and believe it or not, even in official government documents.
A proposed solution
The function fix_region
has methods that are designed to
address spelling errors LGAs1. These fixes can be carried out in 3
incremental phases; if the user cannot effect the corrections at a
particular level s/he can proceed to the next stage. The phases are as
follows:
- Automatic
- Interactive
- Manual
1. Automatic fixes
When the spelling error is slight and unambiguous, the function
automatically effects the repair. However, when there is only a single
misspelt LGA, and especially if it is supplied as a string,
fix_region
will signal an error.
library(naijR)
fix_region("Legos Island")
#> Error in `fix_region()`:
#> ! Incorrect region name(s); consider reconstructing 'x' with `states()`
#> or `lgas()` for a more reliable fix
The thinking is that an automated solution may not be necessary for a single value that was probably provided interactively.
When another LGA is added, such that the argument passed to the function is a character vector with more that one element:
fix_region(c("Legos Island", "Amuwo-Odofin"))
#> ℹ Successful fix(es):
#> -------------------
#> * Legos Island => Lagos Island
Sometimes when we use a character vector to perform this check, the
function fix_region
may find it difficult to decide on what
fixes to apply. In such instances, the best thing is to convert the
vector into an lgas
object.
chars <- c("Legos Island", "Amuwo Odofin")
fix_region(chars)
#> Error in `fix_region()`:
#> ! Incorrect region name(s); consider reconstructing 'x' with `states()`
#> or `lgas()` for a more reliable fix
# Create an `lgas` object
lgs <- lgas(c("Legos Island", "Amuwo Odofin"))
#> Warning: One or more items is not an LGA. Spelling error?
fix_region(lgs)
#> ℹ Successful fix(es):
#> -------------------
#> * Legos Island => Lagos Island
#> * Amuwo Odofin => Amuwo-Odofin
#> [1] "Lagos Island" "Amuwo-Odofin"
#> attr(,"misspelt")
#> character(0)
#> attr(,"regions.fixed")
#> Legos Island Amuwo Odofin
#> "Lagos Island" "Amuwo-Odofin"
Note that the constructor lgas()
, by default, signals a
warning when the vector we are supplying has misspelt LGAs. However, to
avoid the unnecessary verbosity whilst attempting the fix, this warning
is suppressed if lgas()
is nested in
fix_region()
.
fix_region(lgas(c("Legos Island", "Amuwo Odofin")))
#> ℹ Successful fix(es):
#> -------------------
#> * Legos Island => Lagos Island
#> * Amuwo Odofin => Amuwo-Odofin
#> [1] "Lagos Island" "Amuwo-Odofin"
#> attr(,"misspelt")
#> character(0)
#> attr(,"regions.fixed")
#> Legos Island Amuwo Odofin
#> "Lagos Island" "Amuwo-Odofin"
When the spelling mistakes depart far from the available LGA spellings, the user is shown a message stating what fixes could not be applied. To now carry out a fix, one can proceed to the next stage.
fix_region(lgas(c("Orange County", "Amuwo Odofin")))
#> ℹ Successful fix(es):
#> -------------------
#> * Amuwo Odofin => Amuwo-Odofin
#>
#> Fix(es) not applied:
#> --------------------
#> * Orange County
#> [1] "Orange County" "Amuwo-Odofin"
#> attr(,"misspelt")
#> [1] "Orange County"
#> attr(,"regions.fixed")
#> Amuwo Odofin
#> "Amuwo-Odofin"
2. Interactive fixes
When the automatic fix is not feasible, the user has the option of
doing it interactively by calling fix_region
and setting
its interactive
argument to TRUE
. By following
the prompts, the misspelt LGA as well as possible replacements are
presented. All the user needs to do is to select the desired replacement
value. This is particularly useful when the user is not sure of what the
correct spelling might be.
When a misspelt LGA has more than one match, the interactive approach is the viable option for effecting fixes. In the example below, not all the mispelt LGAs could be fixed automatically:
mispelt.adamawa <- c("Fufure", "Demsa", "Machika", "Fufure", "Ganye", "Hong")
# check for misspelt LGAs and, if necessary, attempt to fix
length(mispelt.adamawa)
#> [1] 6
sum(is_lga(mispelt.adamawa))
#> [1] 3
corrected.adamawa <- fix_region(mispelt.adamawa)
#> 'Machika' approximately matched more than one region - Michika, Machina
#> ℹ Successful fix(es):
#> -------------------
#> * Fufure => Fufore
#>
#> Fix(es) not applied:
#> --------------------
#> * Machika
sum(is_lga(corrected.adamawa))
#> [1] 5
We see that the string ‘Machika’ is not an actual LGA and there are
more than one possible candidate LGA that are considered as possible
replacement. Our original intent was to use “Michika LGA”. To address
this mistake, we can run fix_region
interactively. Note
that the interactive
option is only available for use with
objects of class lgas
.
adamawa <- fix_region(lgas(corrected.adamawa), interactive = TRUE)
Next, the user is provided with a prompt to provide a search item for
likely LGAs that would be an appropriate replacement for the misspelt
one. Possible replacements are presented and to select any value, the
appropriate number should be entered at the prompt. The option
RETRY
allows the user to restart the query and provide
another search term, SKIP
is to try out a different
misspelt item (where there are more than one) and to QUIT
is to exit the prompt. On Windows machines, the user can use the native
dialog boxes by setting graphic
to TRUE
.
3. Manual fixes
When spelling errors are identified and the correct one is already
known, then a manual fix can be applied. For this, we use the function
fix_region_manual
and its use is straightforward2:
adamawa <- fix_region_manual(corrected.adamawa,
wrong = "Machika",
correct = "Michika")
adamawa
#> [1] "Fufore" "Demsa" "Michika" "Fufore" "Ganye" "Hong"