slu-opengis / postmastr Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 7.0 883 KB

R package for Processing and Parsing Untidy Street Addresses

Home Page: https://slu-opengis.github.io/postmastr/

License: GNU General Public License v3.0

R 99.86% Shell 0.14%

postmastr's People

Contributors

Stargazers

Watchers

Forkers

brucker3 berkaytok arbelt northerndruid mtdukes datadrivensupplychain psrc

postmastr's Issues

Converting st names to directions

Before you open your issue:

All issues and contributions are covered by the Code of Conduct
Please check out the Contribution guidelines

Describe the bug
Hello,
Thanks for this. Let me know if you need more testers. I also have preliminary code set up for apt number matching if you would like that.

Anyways, back to the bug. It correctly converts North to N or East to E and even keeps directions in street addresses as directions, like E 28th st -> E 28th st. But what I found it does in error is convert letter street names to directions. I'm not sure why it does this, but this should not occur. For example:
E St -> East St.
N Avenue -> North Ave
N N Avenue -> N North Ave

Should not occur because there are streets that are N and E streets.

Expected behavior
It should keep letter names for sts.
E St -> E St.
N Avenue -> N Ave
N N Avenue -> N N Ave

To Reproduce
You can use the examples provided above.

cx2 <- pm_identify(cx2, var = address)



##NEW DATAFRAME CX3 WITH PARSED VARIABLES FOR ADDRESS TO EXTRACT ADDRESS INTO ITS CONSTITUTENT PARTS
cx3 <- cx2 %>%
  pm_parse(input = "short", address = "address", output = "short",keep_parsed="yes",
           dir_dict = dirs, suffix_dict = suff )

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
Windows 11
R 4.2.2
R studio 2021.09.0 Build 351

Additional context
Add any other context about the problem here.

Some addresses that have had "issues"

I'm working through a large list of addresses supplied by the census bureau geocoder, and found a few special cases that the directional prefix/suffix parser has had problems with.
1441 EAST, HOUSTON, TX, 77007
7124 AVENUE E, HOUSTON, TX, 77011
9315 E AVE N, HOUSTON, TX, 77012
1630 DALLAS ST, SOUTH HOUSTON, TX, 77587 (this one I didn't have "South Houston" in the city dictionary, so it got confused at the city stage)

Geocoding in Mexico

Before you open your issue:

All issues and contributions are covered by the Code of Conduct
Please check out the Contribution guidelines

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Provide a reprex
Please include a minimal reproducible example (AKA a reprex). If you've never
heard of a reprex before, start by reading
https://www.tidyverse.org/help/#reprex. When you create your reprex, please use a data set available in a package.

How can I help to work on the directions of Mexico?

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

in pm_prep(), limit forward-slash replacement to intersection-type addresses

Current problem:
Currently, pm_prep() replaces forward-slashes with the string "at" in all addresses, even though this logic seems intended for intersection-types only.

Desired solution:
Suppress the forward-slash replacement logic for "street"-type addresses.

reprex:
As shown below, the address "10/20 Main Street" gets changed to "10 at 20 Main Street":

library(postmastr)

df <- data.frame(my.address = "10/20 Main Street, Seattle, Washington")

df_ident <- pm_identify(df, 
                        var="my.address", 
                        locale="us")

df_min <- pm_prep(df_ident, 
                  var="my.address", 
                  type = "street")

str(df_min)
#> tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ pm.uid    : int 1
#>  $ pm.address: chr "10 at 20 Main Street Seattle Washington"

Describe alternatives you've considered
None, I have to admit.

"type = " missing with pm_prep()

Before you open your issue:

All issues and contributions are covered by the Code of Conduct
Please check out the Contribution guidelines

Describe the bug
A clear and concise description of what the bug is.
Error on running code line
Expected behavior
A clear and concise description of what you expected to happen.
No error should occur
To Reproduce
Please include a minimal reproducible example (AKA a reprex). If you've never
heard of a reprex before, start by reading
https://www.tidyverse.org/help/#reprex. When you create your reprex, please use a data set available in a package.

# inset reprex here
sushi1_min <- pm_prep(sushi1, var = "address")
#> Error in pm_prep(sushi1, var = "address"): argument "type" is missing, with no default

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [WIN]
Version of R [e.g. v3.60]
Version of RStudio [1.2.1335]

Additional context
Add any other context about the problem here.
adding [type = "street"] resolves issue. Other option was "intersection", I believe

Bug in `pm_has_unit`

Before you open your issue:

All issues and contributions are covered by the Code of Conduct
Please check out the Contribution guidelines

Describe the bug
working_data is referenced in two places in the function, but I think .data is the proper argument.

Expected behavior
pm_has_unit gives me the error message "Error 2." even though pm_has_uid==TRUE

To Reproduce
Please include a minimal reproducible example (AKA a reprex). If you've never
heard of a reprex before, start by reading
https://www.tidyverse.org/help/#reprex. When you create your reprex, please use a data set available in a package.

library(postmastr)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.2
#> Warning: package 'ggplot2' was built under R version 4.2.2
#> Warning: package 'tibble' was built under R version 4.2.2
#> Warning: package 'readr' was built under R version 4.2.2
#> Warning: package 'purrr' was built under R version 4.2.2
#> Warning: package 'dplyr' was built under R version 4.2.2
#> Warning: package 'stringr' was built under R version 4.2.2
#> Warning: package 'forcats' was built under R version 4.2.2

sushi1 %>%
  filter(name != "Drunken Fish - Ballpark Village") %>% 
  pm_identify(var = address) %>% 
  pm_prep(var = address, type = "address") %>% 
  pm_has_unit()
#> Error in pm_has_unit(.): Error 2.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Windows 10 x64
Version of R: v4.2.1
Version of RStudio [RStudio v2022.07.1+554.pro3 Spotted Wakerobin]

Additional context
Add any other context about the problem here.

pm_streetSuf_parse() fails to identify many street suffixes

Describe the bug
pm_streetSuf_parse() does not identify many street suffixes as illustrated in the vignette.

I suspect this failure is possibly related to the current inability of the package to identify unit numbers.

Specific example: the pm_streetSuf_parse() method does not identify the street suffix "Drive" in the address, "310 Westline Drive, APT. 201B"

Expected Behavior
I guess I expected the street suffixes "DRVIE", "HWY", "RD", and "ROAD" from the example below to be identified and parsed.

I have verified that these string values are present in the street Suffix dictionary.

To Reproduce

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(postmastr)
library(stringr)
eidl_addresses <- c("98-199 KAMEHAMEHA HWY. E1","8928 S. LACLEDE STATION RD.","9785 MACKENZIE ROAD, SUITE 100","29805 MARLIS ST",
                    "310 WESTLINE DRIVE, APT. 201B")
addresses <- data.frame(eidl_addresses)
addresses <- addresses %>% pm_identify(var = eidl_addresses)
addresses <- addresses %>% pm_prep(var=eidl_addresses,type="short")
addresses <- addresses %>% pm_house_parse()
addresses
#> # A tibble: 5 x 3
#>   pm.uid pm.address               pm.house
#>    <int> <chr>                    <chr>   
#> 1      1 KAMEHAMEHA HWY. E1       98-199  
#> 2      2 S. LACLEDE STATION RD.   8928    
#> 3      3 MACKENZIE ROAD SUITE 100 9785    
#> 4      4 MARLIS ST                29805   
#> 5      5 WESTLINE DRIVE APT. 201B 310
addresses <- addresses %>% pm_streetDir_parse()
addresses
#> # A tibble: 5 x 4
#>   pm.uid pm.address               pm.house pm.preDir
#>    <int> <chr>                    <chr>    <chr>    
#> 1      1 KAMEHAMEHA HWY. E1       98-199   <NA>     
#> 2      2 LACLEDE STATION RD.      8928     S        
#> 3      3 MACKENZIE ROAD SUITE 100 9785     <NA>     
#> 4      4 MARLIS ST                29805    <NA>     
#> 5      5 WESTLINE DRIVE APT. 201B 310      <NA>
addresses <- addresses %>% pm_streetSuf_parse()
addresses
#> # A tibble: 5 x 5
#>   pm.uid pm.address               pm.house pm.preDir pm.streetSuf
#>    <int> <chr>                    <chr>    <chr>     <chr>       
#> 1      1 KAMEHAMEHA HWY. E1       98-199   <NA>      <NA>        
#> 2      2 LACLEDE STATION RD.      8928     S         <NA>        
#> 3      3 MACKENZIE ROAD SUITE 100 9785     <NA>      <NA>        
#> 4      4 MARLIS                   29805    <NA>      St          
#> 5      5 WESTLINE DRIVE APT. 201B 310      <NA>      <NA>
addresses <- addresses %>% pm_street_parse()
#> Error in get(genname, envir = envir) : object 'testthat_print' not found
addresses
#> # A tibble: 5 x 5
#>   pm.uid pm.house pm.preDir pm.street                pm.streetSuf
#>    <int> <chr>    <chr>     <chr>                    <chr>       
#> 1      1 98-199   <NA>      Kamehameha Hwy E1        <NA>        
#> 2      2 8928     S         Laclede Station Rd       <NA>        
#> 3      3 9785     <NA>      Mackenzie Road Suite 100 <NA>        
#> 4      4 29805    <NA>      Marlis                   St          
#> 5      5 310      <NA>      Westline Drive Apt 201b  <NA>
pm_dictionary(type="suffix")[str_detect("HWY",pm_dictionary(type="suffix")$suf.input),]
#> # A tibble: 2 x 3
#>   suf.type suf.input suf.output
#>   <chr>    <chr>     <chr>     
#> 1 Highway  HWY       Hwy       
#> 2 Way      WY        Way

Note that if the Apartment Number is removed from the 5th entry, the street suffix is identified:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(postmastr)
library(stringr)
eidl_addresses <- c("98-199 KAMEHAMEHA HWY. E1","8928 S. LACLEDE STATION RD.","9785 MACKENZIE ROAD, SUITE 100","29805 MARLIS ST",
                    "310 WESTLINE DRIVE")
addresses <- data.frame(eidl_addresses)
addresses <- addresses %>% pm_identify(var = eidl_addresses)
addresses <- addresses %>% pm_prep(var=eidl_addresses,type="short")
addresses <- addresses %>% pm_house_parse()
addresses
#> # A tibble: 5 x 3
#>   pm.uid pm.address               pm.house
#>    <int> <chr>                    <chr>   
#> 1      1 KAMEHAMEHA HWY. E1       98-199  
#> 2      2 S. LACLEDE STATION RD.   8928    
#> 3      3 MACKENZIE ROAD SUITE 100 9785    
#> 4      4 MARLIS ST                29805   
#> 5      5 WESTLINE DRIVE           310
addresses <- addresses %>% pm_streetDir_parse()
addresses
#> # A tibble: 5 x 4
#>   pm.uid pm.address               pm.house pm.preDir
#>    <int> <chr>                    <chr>    <chr>    
#> 1      1 KAMEHAMEHA HWY. E1       98-199   <NA>     
#> 2      2 LACLEDE STATION RD.      8928     S        
#> 3      3 MACKENZIE ROAD SUITE 100 9785     <NA>     
#> 4      4 MARLIS ST                29805    <NA>     
#> 5      5 WESTLINE DRIVE           310      <NA>
addresses <- addresses %>% pm_streetSuf_parse()
addresses
#> # A tibble: 5 x 5
#>   pm.uid pm.address               pm.house pm.preDir pm.streetSuf
#>    <int> <chr>                    <chr>    <chr>     <chr>       
#> 1      1 KAMEHAMEHA HWY. E1       98-199   <NA>      <NA>        
#> 2      2 LACLEDE STATION RD.      8928     S         <NA>        
#> 3      3 MACKENZIE ROAD SUITE 100 9785     <NA>      <NA>        
#> 4      4 MARLIS                   29805    <NA>      St          
#> 5      5 WESTLINE                 310      <NA>      Dr

Desktop (please complete the following information):

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19043)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.2    magrittr_2.0.1    tools_4.0.2       htmltools_0.5.1.1
#>  [5] yaml_2.2.1        stringi_1.5.3     rmarkdown_2.4     highr_0.8        
#>  [9] knitr_1.30        stringr_1.4.0     xfun_0.18         digest_0.6.27    
#> [13] rlang_0.4.10      evaluate_0.14

pm_rebuild with zip4

Need to add a dash between zip5 and zip4 on pm_rebuild()

Spain dictionaries, adapted tool

Hi,

Thank you for this great tool.
I´m interested to normalize Spanish addresses, do you know anyone hwo did the dictionaries for Spain?

Thank you in advance.
Rafa

pm_parse with PO Boxes

When calling pm_parse with an address that only has a PO Box and no street, the following error is returned:
Error in x:y : argument of length 0

Expected behavior
Address formatted to PO Box Number, City, State, Zip

To Reproduce
addr1b <- tibble(pm.id = 1, pm.uid = 1, pm.type = "full", address = "PO Box 111, Los Angeles, CA 90027")
pm_parse(addr1b,
input = "full",
address = "address",
output = "short")

Desktop (please complete the following information):
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Additional context
postmastr:0.1.0.9000

Case sensitivity

My test data has both census addresses and google addresses so that the street and city names can either be all caps, or mixed case. Some options or tools for handling case would be nice. Perhaps something as simple as an option for the city dictionary to automagically add upper case equivalents to entered mixed-case city names? I expect this may become an issue when I start playing with street names too.

Documentation issues

many commands obviously got their names updated without corresponding documentation updates.
pm_any_postal(sushi1_min) -> pm_postal_any
pm_all_postal(sushi1_min) -> pm_postal_all
pm_parse_postal(sushi1_min) -> pm_postal_parse
pm_any_state -> pm_state_any
pm_all_state -> pm_state_all
pm_no_state -> pm_state_none
There are probably more, but that is as far as I have gotten.

Dealing with Country Abbreviations at End of Address

Per @alankjackson - Google's geocoder returns data with USA affixed to the end of each address. We don't currently have functionality to remove countries, so as a workaround, the following code could be used to modify the sushi1 data:

# dependencies
library(postmastr)
library(dplyr)
library(stringr)

# add USA to the address data ala Google geocoder results
sushi <- mutate(sushi1, address = str_c(address, "USA", sep = " "))

# remove USA
sushi <- sushi %>%
  mutate(address = str_replace(string = address, 
                               pattern = "\\bUSA\\b$",
                               replacement = "")) %>%
  mutate(address = str_trim(address))
  
# create dictionaries
mo <- pm_dictionary(type = "state", filter = "MO", case = c("title", "upper"), locale = "us")
cities <- pm_append(type = "city",
                    input = c("Brentwood", "Clayton", "CLAYTON", "Maplewood", 
                              "St. Louis", "SAINT LOUIS", "Webster Groves"),
                    output = c(NA, NA, "Clayton", NA, NA, "St. Louis", NA))

# parse
sushi %>%
  filter(name != "Drunken Fish - Ballpark Village") %>%
  pm_parse(input = "full", address = address, output = "short", 
           keep_parsed = "no", city_dict = cities, state_dict = mo)

We'll have to figure out a long-term solution for these types of edits. pm_mutate doesn't make sense since its designed to edit observation by observation.

Example does not work

Describe the bug
Running example(pm_parse) results in the following output:

> example(pm_parse)

pm_prs> # construct dictionaries
pm_prs> dirs <- pm_dictionary(type = "directional", filter = c("N", "S", "E", "W"), locale = "us")

pm_prs> sufs <- pm_dictionary(type = "suffix", locale = "us")

pm_prs> mo <- pm_dictionary(type = "state", filter = "MO", case = c("title", "upper"), locale = "us")

pm_prs> cities <- pm_append(type = "city",
pm_prs+     input = c("Brentwood", "Clayton", "CLAYTON", "Maplewood", "St. Louis",
pm_prs+               "SAINT LOUIS", "Webster Groves"),
pm_prs+     output = c(NA, NA, "Clayton", NA, NA, "St. Louis", NA))

pm_prs> # add example data
pm_prs> df <- sushi1

pm_prs> # identify
pm_prs> df <- pm_identify(df, var = address)

pm_prs> # temporary code to subset unit
pm_prs> df <- dplyr::filter(df, name != "Drunken Fish - Ballpark Village")

pm_prs> # parse, full output
pm_prs> pm_parse(df, input = "full", address = address, output = "full", keep_parsed = "no",
pm_prs+     dir_dict = dirs, suffix_dict = sufs, city_dict = cities, state_dict = mo)
Error: Can't combine `..1$...street` <character> and `..2$...street` <logical>.
Run `rlang::last_error()` to see where the error occurred.

Expected behavior
I expected the example to complete without an error.

To Reproduce
Run example(pm_parse)

Screenshots

> rlang::last_trace()
<error/vctrs_error_incompatible_type>
Can't combine `..1$...street` <character> and `..2$...street` <logical>.
Backtrace:
     █
  1. ├─utils::example(pm_parse)
  2. │ └─base::source(...)
  3. │   ├─base::withVisible(eval(ei, envir))
  4. │   └─base::eval(ei, envir)
  5. │     └─base::eval(ei, envir)
  6. ├─postmastr::pm_parse(...)
  7. │ └─`%>%`(...)
  8. │   ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  9. │   └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 10. │     └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 11. │       └─postmastr:::`_fseq`(`_lhs`)
 12. │         └─magrittr::freduce(value, `_function_list`)
 13. │           ├─base::withVisible(function_list[[k]](value))
 14. │           └─function_list[[k]](value)
 15. │             └─postmastr:::pm_parse_street(...)
 16. │               └─`%>%`(...)
 17. │                 ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 18. │                 └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 19. │                   └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 20. │                     └─postmastr:::`_fseq`(`_lhs`)
 21. │                       └─magrittr::freduce(value, `_function_list`)
 22. │                         ├─base::withVisible(function_list[[k]](value))
 23. │                         └─function_list[[k]](value)
 24. │                           └─postmastr::pm_street_parse(., dictionary = street_dict, ordinal = ordinal)
 25. │                             └─postmastr::pm_street_std(...)
 26. │                               └─postmastr:::pm_street_ord(.data, var = !!varQ, locale = locale)
 27. │                                 └─postmastr:::pm_street_ord_us(.data, var = !!varQ)
 28. │                                   └─dplyr::bind_rows(noOrd, yesOrd)
 29. │                                     └─vctrs::vec_rbind(!!!dots, .names_to = .id)
 30. └─vctrs::vec_default_ptype2(...)
 31.   └─vctrs::stop_incompatible_type(...)
 32.     └─vctrs:::stop_incompatible(...)
 33.       └─vctrs:::stop_vctrs(...)

Desktop (please complete the following information):

OS: Darwin MacBook-Pro-2.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
Version of R: 4.0.1
Version of RStudio: NA

Some notes from a real use case

I just used the package to help turn a 200,000 record permit database into a geocoding file (https://cohgis-mycity.opendata.arcgis.com/datasets/permits-wm-structural?geometry=-97.509%2C29.379%2C-93.263%2C30.213)

A few things I noted.

A handful of records were missing the zip, and I didn't see a filter to remove those records, so I used:

geo2 <- pm_postal_detect(geo2) %>% 
  filter(pm.hasZip==TRUE) %>% 
  select(-pm.hasZip)

A few records had unit numbers attached to the house number (no blank), for example:

7301#1 AVE K, 77011
5146#2 LONGMONT DR, 77056
1212A W DREW ST, 77006
2002C GENESEE ST, 77006

so I cleaned them up with

geo2$pm.house <- str_remove(geo2$pm.house, "[A-Z#][0-9]*")

In general the package performed very well. Thanks for your efforts.

Output option "full" for pm_rebuild() not working

When the output argument for pm_rebuild() is set to "full" you receive the error message Error: object 'endQ' not found

library(tidyverse)
library(postmastr)

dirs <- pm_dictionary(type = "directional", filter = c("N", "S", "E", "W"), locale = "us")

original_df <- postmastr::sushi2 %>%
  pm_identify(var = address) 

parsed_df <- original_df %>%
  pm_prep(var = "address", type = "street") %>%
  pm_house_parse() %>%
  pm_streetDir_parse(dictionary = dirs) %>%
  pm_streetSuf_parse() %>%
  pm_street_parse(ordinal = TRUE, drop = TRUE)

merged_df <- pm_replace(parsed_df, source = original_df)

final_df_short <- pm_rebuild(merged_df, "short", keep_parsed = "no")
final_df_full <- pm_rebuild(merged_df, "full", keep_parsed = "no") # Error: object 'endQ' not found

Desktop:

OS: Windows
R Version 4.0.2 (2020-06-22)
RStudio Version 1.3.959

Street names like "three pines" get turned into "3Rd Pines" when normalized

It seems like the street dictionary should handle this, and maybe I am not using it correctly, but it seems to make no difference.

reprexdata <- tribble(
  ~address,
  "5330 THREE OAKS CIR, HOUSTON, TX, 77069",
  "3240 THREE PINES DR, HUMBLE, TX, 77339"
)
TX_dict <- pm_dictionary(type = "state", filter = "TX", 
                         case = "title", locale = "us")
cityDict <- pm_append(type = "city",
                      input = 
                        c("Houston", "Katy", "Pasadena", "Bellaire", 
                          "Humble", "Meadows Place", "Sugar Land",
                          "HOUSTON", "KATY", "PASADENA", "BELLAIRE",
                          "HUMBLE", "MEADOWS PLACE", "SUGAR LAND"
                          ))

dirs <- pm_dictionary(type = "directional", filter = c("N", "S", "E", "W"), locale = "us")

streetDict <- pm_append(type="street",
                        input=c("THREE OAKS", "THREE PINES", 
                                "FOUR PINES", "FOUR RIVERS", 
                                "FOUR WINDS", "SEVEN MAPLES", 
                                "SEVEN MILE", "SEVEN OAKS", 
                                "EIGHT WILLOWS"),
                        output=c("Three Oaks", "THREE PINES", 
                                "FOUR PINES", "FOUR RIVERS", 
                                "FOUR WINDS", "SEVEN MAPLES", 
                                "SEVEN MILE", "SEVEN OAKS", 
                                "EIGHT WILLOWS"))

dftest <- pm_identify(reprexdata, var = "address")

dftest <- pm_prep(dftest, var = "address")

dftest <- pm_postal_parse(dftest)

dftest <- pm_state_parse(dftest)

dftest <- pm_city_parse(dftest, dictionary=cityDict)

dftest <- pm_house_parse(dftest)

dftest <- pm_streetDir_parse(dftest, dictionary=dirs)

dftest <- pm_streetSuf_parse(dftest) 

dftest <- pm_street_parse(dftest, dictionary = streetDict)

dftest

1 5330 3rd Oaks Cir HOUSTON TX 77069
2 3240 3rd Pines Dr HUMBLE TX 77339

documentation error - pm_any_postal

pm_any_postal in documentation should be pm_postal_any

Unit Parsing

Hello,
Are there plans to continue the development of this package? Specifically, will you release function to parse units? I find it incredibly useful for standardizing addresses and would love to see this feature. Thanks.

Any plans to update the postmastr Workflow instructions?

The postmastr Workflow instructions includes functions that no longer exist. For example, pm_all_postal, which is now pm_postal_all. As a new user, this is a pretty big hindrance to learning the package.

Willing to help from Canada!

Good day. I am in Quebec and doing research on companies here. The data is a mess and your package looks like a god send. If I can help I am here. [email protected]

Best!

Need unit functionality help?

Love the functionality that already exists in postmastr, and it looks like there's a start on the issue of units. I've got a dataset with a lot of suits, boxes, floors etc. and I'd love to be able to use this feature with the package, although it appears it's not functional yet.

Is there any way I can contribute to extend this functionality?

reprex

library(tidyverse)
library(postmastr)
reprex_address <- tibble(address = c('188 E CAPITOL STREET 300 ONE JACKSON PLACE, JACKSON, MS, 39201',
  '160 MINE LAKE CT STE 200, RALEIGH, NC, 27615',
  '7491 N FEDERAL HWY SUITE C 5 275, BOCA RATON, FL, 33487',
  '176 MINE LAKE COURT SUITE 100, RALEIGH, NC, 27615',
  'GENERAL SERVICES CORPORATION 2922 HATHAWAY ROAD, RICHMOND, VA, 23225-1724',
  'ATTN JENNY BELOTE CORPORATE OFFICE 16 CONSULTANT PLACE SUITE 104, DURHAM, NC, 27707-6313'))

reprex_cities <- pm_append(type = 'city',
                           input = c('JACKSON', 'RALEIGH', 'BOCA RATON', 'RICHMOND', 'DURHAM'),
                           output = c(NA, NA, NA, NA, NA))

reprex_pm_address <- reprex_address %>% 
  pm_identify(var = 'address')

reprex_pm_address %>%  
  pm_parse(input = 'full',
           address = 'address',
           output = 'short',
           keep_parsed = 'yes',
           city_dict = reprex_cities,
           include_units = TRUE
  )
#> # A tibble: 6 × 10
#>   address  pm.address pm.house pm.preDir pm.street pm.streetSuf pm.city pm.state
#>   <chr>    <chr>      <chr>    <chr>     <chr>     <chr>        <chr>   <chr>   
#> 1 188 E C… 188 E Cap… 188      E         Capitol … <NA>         JACKSON MS      
#> 2 160 MIN… 160 Mine … 160      <NA>      Mine Lak… <NA>         RALEIGH NC      
#> 3 7491 N … 7491 N Fe… 7491     N         Federal … <NA>         BOCA R… FL      
#> 4 176 MIN… 176 Mine … 176      <NA>      Mine Lak… <NA>         RALEIGH NC      
#> 5 GENERAL… General S… <NA>     <NA>      General … Rd           RICHMO… VA      
#> 6 ATTN JE… Attn Jenn… <NA>     <NA>      Attn Jen… <NA>         DURHAM  NC      
#> # … with 2 more variables: pm.zip <chr>, pm.zip4 <chr>

reprex_pm_address %>% 
  pm_has_unit()
#> Error in pm_has_unit(.): Error 2.

pm_prep not found

reprexdata <- tribble(
~address,
"1230 TRAVIS ST, HOUSTON, TX, 77002",
"3830 RICHMOND AVE, HOUSTON, TX, 77027"
)

df <- pm_identify(reprexdata, var = "address")
df
df <- pm_prep(df, var = "address")

Error in pm_prep(df, var = "address") : could not find function "pm_prep"

pm_street_parse with ordinal = TRUE

This is currently failing

slu-opengis / postmastr Goto Github PK

postmastr's People

Contributors

Stargazers

Watchers

Forkers

postmastr's Issues

Recommend Projects

Recommend Topics

Recommend Org