aus_gbif_clean's People
aus_gbif_clean's Issues
Flag extreme outliers based on distance
When there's a 1-2 points a long away from the rest of the points, these are likely candidates for bad data. could be mis-id, or horticultural planting.
Here's an attempt to id outliers, based on distance from the centre of the centroid of points for that species.
- Calculate the centroid used median of lat and long
- Calculate the distance of all points from the centroid
- Estimate a reasonable edge of distance, e.g. as 99% quintal of distances measures
- Anything > 2 * this edge distance can be consider an outlier.
distance <- function(decimalLongitude, decimalLatitude) {
sqrt(
(decimalLongitude - median(decimalLongitude))^2 +
(decimalLatitude - median(decimalLatitude))^2
)
}
is_outlier <- function(dist, f=2) {
edge <- f*quantile(dist, probs=0.99)[1]
dist > edge
}
library(tidyverse)
data <- read_csv("processed_data/filt_aus_limited_columns.csv")
map_australia <- map_data("world") %>%
filter(region == "Australia") %>%
filter(lat > -45 | long < 155)
for(sp in unique(data$species)) {
data_sp <-
data %>% filter(species == sp) %>%
mutate(
dist_center = distance(decimalLongitude, decimalLatitude),
outlier = is_outlier(dist_center)
)
p1 <-
ggplot() +
geom_polygon(data = map_australia, mapping = aes(x = long, y = lat, group = group), fill = "white", colour = "black") +
geom_point(data = data_sp, aes(x = decimalLongitude, y = decimalLatitude, col=outlier)) + coord_fixed() +
labs(title=sp)
ggsave(sprintf("output2/%s.png", sp), p1, dpi=150)
}
Some example plots below, and many more here
use smaller buffer for land
as per @yangsophieee plot
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.