ipfs-shipyard / ipfs-geoip Goto Github PK
View Code? Open in Web Editor NEWgeoip lookup over DAG-CBOR dataset loaded from IPFS
License: MIT License
geoip lookup over DAG-CBOR dataset loaded from IPFS
License: MIT License
I've just uploaded the latest database (GeoLite2-City-CSV_20210928) to IPFS:
-> https://bafybeihw4fr6rdf5hokzyrejwckporynpqzdsrsop2v4w3jevboehfcsby.ipfs.ipfs.1-2.dev
-> https://bafybeihw4fr6rdf5hokzyrejwckporynpqzdsrsop2v4w3jevboehfcsby.ipfs.infura-ipfs.io
-> https://bafybeihw4fr6rdf5hokzyrejwckporynpqzdsrsop2v4w3jevboehfcsby.ipfs.dweb.link
The last one is a year old now.
Thanks to this perfect project , it made me came up with some new ideas ๏ผ
I have making a research about a geography relation object model demo with geohash and ipfs,
here are some of my work ,maybe there will be some location base service (LBP) application condition for IPFS :https://github.com/daijiale/ipfs-geo
expect to communicate with you
I think current dataset is quite old. We also could use directory sharding if js-ipfs supports reading it.
While planning to move to the new dataset (see #63), I found some problems I need help with!
First of all, right now, we have the following data for each location:
{
"country_code": "US",
"country_name": "United States",
"region_code": "CA",
"city": "Mountain View",
"postal_code": "94040",
"latitude": 37.3860,
"longitude": -122.0838,
"metro_code": "807",
"area_code": "650",
"planet": "Earth"
}
The new datasets contain much more than that:
is_anonymous_proxy
is_satellite_provider
postal_code
latitude
longitude
accuracy_radius
locale_code
continent_code
continent_name
country_iso_code
country_name
subdivision_1_iso_code
subdivision_1_name
subdivision_2_iso_code
subdivision_2_name
city_name
metro_code
time_zone
is_in_european_union
I am pretty sure we don't need all of those fields, so the first goal of this issue is to define which informations do we want to provide through this package.
The second issue is: how to support IPv6 (#60)? The newest dataset has an IPv6 table too! Just like the IPv4, we are provided with CIDR addresses that allow us to know the range for which to check for IPv6 addresses. However, unlike IPv4, there's no "int long" form of IPv6 so we can't keep the same structure as we have now for IPv4.
Knowing this, how'd you suggest to tackle this issue? How to organize the information in such a way we can fetch it quickly?
The new dataset provides translations for just some languages. Are they worth including or shall we keep just the english ones for now?
Also, I am thinking about setting up a way of updating the geoip database automatically since they update it every tuesday. It would be great so we wouldn't need to think a lot about this (perhaps just merging a PR with the newer CID).
Ping @lidel
the readme says to generate the tree from path/GeoLite-Blocks.csv
, but it's not clear which one? the page http://dev.maxmind.com/geoip/legacy/geolite/ lists several possibilities?
Would be good to include a mapping (on the readme or another file) of:
that way we can make sure to back them all up as we increase versions.
maybe the list of refs can be -- itself -- an ipfs node. that way we can just back up that root to backup all versions ever. (( we need to come up with a good way of doing this that's friendly with git, github, and ipfs -- i've been using "published-version" files, but this isn't the best thing ever))
wonder how to reconcile json / protobuf dichotomies in ipfs. json is nice for the human readability, and ease of use. protobuf may be better for lookups in datastructures.
btw, @krl awesome work here
All geoip lookups in web ui are failing, as we've updated to the latest [email protected] but it fails when passed to ipfs-geoip as it needs to be updated to handle the new object
api changes in ipfs-inactive/js-ipfs-http-client#896
This relates to:
The way we're generating ESM right now transpiles src into ESM which exports the required interfaces for performing geo-ip lookups. This works well for all agents that support module
types and allows import/export syntax (e.g. browsers, node, etc) (except for the dependency issues in #100).
However, this takes away the ability to tree-shake the module when ipfs-geoip
is included as a dependency to say ipfs-webui
because we're unable to bundle this properly. e.g. https://github.com/ipfs-shipyard/ipfs-geoip/actions/runs/3287072521/jobs/5415859864#step:5:124
AI:
npm run release
does not work locally.
I made release with npm version major
+ npm run build
+ npm publish
We want the same or similar flow as in https://github.com/multiformats/js-multiformats/
where github repo has all secrets and publishing happens automatically.
(placeholder)
Parts of what ipfs-geoip does could be generalized and extracted into standalone search index generator / consumer libraries useful in other places (eg. wikipedia text search etc).
Or we could refactor it to use something that already exists.
Some prior art:
Take inspiration from https://github.com/whyrusleeping/js-mafmt/blob/master/test/index.spec.js#L9-L131
i mean, could we run something like the following command ?
node index.js ipfs.io
Is it supported ? ... is it even possible ? it would be nice to use it here https://github.com/ipfs/public-gateway-checker
New source dataset format introduced in #80 provides country and city names in other languages than English
(at the time of writing this, we have names in: de, en, es, fr, ja, pt-BR, ru and zh-CN).
We could add support for passing optional language code to the lookup
method.
Details of how to modify b-tree format remain TBD.
Open questions:
Now we have an org to incubate projects that not part of the core implementation of the protocol or discussion of the spec. That org is ipfs-shipyard created from ipfs/team-mgmt#448
Short description:
IPFS Shipyard is a venue for the community to pursue and collaborate on research experiments, products, code libraries and more around the IPFS project. It is where innovation in userland happens and where we discover and form new primitives to push to the core of IPFS.
Anyone opposing?
Some content in ipfs-geoip data in IPFS contain data:0
and should probably be removed.
From @lidel
we probably could remove it the next time b-stree is generated.
if you have time, fill issue in https://github.com/ipfs-shipyard/ipfs-geoip so we remember to clean this up next time b-tree format is revisited
discussion started in slack: https://filecoinproject.slack.com/archives/C03KQ8MC62Y/p1685570853644219
@SgtPooki noted that this library does not work with the latest version of https://www.npmjs.com/package/ipfs-http-client.
๐ We need ipfs-geoip to work with the latest ipfs-http-client so we can use it in ipfs-webui and have no regressions on Peers screen.
Some thoughts:
Buffer
in JS libs and use Uint8Array
This library is very old, and remembers the time before we had dag-cbor.
It uses stringified JSON put in data field of dag-pb
which is not only inefficiency and a technical debt, but an antipattern these days.
ipfs.object
API and JSON in dag-pb
ipfs.block
(ipfs.dag
may change, block
won't) and space-efficient dag-cbor (https://www.npmjs.com/package/@ipld/dag-cbor)There should be CI set up that pins both CIDs (DATA_HASH
and GEOIP_ROOT
) to at least two pinning services.
This way we avoid issues like ipfs/ipfs-webui#1992
@krl we need to put the data onto the 0.4 network, any pointers on how to easiest do this
cc @whyrusleeping @lgierth for ideas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.