ipfs-shipyard / ipfs-geoip Goto Github PK

View Code? Open in Web Editor NEW

67.0 27.0 17.0 2.81 MB

geoip lookup over DAG-CBOR dataset loaded from IPFS

License: MIT License

JavaScript 99.30% Shell 0.70%

geoip ipfs dag-cbor b-tree dag

ipfs-geoip's Issues

Previous Data

@jbenet here:

[geoip lookups] should work in previous releases too. we should be able to regenerate the exact data. the database has versions. so does the codebase.

Is there any way we can do this?

Update database

I've just uploaded the latest database (GeoLite2-City-CSV_20210928) to IPFS:

-> https://bafybeihw4fr6rdf5hokzyrejwckporynpqzdsrsop2v4w3jevboehfcsby.ipfs.ipfs.1-2.dev
-> https://bafybeihw4fr6rdf5hokzyrejwckporynpqzdsrsop2v4w3jevboehfcsby.ipfs.infura-ipfs.io
-> https://bafybeihw4fr6rdf5hokzyrejwckporynpqzdsrsop2v4w3jevboehfcsby.ipfs.dweb.link

The last one is a year old now.

Some new ideas about LBP by geohash and ipfs

Thanks to this perfect project , it made me came up with some new ideas ！

I have making a research about a geography relation object model demo with geohash and ipfs,
here are some of my work ,maybe there will be some location base service (LBP) application condition for IPFS :https://github.com/daijiale/ipfs-geo

expect to communicate with you

Update the dataset

I think current dataset is quite old. We also could use directory sharding if js-ipfs supports reading it.

Dataset Updating Plan

While planning to move to the new dataset (see #63), I found some problems I need help with!

Field, information, field and more information

First of all, right now, we have the following data for each location:

{
  "country_code": "US",
  "country_name": "United States",
  "region_code": "CA",
  "city": "Mountain View",
  "postal_code": "94040",
  "latitude": 37.3860,
  "longitude": -122.0838,
  "metro_code": "807",
  "area_code": "650",
  "planet": "Earth"
}

The new datasets contain much more than that:

is_anonymous_proxy
is_satellite_provider
postal_code
latitude
longitude
accuracy_radius
locale_code
continent_code
continent_name
country_iso_code
country_name
subdivision_1_iso_code
subdivision_1_name
subdivision_2_iso_code
subdivision_2_name
city_name
metro_code
time_zone
is_in_european_union

I am pretty sure we don't need all of those fields, so the first goal of this issue is to define which informations do we want to provide through this package.

IPv6

The second issue is: how to support IPv6 (#60)? The newest dataset has an IPv6 table too! Just like the IPv4, we are provided with CIDR addresses that allow us to know the range for which to check for IPv6 addresses. However, unlike IPv4, there's no "int long" form of IPv6 so we can't keep the same structure as we have now for IPv4.

Knowing this, how'd you suggest to tackle this issue? How to organize the information in such a way we can fetch it quickly?

Languages?

The new dataset provides translations for just some languages. Are they worth including or shall we keep just the english ones for now?

Also, I am thinking about setting up a way of updating the geoip database automatically since they update it every tuesday. It would be great so we wouldn't need to think a lot about this (perhaps just merging a PR with the newer CID).

Ping @lidel

which GeoLite dataset?

the readme says to generate the tree from path/GeoLite-Blocks.csv, but it's not clear which one? the page http://dev.maxmind.com/geoip/legacy/geolite/ lists several possibilities?

Would be good to include a mapping (on the readme or another file) of:

the original import csv (src url + IPFS url -- let's back it up!)
the generated geo-ip tree root

that way we can make sure to back them all up as we increase versions.

maybe the list of refs can be -- itself -- an ipfs node. that way we can just back up that root to backup all versions ever. (( we need to come up with a good way of doing this that's friendly with git, github, and ipfs -- i've been using "published-version" files, but this isn't the best thing ever))

formats

wonder how to reconcile json / protobuf dichotomies in ipfs. json is nice for the human readability, and ease of use. protobuf may be better for lookups in datastructures.

btw, @krl awesome work here

Lookup fails when provided ipfs-http-client instance >= v27.0.0

All geoip lookups in web ui are failing, as we've updated to the latest [email protected] but it fails when passed to ipfs-geoip as it needs to be updated to handle the new object api changes in ipfs-inactive/js-ipfs-http-client#896

Cleanup Configs to Generate Tree-Shakable ESM

This relates to:

The way we're generating ESM right now transpiles src into ESM which exports the required interfaces for performing geo-ip lookups. This works well for all agents that support module types and allows import/export syntax (e.g. browsers, node, etc) (except for the dependency issues in #100).

However, this takes away the ability to tree-shake the module when ipfs-geoip is included as a dependency to say ipfs-webui because we're unable to bundle this properly. e.g. https://github.com/ipfs-shipyard/ipfs-geoip/actions/runs/3287072521/jobs/5415859864#step:5:124

AI:

Cleanup Configs to build ESM valid in both Node-like and browser context
Establish imports are tree-shakeable
Setup better defaults to check this in aegir.

Would love to hear thoughts on this @SgtPooki, @lidel

CI: set up automatic releases

Current state

npm run release does not work locally.
I made release with npm version major + npm run build + npm publish

Desired state

We want the same or similar flow as in https://github.com/multiformats/js-multiformats/
where github repo has all secrets and publishing happens automatically.

Ref. https://github.com/multiformats/js-multiformats/blob/master/.github/workflows/js-test-and-release.yml

Investigate use of search index library

(placeholder)

Parts of what ipfs-geoip does could be generalized and extracted into standalone search index generator / consumer libraries useful in other places (eg. wikipedia text search etc).
Or we could refactor it to use something that already exists.

Some prior art:

Create tests with a greater variety of multiaddrs

Take inspiration from https://github.com/whyrusleeping/js-mafmt/blob/master/test/index.spec.js#L9-L131

Support for domain names ?

i mean, could we run something like the following command ?

node index.js ipfs.io

Is it supported ? ... is it even possible ? it would be nice to use it here https://github.com/ipfs/public-gateway-checker

Add locale (i18n) support

New source dataset format introduced in #80 provides country and city names in other languages than English
(at the time of writing this, we have names in: de, en, es, fr, ja, pt-BR, ru and zh-CN).

We could add support for passing optional language code to the lookup method.

Details of how to modify b-tree format remain TBD.

Open questions:

should we have separate b-tree for each language, or should we keep all translations in a single tree?
- if it is a single tree, how to ensure client is not fetching strings that they do not need?

Move to ipfs-shipyard?

Now we have an org to incubate projects that not part of the core implementation of the protocol or discussion of the spec. That org is ipfs-shipyard created from ipfs/team-mgmt#448

Short description:

IPFS Shipyard is a venue for the community to pursue and collaborate on research experiments, products, code libraries and more around the IPFS project. It is where innovation in userland happens and where we discover and form new primitives to push to the core of IPFS.

Anyone opposing?

Add IPv6 support

https://github.com/ipfs/ipfs-geoip/blob/master/src/pretty.js#L26-L29

fix: b-tree contains zero-ed data.

Description

Some content in ipfs-geoip data in IPFS contain data:0 and should probably be removed.

What is needed?

From @lidel

we probably could remove it the next time b-stree is generated.
if you have time, fill issue in https://github.com/ipfs-shipyard/ipfs-geoip so we remember to clean this up next time b-tree format is revisited

References

discussion started in slack: https://filecoinproject.slack.com/archives/C03KQ8MC62Y/p1685570853644219

Update to work with latest js-ipfs-http-client

@SgtPooki noted that this library does not work with the latest version of https://www.npmjs.com/package/ipfs-http-client.

👉 We need ipfs-geoip to work with the latest ipfs-http-client so we can use it in ipfs-webui and have no regressions on Peers screen.

Some thoughts:

I suspect main issue is that we removed Buffer in JS libs and use Uint8Array
some useful libs:
- https://www.npmjs.com/package/uint8arrays
- https://github.com/alanshaw/it-awesome
modern JS API is documented at https://github.com/ipfs/js-ipfs/tree/master/docs/core-api

Switch to dag-cbor

Problem

This library is very old, and remembers the time before we had dag-cbor.
It uses stringified JSON put in data field of dag-pb which is not only inefficiency and a technical debt, but an antipattern these days.

Solution

Remove use of ipfs.object API and JSON in dag-pb
- Use ipfs.block (ipfs.dag may change, block won't) and space-efficient dag-cbor (https://www.npmjs.com/package/@ipld/dag-cbor)

CI: pin CIDs on release

There should be CI set up that pins both CIDs (DATA_HASH and GEOIP_ROOT) to at least two pinning services.

This way we avoid issues like ipfs/ipfs-webui#1992

Copy to 0.4 network

@krl we need to put the data onto the 0.4 network, any pointers on how to easiest do this

cc @whyrusleeping @lgierth for ideas