Coder Social home page Coder Social logo

dvc-data's Introduction

DVC data

PyPI Status Python Version License

Tests Codecov pre-commit Black

Features

  • TODO

Requirements

  • TODO

Installation

You can install DVC data via pip from PyPI:

$ pip install dvc-data

Usage

HashFile

HashFile

Based on dvc-object's Object, this is an object that has a particular hash that can be used to verify its contents. Similar to git's ShaFile.

from dvc_data.hashfile import HashFile

obj = HashFile("/path/to/file", fs, HashInfo("md5", "36eba1e1e343279857ea7f69a597324e")

HashFileDB

Based on dvc-object's ObjectDB, but stores HashFile objects and so is able to verify their contents by their hash_info. Similar to git's ObjectStore.

from dvc_data.hashfile import HashFileDB

odb = HashFileDB(fs, "/path/to/odb")

Index

Index

A trie-like structure that represents data files and directories.

from dvc_data.index import DataIndex, DataIndexEntry

index = DataIndex()
index[("foo",)] = DataIndexEntry(hash_info=hash_info, meta=meta)

Storage

A mapping that describes where to find data contents for index entries. Can be either ObjectStorage for HashFileDB-based storage or FileStorage for backup-like plain file storage.

index.storage_map[("foo",)] = ObjectStorage(...)

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the Apache 2.0 license, DVC data is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

dvc-data's People

Contributors

0x2b3bfa0 avatar alexvcaron avatar casperdcl avatar daavoo avatar dberenbaum avatar dependabot[bot] avatar dtrifiro avatar efiop avatar erotemic avatar github-actions[bot] avatar hugo-marquez avatar jonburdo avatar pmrowla avatar pre-commit-ci[bot] avatar rlamy avatar skshetry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dvc-data's Issues

index: introduce fetch

We currently have a junky version of fetch based on odb that is not used anywhere. It was part of early experiments (not dvc exp) and is no longer needed.

In dvc fetch we currently do two things:

  1. collect and trasfer objects from regular outputs
  2. download files to a temp location using an index built out of imports

we need to take 2), make it dedup based on source fs/path and download stuff into a temporary location (note that we are not talking about reproducing the structure of indexes there, but purely stashing data somewhere). This will allow us to download stuff optimally across different indexes (e.g. across different git revisions), which also means that fetch should probably accept multiple indexes and not just 1. And probably it should update storage_info.data as a result.

Related https://github.com/iterative/studio/issues/4782

hashfile: get rid of state

State should be replaced by using data index, which is easier to work with and easier to update. Note that this is not 1to1 replacement, but rather requires working with data through index point of view.

For example, in state.get we retrieve the entry for a particular path and then check if the recorded metadata matches the one from actual stat(). With index we should instead build a new index from the filesystem and then transfer md5s from an old index to this new index entries if the metadata matches. The latter is a pure sql operation that could be done more efficiently.

This is also important for NFS, to reduce the number of sqlite databases that we have do deal with.

db: built-in fs object versioning (s3, azure, etc)

Some filesystems like s3/azure/etc have built-in object versioning (e.g. you can always access the previous version of an s3 object by using version-id), which means that
giphy
for us. From odb perspective, implementation will likely look a lot like old refdb: dvc objects that reference a path with a version-id in it, but we won't need to validate it beyond ensuring that it exists because the versions are immutable.

index: fetch: use index to cache collected tasks

We collect all the files we need to download in a form of index and it would be great to cache it, so we don't have to recollect it every time. This will dramatically reduce dvc fetch time by skipping "cache collection" after 1 time.

After #341 this became very straightforward and I already have a POC, which needs to be cleaned up and submitted.

refodb: provide raw odb view

Both ref object and refodb are great, they allow us to read and write to this virtual odb as if we are dealing with regular HashFile objects. But the problem is that we can't actually work with ref objects as ref objects and do things like transfer them from memodb to localodb (e.g. if we want to make them persistent).

We should better separate refobj/refodb from underlying rawobj/rawodb and provide an easy access to them.

index: refactor checkout

Several different dvc cloud versioning/worktree behaviors are offloaded into index.checkout now (version-aware push, worktree push, worktree update/checkout) and the new flags controlling the behavior don't really belong in index.checkout. We should separate these behaviors properly, but don't have time to do so right now before the initial cloud versioning release

  • non-worktree push is really a version-aware transfer and not checkout
  • index.checkout should not be modifying the input new/old indexes (right now we update meta in the "new" index to support worktree push)
    • if anything, checkout should probably return a new index containing what the checkout result, since we currently do not actually account for deletes in the caller

index: add db-based implementation (e.g. sqlite)

We currently use in-memory prefix trie, but for large enough indexes it will be much nicer to have an ability to use a proper db and also use it to do operations like diff more efficiently (e.g. directly in sql query instead of fetching and comparing in python).

meta: capture nlink/ishardlink and islink/issymlink

Capturing *link info will allow us to be smarter deciding if we need to relink stuff (e.g. in subsequent "noop" dvc add) and greatly improve performance there.

Important not to forget to not write those to dvc files on dvc side (at least for now to preserve current behaviour).

dvc commit slow with many files

Hi,
I have a dvc repository with a total size of 1.2TB and about 300,000 files. I understand that with this many files, I cannot expect all dvc operations to be fast, but when I add one small file and perform dvc commit, it takes 3 minutes to finish. Furthermore, the console output during the commit seems a bit strange to me:

  • In the first minute, a progressbar appears and it says that it is building data objects, building cache and transfering.
  • In the next two minutes, there is no progressbar and no text displayed at all.

The empty output for two minutes confuses me, and the time it takes for whatever it is doing then seems a bit long to me.

To find out what it is doing in that time, I attached a debugger during that time with pyrasite and obtained this stacktrace:

  File "dvc/__main__.py", line 7, in <module>
  File "dvc/cli/__init__.py", line 185, in main
  File "dvc/cli/command.py", line 22, in do_run
  File "dvc/commands/commit.py", line 20, in run
  File "dvc/repo/__init__.py", line 48, in wrapper
  File "dvc/repo/commit.py", line 66, in commit
  File "funcy/decorators.py", line 45, in wrapper
  File "dvc/stage/decorators.py", line 43, in rwlocked
  File "funcy/decorators.py", line 66, in __call__
  File "dvc/stage/__init__.py", line 548, in commit
  File "dvc/output.py", line 713, in commit
  File "dvc/output.py", line 676, in _checkout
  File "dvc_data/hashfile/checkout.py", line 274, in checkout
  File "dvc_data/hashfile/checkout.py", line 221, in _checkout
  File "dvc_data/hashfile/checkout.py", line 115, in _checkout_file
  File "dvc_data/hashfile/state.py", line 107, in save
  File "diskcache/core.py", line 823, in __setitem__
  File "diskcache/core.py", line 796, in set
  File "contextlib.py", line 142, in __exit__
  File "diskcache/core.py", line 744, in _transact

Since dvc_data appears there, this is hopefully the right repository for this issue.

dvc version: 2.41.1
harddrive is SSD with xfs. Reflinks are enabled.

optimize `Tree.from_list()`

It'd be nice if we could figure out a way to optimize Tree.from_list(), taking more than 1s to loa d one .dir file.

        2    0.017    0.009    2.621    1.310 __init__.py:23(load)
        2    0.000    0.000    2.604    1.302 tree.py:175(load)
        2    0.563    0.281    2.452    1.226 tree.py:152(from_list)
   202605    0.090    0.000    1.433    0.000 <attrs generated init dvc_data.hashfile.diff.Change>:1(__init__)
   202605    0.228    0.000    1.343    0.000 diff.py:36(_)
   405207    1.044    0.000    1.191    0.000 meta.py:75(from_dict)
   202604    0.060    0.000    0.816    0.000 _make.py:1718(__ne__)
   202604    0.678    0.000    0.756    0.000 <attrs generated eq dvc_data.hashfile.diff.TreeEntry>:1(__eq__)
   405210    0.492    0.000    0.621    0.000 diff.py:103(_in_cache)
   405210    0.252    0.000    0.564    0.000 diff.py:94(_get)
       42    0.001    0.000    0.444    0.011 __init__.py:1(<module>)
   405207    0.231    0.000    0.326    0.000 hash_info.py:20(from_dict)
   810421    0.209    0.000    0.299    0.000 diff.py:26(__bool__)

transfer/checkout: reduce relink/transfer

At the moment, when users do dvc add data, we are copying all of the files in that directory and again, checking them back out. This is done as part of relinking, which is not necessary, except for symlink/hardlinks.

remote transfer slow for unversioned data

    w/threadpoolexecutor and the `cats-dogs` dataset:

default remote:

time dvc push -r s3-unversioned
2801 files pushed
dvc push -r s3-unversioned  41.37s user 7.50s system 10% cpu 7:56.26 total
time dvc pull -r s3-unversioned
A       cats-dogs/
1 file added and 2800 files fetched
dvc pull -r s3-unversioned  12.03s user 4.40s system 21% cpu 1:14.68 total

version_aware = true remote:

time dvc push -r s3-versioned
2800 files pushed
dvc push -r s3-versioned  21.65s user 3.40s system 12% cpu 3:13.01 total
time dvc pull -r s3-versioned
A       cats-dogs/
1 file added and 2800 files fetched
dvc pull -r s3-versioned  11.19s user 4.03s system 20% cpu 1:15.42 total

Not sure why versioned remote push performs so much faster than unversioned on my machine after these changes, it may be due to the same listing performance problems noted in the gc issue iterative/dvc#5961 (comment). (we don't do full remote listing for versioned remotes)

Originally posted by @pmrowla in #246 (comment)

coarse status/diff

Similar to iterative/scmrepo#81 (comment), I was wondering if we could have a faster version of status/diff that’d return early if things are modified in the repo. This might be useful in non-granular status/diff.

Ideally it’d do staging and diffing together, in a generator so that one can be piped to other and iterated together.

pygtrie: don't rely on _SENTINEL

We don't define a proper root node when using Trie, which makes us rely on an obscure behaviour like this

kwargs = {"prefix": prefix}
, where the only way to iterate from the root is not specify the prefix at all (which interally in Trie results in _SENTINEL being used as a prefix). We are pretty much misusing Trie right now and instead should use some kind of root convention. We didn't use / before to avoid associating it with POSIX paths, but we might indeed want to use that unless there are better ideas in mind. Obviously can just go with ROOT = "/" defined for now and rename it any time later in the future if needed.

build: do we really need to raise if we have a `.dvcignore` inside a tracked directory?

if DefaultIgnoreFile in fnames:
raise IgnoreInCollectedDirError(
DefaultIgnoreFile, fs.path.join(root, DefaultIgnoreFile)
)

Currently, dvc data status etc might fail if we have .dvcignore inside a tracked directory. Do we really need to raise?

I think it would be sufficient to skip file or print a warning, failing is too strict.

dvc data status --untracked --unchanged  --granular            
ERROR: .dvcignore file should not be in collected dir path: '/home/saugat/projects/iterative/example-get-started/data/features/.dvcignore'

Cached files get copied to destination instead of linked

OS : Ubuntu 20.04
Python : 3.10
DVC-data : 3.7.0 (but the bug is still present on the main branch)

I am using a DVCFilesystem object to get files from a remote repository. To make the process efficient, I added a local cache to prevent downloading the same md5 again. On that side, everything is good. However, after the file is downloaded into the cache, it gets copied in the final directory instead of symlinked, like it is supposed to be by configuration.

While debugging, I found that there is an error for this use-case in the fs.pymodule, more precisely, the get_files method of DataFileSystem. When a md5 is absent from cache, it gets downloaded using the _cache_remote_file method, but then gets copied, since the later _transfer uses the storage options from the remote instead of the cache_storageoptions it should.

Steps to replicate :

  • Create a DVCFilesystem, passing a remote configuration with remote_config and cache configuration with config.
    • The cache configuration must have the symlink or any other link type
  • Use the getmethod to pull a file from remote storage to a location
  • Inspect the file created at location, it will be a copy
  • Inspect the cache to discover a file is present there as well

index: checkout: add logging

At the moment, it seems that there are no logging for index.checkout, so it has made it harder to find what's failing during checkout.

Cannot import name 'umask' from 'dvc_objects.fs.system'

Hello,
dvc-objects has just released 1.4.x version, which causes an import error (iterative/dvc-objects#241)

ERROR: unexpected error - cannot import name 'umask' from 'dvc_objects.fs.system' (/opt/hostedtoolcache/Python/3.10.4/x64/lib/python3.10/site-packages/dvc_objects/fs/system.py)

I think this is because the umask is removed in new version of dvc-objects.

index: diff: use hierarchical approach instead of flat

current diff() is pretty naive and just lists all keys and then generates differences, but we need to walk and diff instead, so that we can propagate hierarchical status (e.g. unknown in dvc data status) and be able to stop early if we have dir hashes that match (e.g. imagine we have a dataset with the same .dir md5 in both indexes - it means that there is no point in walking into them and we can shortcircuit it quickly).

Needed to finish migrating dvc data status to index for iterative/dvc#8761

odb: move corrupted files to /bad instead of deleting

Currently if we detect that some file was corrupted - we completely delete it, which takes quite a bit of time for large files and is also lossy, as it might be your last source of your useful data. We should just move corrupted file instead (e.g. .dvc/cache/12/345 -> .dvc/cache/bad/12345) so one could recover it if needed.

For the record: bad is like in git lfs .git/lfs/bad

index: support loading dirs from FileStorage

Currently we only support loading .dir objects from ObjectStorage, but directories can be stored in FileStorage too (e.g. in gitfs or in some backed up location) and we should be able to load it up as well.

Required for iterative/dvc#8789 , because cloud versioning imports are FileStorage stuff and once you go into support chained imports you now have to be able to dynamically load index on each level of the chain from either ObjectStorage or FileStorage.

diff: confusing results when object not in cache

On example-get-started repo, if you delete the .dir file, it shows confusing results, sometimes reporting added vs modified.

repro

$ rm -rf $(dvc-data o2p 20b786b6e6f80e2b3fcf17827ad18597.dir)
$ dvc data status
Not in cache:                                                                        
  (use "dvc pull <file>..." to update your local storage)
        data/prepared/

DVC committed changes:
  (git commit the corresponding dvc files to update the repo)
        added: data.xml

DVC uncommitted changes:
  (use "dvc commit <file>..." to track changes)
        added: data/prepared/
(there are other changes not tracked by dvc, use "git status" to see)
$ dvc data status --granular
DVC committed changes:                                                               
  (git commit the corresponding dvc files to update the repo)
        added: data.xml

DVC uncommitted changes:
  (use "dvc commit <file>..." to track changes)
        added: data/prepared/test.tsv
        added: data/prepared/train.tsv

See iterative/dvc#7943 (comment).
Possibly related: iterative/dvc#7661

fs: don't rely on entry.odb/remote objects directly

Currently this is the only user of those objects and it requires manual assignments to every entry in a tree, which is very costly and rather pointless. We could assign fs and path instead, but that would only create a similar problem (which actually already exists too). We should probably just introduce some kind of factory/cb/map/etc that would generate fs/path pair from an entry.

Another way of approaching this could be to supply those factories to the index (it uses odb/remote to lazy-load directories anyway).

Related to iterative/dvc#8827

use data index to batch operations

  • load/dump all state when loading/dumping index (e.g. build/checkout should likely not interact with state at all anymore). Related #111, #125
  • batch makedirs during checkout, so that we don't call them for every file we checkout

index: add filewatcher

With #208 implemented, a deamon could even keep writing the index so that we don't have to rebuild anything at all when we need to use it in dvc.

dvc migrate fails on 3.0 repos

This is a non critical but, but it might be nice to fix it.

It seems that if you run dvc migrate on a repo that doesn't contain any 2.0 structure, then an error is raised in dvc_data/hashfile/db/migrate.py:

2023-09-07 17:26:54,828 ERROR: unexpected error - not enough values to unpack (expected 2, got 0)                                                                                   
Traceback (most recent call last):
  File "/home/joncrall/.pyenv/versions/3.11.2/envs/pyenv3.11.2/lib/python3.11/site-packages/dvc/cli/__init__.py", line 209, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/home/joncrall/.pyenv/versions/3.11.2/envs/pyenv3.11.2/lib/python3.11/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/home/joncrall/.pyenv/versions/3.11.2/envs/pyenv3.11.2/lib/python3.11/site-packages/dvc/commands/cache.py", line 44, in run
    migrate_2_to_3(self.repo, dry=self.args.dry)
  File "/home/joncrall/.pyenv/versions/3.11.2/envs/pyenv3.11.2/lib/python3.11/site-packages/dvc/cachemgr.py", line 135, in migrate_2_to_3
    migration = prepare(src, dest, callback=cb)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/joncrall/.pyenv/versions/3.11.2/envs/pyenv3.11.2/lib/python3.11/site-packages/dvc_data/hashfile/db/migrate.py", line 55, in prepare
    paths, oids = zip(*executor.imap_unordered(func, src_paths))
    ^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 0)

It looks like you probably just need to run:

items = list(executor.imap_unordered(func, src_paths))
if items:
    paths, oids = zip(*items)
else:
    paths, oids = [], []

instead

Here is a MWE to reproduce:

def simple_demo_repo(dvc_root):
    """
    Build a simple repo using only standard dvc commands for upstream MWEs
    """
    import ubelt as ub

    # Build in a staging area first
    assert not dvc_root.exists(), 'directory must not exist yet'
    dvc_root = dvc_root
    dvc_root.ensuredir()

    def cmd(command):
        return ub.cmd(command, cwd=dvc_root, verbose=2, system=True)

    cmd('git init')
    cmd('dvc init')

    cmd('dvc config core.autostage true')
    cmd('dvc config cache.type symlink,reflink,hardlink,copy')
    cmd('dvc config cache.protected true')
    cmd('dvc config core.analytics false')
    cmd('dvc config core.check_update false')
    cmd('dvc config core.check_update false')

    # Build basic data
    (dvc_root / 'test-set1').ensuredir()
    assets_dpath = (dvc_root / 'test-set1/assets').ensuredir()
    for idx in range(1, 21):
        fpath = assets_dpath / f'asset_{idx:03d}.data'
        fpath.write_text(str(idx) * 100)
    manifest_fpath = (dvc_root / 'test-set1/manifest.txt')
    manifest_fpath.write_text('pretend-data')

    root_fpath = dvc_root / 'root_file'
    root_fpath.write_text('----' * 100)

    cmd(f'dvc add {root_fpath}')
    cmd(f'dvc add {manifest_fpath}')
    cmd(f'dvc add {assets_dpath}')

    cmd('git commit -am "initial commit"')


def mwe():
    import ubelt as ub

    # Build a simple fresh dvc repo
    dvc_root = ub.Path.appdir('simpledvc', 'simple_demo')
    dvc_root.delete()
    simple_demo_repo(dvc_root)

    _ = ub.cmd('dvc cache migrate -vvv', cwd=dvc_root, verbose=3, system=True)

DVC doctor:

(pyenv3.11.2) joncrall@toothbrush:~/.cache/simpledvc/simple_demo$ dvc doctor
DVC version: 3.19.0 (pip)
-------------------------
Platform: Python 3.11.2 on Linux-6.2.0-32-generic-x86_64-with-glibc2.35
Subprojects:
	dvc_data = 2.16.0
	dvc_objects = 1.0.1
	dvc_render = 0.5.3
	dvc_task = 0.3.0
	scmrepo = 1.3.1
Supports:
	azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.12.0),
	gdrive (pydrive2 = 1.15.4),
	gs (gcsfs = 2023.6.0),
	hdfs (fsspec = 2023.6.0, pyarrow = 11.0.0),
	http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
	oss (ossfs = 2021.8.0),
	s3 (s3fs = 2023.6.0, boto3 = 1.26.76),
	ssh (sshfs = 2023.4.1),
	webdav (webdav4 = 0.9.8),
	webdavs (webdav4 = 0.9.8),
	webhdfs (fsspec = 2023.6.0)
Config:
	Global: /home/joncrall/.config/dvc
	System: /etc/xdg/xdg-ubuntu/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/vgubuntu-root
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/vgubuntu-root
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/cba64d0f7628d6e7cf6a9216093a7519

index/meta: separate meta handling per filesystem

Currently index entries contain a single meta instance that really contains meta information from several different filesystems - so entry.meta may end up holding a local md5/inode/mtime but can also contain a remote etag/version_id. meta information should really be tracked per-filesystem, the current behavior makes merging and comparing metadata a mess (especially at the DVC level). The current behavior also essentially makes it impossible to use more than one cloud versioned remote at a time in DVC

index: consider replacing fs/path/odb/remote fields with factories to specific methods

E.g. checkout uses entry.fs and entry.path, which could be easilly derived in a factory based on corresponding outputs. E.g. if we have an output a, it means that a, a, b, a, b, c etc entries in index have the same output.fs/odb/remote and path is join(output.path, *key).

This will remove the annoying need to fill up all of those fields when creating an index and will make index slimmer and easier to handle (e.g. serialize).

This is low priority, unless it gets in the way in particular scenarios. Just noting it down.

Tracking fsspec related pending changes

Fsspec Compatibility

  • Make RepoFileSystem fsspec-compatible. (@efiop)
  • Make LocalFileSystem fsspec-compatible. (@skshetry) (lower priority)
  • Make GitFileSystem fsspec-compatible (@efiop)
  • Get rid of FileSystem. (@skshetry)
  • Get rid of CallbackMixin since callbacks are now generally supported. (@skshetry)
  • Get rid of NoDirectoriesMixin, merge with HTTPFileSystem. (@skshetry)
  • use to_json/from_json instead of config (@skshetry)
  • use configs instead of creating filesystems right away (@skshetry)
  • Plugins (lower priority)
  • Reorganize/get-rid of fs.utils/dvc.utils.fs/System (low priority)

Post-fsspec changes

  • Handle OSErrors instead of DvcExceptions. (low priority)
  • Rethink Cloud fixtures, so that they can work with any fsspec-compatible filesystems by default.
    Extract Cloud.get_url() and Cloud.config out of it. (low priority)

index: introduce restore method

restore is kinda opposite of save: given an index with hashes, it needs to restore it using odb into a virtual dataset. This is extremely useful for those in-between states, where we don't yet have a real workspace to work with (e.g. we didn't checkout your dataset), but want to virtually reconstruct it using cache. This allows us to operate on datasets no matter how they are actually stored (e.g. real dataset on s3 vs dvc cached dataset).

Another example, admittedly a bit unrelated, is dvc's run-cache thing that goes on out-by-out basis trying to fetch everything one-by-one, but with restore functionality we could virtually build a dataset using run-cache.

Needed for iterative/dvc#8761 , because there we have to compare a virtually restored dataset with a real one on the cloud.

Think of restore as an index that we would build out of a dataset that we've actually tried to checkout from cache. E.g. if some cache files were missing - those files will be missing from the workspace.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.