Comments (49)
I'm glad to hear it and thank you for tweaking directly.
My setup has evolved slightly. I've just uploaded a screencast to Youtube that might be interesting. It includes an actual notebook and execution which is fun to watch.
In that video I lay out some questions about interactive use, particularly around launching many small jobs rather than a single large deployment. Having a herd of independent workers might be a decent way to handle interactive jobs within a batch infrastructure short term.
from pangeo.
You can use our new allocation for any jobs on Cheyenne.
from pangeo.
Previously I was piggy-backing on @jhamman 's allocation. I take it it's preferred to use the new one?
from pangeo.
Yes, use the new one.
from pangeo.
from pangeo.
I'm copying some notes that @jhamman sent by e-mail a while ago
Joe's e-mail
Here are a few details on the running a Jupyter Notebook Server on Cheyenne
- UCAR's documentation: https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne/software/jupyter-and-ipython#notebook
- I think I created my own copy of the start-notebook script so I could make sure I'm running my Anaconda install of Jupyter. My copy is in the notebooks directory below.
- Side note: you may need to manually unset your LD_LIBRARY_PATH to get your anaconda distribution to play nice on cheyenne.
from pangeo.
I have started putting stuff up on the wiki of this repo. Not sure if this is a good long-term solution, but it's there and it works.
from pangeo.
Supported way to use Python on Cheyenne but of course other options work and might be a better fit for this project.
Supported way to use Jupyter on Cheyenne (and Yellowstone, however the latter is EOL'ed)
from pangeo.
Wiki seems like a good idea
from pangeo.
So far my approach is as follows:
Install Miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
Create Development environment
conda create -n pangeo python=3.6 dask distributed xarray jupyterlab mpi4py -c conda-forge
Activate environment
source activate pangeo
Install development version of dask.distributed for MPI deployment support
pip install git+https://github.com/mrocklin/distributed.git@cli-logic --upgrade
Create a job script
Mine looks like the following
#!/bin/bash
#PBS -N sample
#PBS -q regular
#PBS -A UCLB0022
#PBS -l select=9:ncpus=4:mem=16G
#PBS -l walltime=00:20:00
#PBS -j oe
#PBS -m abe
rm -f scheduler.json
mpirun --np 9 dask-mpi --nthreads 4 --memory-limit 16e9
And submit
qsub myscript.sh
Connect from Python
This writes connection information into a local file, scheduler.json
. We can use this to connect
$ ipython
from dask.distributed import Client
client = Client(scheduler_file='scheduler.json')
>>> client
<Client: scheduler='tcp://10.148.3.189:8786' processes=8 cores=32>
from pangeo.
Current challenges:
-
How to clean up reliably. The job scheduler is sending SIGTERM to my processes, so normal cleanup processes fail to take over. Is there any way to get a polite SIGINT a few seconds before SIGTERM? Is there somewhere I can register cleanup code? Perhaps in my script? (this question might be for @davidedelvento , let me know if I should raise a ticket)
-
Is the way that I specify cores-per-process with
PBS -l select=9:ncpus=4:mem=16G
appropriate? This is important because we will want to play with larger and smaller workers for benchmarking. -
What is the right way to set up a JupyterLab (or notebook) server and tunnel appropriately. It would be unfortunate to take over another interactive node just for this. Instead we should probably put this on the same node running the scheduler. Now we'll need to grab that hostname and issue an informative ssh tunneling message like the following to the user:
ssh -N -l username 8888:HOSTNAME:8888 cheyenne.ucar.edu
-
How do we want to handle the diagnostic dashboard, presumably we add another port to the tunneling suggestion above? I wonder if we can get JupyterLab to embed an iframe for us.
from pangeo.
-
Cleanup: as far as I know, this is not possible and PBS User's Guide does not mention anything in this regard. LSF does exactly as you said, but we found that no user took advantage of that feature. Maybe a workaround could be to start an "at" command to send a SIGINT a minute before the scheduled end time?
-
core-per-process: it depends what you are trying to achieve. See the documentation about it which is all I know
-
tunneling: I agree no need to have an additional node just for that (however ssh may take a fair amount of CPU). You should be able to look into the default
start-notebook
script for a good starting point.
from pangeo.
Historically I've always made a custom modulefile to manage a miniconda installation I curate in $HOME on yellowstone. We're writing up documentation on this approach for GCPy, since it applies to most clusters. You just extend @mrocklin's steps by creating a modulefile that you save locally, e.g. $HOME/modulefiles/miniconda:
#%Module -*- tcl -*-
# 'Real' name of package, appears in help,display message
set PKG_NAME miniconda
# Path to package
set PKG_ROOT $env(HOME)/miniconda
######################################################################
proc ModulesHelp { } {
global PKG_ROOT
global PKG_NAME
puts stdout "Build: $PKG_NAME"
puts stdout "URL: http://conda.pydata.org"
}
module-whatis "$PKG_NAME: streamlined conda-based Python package/env manager"
#
# Standard install locations
#
prepend-path PATH $PKG_ROOT/bin
prepend-path MANPATH $PKG_ROOT/share/man
prepend-path LD_LIBRARY_PATH $PKG_ROOT/lib
or for the newer versions of lmod:
help([[
Curated miniconda Python installation.
]])
whatis("Keywords: Python, analysis")
whatis("URL: http://conda.pydata.org")
whatis("Description: Simplified python environment and package manager")
local home = os.getenv("HOME")
prepend_path( "PATH", home .. "path/to/miniconda/bin")
prepend_path( "MANPATH", home .. "path/to/miniconda/share/man")
prepend_path("LD_LIBRARY_PATH", home .. "path/to/miniconda/lib")
Then the path containing this (and other) modulefiles as to be made known to lmod in your whatever startup scripts you have,
export MODULEPATH=/home/<username>/modulefiles:$MODULEPATH
Then you can module load miniconda
as part of any set of scripts which start/deploy your distributed environment.
WRT to tunneling for jupyter notebooks and dask dashboard, are there issues tunneling though the login nodes to compute nodes on cheyenne? Some systems are hit/miss, but may require you to tunnel from the compute node back to the login if direct ssh access to compute nodes is disabled. I've never found clear documentation on exactly how to do this, and it would be useful if someone with more knowledge could pitch in.
from pangeo.
core-per-process: it depends what you are trying to achieve. See the documentation about it which is all I know
This is going to look something like what NCAR describes as a Hybrid OpenMP/MPI job. We want one MPI process per node and a dask-configurable number of tasks per node. We will certainly want to have access to all processors on each node.
Do you know if there is someone else at CISL that would have a better idea of how to do this? I had a short call with @mrocklin this morning where we made some rapid progress. However, the remaining tasks of figuring out how to get the PBS scheduler to work probably need to addressed by a Sys Admin.
from pangeo.
Yes, to be clear we would like each MPI rank to own a be allocated a set of cores. I might want 10 ranks, each with 4 cores. I don't particularly care if they are on the same physical nodes or not.
from pangeo.
Cleanup: as far as I know, this is not possible and PBS User's Guide does not mention anything in this regard. LSF does exactly as you said, but we found that no user took advantage of that feature. Maybe a workaround could be to start an "at" command to send a SIGINT a minute before the scheduled end time?
Most of my cleanup is to delete temporary files. Is there somewhere I can write temporary data that I know will also be cleaned up by the job scheduler?
Also, just checking, the compute nodes on Cheyenne don't have attached local storage, correct? Is there a fast/insecure place to write temporary data?
from pangeo.
I think you just use either or both ncpus
and mpiprocs
and that should work. Give it a try and if you encounter issues send email to cislhelp. See also the provided examples including the section on pinning which is important if your workload seriously depends on the cache for performance
from pangeo.
The scratch part of the distributed file system seems a natural choice to me. This will be cleaned automatically according to the purge policy.
If you want higher performance you can use the local tmp which is ramdisk, but very limited in size. This will be cleaned automatically by some PBS hooks.
If you need more (fast) disk and you don't need much memory (each node has 64GB or
128 GB) we can probably set up a larger ramdisk, but I have to double check on that, and only after you present convincing arguments for the previous two not being adequate.
from pangeo.
Regarding Jupyter servers (I prefer JLab, but I imagine that others may prefer the Jupyter notebook server) I do the following:
From the login node I quickly connect to the cluster, set up JLab to run on the scheduler process, get the hostname, and print out the appropriate ssh command
from dask.distributed import Client
client = Client(scheduler_file='scheduler.json')
def start_jlab(dask_scheduler):
import subprocess
proc = subprocess.Popen(['jupyter', 'lab', '--ip', '*', '--no-browser'])
dask_scheduler.jlab_proc = proc
client.run_on_scheduler(start_jlab)
import socket
hostname = client.run_on_scheduler(socket.gethostname)
print("ssh -N -L 8787:%s:8787 -L 8888:%s:8888 cheyenne.ucar.edu" % (host, host))
print("Navigate to http://localhost:8787 for the dask diagnostic dashboard")
print("Navigate to https://localhost:8888 for a JupyterLab server")
Previously I have set up a password for Jupyter in my home directory by running
jupyter notebook --generate-config
Then I edit the file ~/.jupyter/jupyter_notebook_config.py
to include a password (search for password and instructions will be in the right place).
from pangeo.
My desire for fast ephemeral storage is for writing excess data to disk when workers run out of memory. On commodity systems this is standard practice but obviously a bit less natural on an HPC system. We obviously want to strongly discourage users from depending on this, but in the course of interactive workloads they'll inevitably push up against memory boundaries. Having some disk lying around that we can spill to, even at extreme performance penalty is nicer than OOM-killing their jobs. (the dashboards will also go all red when do they do this, so they get good feedback).
Given what you've said above it sounds like the scratch drive is the best choice that we have. It'll be interesting to see if this generates excessive junk data on that drive.
Regarding planned shutdown that's certainly possible, and we have good mechanisms to do this already. The challenge to interactive workloads is, I think, that people will likely overshoot their walltime significantly and then cancel jobs when they're done.
from pangeo.
I'm going to write up what I have in the wiki, and see if I can consolidate some of this into a script or something.
from pangeo.
First draft: https://github.com/pangeo-data/pangeo-discussion/wiki/Getting-Started-with-Dask-on-Cheyenne
I get the sense that everyone is busy, feedback and trial testers welcome :)
from pangeo.
@mrocklin I walked through your wiki steps with a beginner-Python colleague just now. Worked great except for a small tweak to the ssh tunneling command (need to pass username or else my YubiKey tokens wouldn't work). I'll make an edit on the wiki.
from pangeo.
The dask-mpi
executable is now on the released version of dask.distributed with packages on conda-forge.
from pangeo.
I've updated the wiki with the workflow from the youtube video
from pangeo.
OK, I think that the first basic pass here is complete. There is still almost certainly work to do here but I think that we're hopefully at a point where we can set up some feedback cycles from basic users.
Thank you @darothen for starting this up. I would encourage others to walk through things as well and report where things break. @rabernat if you can look things over and see if we're at a point where we can start engaging others at your institution that might be useful.
from pangeo.
@darothen also please let me know if there is anything I can do to engage you and your group more effectively on this. It would be great to collaborate further.
from pangeo.
(Edit: add another point on resources allocation an memory)
Thanks @mrocklin for pointing this discussion to me. I definitly need to try this on our cluster. I hope I will find the time soon. Just two quick remarks:
- On your PBS resources allocation, I feel there is something wrong in your example:
#PBS -l select=2:ncpus=72:mpiprocs=6:ompthreads=6
: With that, you're actually reserving 2 nodes with 72 core each. It is visible in the qstat output which show 144 tasks. PBS way is select 2 nodes with ncpus, mpiprocs or any other options applied to each of them.
You should also probably limit the memory you intend to use, by adding :mem=24G at the end of the select line. E.g. at the end:#PBS -l select=2:ncpus=36:mpiprocs=6:ompthreads=6:mem=24G
. This way you ensure that you'll have enough memory, and also limit your use to 24GB per selected resource. - On the dynamic allocation. This can be very usefull, but I don't believe in this sentence:
However we seem to be able to get much faster response from the job scheduler if we launch many single-machine jobs. This allows us to get larger allocations faster (often immediately).
We will indeed be able to have a smaller cluster faster, and then increase its size, but I don't believe we can get larger (final?) allocations faster. Maybe you observed this because of the problem on PBS resource allocation I mentionned above?
@subhasisb could you confirm both remarks if you have some time?
And finally I also definitly need to promote pangeo initiative internaly. I see the LEGOS lab is already involved! Thanks all for your work.
from pangeo.
(Edit for result with complete nodes)
Not sure if it is the right place to post...
I gave a try to this solution this morning, this looks promising but I ran into a problem, it seems like the Scheduler is started on several of my mpi procs. Maybe it is because I am selecting small resources which may run on the same host. But in that case I would expect to get the error for the Workers, not for the Scheduler. Our cluster has nodes with 24 cores and 128GB. I am currently trying with a selection of complete nodes (ncpus=24), but I need to wait for some room on our cluster.
(Edit:) I am actually getting the same problem with full reserved nodes. With resources #PBS -l select=2:ncpus=24:mpiprocs=4:ompthreads=6:mem=110G
, I get two nodes, but the scheduler is launched 4 times on each node, so one correct start and 3 failures on each nodes. Dask distributed version is 1.18.3.
Here is my PBS script:
#!/bin/bash
#PBS -N sample_dask
#PBS -l select=4:ncpus=6:mpiprocs=1:ompthreads=6:mem=24G
#PBS -l walltime=01:00:00
# Qsub template for CNES HAL
# Scheduler: PBS
export PATH=/home/eh/eynardbg/miniconda3/bin:$PATH
source activate pangeo
module load openmpi/2.0.1
rm -f scheduler.json
mpirun --np 4 dask-mpi --nthreads 6 --memory-limit 24e9 --interface ib0
Here is the error stack.
distributed.scheduler - INFO - Scheduler at: tcp://10.135.36.24:8786
distributed.scheduler - INFO - bokeh at: 10.135.36.24:8787
distributed.scheduler - INFO - Scheduler at: tcp://10.135.36.23:8786
distributed.scheduler - INFO - bokeh at: 10.135.36.23:8787
Traceback (most recent call last):
File "/home/eh/eynardbg/miniconda3/envs/pangeo/bin/dask-mpi", line 6, in <module>
sys.exit(distributed.cli.dask_mpi.go())
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/distributed/cli/dask_mpi.py", line 85, in go
main()
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/distributed/cli/dask_mpi.py", line 48, in main
scheduler.start(addr)
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/distributed/scheduler.py", line 459, in start
self.listen(addr_or_port, listen_args=self.listen_args)
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/distributed/core.py", line 216, in listen
self.listener.start()
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/distributed/comm/tcp.py", line 360, in start
backlog=backlog)
File "/home/eh/eynardbg/miniconda3/envs/pangeo/lib/python3.6/site-packages/tornado/netutil.py", line 199, in bind_sockets
sock.listen(backlog)
OSError: [Errno 98] Address already in use
from pangeo.
You are probably calling the dask-mpi
program four times somehow. The dask-mpi program in pseudocode looks like the following:
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
start_scheduler()
else:
start_worker()
This seems simple enough that I don't expect there to be a problem on the dask side. I suspect that the problem is in how you're calling the MPI program.
from pangeo.
We will indeed be able to have a smaller cluster faster, and then increase its size, but I don't believe we can get larger (final?) allocations faster. Maybe you observed this because of the problem on PBS resource allocation I mentionned above?
I don't know enough about the scheduling policies of the job scheduler to comment intelligently here. I'm just reporting my experience.
from pangeo.
Thanks @mrocklin for the pseudocode, I was able to test a simple python script and indeed it comes from the MPI program. It didn't work with openmpi, rank variable was always 0, but it worked with intel version. After discussing with my coworkers, it seems mpi4py is tightly linked with an mpi implementation when it installs. So it's kind of a tricky issue here, be carefull of what mpi implementation is available in your environment when issuing:
conda create -n pangeo -c conda-forge \
python=3.6 dask distributed xarray jupyterlab mpi4py
Now I am able to start dask:
In [1]: from dask.distributed import Client
...: client = Client(scheduler_file='scheduler.json')
...: client
...:
Out[1]: <Client: scheduler='tcp://10.135.36.37:8786' processes=7 cores=42>
I have no time to go further yet, but its already a really good result! Thanks again for the work.
One question: is it possible to choose in which folder we want to write the scheduler.json file using dask-mpi? And perhaps it could be written in the current submission folder by default?
from pangeo.
Yes, see the helpstring for dask-mpi
mrocklin@carbon:~$ dask-mpi --help
Usage: dask-mpi [OPTIONS]
Options:
--scheduler-file TEXT Filename to JSON encoded scheduler
information.
--interface TEXT Network interface like 'eth0' or 'ib0'
--nthreads INTEGER Number of threads per worker.
--memory-limit TEXT Number of bytes before spilling data to disk.
This can be an integer (nbytes) float
(fraction of total memory) or 'auto'
--local-directory TEXT Directory to place worker files
--scheduler / --no-scheduler Whether or not to include a scheduler. Use
--no-scheduler to increase an existing dask
cluster
--help Show this message and exit.
You want the --scheduler-file
keyword. It defaults to scheduler.json
. Should it default to something else?
from pangeo.
Hrm, can you think of any way to make mpi4py
work more generically?
from pangeo.
For your first question, it may be again that I did not use mpirun correctly, I am not an mpi expert (more used to Hadoop and Spark). I was in a specific folder when I issued the qsub command, so I was expecting the scheduler.json file to be written to that folder (which should be the case if I understand correctly what you are saying). But it was written in my $HOME dir. I need to check my PBS script or the way mpirun is working, I may have to do a change dir command in the PBS script or give some option to mpirun.
For the second point, I was just repeating what one of my colleague observed when installing python and mpi4py on our cluster. It seems that during the module installation, openmpi or intel library (the one being available at the time) is statically linked to the mpi4py installation, with no way to change it afterwards. It appears some path to the library is written once and for all in some file. So if this mechanism is confirmed, I believe it should be changed.
But some warning may be enough, and I did not check on mpi4py page or source code to verify it, take this carefully even if I completly trust my colleague.
from pangeo.
@jhamman any thoughts on why this might suddenly start failing?
import netCDF4 as nc4
filename = '/glade/u/home/jhamman/workdir/GARD_inputs/newman_ensemble/conus_ens_001.nc'
nc4.Dataset(filename)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-7-d6fa56ea26ea> in <module>()
1 filename = '/glade/u/home/jhamman/workdir/GARD_inputs/newman_ensemble/conus_ens_001.nc'
----> 2 nc4.Dataset(filename)
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
OSError: NetCDF: Unknown file format
from pangeo.
@mrocklin I was actually just playing with this before lunch... it looks like conus_ens_001.nc and conus_ens_004.nc in that folder are empty files, and the NetCDF reader doesn't handle them gracefully.
from pangeo.
Oddly it did handle it well yesterday. Perhaps these files have been removed or recently added?
from pangeo.
Looks like they were recently re-written (yesterday afternoon) -
from pangeo.
from pangeo.
No rush, was just curious
from pangeo.
@mrocklin - they're back and improved. I'm adding the other 95 ensemble members too. Note that I added the lowest level of zlib compression (level 1) to these files. Let me know if that causes any problems.
from pangeo.
@mrocklin and @darothen - The sample dataset has been revived and is now 100 ensemble members in size.
I think we can close this issue. I've successfully run xarray / dask.distributed / jupyter notebook on cheyenne and on two other PBS systems. I also roped a few students from the University of Washington into walking through the wiki and setup the system on their local clusters - without my help, they were able to successfully do it!
from pangeo.
I'm glad to hear this. I suspect that we'll need to iterate in the future. Please speak up if you or anyone around you notices any opportunities for improvement.
from pangeo.
@rabernat @jhamman @mrocklin - thanks so much for this. We just used these notes to get set up on UC Berkeley's Savio SLURM cluster, connecting to compute nodes through their jupyterhub login nodes. Haven't figured out yet how to enable the dashboard, as they block SSH port forwarding, but we're working with the HPC's IT staff to find a solution. We can let you know how that goes and contribute notes/instructions on translating this to a SLURM environment if you're interested.
from pangeo.
I'm very glad to hear it @delgadom . You might consider trying nbserverproxy
from pangeo.
pip install git+https://github.com/jupyterhub/nbserverproxy
jupyter serverextension enable --py nbserverproxy --sys-prefix
Then you should be able to navigate to /proxy/8787/status
or something similar
See https://github.com/jupyterhub/nbserverproxy/ cc @yuvipanda
You may also want distributed
master
from pangeo.
Fantastic @delgadom!
At some point, we want to try to collect a list of all the local clusters where this has been deployed, along with any site-specific tweaks that are necessary. I'll follow up with you once we figure out how to organize that within the documentation.
from pangeo.
It would be nice for such a list to point to other active documentation on how to deploy these systems within various clusters. I suspect that having such a list of active deployments would provide examples for other groups to start themselves.
from pangeo.
Related Issues (20)
- Codecov Bash Uploader Security Notice HOT 3
- Change website template to sphinx-book pangeo style HOT 2
- Euro-Cordex datasets in the cloud HOT 1
- Update the tools mentioned as the "pangeo stack" HOT 7
- Clarify relationship with Project Pythia on the Pangeo website HOT 9
- Add guidance for contributing HOT 3
- Create a video for the home page "What is pangeo" HOT 3
- Update homepage to point to guideance for our core expected user groups HOT 9
- Request membership to Pangeo-data HOT 1
- Pangeo binder link not working anymore HOT 2
- Admin of Organization HOT 1
- Calendar Security HOT 2
- Improve navigation to cloud deployment instructions HOT 3
- Discussion: Stale bot should not close issues HOT 3
- The showcase archive has disappeared HOT 2
- Reporting a vulnerability HOT 1
- The problem of using pangeo-data HOT 1
- Can't reproduce pangeo galery exemple: RefreshError: ("Failed to retrieve http://metadata.google.internal... HOT 2
- Website refresh HOT 1
- Google Analytics HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pangeo.