Coder Social home page Coder Social logo

sfapi_client's Introduction

Welcome to sfapi_client

sfapi_client is a Python 3 client for NERSC's Superfacility API.


Install sfapi_client using pip:

$ pip install sfapi_client

Let's get started by checking the status of perlmutter:

>>> from sfapi_client import Client
>>> from sfapi_client.compute import Machine
>>> with Client() as client:
...     status = client.compute(Machine.perlmutter)
...
>>> status
Compute(name='perlmutter', full_name='Perlmutter', description='System Degraded', system_type='compute', notes=['2023-04-26 18:16 -- 2023-04-28 09:30 PDT, System Degraded, Rolling reboots are complete, a final reboot is scheduled for 0930 PDT'], status=<StatusValue.degraded: 'degraded'>, updated_at=datetime.datetime(2023, 4, 26, 18, 16, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200))), client=<sfapi_client._sync.client.Client object at 0x102c871c0>)

Features

  • async interface and standard synchronous interface.
  • Fully type annotated.

Documentation

For the basics, head over to the QuickStart. We also have Jupyter Nodebook examples.

More in depth developer documentation can be found in the API reference.

Dependencies

The sfapi_client project relies on these libraries:

  • httpx - HTTP support.
  • authlib - OAuth 2.0 authentication.
  • pydantic - Data models.
  • tenacity - Retry.
  • datamodel-code-generator - Generating data models from the Open API specification.
  • unasync - Generating synchronous interface from asyn implementation.

Installation

Install with pip:

$ pip install sfapi_client

sfapi_client's People

Contributors

cjh1 avatar tylern4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sfapi_client's Issues

Tenacity retries on 403 forbidden

It might be better to exit if we get a forbidden instead of constantly retrying. I ran into the client just waiting and retying when I had loaded the wrong key that was for a different ip address.

Importing sqlmodel after importing sfapi_client clobbers... something

Try the reproducer here:
https://github.com/swelborn/sfapi_client_bug_reproducer

Traceback (most recent call last):
  File "/Users/swelborn/Documents/gits/sfapi_client_bug_reproducer/notworking.py", line 61, in <module>
    asyncio.run(run())
  File "/Users/swelborn/miniconda3/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/swelborn/miniconda3/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/swelborn/Documents/gits/sfapi_client_bug_reproducer/notworking.py", line 57, in run
    job = await pm.submit_job(job_script)
  File "/Users/swelborn/Library/Caches/pypoetry/virtualenvs/app-9xoyDW7z-py3.10/lib/python3.10/site-packages/sfapi_client/_async/compute.py", line 117, in submit_job
    job = AsyncJobSqueue(jobid=jobid, compute=self)
  File "/Users/swelborn/Library/Caches/pypoetry/virtualenvs/app-9xoyDW7z-py3.10/lib/python3.10/site-packages/pydantic/main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
  File "/Users/swelborn/Library/Caches/pypoetry/virtualenvs/app-9xoyDW7z-py3.10/lib/python3.10/site-packages/pydantic/_internal/_mock_val_ser.py", line 47, in __getattr__
    raise PydanticUserError(self._error_message, code=self._code)
pydantic.errors.PydanticUserError: `AsyncJobSqueue` is not fully defined; you should define `AsyncCompute`, then call `AsyncJobSqueue.model_rebuild()`.

For further information visit https://errors.pydantic.dev/2.6/u/class-not-fully-defined

You will need to set up .env file in the repo.

Outage tests failing

Tests are failing currently since there's not a planned outage for perlmutter.

FAILED tests/test_resources_async.py::test_planned_outages_by_resource - assert 0 > 0
FAILED tests/test_resources_async.py::test_planned_outages - AssertionError: assert 'perlmutter' in {'helpportal': [Outage(name='helpportal', start_at=datetime.datetime(2023, 5, 17, 9, 0, tzinfo=datetime.timezone(datet... it is upgraded. Users may not be able to log into Iris at this time.', status='Planned', swo='...

sfapi: No architecture specified error

I'm reaching out from the ALS computing group. Was told to reach out here for SFAPI issues in a previous NERSC ticket.

I am working on submitting jobs from Prefect, a workflow orchestration tool. I previously created a child class from the SFAPI client class, for our own computing needs: https://github.com/als-computing/splash_flows_globus/blob/main/orchestration/nersc.py

For my integration of Prefect with sfapi, I am currently using the test script provided in one of the training sessions, modified for my project, in a function that can create a client object from my version of the NerscClient as shown in the file above, and then submitting the job:

def launch_nersc_jobs_tomography(
    
):
    logger = get_run_logger()

    client = create_nersc_client()
    user = client.user()

    logger.info("Client created")

    home_path = f"/global/homes/{user.name[0]}/{user.name}"
    scratch_path = f"/pscratch/sd/{user.name[0]}/{user.name}"

    client.perlmutter.run(f"mkdir -p {scratch_path}/prefect-recon-test")
    #job_script = get_job_script(scratch_path)
    N = 5
    job_script = f"""#!/bin/bash
    #SBATCH -q debug
    #SBATCH -A als
    #SBATCH -N 1
    #SBATCH -C cpu
    #SBATCH -t 00:10:00
    #SBATCH -J sfapi-demo
    #SBATCH --exclusive
    #SBATCH --output={scratch_path}/nerscClient-test/sfapi-demo-%j.out
    #SBATCH --error={scratch_path}/nerscClient-test/sfapi-demo-%j.error
    module load python
    # Prints N random numbers to form a normal disrobution
    python -c "import numpy as np; numbers = np.random.normal(size={N}); [print(n) for n in numbers]"
    """ 
    job = client.perlmutter.submit_job(job_script)
    job.complete()
    logger.info(f"Job {job.id} completed")

The client object is created, the new directory is created, but I run into an error with the submit_job. This is the full error message:

Traceback (most recent call last):
  File "home/splash_flows_globus/orchestration/_tests/test_832_prefect_nersc_jobs.py", line 8, in <module>
    test_launch_nersc_jobs_tomography()
  File "home/splash_flows_globus/orchestration/_tests/test_832_prefect_nersc_jobs.py", line 4, in test_launch_nersc_jobs_tomography
    launch_nersc_jobs_tomography()
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/flows.py", line 1231, in __call__
    return enter_flow_run_engine_from_flow_call(
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/engine.py", line 293, in enter_flow_run_engine_from_flow_call
    retval = from_sync.wait_for_call_in_loop_thread(
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/_internal/concurrency/api.py", line 218, in wait_for_call_in_loop_thread
    return call.result()
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 318, in result
    return self.future.result(timeout=timeout)
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 179, in result
    return self.__get_result()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 389, in _run_async
    result = await coro
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/client/utilities.py", line 100, in with_injected_client
    return await fn(*args, **kwargs)
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/engine.py", line 396, in create_then_begin_flow_run
    return await state.result(fetch=True)
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/states.py", line 91, in _get_state_result
    raise await get_state_exception(state)
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/engine.py", line 877, in orchestrate_flow_run
    result = await flow_call.aresult()
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
    return await asyncio.wrap_future(self.future)
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
  File "home/splash_flows_globus/orchestration/flows/bl832/move.py", line 280, in launch_nersc_jobs_tomography
    job = client.perlmutter.submit_job(job_script)
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/sfapi_client/_sync/compute.py", line 34, in wrapper
    return method(self, *args, **kwargs)
  File "home/splash_flows_globus/env/lib/python3.9/site-packages/sfapi_client/_sync/compute.py", line 111, in submit_job
    raise SfApiError(result["error"])
sfapi_client.exceptions.SfApiError: sbatch: error: No architecture specified, cannot estimate job costs.
sbatch: error: Batch job submission failed: Unspecified error

Is it a script issue? This script works in a notebook, so I haven't had issues with the script before. But I've never seen this error before. Could you direct me on what could have caused this error?

Optimize job monitoring for large numbers of jobs

From @tylern4:

For getting multiple jobs from one squeue/sacct call you can run squeue -j jobid1,jobid2,.../sacct -j jobid1,jobid2,... so I think batch gets should be doable. It's probably be best to have it as an option on the rest api side though, currently giving it multiple job ids just returns the first one in the list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.