Coder Social home page Coder Social logo

aind-data-asset-indexer's People

Contributors

github-actions[bot] avatar jtyoung84 avatar mekhlakapoor avatar yosefmaru avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

aind-data-asset-indexer's Issues

Dockerize and Publish docker image

User story

As an engineer, I want to publish a docker image so that I can run the code in the cloud.

Acceptance criteria

  • Given a branch is merged into main, then github action will build and publish a docker image.
  • Figure out where to publish docker image.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Ensure full added aind-data-access-api on the next upgrade

User story

As a user I want to ensure that aind-data-access-api is fully added on the next upgrade so that I can access the latest changes

Acceptance criteria

  • Change pyproject.toml to use the full aind-data-access-api

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Latest api no longer returns certain keys in responses

Describe the bug
In the past, "tags" used to always be in the response returned from Code Ocean.

To Reproduce
Steps to reproduce the behavior:

  1. Run response = co_client.search_all_data_assets() and then results = response.json()["results"]
  2. Notice that some of the latest results don't have a "tags" field.

Expected behavior
In the past, the tags was always present. We'll have to update the code to handle this new behavior.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Use updated "derived" tags to filter records

User story

As a user, I want to see processed data in the docdb. The tags are being changed from "processed" to "derived" so we need to update that here.

Acceptance criteria

  • We should keep "processed" for legacy purposes.
  • We need to add aind_data_schema.data_description.DataLevel.derived also

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Nonexistent assets appearing in docdb

Describe the bug
The SmartSPIM report shows the following records:
image

These were for runs that have since been deleted from S3, and as far as I can tell they also do not exist in Code Ocean.

To Reproduce
Steps to reproduce the behavior:

  1. Go to the SmartSPIM dashboard
  2. Filter to subject_id 695464.
  3. See all the zombie assets.

Expected behavior
Only real data should appear in docdb :).

Add a check in S3 crawler script to catch empty pandas dataframes

User story

As a software engineer, I want to add a check in S3 crawler script, so I can catch empty dataframes before merging them.

Acceptance criteria

  • Check for empty dataframes before merging

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Upgrade dependencies

User story

As a user I want to use the latest versions of the dependencies so that I can access the latest changes

Acceptance criteria

  • Change the docker file
  • Change pyproject.toml to use the latest dependencies

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

aind-data-access-api needs full option

Add script to crawl through S3

User story

As an engineer, I want a script to do sanity checks on DocDB.

Acceptance criteria

  • Given script is run, then all data asset records in S3 are downloaded
  • Given downloaded data asset records, check whether metadata file exists and create a table
  • Given table, save table in redshift

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Figure out how we want to log discrepancies.

Update DocDB

User story

As an engineer, I want DocDB to be updated to match S3.

Acceptance criteria

  • Given table in redshift with records info, compare with records in DocDB.
  • Given discrepancies, update DocDB and log.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Add check for processed data in non-Code Ocean results bucket.

User story

As a user, I want to index processed data assets stored in aws buckets other than the Code Ocean results bucket, so I can analyze data from there.

Acceptance criteria

  • When a run is executed, data tagged with processed and are in s3 buckets will be added to the document store.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Add code to index data assets

User story

As a user, I want to update the document store with data asset metadata, so I can search them more easily.

Acceptance criteria

  • When the capsule is triggered, the document store will be populated.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Maybe this will be moot in with future releases of Code Ocean?

Use ec2 instance role to create boto3 client

User story

As a user, I want to optionally pass an ec2 instance's assumed role's aws credentials, so I can run the capsule via pipelines.

Acceptance criteria

  • If the default aws credentials attached to a capsule are not found, then it will attempt to use the ec2 instance's credentials.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Add testing and linting on pull request

User story

As a developer, I want to have checks on tests and code formats, so I can maintain the code base easier.

Acceptance criteria

  • When a Pull Request is opened, tests and linters will run automatically.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.