Coder Social home page Coder Social logo

containeranalysis's People

Contributors

mtarsel avatar ryescholin avatar

Watchers

 avatar  avatar

containeranalysis's Issues

Improve error codes

There are some sys.exit() commands spread about which can have an argument for an error message. It would be useful to know what failed with those and to make all of the logging that goes along with that have the same language

Add Performance Tests

Outputs useful information about how the program ran that is less verbose, more concise than container-output.log. metrics.log will include:

  • time taken to run
  • # of applications checked total
  • # of apps w/ unparsed images

As I think of other metrics to add, I'll comment them below

Move to python3

Using a venv, ensure this project executes properly using python3. Modules will likely need to be updated and a new requirements.txt will be created.

Remove unneeded dockerhub requests

High priority because if implemented properly, it has the potential to reduce the runtime by >25%, as well as helping to make #31 quite a bit easier because there would be less info to save between runs.

Currently there are requests made to dockerhub in 4 functions while repo_crawling: get_image_tag_count, get_image_tag_names, get_image_tag, and sometimes get_archs.

get_image_tag_count and get_image_tag_names each make the exact same request:

image_url = ('https://' + regis + '/v2/repositories/' + self.org + '/'+
			self.name +'/tags/?page=1&page_size=100')

get_alot_image_tag_names uses a very similar request which can be re-configured using the 'next' field in the return from the get_image_tag_count request.

get_image_tag and get_archs also both make the exact same request:

tag_url = ('https://' + regis + '/v2/repositories/' + self.org + '/'+
			self.name +'/tags/'+ image_tag_name + '/')

Just to test I saved the data from get_image_tag_count and passed it into get_image_tag_names, removing the request from get_image_tag_names. that brought the runtime of python3 get-image-info.py user.yaml -k from 11:09 to 10:19, almost a 9% reduction in runtime already. I am also fairly sure that the info used in get_image_tag and get_image_archs can be found in the first request, thus reducing runtime further.

Travis CI Unit tests

  • Various input to get_image_info.py, like --debug
  • results.csv file contains the correct number of columns
  • General function tests

Make results.csv name include date

rather than just results.csv, there would be an archives folder with file names formated as RESULTS-07-12-2019.csv. The old results from that day would be overwitten but previous days would not

Add Progress bar to project

[ENHANCEMENT]

Currently there is no way to see how long the application has been running for or how close it is to finishing. A simple progress bar would do that

Update /utils/tests.py

See 4cffb18

This is helpful to test a single application rather than waiting for all of them to finish and output. The code in tests.py is out-dated and needs to be changed to better utilize more functions instead of copy code.

This issue will be marked complete once it is possible to test a single application with proper output.

Make program flow stage-based

Currently there are a lot of nested for loop that could be turned each into an individual stage, for instance, to get, extract, and parse the values.yaml the deepest (sorta) stack trace looks like this:

(get-image-info.py) main -> parse_index_yaml -> (indexparser.py) get_tarfile -> obtain_values_yaml -> get_app_info -> parse-image_repo

and the flow currently looks like this:

for main_image:
  download, open, extract tarfile
  for item in tarfile:
    if (values.yaml):
      find images, tags, repos in values.yaml
      for each potential_repo:
        if potential_repo is dict:
          for item in dict:
            if item is dict:
              for sub_item in item:
                if correct_format:
                  append(sub_item) to app_obj
      for repos in app_obj:
        clean the repos

I think that could be cleaned up considerably if you did some thing like

# Stage1: download and extract
for each main_image:
  download, open, extract tarfile
# stage2: find values.yaml
for each main_image:
  find values.yaml
  find find images, tags, repos in values.yaml
# stage3: get potential repos
for each potential_repo:
  if potential_repo is dict:
    for item in dict:
      if item is dict:
        for sub_item in item:
          if correct_format:
            append(sub_item) to app_obj
# stage4: clean the repos

this is a bigger undertaking and might take a while, but this way the code is more modular and easier to read, with fewer traces of "what calls what calls what?"

Add travis badge to README

[ENHANCEMENT]

Since everything in the code is ready for Travis CI integration (on a basic level), there should be a Travis Badge in the README. That requires @mtarsel to link his github to Travis CI first, then we can get the markdown with a link to the travis badge image

related to #2

Share results from program output

Since #40 is going to stay "private", imo the best way to share the results would be to:

  • Execute get-image-info.py on a Jenkins server
  • Cross-validate (#40), get results, get diff (#39)
  • Share info via Slack API from Jenkins

When there is a diff or something is very wrong, this should be shared in our slack channel.

So this issue depends on #39 and #40 being closed first.

Properly obtain Product name for Application

This task depends on #7

The Product name is obtained by grabbing the first line of the README for each Application. Sometimes in the readme, there is comments which are not seen once the markdown is rendered and so we don't actually get the Product name in the results.csv

Since there is only a few applications with this problem, #7 should be closed and then we can utilize the single test cases to save ourselves some time.

Remove non-pr travis tests

Since we are no longer using the github api token, there is no need to have a travis secret environment variable and thus there need not be a distinction between PR builds and non-PR builds.

Obtain info from GSA dashboard and cross-validate against our results

There exists a dashboard with architecture information. This task is focused on:

  • Downloading arch info from GSA dashboard
  • Extracting arch info
  • Validating GSA dashboard info with our info from dockerhub

This issue is a different type of cross-validation than #39 in that this task is about verifying our results against a different data set (GSA dashboard) and not verifying our own past results.

Utilize Github API to obtain Chart.yaml and values.yaml

Rather than downloading the Application's tarball from github and extracting just the files we need, we should just use the Github API and download the raw values.yaml and Chart.yaml into the Applications/{app_name}/

This should reduce run time by removing all the file system calls. It should be noted this will increase the number of requests to github so authentication will be mandatory to run this project to avoid the rate limit of the API.

Create way to save docker info locally

Since the majority of the runtime comes from crawling dockerhub, what if we were able to just save all of the information we pulled from dockerhub locally? A couple ideas to do this:

  • Save the information we pull as json files, since it come in json format already
  • save all of the information we have on apps as json files (so we don't need to pull anything, no values.yaml, no README.md, no Chart.yaml)

This should reduce the runtime even further than #13 did

Add more travis tests

There can be way more Travis tests utilizing the new --test feature to ensure proper outputs, as well as creating individual app objects and using those to test individual functions

Get diff of archives

Alert the user if there was a change from the last archived result to the currently result, and what that change is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.