gsa-tts / all_sorns Goto Github PK
View Code? Open in Web Editor NEWRepo for SORN DASH
Home Page: https://all-sorns.app.cloud.gov
License: Other
Repo for SORN DASH
Home Page: https://all-sorns.app.cloud.gov
License: Other
Users are not recognizing the agency names as they are, expecting them to be named differently. We should use the more common version of the name
Ex: Department of Defense not Defense Department, Office of Personnel Management not Personnel Management Office
Right now we are at 2000. That feels very low.
The Federal Register has way more, this screenshot shows search results that should be similar to our API calls. These are before we filter out matching, rulemaking, or implementation SORNs.
Based on latest user research, I think we may want to include the above, but not fully parse them. Just keep the basic info from the api and show links to them if the user wants to research more.
The low numbers we have can also be from errors in our code, we don't have good error handling set up yet. I've also noticed some SORNs where the title doesn't mention 'matching' or the other unwanted types, but it'll be detailed in the action
section.
action
as well as the title
. Right now we only filter out on title
.An alternative navigation model that allows privacy officers to look through SORNs without a keyword search
consider using a different route for landing page, results, browse mode. Right now, in order to get to filters / section, must have a search term in the url.
Replace the Federal Register Ruby Gem with plain http requests to the API.
The Federal Register Ruby Gem isn't feature complete. The two main missing features are:
We are already using the httparty to make other web requests, use it to make these API calls as well. They have an example of how to turn it into its own class. Do that or just have big long ugly urls, that works too.
look for system numbers in summary section if not found in system name to reduce number of unknowns.
Should we change the way we present unavailable SORNs?
@igorkorenfeld I see this in Figma.
It looks like a combo-box meets a multi-selct meets a checkbox list.
USWDS doesn't have anything like that.
I do see a discussion where the USWDS team didn't get to any answers on it. uswds/uswds#22
I can hack something together, but I can't say it will be elegant or meet accessibility requirements.
To get some ideas of what is possible, ignoring accessibility needs, see https://harvesthq.github.io/chosen/
Make the find_sorns_job into a production ready service.
As an engineer, I want a job that I can easily start and stop, that will grab all SORNs that we haven't already saved in our database. I want to be able to easily change the params (newest, oldest, etc) and be able to resume a run from where it last stopped.
I also want this job to keep teaching us, so it should report on the data it is finding or not finding. Old SORNs not having xml_urls for example.
Until now we've been using it to learn more about what data the Federal Register has available and start looking as SORNs. I often will change the params and then run it to grab just a few handfuls of SORNs, then stop it from running.
https://www.federalregister.gov/documents/full_text/xml/2017/05/03/2017-08950.xml
uses another HD for its system name title!
<HD SOURCE="HD2">SYSTEM NAME AND NUMBER:</HD>
<HD SOURCE="HD1">Department of Education Federal Docket Management System (EDFDMS) (18-09-05).</HD>
There is no way for keyboard users to skip repetitive content on the page when navigating/loading new pages. Often, a mechanism like a "skip link" is added to allow keyboard-only users to jump easily to the main content in the page.
Repeated tab stops on search results page:
Success Criterion 2.4.1 Bypass Blocks (Level A): A mechanism is available to bypass blocks of content that are repeated on multiple Web pages.
Implement a "skip to main content" link per accessibility guidance offered in the USWDS Header component.
We were explaining the business models to our partners and they wanted to see what we meant.
Build two versions of the service:
Send it to partners by Thursday, so they can review and talk about it in our call on Friday.
Form controls must have accessible names.
The second-level heading (h2
) above the search form would make for a good label
for the search input
.
<h2>Search for SORNs by entering a keyword (will return exact matches)</h2>
<label for="general-search">Search for SORNs by entering a keyword (will return exact matches)</label>
If the team would prefer to keep the h2
as-is, then the search input
could be assigned an aria-label
attribute with a concise value to serve as the input
's label
, like:
<input class="usa-input" id="general-search" type="search" name="search" value="<%= params[:search] %>" aria-label="search"></input>
There is a fieldset
element wrapping the search input
and button
which is unnecessary. Fieldsets are typically used to represent "a set of form controls optionally grouped under a common name" (W3C HTML 5 Spec). If this is being used to attach CSS classes for presentational purposes, the div
element may be a better choice here.
The language of each page must be set so that text is presented correctly for assistive technologies and conventional browsers/user agents.
/app/views/layouts/application.html.erb, line 2
Apply lang="en"
to the html
element in the application layout view as well as anywhere the html
element may be rendered from.
Figure out how many SORNs we can get total from the Federal Register API
We search on the phrase Privacy Act of 1974; System of Records
.
We filter out SORNs that have these words in the title, because they don't seem relevant. 'matching', 'rulemaking', 'implementation'. We may reconsider 'Computer matching agreement' SORNs later.
Look at all these routes we've still got available.
sorns_path GET /sorns(.:format)
sorns#index
POST /sorns(.:format)
sorns#create
new_sorn_path GET /sorns/new(.:format)
sorns#new
edit_sorn_path GET /sorns/:id/edit(.:format)
sorns#edit
sorn_path GET /sorns/:id(.:format)
sorns#show
PATCH /sorns/:id(.:format)
sorns#update
PUT /sorns/:id(.:format)
sorns#update
DELETE /sorns/:id(.:format)
sorns#destroy
We can get rid of these by dropping the resources :sorns
line in the routes.rb file.
Do we want to allow people to do searches that will have no results? Like no fields selected? Should we show a red validation warning or something?
The download icon on the search results page is missing the alt
attribute.
/app/views/sorns/search.html.erb, Line 40
Success Criterion 1.1.1 Non-text Content (Level A): All non-text content that is presented to the user has a text alternative that serves the equivalent purpose, except for the situations listed below.
...
If non-text content is pure decoration, is used only for visual formatting, or is not presented to users, then it is implemented in a way that it can be ignored by assistive technology.
Since this icon is decorative and part of the link that reads "Download results as a CSV file", the alt
attribute, when added, can be blank (or null) so that assistive technology will ignore the image. The code for this should look something like this:
<%= image_tag("Download_Icon.svg", alt: "")%>
There are duplicated id
attributes for elements on the search results page. This appears to be caused by agency names that are doubled-up in the agencies listing of checkboxes to filter results by. Searching by "FedRAMP" will show this issue within results page.
Success Criterion 4.1.1 Parsing (Level A): In content implemented using markup languages, elements have complete start and end tags, elements are nested according to their specifications, elements do not contain duplicate attributes, and any IDs are unique, except where the specifications allow these features.
Ensure there are no duplicate ID
s on the page. This will likely resolve itself if the agencies list does not contain duplicate agency names, as the id
for each appear to be derived from the agency name.
There is a div
with a nav
element (for pagination) as a direct child of the parent unordered list in the search results page. This could be confusing and/or problematic for screen reader users. Per Accessibility Insights, "<ul>
and <ol>
must only directly contain <li>
, <script>
or <template>
elements. See more info here."
Success Criterion 1.3.1 Info and Relationships (Level A): Information, structure, and relationships conveyed through presentation can be programmatically determined or are available in text.
Location of code:
/app/views/sorns/search.html.erb, Line 168
Move the following code to just after the closing </ul>
element:
<div class="grid-offset-6 grid-col-6 margin-bottom-3">
<%= paginate @sorns %>
</div>
Currently, our system names look like
["Department of Homeland Security (DHS)/United States Secret Service (USSS)-001 Criminal Investigation Information System of Records."]
When we don't have a system_name, it throws an error.
from GoodJob(default) in 1058.5ms: NoMethodError (undefined method `join' for nil:NilClass):
2020-11-17T09:49:59.25-0800 [APP/TASK/6d3e4936/0] OUT /home/vcap/app/app/models/sorn_xml_parser.rb:75:in `get_system_name'
2020-11-17T09:49:59.25-0800 [APP/TASK/6d3e4936/0] OUT /home/vcap/app/app/models/sorn_xml_parser.rb:15:in `parse_xml'
2020-11-17T09:49:59.25-0800 [APP/TASK/6d3e4936/0] OUT /home/vcap/app/app/models/sorn.rb:79:in `parse_xml'
2020-11-17T09:49:59.25-0800 [APP/TASK/6d3e4936/0] OUT /home/vcap/app/app/jobs/parse_sorn_xml_job.rb:11:in `perform'
PR in #54
Have our SORN data collection create single records of SORNs it finds. It needs to update the existing data as our parsing gets better and more refined.
Is there a unique identifier for the SORNs that we can use?
My prototype versions just create duplicate SORNs on each run. I would wipe the database often. It is time for us to start keeping the data.
This one has multiple agencies)
We can use the relationships from the FedReg API.
{
"raw_name": "DEPARTMENT OF DEFENSE",
"name": "Defense Department",
"id": 103,
"url": "https://www.federalregister.gov/agencies/defense-department",
"json_url": "https://www.federalregister.gov/api/v1/agencies/103",
"parent_id": null,
"slug": "defense-department"
},
{
"raw_name": "Department of the Air Force",
"name": "Air Force Department",
"id": 13,
"url": "https://www.federalregister.gov/agencies/air-force-department",
"json_url": "https://www.federalregister.gov/api/v1/agencies/13",
"parent_id": 103,
"slug": "air-force-department"
}
I think because of turbolinks impacting the uswds.min.js was need to revise how we pull in the uswds js
Runs CFR parser regex, then goes back to see if we have the SORN or not. Currently, on pageload - turn into one-off command - or on ingest.
WIP in #43
The page jumps down when clicking on a filter that's below the scroll view.
To replicate:
AC
Add explanation on the search interface that search by number is just for a-108 compliant SORNs - and link back to about page for explanation.
Federal Register API has agency acronyms in this endpoint:
https://www.federalregister.gov/developers/documentation/api/v1#/Agencies/get_agencies
We should add them to our model - it will allow users to find agencies by acronym and resolve some of the confusion around unexpected agency name variants.
We need to get this data once per instance, it should be added to the deployment scripts.
Separate the build_and_test
and the deploy
jobs in Github Actions.
The deploy job gets run all the time, but then bails if its not a push to main
. This is scary.
Instead, have two different job files in our workflow folder, set the on
action like:
build_and_test:
on: [ push, pull_request ]
deploy:
on:
push:
branches: [ main ]
https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions
Our app isn't updating nightly like it should be.
We have a command run in the middle of the night to look for new SORNs. Yet, our worker isn't awake then. We need to define a worker process to be waiting to do do work.
https://docs.cloudfoundry.org/devguide/deploy-apps/manifest-attributes.html#-processes
We had been starting GoodJob (our worker) by hand. It gets stopped whenever there is a deploy though (I assume).
On Heroku, this would double the monthly bill. I don't know how Cloud.gov charges.
There are a few SORN fields that we can get from both the content of the SORN or from the API.
There are 920 SORNs that don't have an XML link for us to parse. They all have the text link though. The Federal Register API has done some text parsing for us and made those fields available by API
I compared the Action we are parsing vs the Action available by the API. They are all the same. There are an additional 920 actions available by the API. Let use the api_action instead of our parsed actions.
Same as above with Dates.
The GPO publishes Privacy Act Issuances. They are packaged as one big XML file containing all agency SORNs to date, and in a structured format. This would be useful, but is the data reliable?
Questions to answer:
Are the initial publication dates accurate (in the <previouslyPublished>
section)?
Do the SORNs change at all from bundle to bundle to reflect modifications?
Does the most current bundled version match the most current version in the federal register?
Do the bundled versions contain SORNs that have been rescinded?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.