Coder Social home page Coder Social logo

dl-discovery's Introduction

Vecnet Metadata Catalog

This application provides the Vecnet Metadata Catalog. It handles the curation and indexing of the data generated by the Vecnet cyberinfrastructure.

Dependencies

  • Fedora Commons 3.6
  • Solr 4.3
  • Redis (version?)
  • Postgresql or other SQL database
  • nginx
  • chruby Ruby version manager

The SETUP file has detailed steps on installing the platform on a bare RHEL machine.

Deployment

First, your public ssh key needs to be put on the server. Ask Don to do this. To deploy to QA:

cap qa deploy

To deploy to Production:

cap production deploy

To deploy from branch

cap <environment> deploy -S branch=<branch name>

To deploy new nginx config. This will reload nginx.

cap <environment> vecnet:update_nginx_config

Other server admin tasks

To rebuild the Fedora object store:

sudo service tomcat6 stop
cd /opt/fedora/server/bin
sudo FEDORA_HOME=/opt/fedora CATALINA_HOME=/usr/share/tomcat6 ./fedora-rebuild.sh
# choose option 1 to rebuild the resource index
sudo FEDORA_HOME=/opt/fedora CATALINA_HOME=/usr/share/tomcat6 ./fedora-rebuild.sh
# choose option 2 to rebuild the SQL database
sudo service tomcat6 start

To resolarize everything...it will take a LONG time to complete.

chruby 2.0.0-p353
RAILS_ENV=qa bundle exec rake solrizer:fedora:solrize_objects

To load and build the MeSH trees run. This will run for a while (~0.5--1 hours)

chruby 2.0.0-p353
RAILS_ENV=qa bundle exec rake vecnet:import:mesh_subjects vecnet:import:eval_mesh_trees

To resolrize with mesh synonyms...it will take a LONG time to complete.

chruby 2.0.0-p353
# This builds the synonyms.txt file if needed.
# you could skip this if synonyms did not change
RAILS_ENV=qa bundle exec rake vecnet:solrize_synonym:get_synonyms FILE=solr_conf/conf/synonyms.txt
#copy this file to solr core
sudo  cp solr_conf/conf/synonyms.txt /opt/solr-4.3.0/vecnet/conf/synonyms.txt
#copy schema and solrconfig
sudo  cp solr_conf/conf/schema.xml /opt/solr-4.3.0/vecnet/conf/schema.xml
sudo  cp solr_conf/conf/solrconfig.xml /opt/solr-4.3.0/vecnet/conf/solrconfig.xml
#change owner to be tomcat
sudo chown tomcat:tomcat -R /opt/solr-4.3.0
#restart solr
sudo service tomcat6 restart
#resolrize all objects
RAILS_ENV=qa bundle exec rake solrizer:fedora:solrize_objects

To ingest Citation to qa/Production #Copy endnote file to file to /opt/endnote and make sure everyone can read sudo cp /from/path/to/endnote/file /opt/endnote sudo chmod -r 755 /opt/endnote #Copy pdf to /opt/citation_file/<createfolder_with_endnote_file_name> and make sure everyone can read sudo cp -r /from/path/to/endnote/pdf/* /opt/citation_file/ sudo chmod -r 755 /opt/citation_file/ #Execute citation task as app user sudo su app cd /home/app/vecnet/current chruby 2.0.0-p353 RAILS_ENV=production bundle exec rake vecnet:import:endnote_conversion ENDNOTE_FILE=/opt/endnote/ ENDNOTE_PDF_PATH=/opt/citation_files:/opt/citation_files/

Initializing new production environment

  1. Do system setup as in SETUP file
  2. Get capistrano deploy working to new site
  3. on production machine:
  • setup ruby: chruby 2.0.0-p353
  • Setup mesh terms: RAILS_ENV=production bundle exec rake vecnet:import:mesh_subjects vecnet:import:eval_mesh_trees
  • Migrate user table: See below
  • Resolrize: RAILS_ENV=production bundle exec rake solrizer:fedora:solrize_objects
  • Migrate fedora objects: RAILS_ENV=production bundle exec rake vecnet:migrate:batch_to_collection
  1. Done!

NCBI Terminalogy

Work in progress. After running rake db:migrate the following task will download the NCBI taxonomy from the following location

ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

and ingest the terms into the database.

rake vecnet:import:ncbi_taxonomy

There are about 1,091,096 terms (November 2013).

Gather repository contents for statistics

OUTFILE=~/repo-stats-20130916.csv RAILS_ENV=production bundle exec rake vecnet:dump_statistics

Pubtkt Authentication

The site uses the pubtkt authentication scheme, which uses a signed cookie for every request. For development, a dummy login to create a pubtkt is provided (class DevelopmentSessions). But, first, a public/private key pair needs to be generated and installed.

rake pubtkt:generate_keys
mv pubtkt.pem config/pubtkt-development.pem
mv pubtkt-private.pem config/pubtkt-private-development.pem

And that should be enough for development. There are also utility rake tasks for creating and verifying tickets:

  1. To create a ticket on the comand line:

    $ P_KEY=pubtkt-private.pem P_UID=dbrower P_VALIDUNTIL=3456789012 P_TOKENS='dl_librarian,dl_write' rake pubtkt:create uid=dbrower;validuntil=3456789012;tokens=dl_librarian,dl_write;sig=MCwCFHiaErA+7lHoHxbSUIZaSnmTovIPAhRf4RxtrmArBMD8CBnZaUM/yWI+Cw==

The valid until date above has the date July 16, 2079 in the Unix epoch, so the ticket should not expire while you are using it. 2. To validate tickets from the command line:

$ P_KEY=pubtkt-private.pem P_TICKET='uid=dbrower;validuntil=3456789012;tokens=dl_librarian,dl_write;sig=MCwCFF1/aaSbtrxN9PLrZE1XvLH5SIWQAhRXN8AHevzPMFbMuIIlOwuCLTZDPw==' rake pubtkt:verify
Ticket text: uid=dbrower;validuntil=3456789012;tokens=dl_librarian,dl_write
Ticket sig : MCwCFF1/aaSbtrxN9PLrZE1XvLH5SIWQAhRXN8AHevzPMFbMuIIlOwuCLTZDPw==
Sig Valid? : true
Expired?   : true

dl-discovery's People

Contributors

danielbaird avatar dbrower avatar mejackreed avatar stevenvandervalk avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dl-discovery's Issues

A HREF for URL in record view is incorrect for displayed string - need to fix "URL"

The A HREF value for URL in record view is not the same as displayed text string (for simulations at least) but the A HREF value in the "More details section" is good.

Fix the first instance, determine if URL needs to be repeated as here?

<dt>URL</dt>
<dd><a href="/catalog/2658">https://ci-qa.vecnet.org/ts_emod/simulation/details/run/2658/</a></dd>

<!--TODO: Fish out the correct links from JSON object here || Check with Don -->
  <dt>More details at</dt>
  <dd itemprop="url"><a href="https://ci-qa.vecnet.org/ts_emod/simulation/details/run/2658/">https://ci-qa.vecnet.org/ts_emod/simulation/details/run/2658/</a></dd>

image

Search by map area errors with latest solr config

The following error happens when submitting a query containing bbox format earthworks/geoblacklight previously used.

RSolr::Error::Http - 400 Bad Request Error: { 'responseHeader'=>{ 'status'=>400, 'QTime'=>1, 'params'=>{ 'mm'=>'6<-1 6<90%', 'qs'=>'1', 'q.alt'=>'*:*', 'facet.field'=>['dc_type_s', 'dc_creator_sm', 'vn_keyword_sm', URI: http://127.0.0.1:8983/solr/select?wt=ruby&qt=search&facet.field=dc_type_s&facet.field=dc_creator_sm&facet.field=vn_keyword_sm&facet.field=dct_spatial_sm&facet.field=dc_publisher_s&facet.field=dc_format_s&fq=%3A%22Intersects%28112.8515625+-76.39331166244494+166.9921875+68.59248658252947%29%22&start=0&rows=10&q.alt=*%3A*&facet=true&f.dc_type_s.facet.limit=11&f.dc_creator_sm.facet.limit=7&f.vn_keyword_sm.facet.limit=11&f.dct_spatial_sm.facet.limit=11&f.dc_publisher_s.facet.limit=7&f.dc_format_s.facet.limit=4&sort=score+desc%2C+dc_title_sort+asc&bq=%3A%22IsWithin%28112.8515625+-76.39331166244494+166.9921875+68.59248658252947%29%22%5E10 Backtrace: /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/rsolr-1.0.12/lib/rsolr/client.rb:284:in `adapt_response' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/rsolr-1.0.12/lib/rsolr/client.rb:190:in `execute' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/rsolr-1.0.12/lib/rsolr/client.rb:176:in `send_and_receive' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/blacklight-5.12.1/lib/blacklight/solr_repository.rb:44:in `block in send_and_receive' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/activesupport-4.1.8/lib/active_support/benchmarkable.rb:41:in `block in benchmark' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/activesupport-4.1.8/lib/active_support/core_ext/benchmark.rb:12:in `block in ms' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/2.1.0/benchmark.rb:294:in `realtime' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/activesupport-4.1.8/lib/active_support/core_ext/benchmark.rb:12:in `ms' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/activesupport-4.1.8/lib/active_support/benchmarkable.rb:41:in `benchmark' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/blacklight-5.12.1/lib/blacklight/solr_repository.rb:42:in `send_and_receive' /Users/jc234325/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/blacklight-5.12.1/lib/blacklight/solr_repository.rb:28:in `search'

Rails.root: /Users/jc234325/Dropbox/eResearch/github/vecnet/dl-discovery

Application Trace | Framework Trace | Full Trace
Request

Parameters:

{"q"=>"",
 "showmap"=>"",
 "bbox"=>"112.8515625 -76.39331166244494 166.9921875 68.59248658252947",
 "_"=>"1429425474443"}

Add Advanced Search Option back to UI

Need to Add Advanced Search Option back to UI (this option searches full text of pdfs as well as the regular indexes).

The UI element could be just the checkbox maybe ? is there such an input element as a checkbox "inside" the search box at the far right w/a label?

image

Add access control enforcement

The access control fields are being put into solr. Need to do the following:

  • Filter solr search results to only display what user can see
  • Enforce the display of item records to only what the user can see

bounding boxes crossing the antimeridian are not rendered correctly

Bounding boxes which cross the anti-meridian (i.e. international date line) do not show up correctly. Solr requires east and west values to be in the range -180 to 180. So, these straddling boxes have an east coordinate "to the left" their west coordinate. I don't know if it is solr or leaflet which then decides this is an error, and swaps the east and west values, giving a bounding box covering the exact longitudes which the original box did not cover.

I don't know what is causing this error. The record with uuid hd76s009b has such a bounding box, for example.

Add extra tool links for logged in users

In the Digital Library logged in users see 'Upload a document' and library admins so other links. Need to implement these and place them in $location (perhaps TOOLS navbar dropdown).

Retain all Sort by Options

Can we retain all the sort by options in the discovery UI?

They are showing as:
relevance, publisher, title

can you restore/implement the date sort options and the ascending/descending options we had previously? We want to avoid losing any functionality
Can we still have:
publish date (asc or desc)
Title (asc or desc)
Date Uploaded(asc or desc)
Date modified (asc or desc)

image

Boost some search terms

  1. Include species name and keyword in the default search fields
  2. Make sure title, subject, author, species, location name, and keyword are boosted a lot
  3. add full text with the lowest boost possible.

Mixed content warning

On dl-dev.vecnet.org I get a mixed-content warning from my browser (both ff developer edition and chrome). This means some content was transmitted using un-encrypted http. I think that content is the map tiles.

Is it possible to fix this, and use a https connection to get them?

map wraps and drops

if an item has spatial metadata that crosses hemisphere weird stuff still happens .

Can we stop map from dropping? Can we stop map from patterning and repeating ?

Novaluron is an example of a chemical search that brings up records that causes this behavior

image

bounding boxes displaying correctly on front page

I think the bounding boxes are not displayed correctly on the front page because their coordinates are not being decoded out of the solr fields correctly. Moreover, each solr field concerning a bounding box puts the coordinates in a different order. Specifically,

"solr_bbox" is "#{west} #{south} #{east} #{north}"
"solr_geom" is "ENVELOPE(#{west}, #{east}, #{north}, #{south})"
"georss_box_s" is "#{south} #{west} #{north} #{east}"

Don't shoot me for this. It comes from solr.

Making this an issue so that it is trackable. I hope it helps with debugging.

don't use stanford css

Somewhere the su-identity.css file is being included on a page load. Is it needed anymore? Could all references to it be removed?

Feedback form should go to mailing list

We decided this on the call yesterday. I think we meant the MDOC list for the time being. Possible changing to another mailing list after the production deploy. Open question is whether the application can post to the google group.

Migration error when installing -- duplicate column name "email"

Looks like a bad migration?

== 20150131013853 AddUidToUser: migrating =====================================
-- change_table(:users)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

SQLite3::SQLException: duplicate column name: email: ALTER TABLE "users" ADD "email" varchar(255)/Users/pvrdwb/jcu/dl-discovery/vendor/gems/sqlite3-1.3.10/lib/sqlite3/database.rb:91:in `initialize'
/

app crashes in presence of pubtkt

A partial is referring to paths which don't exist in the application.

ActionView::Template::Error (undefined local variable or method `dashboard_index_path' for #<#<Class:0x007f370e60f170>:0x007f370f7169a0>):
    47:               <li><%= link_to "Transmission Simulator", "#{ci_domain}/ts/" %></li>
    48:               <% if current_user -%>
    49:                   <li class="divider"></li>
    50:                   <li><%= link_to "My Uploads", dashboard_index_path %></li>
    51:                   <% if current_user.admin? -%>
    52:                       <li><%= link_to "Admin Dashboard", admin_dashboard_index_path %></li>
    53:                   <% end -%>
  app/views/_user_util_links.html.erb:50:in `_app_views__user_util_links_html_erb__2521860871841868077_69937227652860'
  app/views/shared/_header_navbar.html.erb:16:in `_app_views_shared__header_navbar_html_erb__2582333996295555637_69937227607640'
  app/views/layouts/vndl.html.erb:30:in `_app_views_layouts_vndl_html_erb__1638806196066636405_69937219161320'

search by map area doesnt? limit results

Should search by map are limit results? it doesn't seem to ? Just seems to only display hits on that part of map rather than filter or inform the search & number of results?

For example if I search on term bendiocarb w/only americas displayed and click search by map I still get 127 results even though only 8 are from South america?

how should non-locational records be filtered (included in results set) & counted when searching by map?

image

Support the hierarchal facets

They should still be like what are currently in the DL. There are three such facets:

  • Subject facet (in the solr field dc_subject_h_facet)
  • Species facet (in dwc_scientificname_h_facet...i think)
  • Location facet (in dct_spatial_h_facet)

They consist of list of terms, using colons to separate the hierarchy. E.g. Kenya has the solr field:

[
"Africa",
"Africa:Republic of Kenya"
]

Potential improvements to meta data and search interface from user

After speaking with Tanya Russell a few items that might be wish list or make their way in as good features for Digital Library users and librarians.

  • Ability to filter or search within modals ( e.g. huge list of authors, with only pagination can't get to M )
  • Information about citations in the meta data ( either input or output )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.