Coder Social home page Coder Social logo

programminghistorian / jekyll Goto Github PK

View Code? Open in Web Editor NEW
505.0 40.0 227.0 815.7 MB

Jekyll-based static site for The Programming Historian

Home Page: http://programminghistorian.org

HTML 98.14% Python 0.01% Shell 0.01% CSS 0.09% JavaScript 0.06% Ruby 0.02% Jupyter Notebook 1.67% R 0.01%
programming-historian text-analysis api data-management data-manipulation data-mining pedagogy linked-open-data mapping network-analysis

jekyll's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jekyll's Issues

css and link to pages broken

Please remove all instances of jekyll from any links to resources such as javascript or css files and from links to other parts of the website. Once this is done I will make the appropriate changes on our dns servers.
for example the links for css should look like "css/style.css and links for other pages should look like /about instead of /jekyll/css/style.css and /jekyll/about

Fix tables

Find lessons that had tables and fix Markdown syntax.

Need to change 'group' to 'user' when using personal Zotero library in Zotero API lessons

In the first Zotero API lesson, the user is prompted to use either the sample Zotero library or their own personal library. If you are using your own Zotero library, then the code needs to be changed from 'group' to 'user', otherwise the code will not compute when it's run in a text editor. There also needs to be a reminder in the subsequent second and third Zotero API lessons so users are able to run the code with no problems.
screen shot 2014-09-17 at 10 09 32 am

Flood wikipedia with links to PH

Everyone knows if it's legitimate, it's on wikipedia. We're going to start adding some links to PH on wikipedia where relevant.

flesh out publishing / editing documentation

Caleb agreed to take a look at this. We need details on how to add reviewers, where to store data, and a few paragraphs that can be sent to authors for how they can make copy-edits.

Also, shouldn't these be in a /documentation directory so they're easy to find?

Find any problems in bulk pandoc processing workflow

@fredgibbs @williamjturkel

This afternoon I worked on a way to take all of the lessons on the old Programming Historian site and convert them to Jekyll-friendly markdown---using programmatic techniques learned from the Programming Historian!

I've committed my experiments on the master branch instead of on the gh-pages branch, so that we can figure out problems with this workflow before moving forward and pushing things to the live site.

Basically, my workflow went like this:

  1. Download all the original HTML lessons from the Programming Historian site using wget. I have placed these files under version control and pushed them in f454f1c, so we can always roll back to those original files.
  2. I pre-processed the HTML using Beautiful Soup and prep_for_pandoc.py. This basically enabled me to get metadata like title, author, date, and reviewers names and put them in meta tags that Pandoc would recognize, and to throw away parts of the HTML that we won't want in the Markdown version (like the navigation menus that will now be part of the Jekyll templates.)
  3. I then ran a specially configured Pandoc command on all of the pre-processed HTML using process_with_pandoc.sh.

These are really easy scripts to run, so we can easily revert the conversion commit, make tweaks to the scripts, and try again.

Right now, the converted lessons are in the lessons folder on the master branch. (I think it may have overwritten the lessons you had already converted, but these are still preserved on the gh-pages branch.)

I think our best next step would be to do a semi-careful review to see if there are problems in the converted files that we may be able to fix with the bulk conversion process itself. Using Pandoc filters, we can adjust quite a bit at this stage, so I think we should use this issue to note anything that looks weird about the converted files. (EDIT: I've now done some pandoc filter work, and you can see the filter.)

Here are a few things I see that we'll need to attend to:

  • image tags will need to be adjusted for the new paths (I've addressed this in 6faf2b7)
  • some lessons had attributes in header tags; here's an example (fixed in f33f781)
  • code blocks will need to be modified to fit the jekyll syntax (this is a big issue that will take a Pandoc filter, I imagine, and some more pre-processing) (see comment below)
  • what will we do with comments? (see comment below)
  • internal links scattered throughout are broken (but this may be fixable with a Pandoc filter, too)
  • I followed Fred's practice of combining both technical and literary reviewers into one "reviewers" field in the YAML metadata block, but if we want to preserve a distinction between the two, that's possible; will need to do that at this stage and tweak the bulk processing script if so.
  • I have removed some things that looked to me like kruft (for example, the bottom navigation pager at the bottom of each lesson, which linked to "previous" and "next" lessons; these don't even currently show up on the original site, and "previous" and "next" were being determined by post number within Wordpress, rather than by the site's logical structure).

That's from a very cursory glance. What do you see? Should we bring in the other team members to take a look at this stage, or try to get things even more polished first?

Also not sure if we should open separate issues for each of the above items at this point, or just keep a running list here until we clarify what needs to be done.

Having a problem opening URLs in personal Zotero library - Zotero API lesson 3

In the third Zotero lesson I have repeatedly run into problems attempting to run the following set of code in my text editor:
screen shot 2014-09-17 at 10 24 32 am
I am using my own personal Zotero library, and I made sure the first two entries in my library had URLs attached to them. Even still the text editor identified a problem with the following line:
screen shot 2014-09-17 at 10 24 44 am
With the sample Zotero library, the first two entries have URLs that are simple HTML and thus it is not a problem to open them. It seems as though it cannot run the code with more complicated websites.

Check individual Markdown lessons for errors

In the wake of our bulk conversion to Jekyll, we need to do a close editorial check of each of the Markdown lessons listed below, which are kept in the lessons directory and can be seen live at http://programminghistorian.github.io/jekyll/lessons.

The best way to spot check is probably to look at the live GitHub Pages version of a page alongside its older, ProgrammingHistorian.org version. Be aware that you may see some differences in spacing and color (for example, in the code blocks, which also lack line numbers in the new site); these can be adjusted later with CSS, so they don't require changes to the Markdown files at this stage.

Each lesson check needs to include, at a minimum:

  • Clicking on every link to make sure it works
  • Fixing any incorrect Markdown syntax, using Markdown Style Guide as a reference
  • Identifying any broken images and finding copies of the original images (please add a comment here listing the urls of the images that need to be added, unless you already know how to "push" the images directly to the images folder)
  • Turning any images that were clickable and had captions on the old site into "figures," using the instructions on the Markdown Style Guide
  • General proofreading of any typos
  • If there is nothing showing up in the "About the authors" box at the bottom of your lesson, then you need to add a bio for each author in the _config.yml file, as explained in Adding New Lessons.
  • Restore any missing italics (using the *emphasis* Markdown syntax) or missing monospace font changes (using the "back ticks" inline code Markdown syntax); particularly in some of the early lessons, there were "span" tags for these things that were lost in the conversion. If you notice lots of missing italics / code styling in your lesson, you can also look on the spans branch for deprecated versions of the lessons that do have most of this inline span styling preserved.

Our wiki has instructions on Editing Lessons on GitHub.

If you see a systemic problem, or an issue in a lesson that you're not sure how to fix, see the instructions on Reporting Issues so that others can help out.

To "claim" a lesson for proofreading, please leave a comment on this issue listing the ones that you will look at, so we don't duplicate efforts. I'll try to update this list to reflect the lessons claimed in the comments.

Lesson List

When you've finished checking a lesson, the only thing left to "check" is the box!

  • applied-archival-downloading-with-wget.md (Caleb)
  • automated-downloading-with-wget.md (Caleb)
  • cleaning-data-with-openrefine.md (Adam) - missing datafile phm-collection.tsv
  • cleaning-ocrd-text-with-regular-expressions.md (Caleb)
  • code-reuse-and-modularity.md (Bill)
  • counting-frequencies-from-zotero-items.md (Bill; zip file fixed)
  • counting-frequencies.md (Bill; zip file fixed)
  • creating-an-omeka-exhibit.md
  • creating-and-viewing-html-files-with-python.md (Bill)
  • creating-new-items-in-zotero.md
  • data-mining-the-internet-archive.md (Caleb)
  • downloading-multiple-records-using-query-strings.md (Adam)
  • from-html-to-list-of-words-1.md (Allison; zip file fixed)
  • from-html-to-list-of-words-2.md (Allison; zip file fixed)
  • georeferencing-qgis.md (Caleb)
  • googlemaps-googleearth.md (Caleb)
  • lessons/index.md (fred)
  • installing-python-modules-pip.md (fred)
  • intro-to-beautiful-soup.md (fred)
  • intro-to-the-zotero-api.md (Bill)
  • introduction-and-installation.md (Adam; now in table of contents)
  • keywords-in-context-using-n-grams.md (Bill)
  • linux-installation.md (fred)
  • mac-installation.md (fred)
  • manipulating-strings-in-python.md (Bill; zip file fixed)
  • normalizing-data.md
  • output-data-as-html-file.md (Bill)
  • output-keywords-in-context-in-html-file.md (Bill)
  • preserving-your-research-data.md
  • qgis-layers.md (Caleb)
  • sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md (Caleb)
  • topic-modeling-and-mallet.md (Caleb)
  • transliterating.md (Allison)
  • understanding-regular-expressions.md (Miriam)
  • up-and-running-with-omeka.md (Miriam)
  • vector-layers-qgis.md (Caleb)
  • viewing-html-files.md (Bill)
  • windows-installation.md (fred)
  • working-with-text-files.md (Bill)
  • working-with-web-pages.md (Bill)

Things lost in bulk conversion from old site

The bulk conversion methods that I used (described in #4) were able to carry over most of the necessary structure and metadata from the old website. But some things that were in the original HTML pages did not survive the conversion process.

I'll divide these things into two categories:

Things Removed from Markdown Versions that Could be Automatically Restored with a new Bulk Conversion

  • comments (see discussion on #4; these can also be added back later with Disqus)
  • navigation links to "next" and "previous" lessons
  • distinction between literary and technical reviewers
  • nested url structures; all lessons are now under the lessons directory, whereas on the old site some (like the Zotero lessons) were under a subdirectory

Things Lost from HTML Versions that Can't Be Automatically Restored in Bulk

  • tables (we'll have to make Markdown versions manually)
  • images that were also hyperlinks, as well as figure environments and figcaptions (as documented on the repo wiki, we will need to add these back, manually, using Liquid template variables)
  • any kind of styling that depended on span tags or style attributes within other tags (the filename and username span classes were converted to Markdown inline code, but any attributes of those spans were lost; for example, the regular expressions lesson had some red and green highlighting spans that would have to be manually restored)
  • images that were hosted externally; these will currently show up as broken images in our markdown lessons; we'll need to locate the images and save local versions into our image directory

Things that are Style-Related that Can Be Added with CSS/Javascript

  • line numbers in code blocks

Please report other things that appear to have gone away in the bulk conversion, so I can categorize them on the above lists. If things come up that can be carried over by tweaking our bulk conversion workflow, we should do that before making tweaks to individual markdown files that would be overwritten by the bulk conversion process.

Minor mistake in "Counting and mining research data with Unix" lesson?

Hi,

Where you show the ci flag you say:

grep -ci revolution 2014-01-31_JA_america.tsv 2014-02-02_JA_britain.tsv. This repeats the query, but prints a case insensitive count (including instances of both revolution and Revolution). Note how the count has increased nearly 30 fold for those journal article titles that contain the keyword ‘america’.

Shouldn't that be "keyword revolution"?

pandoc citation syntax issue in "Sustainable Authorship ..."

In http://programminghistorian.org/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown you suggest using the following

Some sentence that needs citation.^[@fyfe_digital_2011 argues that too.]

This is not recommended since it keeps you from switching easily between footnote and author-date styles. Better use the following (no circumflex, no final period inside the square braces, and the final punctuation of the text sentence after the square braces; with footnote styles, pandoc automatically adjusts the position of the final punctuation):

Some sentence that needs citation [@fyfe_digital_2011 argues that too].

Categorize each lesson with tags

Per Skype call on August 15:

  • @fredgibbs is going to reorganize the lessons directory under non-alphabetic categories (like "Getting Started," "Tools," etc.)
  • @wcaleb is going to tag each lesson with categories; the categories will include, at a minimum, the topics under which the lessons are currently listed on the category page, so that the Lessons Directory can be automatically generated.

Add button to select all text in a code block

There is currently a javascript in our repo that, if enabled, will place a small button beside each code block. When the button is clicked, all the text in the code block will be selected so that the user can easily copy it to his/her clipboard.

Currently, the button is floated to the right of the code block outside the main container, sort of like the sidebar comment icons on a site like Medium. The button is grayed out, as in the first example pictured, until the mouse hovers over the code block or clicks on the button, as in the second example pictured.

select_button

Before enabling this across the site, a couple of issues need to be decided:

  • should we use the current icon, or pick a different one (preferably from Font Awesome for the sake of design consistency?
  • should the button be placed elsewhere, i.e. above or below the code block? (placing it within the code block doesn't work well because text will run over it)?
  • should we just hide the button entirely until someone hovers over the code block, at which point it is revealed?

Markdown reference links or inline links?

@fredgibbs, as mentioned in #2, I'm thinking of using Pandoc to convert all the existing lessons to Markdown as a first stab at transition to the new site.

Wondering if we might want to use Markdown "reference links" (which put all URLs at the bottom of the file) instead of inline links, as it would make the Markdown more readable on GitHub?

How should lessons directory be organized?

  • alphabetically?
  • by category?
  • by level of difficulty? (perhaps with an icon to indicate level?)
  • by "workflow path"? (that is, do this lesson, and then do this lesson that builds on it, etc.)

reviewer instructions

do we have a standard set of instructions we send to reviewers in terms of what we want them to do, what we expect, what constitutes a solid review? is it publicly visible?

How should we highlight code blocks?

@fredgibbs I'm preparing to use Pandoc to do a massive bulk conversion of all existing lessons into Markdown. Before doing so, I've been doing some thinking about how code blocks should be handled in the new site.

In the lessons you've already converted, you used Jekyll's preferred highlighting feature. But the problem with this method is that when you view the markdown files in Github, the Github browser itself doesn't recognize those highlighting codes, and instead interprets things that should be commented lines in a code block as level-1 headers.

I'm wondering if it might be better (and would make us less dependent on Jekyll in the long run) to instead use raw HTML tags like pre for code blocks, and then use CSS styling to handle the language-specific highlighting. That way they would look readable either on the site or on Github.

Thoughts on this issue? Here are a couple of (oldish) sites that outline some of the different possibilities:

Create contribute page

This will combine the current Submissions, Contact, and Get Involved page. We will replace "submissions" in the nab bar with "contribute."

PH to scholarship @ western

Bill suggested we put the PH in the Scholarship @ Western portal (at his university), as that seems to have gotten stuff into world cat.

directories for lessons?

i wonder if it wouldn't be easier especially in the long term for moving lessons through the pipeline and maintaining them if we--instead of putting all lessons as files in the lessons directory, and putting images in the image directory, etc.--put all assets associated with a single lesson (and the lesson .md itself) in a single directory with the same name as the lesson under the main lessons directory. that way all assets (images, files, maybe screencasts or videos in the future) for each lesson stays with the lesson. this shouldn't change the main lesson URLs, though we'd need some scripting to change the image links in the existing lesson files. i think there are distinct advantages for editing and making new lessons with this approach. are there procedural reasons not to do this (other than the work involved to make the switch)?

Change location of drafts

I'd like to propose that we handle draft lessons differently than we have been.

Right now, all drafts are being placed in a drafts folder, which is then excluded from the site build by a line in our config file.

My proposal is that we instead keep all lessons, draft or published, in the lessons folder. We will keep drafts from going "live" by putting a published: false line in the YAML metadata block until it's time to publish the lesson.

The advantages of this are as follows:

  • We can make sure all relative links and image URLs stay consistently the same from the draft phase all the way to publication, eliminating some possibilities for error on this.
  • An editor can move a draft from "draft" to published without having to do any git mv commands in the terminal; publication becomes a simple matter of removing the published: false line, and this can be done from within the web browser.

The one disadvantage I can foresee is that it will be more difficult to distinguish published from unpublished lessons, but perhaps we can keep a running list of these in the Wiki. You can also tell from the commit history or the dates when lessons were edited which ones are in production.

Thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.