programminghistorian / jekyll Goto Github PK

Jekyll-based static site for The Programming Historian

Home Page: http://programminghistorian.org

HTML 98.14% Python 0.01% Shell 0.01% CSS 0.09% JavaScript 0.06% Ruby 0.02% Jupyter Notebook 1.67% R 0.01%

programming-historian text-analysis api data-management data-manipulation data-mining pedagogy linked-open-data mapping network-analysis

jekyll's People

Stargazers

Watchers

Forkers

drjwbaker bmschmidt aurelberra brainwane mb21 jjon mduering ahegel l0rd todrobbins librarycarpentry alexanderkustov annie201 milaoiva sarthakbhatia jonathanreeve davanstrien hustlion uvicmakerlab wcaleb nolicc petercarrjones taryndewar clesleycode kimpham54 examplegithub rblades miriamposner shawngraham rusuionut bradwiebe walshbr anastasia1302 mebrett jburnford knoxdw ezmiller leighannskeen whatthedickens garrigus ph-espagnol drkthng qcmuu hpkumbhar jairomelo suryansh2020 zk90 pierrebaque peterneish jenalgit gcortes-prof felipebetancur mariajoafana vgayolrs venkatteja computinghh aiotea scottbot nabsiddiqui whanley banlangen greebie ctschroeder leihongyan kacinash fengchangfight j00p34 jrladd pingyangtiaer takefetter cassws cartagena-data-science blockchaincat devopsrzt kirstenbussiere acrymble m0nica ivanhercaz sinhp davidwhiting ajitb1948 rkclement jeffblackadar inxochitl timothyjellison leonardobareno mjlavin80 silviaegt javiermartinezarribas ansafaryanto colmexbdcv wsnuser vaaleee chriseducator mikekiwa kochal vjfernandez pixarbuff dbs charlyliaema

jekyll's Issues

broken links in mallet lesson

(See [Mac][], [Windows,][Mac][Linux][Mac] for installation instructions)

These aren't linked.

Please remove all instances of jekyll from any links to resources such as javascript or css files and from links to other parts of the website. Once this is done I will make the appropriate changes on our dns servers.
for example the links for css should look like "css/style.css and links for other pages should look like /about instead of /jekyll/css/style.css and /jekyll/about

send intro to bash to live site so author can proof.

sorry i don't know how to do this.

Update documentation to require large datasets to be put on Zenodo

"Programming Historian" in header links to page instead of index

Fix permalinks broken by conversion

Some of the lessons on the old site had nested URLs (especially the Zotero ones). If we are unable to fix these with redirects, so as to make sure inbound links still work, it is also possible to specify permalinks on a post-by-post basis in the lesson front matter.

add author name to lessons listing on github?

I can't find the lessons I'm working on, because we don't have the title or the authors listed. Just filenames. Is there a way we can change the display on https://github.com/programminghistorian/jekyll/tree/gh-pages/lessons to make it more editor-friendly?

Change name of "blog" to "news"

"Tech" spans in old lessons lost in conversion

See "Manipulating Strings in Python" for an example.

Also span class "reserved," which seems to be for inline code.

Fix tables

Find lessons that had tables and fix Markdown syntax.

Need to change 'group' to 'user' when using personal Zotero library in Zotero API lessons

In the first Zotero API lesson, the user is prompted to use either the sample Zotero library or their own personal library. If you are using your own Zotero library, then the code needs to be changed from 'group' to 'user', otherwise the code will not compute when it's run in a text editor. There also needs to be a reminder in the subsequent second and third Zotero API lessons so users are able to run the code with no problems.

Flood wikipedia with links to PH

Everyone knows if it's legitimate, it's on wikipedia. We're going to start adding some links to PH on wikipedia where relevant.

flesh out publishing / editing documentation

Caleb agreed to take a look at this. We need details on how to add reviewers, where to store data, and a few paragraphs that can be sent to authors for how they can make copy-edits.

Also, shouldn't these be in a /documentation directory so they're easy to find?

Find any problems in bulk pandoc processing workflow

@fredgibbs @williamjturkel

This afternoon I worked on a way to take all of the lessons on the old Programming Historian site and convert them to Jekyll-friendly markdown---using programmatic techniques learned from the Programming Historian!

I've committed my experiments on the master branch instead of on the gh-pages branch, so that we can figure out problems with this workflow before moving forward and pushing things to the live site.

Basically, my workflow went like this:

Download all the original HTML lessons from the Programming Historian site using wget. I have placed these files under version control and pushed them in f454f1c, so we can always roll back to those original files.
I pre-processed the HTML using Beautiful Soup and prep_for_pandoc.py. This basically enabled me to get metadata like title, author, date, and reviewers names and put them in meta tags that Pandoc would recognize, and to throw away parts of the HTML that we won't want in the Markdown version (like the navigation menus that will now be part of the Jekyll templates.)
I then ran a specially configured Pandoc command on all of the pre-processed HTML using process_with_pandoc.sh.

These are really easy scripts to run, so we can easily revert the conversion commit, make tweaks to the scripts, and try again.

Right now, the converted lessons are in the lessons folder on the master branch. (I think it may have overwritten the lessons you had already converted, but these are still preserved on the gh-pages branch.)

I think our best next step would be to do a semi-careful review to see if there are problems in the converted files that we may be able to fix with the bulk conversion process itself. Using Pandoc filters, we can adjust quite a bit at this stage, so I think we should use this issue to note anything that looks weird about the converted files. (EDIT: I've now done some pandoc filter work, and you can see the filter.)

Here are a few things I see that we'll need to attend to:

~~image tags will need to be adjusted for the new paths~~ (I've addressed this in 6faf2b7)
~~some lessons had attributes in header tags; here's an example~~ (fixed in f33f781)
~~code blocks will need to be modified to fit the jekyll syntax (this is a big issue that will take a Pandoc filter, I imagine, and some more pre-processing)~~ (see comment below)
~~what will we do with comments?~~ (see comment below)
internal links scattered throughout are broken (but this may be fixable with a Pandoc filter, too)
I followed Fred's practice of combining both technical and literary reviewers into one "reviewers" field in the YAML metadata block, but if we want to preserve a distinction between the two, that's possible; will need to do that at this stage and tweak the bulk processing script if so.
I have removed some things that looked to me like kruft (for example, the bottom navigation pager at the bottom of each lesson, which linked to "previous" and "next" lessons; these don't even currently show up on the original site, and "previous" and "next" were being determined by post number within Wordpress, rather than by the site's logical structure).

That's from a very cursory glance. What do you see? Should we bring in the other team members to take a look at this stage, or try to get things even more polished first?

Also not sure if we should open separate issues for each of the above items at this point, or just keep a running list here until we clarify what needs to be done.

Add Disqus comments to lessons and posts

Need to wait to do this until after custom domain has been set up.

Need to add a line about an author

In the lesson Sustainable Authorship in Plain Text using Pandoc and Markdown, section "About the authors", Grant Wythoff is mentioned, but Dennis Tenen is not. This doesn't seem right: more info needed, I guess!

Best wishes,

Aurélien

Figure out how Jekyll RSS works

Add link to commit history in footer of each page

send research-data-with-unix.md to live site for author proof

thanks.

Figure out where to put zipped code samples from old site

Recreate old blog on new static site

Move posts out of lessons, and fix syntax.

Having a problem opening URLs in personal Zotero library - Zotero API lesson 3

In the third Zotero lesson I have repeatedly run into problems attempting to run the following set of code in my text editor:

I am using my own personal Zotero library, and I made sure the first two entries in my library had URLs attached to them. Even still the text editor identified a problem with the following line:

With the sample Zotero library, the first two entries have URLs that are simple HTML and thus it is not a problem to open them. It seems as though it cannot run the code with more complicated websites.

Check individual Markdown lessons for errors

In the wake of our bulk conversion to Jekyll, we need to do a close editorial check of each of the Markdown lessons listed below, which are kept in the lessons directory and can be seen live at http://programminghistorian.github.io/jekyll/lessons.

The best way to spot check is probably to look at the live GitHub Pages version of a page alongside its older, ProgrammingHistorian.org version. Be aware that you may see some differences in spacing and color (for example, in the code blocks, which also lack line numbers in the new site); these can be adjusted later with CSS, so they don't require changes to the Markdown files at this stage.

Each lesson check needs to include, at a minimum:

Clicking on every link to make sure it works
Fixing any incorrect Markdown syntax, using Markdown Style Guide as a reference
Identifying any broken images and finding copies of the original images (please add a comment here listing the urls of the images that need to be added, unless you already know how to "push" the images directly to the images folder)
Turning any images that were clickable and had captions on the old site into "figures," using the instructions on the Markdown Style Guide
General proofreading of any typos
If there is nothing showing up in the "About the authors" box at the bottom of your lesson, then you need to add a bio for each author in the _config.yml file, as explained in Adding New Lessons.
Restore any missing italics (using the *emphasis* Markdown syntax) or missing monospace font changes (using the "back ticks" inline code Markdown syntax); particularly in some of the early lessons, there were "span" tags for these things that were lost in the conversion. If you notice lots of missing italics / code styling in your lesson, you can also look on the spans branch for deprecated versions of the lessons that do have most of this inline span styling preserved.

Our wiki has instructions on Editing Lessons on GitHub.

If you see a systemic problem, or an issue in a lesson that you're not sure how to fix, see the instructions on Reporting Issues so that others can help out.

To "claim" a lesson for proofreading, please leave a comment on this issue listing the ones that you will look at, so we don't duplicate efforts. I'll try to update this list to reflect the lessons claimed in the comments.

Lesson List

When you've finished checking a lesson, the only thing left to "check" is the box!

Bring project team directory up to date

move Carrie and Jeremy to emeritus section
add Caleb
add bios, contact links, and avatars to author directory in config

Front page should be the lessons directory

We need to see whether we have lots of pages linking to lessons/. If not, we can just replace the index for the whole site with the lessons directory.

Things lost in bulk conversion from old site

The bulk conversion methods that I used (described in #4) were able to carry over most of the necessary structure and metadata from the old website. But some things that were in the original HTML pages did not survive the conversion process.

I'll divide these things into two categories:

Things Removed from Markdown Versions that Could be Automatically Restored with a new Bulk Conversion

comments (see discussion on #4; these can also be added back later with Disqus)
navigation links to "next" and "previous" lessons
distinction between literary and technical reviewers
nested url structures; all lessons are now under the lessons directory, whereas on the old site some (like the Zotero lessons) were under a subdirectory

Things Lost from HTML Versions that Can't Be Automatically Restored in Bulk

tables (we'll have to make Markdown versions manually)
images that were also hyperlinks, as well as figure environments and figcaptions (as documented on the repo wiki, we will need to add these back, manually, using Liquid template variables)
any kind of styling that depended on span tags or style attributes within other tags (the filename and username span classes were converted to Markdown inline code, but any attributes of those spans were lost; for example, the regular expressions lesson had some red and green highlighting spans that would have to be manually restored)
images that were hosted externally; these will currently show up as broken images in our markdown lessons; we'll need to locate the images and save local versions into our image directory

Things that are Style-Related that Can Be Added with CSS/Javascript

line numbers in code blocks

Please report other things that appear to have gone away in the bulk conversion, so I can categorize them on the above lists. If things come up that can be carried over by tweaking our bulk conversion workflow, we should do that before making tweaks to individual markdown files that would be overwritten by the bulk conversion process.

Minor mistake in "Counting and mining research data with Unix" lesson?

Hi,

Where you show the ci flag you say:

grep -ci revolution 2014-01-31_JA_america.tsv 2014-02-02_JA_britain.tsv. This repeats the query, but prints a case insensitive count (including instances of both revolution and Revolution). Note how the count has increased nearly 30 fold for those journal article titles that contain the keyword ‘america’.

Shouldn't that be "keyword revolution"?

Move bulk conversion stuff out of repo

modified_html
original_html
Python and shell scripts

pandoc citation syntax issue in "Sustainable Authorship ..."

In http://programminghistorian.org/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown you suggest using the following

Some sentence that needs citation.^[@fyfe_digital_2011 argues that too.]

This is not recommended since it keeps you from switching easily between footnote and author-date styles. Better use the following (no circumflex, no final period inside the square braces, and the final punctuation of the text sentence after the square braces; with footnote styles, pandoc automatically adjusts the position of the final punctuation):

Some sentence that needs citation [@fyfe_digital_2011 argues that too].

Track changes link doesn't work on blog posts

The URL for a blog post doesn't match the repo URL, so it can't find the GitHub link.

Create link on each lesson to related Issues

Would involve making a custom label for each lesson, and then making a link to those.

Categorize each lesson with tags

Per Skype call on August 15:

@fredgibbs is going to reorganize the lessons directory under non-alphabetic categories (like "Getting Started," "Tools," etc.)
@wcaleb is going to tag each lesson with categories; the categories will include, at a minimum, the topics under which the lessons are currently listed on the category page, so that the Lessons Directory can be automatically generated.

Look into getting DOI for Programming Historian releases

https://guides.github.com/activities/citable-code/

Restore "about author" divs in footers

These were thrown out in bulk conversion. Perhaps set up includes so that author bios can be dropped in on multiple lessons?

Google Analytics code to be included on all lessons

Our id number is:

UA-2752866-8

Images to represent lessons on table of contents page

Allison did a really great mockup of what we could do to represent our lessons visually:

https://dl.dropboxusercontent.com/u/58260988/PHDesign.png

We discussed this and really like the idea, but we wanted to ensure the Author names were kept, as were the blurbs about each section. We also thought some good historic images / lithographs would make a better set of images. Allison is going to take this on for next meeting.

Add button to select all text in a code block

There is currently a javascript in our repo that, if enabled, will place a small button beside each code block. When the button is clicked, all the text in the code block will be selected so that the user can easily copy it to his/her clipboard.

Currently, the button is floated to the right of the code block outside the main container, sort of like the sidebar comment icons on a site like Medium. The button is grayed out, as in the first example pictured, until the mouse hovers over the code block or clicks on the button, as in the second example pictured.

Before enabling this across the site, a couple of issues need to be decided:

should we use the current icon, or pick a different one (preferably from Font Awesome for the sake of design consistency?
should the button be placed elsewhere, i.e. above or below the code block? (placing it within the code block doesn't work well because text will run over it)?
should we just hide the button entirely until someone hovers over the code block, at which point it is revealed?

Markdown reference links or inline links?

@fredgibbs, as mentioned in #2, I'm thinking of using Pandoc to convert all the existing lessons to Markdown as a first stab at transition to the new site.

Wondering if we might want to use Markdown "reference links" (which put all URLs at the bottom of the file) instead of inline links, as it would make the Markdown more readable on GitHub?

Update IA lesson to reflect internetarchive module changes

Students of Ed Baptist's pointed out that the internetarchive module has changed in ways that break the code in my lesson. I need to remember to go in and fix these to conform to the new API.

Add avatars from Project Team into images directory

How should lessons directory be organized?

alphabetically?
by category?
by level of difficulty? (perhaps with an icon to indicate level?)
by "workflow path"? (that is, do this lesson, and then do this lesson that builds on it, etc.)

reviewer instructions

do we have a standard set of instructions we send to reviewers in terms of what we want them to do, what we expect, what constitutes a solid review? is it publicly visible?

How should we highlight code blocks?

@fredgibbs I'm preparing to use Pandoc to do a massive bulk conversion of all existing lessons into Markdown. Before doing so, I've been doing some thinking about how code blocks should be handled in the new site.

In the lessons you've already converted, you used Jekyll's preferred highlighting feature. But the problem with this method is that when you view the markdown files in Github, the Github browser itself doesn't recognize those highlighting codes, and instead interprets things that should be commented lines in a code block as level-1 headers.

I'm wondering if it might be better (and would make us less dependent on Jekyll in the long run) to instead use raw HTML tags like pre for code blocks, and then use CSS styling to handle the language-specific highlighting. That way they would look readable either on the site or on Github.

Thoughts on this issue? Here are a couple of (oldish) sites that outline some of the different possibilities:

Create contribute page

This will combine the current Submissions, Contact, and Get Involved page. We will replace "submissions" in the nab bar with "contribute."

Update submissions page to detail Markdown requirement

@fredgibbs said he would do this.

Write a script to produce site as e-book and PDF

PH to scholarship @ western

Bill suggested we put the PH in the Scholarship @ Western portal (at his university), as that seems to have gotten stuff into world cat.

directories for lessons?

i wonder if it wouldn't be easier especially in the long term for moving lessons through the pipeline and maintaining them if we--instead of putting all lessons as files in the lessons directory, and putting images in the image directory, etc.--put all assets associated with a single lesson (and the lesson .md itself) in a single directory with the same name as the lesson under the main lessons directory. that way all assets (images, files, maybe screencasts or videos in the future) for each lesson stays with the lesson. this shouldn't change the main lesson URLs, though we'd need some scripting to change the image links in the existing lesson files. i think there are distinct advantages for editing and making new lessons with this approach. are there procedural reasons not to do this (other than the work involved to make the switch)?

Link to Reg Ex lesson from Applied Archival Downloading

The section on downloading Bligh's diary could probably revised with a link to the Regular Expressions lesson so that readers can capture pages that contain letters as well as numbers in the script.

Change location of drafts

I'd like to propose that we handle draft lessons differently than we have been.

Right now, all drafts are being placed in a drafts folder, which is then excluded from the site build by a line in our config file.

My proposal is that we instead keep all lessons, draft or published, in the lessons folder. We will keep drafts from going "live" by putting a published: false line in the YAML metadata block until it's time to publish the lesson.

The advantages of this are as follows:

We can make sure all relative links and image URLs stay consistently the same from the draft phase all the way to publication, eliminating some possibilities for error on this.
An editor can move a draft from "draft" to published without having to do any git mv commands in the terminal; publication becomes a simple matter of removing the published: false line, and this can be done from within the web browser.

The one disadvantage I can foresee is that it will be more difficult to distinguish published from unpublished lessons, but perhaps we can keep a running list of these in the Wiki. You can also tell from the commit history or the dates when lessons were edited which ones are in production.

Thoughts?