Coder Social home page Coder Social logo

gityll-blog's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

gityll-blog's Issues

Gityll Basics

Gityll is a customizable site generator using Github issues as a CMS and Express as a backend web server.

Setup Guide

To run Gityll I suggest using a VPS from DigitalOcean or AWS EC2.

  1. git clone https://github.com/aranscope/gityll
  2. cd gityll/templates/
  3. customize the contents template and post template to your hearts desire
  4. cd ..
  5. node gityll.js [port] [git repo url]

Running Gityll will throw dependency errors, in the meantime before this is properly packages, you can manually run npm install [dependency] or npm install [dependency] -g to install these dependencies locally or globally.

Tags

Posts

These tags can be added anywhere in the template.html file.

title - the title of the post
body - the html content of the post
author - the author's name (assignee of the issue)
author_url - the author's github url
author_icon_url - the authors github profile icon
time - the time the post was last modified
tags - the tags for the post

Contents

These tags can be added anywhere in the contents.html file.

body - links to all of the posts, dependent on theme

NLP Crash Course

NLP is a hot topic right now, but it has a barrier to entry that only the number of .ac.uk extensions for any google search can truly describe. Let's try and demystify NLP and introduce the very basics.

NLP stands for 'Natural Language Processing'. Natural language, meaning it is concerned with languages that have evolved naturally and processing, meaning trying to uncover data that is not immediately visible.

Example of this data could be:

  • keywords
  • related concepts
  • sentiment

Some example use cases include:

  • grammar checking
  • spell correction
  • translation

Let's talk about a typical NLP pipeline (sequence of events)

tokenisation

First, we need to break apart our text, this is called tokenisation, a typical process would be to split text into words.

lemmatisation / stemming

Next is lemmatisation this is where we convert our tokens into their base form (lemma) by removing inflectional endings. e.g. Running -> Run.

part of speech tagging

Now we assign a tag to each token in the sentence, examples include: verbs, nouns, pronouns etc. It can be useful to have more 'fine-grained' tags such as 'noun-plural' also.

chunking

Chunk size can be roughly described as the level of detail of a phrase. Chunking, therefore, is the process of either increasing (chunking down) or decreasing (chunking up) this level of detail.

named entity recognition

NER is the process of classifying tokens and phrases in a sentence, it is best illustrated by an example:

<ENAMEX TYPE="PERSON">Jim</ENAMEX> bought <NUMEX TYPE="QUANTITY">300</NUMEX> shares of <ENAMEX TYPE="ORGANIZATION">Acme Corp.</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>.

parsing

This is the process of grammatical analysis, we know that sentences are composed of tokens, these tokens have grammatical tags (from POS tagging). By recursively grouping these tags we can form a parse tree, this tells us the grammatical structure of the sentence.

parse tree

information extraction

This is the overarching process of finding structure in unstructured text. NER is one such sub-task, other sub-tasks include:

  • relationship extraction - finding relations between entities
  • keyword extraction - finding the most relevant tokens

deep semantics

Semantic analysis is the process of encoding meaning from text (or speech). Deep semantics have evolved from new advances in deep learning, which can be used to derive a more meaningful and relevant semantic analysis.

That's it (I did say it would be basic). In my next post, I'll be writing a basic lemmatiser and frequency analyser using Python and NLTK (natural language toolkit), stay tuned!

notifi.click and hackathon nightmares

This last weekend I was lucky enough to travel to Barcelona for HackUPC.

Myself and Bede bought tickets the day before we flew out, we had it in our heads that an Amazon Dash hack would be simple and fun! Well, Prime Now, a short flight and several metro stops later we're at UPC.

The event kicks off, we end up doing some domain roulette; as per, and find ourselves buying rightswiperecipe.club. The premise was to swipe through ingredients, until you matched with a recipe. Simple right? Well, replicating the Tinder UI was beyond my limited frontend skillset so ~5 hours in we bailed and started work on a new hack...

The idea? What if we could make a no-cost notification system that could be used by anyone, for anything, anywhere in the world? notify.click was our solution. It turns out you can (fairly simply) intercept the requests these Dash buttons send.

Bonus tip: don't fully finish the Dash setup to avoid having enough Kleenex boxes to build a 1:1 scale cardboard Eiffel tower.

Thus begins what I'll call 'The Dark Times'. We quickly realise that you need to be on the same subnet as the buttons in order to intercept their requests, well, we're connected to eduroam. Of course, the university has a lot of access points and the odds of us connecting to the same one are pretty low. A few disgusting ideas later; the worst of which involving tethering a phone to a laptop and then connecting the buttons to the laptop, we stumble across a phone connected by ethernet. I dig through the cable monster that's developed in my bag and uncover a pocket router -- we're off!

The rest of the hack was relatively plain sailing. We ran into a few issues with intermittent requests and a lot more setting up our AWS nanoservices architecture (see Bede's blog for more details). It's over, we managed to scrape the 'best use of AWS' prize and you can expect to see some tech specific writeups v. soon!

Check out our (questionable quality) source over at aranscope/notifi and thanks again to the excellent team at HackUPC.

Cisco University Challenge

The Cisco University Challenge is an amalgamation of a hackathon and a businessathon(?). Take Computer Science and Business students, place them in a room with UK business leaders and a worrying amount of caffeing, see what happens.

Starting the competition, we were presented with a series of four briefs, our team choosing Barclays' brief of "Controlling access to a users private data in the internet of things".

Borrowing from the Bitcoin architecture we formed 'PermissionCloud', a user centric IOT permissions management system. The premise is based around the following key components:

Device Administrators

These are the manufacturers and developers of the IOT device in question, for example: Nest and their 'smart' thermostat system.

Data Providers

This is a source of a users personal data, for example: Android or IOS could provide a users GPS location.

Permission Cloud

This is a de-centralised server that stores tokens for specific endpoints provided by the device administrators and data providers.

BlockChain

This is a distributed backup of the state of the Permission Cloud at specific times. This means if anyone was to somehow gain unauthorised access to the Permission Cloud and change any users settings, this could be detected due to a discontinuity between the permissions and those stored in the BlockChain.

User

This is the person actually using the system, be it through a mobile application, web app or any other implementation. Users have the ability to create new rules, think IFTTT, and at any time can invalidate the tokens used by these rules, hence revoking access to their data.

We are essentially making all IOT devices slaves to their respective owners and device administrators. Far removed from the idea of a smart fridge botnet, devices using our system could not actually access any data directly. This would actually be acheived by the device administrators requesting permission from the user, the user granting permission and authenticating with the device administrator and data provider, OAuth tokens from both providers being stored in a 'rule entry' in the Permission Cloud and then the device administrator being able to request data directly from the data provider using their OAuth token.

In summary, IOT devices need not apply (for a users data) and are in the hands of the user rather than the other way around.

Travlr, Scope and Hack The North

It's currently 12:43pm and we've just finished up with our project and presentations at Hack The North. Needless to say, it was... -- Fast forward a couple of weeks and some much required sleep later.

-- incredible! After my longest stretch of time without sleeping I'm going to share some lessons I learned from 36 hours of ups and downs. The first and in my view, most important, is on scope: It starts with a little story -

It's 9pm Canadian, we've just arrived after travelling for ~11 hours and stopping over in Toronto. Needless to say, we're very out of it, anyway, we grab some food (mmmm poutine <3) and head straight to the opening ceremony. We spot someone ahead of us gazing in our direction, he walks back and we start chatting. It turns out he had heard our British accents and was curious, well, we ended up working with Chase (he has a name!). The hack started at midnight but we had to sleep so abandoned the first few hours of the hack.

The next day we wandered around the 'job fair' type area, we ended up chatting to pretty much everyone there and decided that the Google cloud tech looked really cool and we'd love to use it. We set to it and had our core architecture designed and working within an hour. That was the end of plain sailing -- as it happens, microservices are a tough sell at a hackathon, each one had its own teething problems required lots of debugging.

In the end, we had an almost finished MVP for an app called 'Travlr', a platform for travellers to get suggestions for day trips, diversions, meet ups and group discounts. All powered by the Google cloud platform, Google maps and Google places with machine learning used to learn the users travelling habits. All of the microservices were built, buy we tragically ran out of time to link these all into our frontend.

Everyone we presented Travlr to -- The HTN judges, Deloitte and a few others, loved the idea. We had a solid presentation from a tech standpoint but fell down in the demos. We now realise that had the idea been minimized to remove the machine learning components and scoped to be reasonably finished within ~16 hours, we could have finished our MVP.

A fun aside is that the day after we presented Travlr to some members of the Google cloud team and some from the Firebase team, Google Trips was released. Huh, I guess we were onto something after all :)

To summarise, when you're under 24 - 36 hour time constraints, aim to finish to a reasonable standard in half the time. Then you'll have at least a few hours (assuming you overrun as is typical) to debug, round off the sharp corners and work on your presentation.

Adventures in Web Dev

So I've spent the last 10 weeks working for Autodesk as a C++ developer. As someone who is questionably obsessed with Java, there was somewhat of a difficult transition period (birth of fire).

I'll have a full write up talking about my experiences at Autodesk soon, but for now, I'd like to talk about how I've spent my new found freedom.

NodeJS is awesome! That's something I've discovered in the last week working on Gityll. In ~300 lines, at the time of writing, I have a site generator, using Github as a CMS and Express as a backend. Gityll was originally a toy project, much like this Trello sequencer built by Manoj Nathwani, but I've found it to be surprisingly useful and have since migrated my website over.

Github is a website I visit more than most, every time A cool project idea comes to mind I'll hop over, make a new repo and get some ideas down. Being able to manage my blog entirely from Github is much easier for me, not to mention that it brings ease to making project pages such as the one included in Gityll itself.

ServiceNow Bootcamp

ServiceNow is a enterprise IT system for managing of services, facilities and resources within corporate companies. From the 6th-10th June, 2016 I participated in ServiceNow's University of Birmingam bootcamp week.

During Monday to Wednesday of the week we were taught the very basics of the platform. Starting with creating basic databases and tables and moving onto the creation of simple forms and applications to interact with these tables. This concluded in a session using Jellyscript which is near identical to Javascript but designed to be safer in terms of potentially dangerous modifications and changes.

G2G3 Challenge

Immediately following the taught sessions we participated in Polestar, A business simulation created by G2G3. In the simulation, we had a set of systems (servers, routers etc.) that we had to maintain, this was all to facilitate the running of 5 companies. At differnet stages our servers and networking equipment would encounter faults which would require technical manuals and solving mathematical and logical problems. The simulation consisted of 3 rounds:

Round 1 - Submarine

In the first round we consistently experienced catastrophic failure of our network, this resulted in our technical team panicking and taking a long time to fix these problems. With the huge downtime for online sales we lost $5M over the course of the round.

Round 2 - Sinking Ship

Here we started developing strategy between the sub-teams of the Company. We would pre-solve problems in case they came up and introduced redundancies for our servers to deal with the failure cases. This however, was not enough.

Round 3 - Wait, it's actually working?

This was where we were taught tangible business lessons that I'm sure will be useful in the future. Most evident was the idea of preventing problems from occuring as early in the heirarchy as possible. In this case, the service desk (front line) began keeping a log of all answered questions, eventually having a log of at least half of all inbound questions. This meant the technical team we're free to work on difficult, new problems instead of having to waste time with problems that had previously been sold.

We also introduced a role in the technical team of 'service desk liason', communication between the teams resulted in the mitigation of errors for both teams meaning we did not have to introduce time expensive changes at later stages.

The Hackathon

As has become evident over the last year, Hackathons are the staple of innovation in the academic Computer Science community. This has started leaking more into the corporate world and is fortunately being shared back with the community through collaborations such as the ServiceNow bootcamp at the University of Birmingham.

At the start of the hack we we're allocated into teams of 5 and given the option of ~4 briefs, one of which giving creative freedom to the team. We, of course, chose the creative freedom option. Our initial idea in a few words was 'Distribute incident updates only to those concerned', what this boils down to can be summed up with the following example.

"Picture a Lecturer in the school of Computer Science, we know they are working Mon-Fri, 9-5pm. A typical work week. They teach in lecture theatres in the school of Biosciences and they conduct research in the school of Psychology. Clearly they don't need to be updated about a power cut in the Arts building or an overly enthusiastic student stealing from the music building canteen. Perhaps they would be more interested in internet downtime in floors 1-2 of the Computer Science building."

Our solution to this problem involved multiple systems:

  • A NodeJS server, featuring a login and interest selection page. Used to send personal data to ServiceNow and receive incident logs using a REST API. These incidents are then displayed using AngularJS as a single page web app.
  • The ServiceNow database, this is a relational database keeping records of users (staff and students), incidents etc. Essentially a profile of our users and logged incidents.
  • The Bayesian classifier script, also hosted on ServiceNow, this took in data regarding our user and returned the result of a query to the incidents table i.e. All of the important incidents for a given user.

Conclusion

The ServiceNow bootcamp taught me an important lesson about where my interests lie, I am personally more interested in working on small, intimate projects. The projects I have talked about in the past, working in teams of 4, having full control over the project have been much more satisfying for me.

ServiceNow affords you a great deal of safety and speed with practice, it is difficult to deal any major damage to a project, quick to get something up and running but it is also difficult to feel a sense of full ownership over a project, as so much is abstracted away from the developer.

Hackathon Hack Pack

I've been travelling to quite a few events lately, in an attempt to make mine and everyone's else's lives a little easier, I've compiled a list of hack essentials. I've of course used Maslow's hierarchy -- couldn't just be normal now could I?

Wants

Bedding

If you can bring a suitcase, take an air mattress, pillow and blanket. Getting a good nights sleep will make your final presentation (and general sanity) much stronger.

  • Air mattress
  • Pillow
  • Blanket
  • Collection of hardwa

Needs

  • Laptop
  • Headphones

Musts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.