This is documentation in regards to how we are using Prowl with AWS. If you're an employee and having trouble getting connected, read through this -- this should address any issue you have, and if you're just a lookie loo having some sort of problem with AWS -- this may help in some indirect way.
QuickSight is from the AWS suite that is simply a tool to visually build ad-hoc queries. As of now, there is no API for it so that we can automatically generate dashboards and there is no built in way to run statistical testing. For this, we can use Plotly, or something similar.
In the private Prowl repo's there are some VMWare Fusion license keys "add-ons" for Vagrant, worth about $70 a piece, still available if anyone involved in Prowl needs a key. We will be using precompiled Gentoo boxes for most testing via Vagrant. https://gentoo.org/
You can access this box, that's maintained by myself (Montana) via
vagrant init montana/gentoo64
vagrant up
vagrant ssh
If there is a port collision, I've already released a patch for this box, so port collisions should be fixed, but in the case that there is run
vagrant halt
Then reenter those above commands. If the port collison still exists, try running
sed -i '' '/$1/d' /Library/Preferences/VMware\ Fusion/networking
This of course is assuming you are using the VMWare Fusion provider. There's a bash script I've created that does this automatically if you want to save some time.
Some of the Prowl team have talked about using OpenRefine, about cleansing our data we receive in CSV form. Which as of now we have a whole lot. We had initially looked at CrowdFlower for this, but decided to see if we couldn't use something a bit more cost efficient, and open source. We would be using OpenRefine (2.5) which then was called Google Refine, the reason for this, is stability, along with an addon called "Conciliator".
We have to make sure we have Maven installed, once installed, we can start cleaning data -- and sending this data to AWS. The flowchart below, describes this process. This flowchart was created by Garrett Loh, Prowl's co-founder, and business development.
Each Raw data dump should be in a folder named by the client, for example EPR
or HAUS COFFEE
.
We need the File name by upload date/time MMDDYYYY_HHMM (pref that format, but another will do)
The data cleaned with Google Refine in an AWS S3 bucket called something like prowl-clean
.
With this data we can connect S3 to AWS Glue. So now we know how the Data structure is setup like, the PoC is as follows in this chart.
I would use Jenkins in any other case, but to stay within the suite, automation here calls for AWS Lambda. Given the notion our employees have the AWS CLI setup, you can run
aws lambda list-functions --profile adminuser
This will try and reach Lambda, if it fails, it will print out an error message. Prowl already has a custom s3-get-object
and a django-storages
via Zappa.
Okay, now that you're connected to Lambda, and you have hopefully grabbed our custom django-storages
in the private repo, make sure Zappa has ALL of it's dependencies installed before you start using it, of course pip will fetch whatever is in the requirements.txt file, so let's start the Zappa shell, get the virtual enviornment going (you can use Docker alternatively), and install the requirements
zappashell zappashell> python -m venv ve zappashell> source ve/bin/activate (ve) zappa> pip install -r requirements.txt
As mentioned above you can use Docker, to do so
docker built -t prowl
Well, looks like you're all set, you're connected to Lambda, and have Zappa going! Congratulations for setting up an aspect of automation, but sometimes, it isn't this seamless, we've ran into this problem -- specifically myself so in the next section I will be talking about the problem once you try and deploy using Django/Zappa.
This is a mockup made by our co-founder Jake Augunas, essentially once you login from our web portal, you will have these actionalable insights, and most of this will be using what's being described in this document, AWS.
Okay, so we've setup our Django/Lambda enviornment, it looks a little something like this
{ "dev": { "django_settings": "montana.settings", "s3_bucket": "prowl-test-code" } } Does this look okay? (default 'y') [y/n]: y Done! Now you can deploy your Zappa application by executing: $ zappa deploy dev After that, you can update your application code with: $ zappa update dev To learn more, check out our project page on GitHub here: https://github.com/Miserlou/Zappa and stop by our Slack channel here: http://bit.do/zappa Enjoy!, ~ Team Zappa!
As you see we are pointing Lambda to the proper S3 bucket, and everything seems dandy, so let's deploy the dev version
zappa deploy dev
Unfortunately we encounter an error
zappa deploy dev Calling deploy for environment dev.. Warning! AWS Lambda may not be available in this AWS Region! Warning! AWS API Gateway may not be available in this AWS Region! Oh no! An error occurred! :( ============== Traceback (most recent call last): [boring callback removed] NoRegionError: You must specify a region. ============== Need help? Found a bug? Let us know! :D File bug reports on GitHub here: https://github.com/Miserlou/Zappa And join our Slack channel here: https://slack.zappa.io Love!, ~ Team Zappa!
In this case, we have an umbrella of options to try and fix this, I'll go over what has worked the best for myself personally. So what you'll need to do is specify a default region using environment variables, the drawback of using Lambda in my opinion is you must do this for every console, so a little frustrating.
export AWS_DEFAULT_REGION=us-west-1
You need to add the default region in your ~/.awd/credentials
file
[default] aws_access_key_id = prowl_access_key aws_secret_access_key = prowl_access_key region=us-west-1
Remember to becareful with JSON, and make sure your commas are in the right place. Now let's try and deploy again
zappa deploy dev
This should deploy, but if it doesn't it might be Django's security features blocking it from actually deploying so try to open the settings.py
file and change ALLOWED_HOSTS
to
ALLOWED_HOSTS = [ '127.0.0.1', 'x6kb437rh.execute-api.us-west-1.amazonaws.com', ]
Then redeploy.
There's a couple of Django modules you should have already on your Mac, or PC. Assuming you're running Linux, you need to run
pip install django-storages boto
Going back to the settings.py
file, you want to add the following lines
INSTALLED_APPS = ( ..., 'storages', )
By default newly created RDS Security Groups have no inbound access. So you need to make sure your RDS Security group has open TCP connections from your subnets associated with your AWS Lambdas
Type | Protocol | Port Range |
---|---|---|
All TCP | TCP | 5432 |
In Zappa there are some custom commands you can run which make it SUPER easy to navigate around a somewhat complex set of circumstances, I've made a table below to explain
DNS Provider | CA | Notes |
---|---|---|
Route53 | AWS Certificate Manager | All AWS combo makes this ridiculous easy |
Route53 ย | Let's Encrypt | Another good option that Zappa has smoothed the way |
Other DNS | Other | You've got some work to do |
You can edit the Zappa settings file, which is to configure all these gateways properly
"certificate_arn": "arn:aws:acm:us-west-1:738356466015:certificate/1d066282-ce94-4ad7-a802-2ff87d32b104", "domain": "www.getprowl.com",
Then
"dev": { "django_settings": "montana.settings", "s3_bucket": "haus-coffee", "aws_region": "us-west-1",
Now obviously the bucket can be anything, in this case it's Haus Coffee based out of San Francisco. Now certify that Route53 passes all the tests that Zappa can perform, and hopefully you'll get it certified in your virtual environment
(ve) $ zappa certify dev Calling certify for environment dev.. Are you sure you want to certify? [y/n] y Certifying domain www.getprowl.com.. Created a new domain name with supplied certificate. Please note that it can take up to 40 minutes for this domain to be created and propagated through AWS, but it requires no further work on your part. Certificate updated!
Your code can pull Prowl information/data dumps from the execution environment by using the built-in AWS Lambda environment variables, but there are also custom ways of doing this in Zappa. For example, you could add SOME_LAMBDA_KEY
in the AWS console and retrieve it in your code, via
import os some_lamda_key = os.environ.get('SOME_LAMBDA_KEY') # or get system values aws_lambda_function_name = os.environ.get('AWS_LAMBDA_FUNCTION_NAME')
Hopefully after reading this, you have a general understanding of how Prowl is applying AWS to achieve our goal. Some of the things I covered above could have been done on a CDN like CloudFront or StackPath, but for the sake of time I won't be covering that here, maybe in a different documentation piece, just to cover all grounds for our developers depending on the path they want to take when using a VM/venv. This documentation is made generally for Prowl employees in a private repo (not the one in public, the one in private is filled with documentation) but this particular piece could potentially be applied to other situations, and for that reason I made this open to help others.
Written by Montana Mendy (c) 2017 https://www.getprowl.com