Hi Team, Hopefully you're doing great!
PythonJupyter Enviroment
MySQL (RDBS)
AWS
Dependencies (Import libraries)
import time
from tqdm.notebook import tqdm, trange
import pandas as pd
import numpy as np
from zipfile import ZipFile
from io import BytesIO
import urllib.request as urllib2
import boto3
from boto3.s3.transfer import TransferConfig
from boto3.s3.transfer import S3Transfer
import io
import pyarrow as pa
import pyarrow.parquet as pq
1. Schema Design an RDBMS table schema to store the CSV data
Database Name : Stocks_Data
MySQL Query: CREATE DATABASE IF NOT EXISTS Stock_Data;
Select Database Query: Use Stock_Data;
Creating Table Query :
CREATE TABLE mytable(
Date DATE NOT NULL PRIMARY KEY,
Open NUMERIC(10,6) NOT NULL,
High NUMERIC(10,6) NOT NULL,
Low NUMERIC(10,6) NOT NULL,
Close NUMERIC(10,6) NOT NULL,
Adj_Close NUMERIC(10,6) NOT NULL,
Volume INTEGER NOT NULL
);
Displays Metadata Query: Describe mytable;
2 Calculation You are required to use this data and calculate the following using SQL
Weekly average of High, Low and Volume:
Query: SELECT WEEK(Date),AVG(High),AVG(Low), AVG(Volume) FROM data group by WEEK(Date);
Monthly average of High, Low and Volume:
Query: SELECT MONTH(Date),AVG(High),AVG(Low), AVG(Volume) FROM data group by MONTH(Date);
Yearly average of High, Low and Volume:
Query: SELECT YEAR(Date),AVG(High),AVG(Low), AVG(Volume) FROM data group by YEAR(Date);
Jupyter Source File: Stock Data ETL Process Part-1.ipynb (Contains first two Questions)
3. System Design
Stock Data ETL Process Part-2.ipynb (Contains third Question)