Coder Social home page Coder Social logo

microsoft / mcw-modernizing-data-analytics-with-sql-server-2019 Goto Github PK

View Code? Open in Web Editor NEW
36.0 15.0 47.0 33.05 MB

MCW Modernizing Data Analytics with SQL Server 2019

License: Other

TSQL 10.50% Shell 4.69% Python 6.28% Jupyter Notebook 66.85% Batchfile 11.68%

mcw-modernizing-data-analytics-with-sql-server-2019's Introduction

This workshop is archived and is no longer being maintained. Content is read-only.

For additional Data and AI content, please go to https://microsoft.github.io/sqlworkshops/.

Modernizing Data Analytics with SQL Server 2019

Businesses require near real-time insights from ever-larger sets of data. Large-scale data ingestion requires scale-out storage and processing in ways that allow fast response times. In addition to simply querying this data, organizations want full analysis and even predictive capabilities over their data.

Wide World Importers (WWI) is a traditional brick and mortar business with a long track record of success, generating profits through strong retail store sales of their unique offering of affordable products from around the world. Over the past few years, they have adopted an omni-channel strategy, meaning, different ways for consumers to purchase their products. These new platforms were added without integrating into the OLTP system data or Business Intelligence infrastructures. As a result, "silos" of data stores have developed.

Now, WWI is trying to cope with difficulties in combining these disparate data sources in varying formats into a single location where they can analyze the data in near real-time, joining related information where needed. They also want to be able to leverage AI to help their business grow and cut down maintenance costs. They would like to have all of these capabilities rolled into a single system, while minimizing code changes across their domain.

June 2020

Target audience

  • Database Administrator
  • Data Engineer
  • Data Scientist
  • Database Developer
  • Solution Architect

Abstracts

Workshop

In this workshop, you will gain a better understanding of how new features of SQL Server 2019 enables more Big Data and analytics capabilities through the use of Big Data Clusters, data virtualization and orchestration, query processing enhancements, and through better scalability through distributed storage and compute.

At the end of this workshop, you will be better able to configure and manage SQL Server 2019 Big Data Clusters so you can combine, query, and transform disparate data sources for AI and advanced analytics scenarios.

Whiteboard design session

In this whiteboard design session, you will work with a group to design a solution for modernizing your large-scale data processing and machine learning capabilities through the use of SQL Server Big Data Clusters. You will evaluate the customer scenario and requirements to decide the best architecture that will meet their needs, while unifying data from disparate sources into a platform that help the customer gain business insights and apply advanced analytics at scale.

At the end of this whiteboard design session, you will be better able to design a modernization plan for performing Big Data analytics centered around SQL Server 2019 capabilities.

Hands-on lab

In this hands-on lab, you will implement the steps to install and configure a SQL Server 2019 cluster to Linux-based containers in Azure. Using this cluster, you will use data virtualization to unify data from various sources, analyze the data, create and deploy a machine learning model, and finally detect and fix PII and GDPR compliance issues.

At the end of this hands-on lab, you will be better able to build solutions for conducting advanced data analytics at scale with scalable SQL Server 2019 Big Data clusters.

Azure services and related products

  • Azure CLI
  • Azure Data Studio
  • Azure Kubernetes Service (AKS)
  • PowerShell
  • SQL Server Management Studio
  • SQL Server 2019 Big Data Clusters (BDC)

Related references

mcw-modernizing-data-analytics-with-sql-server-2019's People

Contributors

codingbandit avatar dawnmariedesjardins avatar feaselkl avatar givenscj avatar ikeellis avatar joelhulen avatar microsoftopensource avatar msftgits avatar timahenning avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mcw-modernizing-data-analytics-with-sql-server-2019's Issues

Error while Restoring sales database to Big Data cluster

In Task 4 : Step 3, before HOL

Output of kubectl get svc -n mssql-cluster from Azure-cli reveales for TCP connection ⤵

image

  • KNOX_PORT = 32722

image

  • SQL_MASTER_PORT = 32626

therefore the change is made here & here and respective IPs are used.

On running .\bootstrap-sample-db mssql-cluster x.y.w.z password c:/temp a.b.c.d --install-extra-samples in command prompt.

.bak file gets downloaded in correct directory although,

probably because of #44 in bootstrap-sampl-db.cmd results into ⤵

Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : TCP Provider: The wait operation timed out.

Please help solving this error 😞

Resource Deployment Failure - Code: Conflict

Tried using the specified template - https://github.com/microsoft/MCW-Modernizing-Data-Analytics-with-SQL-Server-2019/blob/master/Hands-on%20lab/Resources/template.json but deployment keeps failing

{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"Conflict","message":"{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n "code": "NameAlreadyExists",\r\n "message": "The name 'init-sql-mod-data.database.windows.net' already exists. Choose a different name."\r\n }\r\n ]\r\n }\r\n}"}]}

Tried figuring out myself but nothing I've tried has worked so far. Some help would go a long way.

Thanks

issues when running scripts of Item3 of Task4 - Before the HOL

Hi Folks,

I am following up the instruction of this link (https://github.com/microsoft/MCW-Modernizing-Data-Analytics-with-SQL-Server-2019/blob/master/Hands-on%20lab/Before%20the%20HOL%20-%20Modernizing%20Data%20Analytics%20with%20SQL%20Server%202019.md) to prepare a SQL Server 2019 Workshop for customer. I finished the task1 to task3 procedures successfully.

When I am doing item3 of Task4: Install sample databases and upload files, I got below error and can’t move forward. It shows “pods master-0 not found”, but I can’t find related statement for master-0 from attached CMD file.

Did anyone experience similar error before? Or could anyone give some suggestions about how to fix this issue?

Many thanks.

sql 2019 hol error
bootstrap-sample-db.cmd.txt

Issues in Lab.

Please find the issues mentioned below
Before Hands on Lab:-
Task 4, item 5, in the script bootstrap-sample-db.cmd, line 42 should read "set SQLCMDUSER=admin" rather than sa and Lines 100 and 106 need a change from -Usa to -Uadmin although -U %SQLCMDUSER% would probably be better.

HOL:-
Exercise 1: Using data virtualization > Task 1: Create external table from Azure SQL Database - the first 2 objects in the script already exist.

Started executing query at Line 1
Msg 15578, Level 16, State 1, Line 1
There is already a master key in the database. Please drop it before performing this statement.
Started executing query at Line 3
Msg 15530, Level 16, State 1, Line 7
The credential with name "SQLCred" already exists.

Exercise 1: Using data virtualization > Task 1: Create external table from Azure SQL Database - the SQL script creates a table called SQLReviews but that is the only place that name is used, breaking the unified query in Task 3.

Exercise 6 > Task 2 > item 7:
While Restarting the cluster the second command is not working supervisorctl restart mssql, it should be** supervisorctl restart mssql-server**

Exercise1 Task 2 Error - cant move forward to step3

hi,

When I get to the step2 of Exercise1/Task2, I got error message, which shows "method not found" and can't move forward to Step3, even I click the next button at Step 2.

Detailed screen shot are attached.

Could you give some suggestions to fix this error?

Thanks.
Task 2-Create external table from CSV files-error-1
Task 2-Create external table from CSV files-error-2

Issues w/VM setup after Python3 version updated to 3.8?

  • When attempting to follow the 'Modernizing Data Analytics with SQL Server 2019 before the hands-on lab setup guide' on a Windows10/1809 freshly-installed VM, the instructs appear to work OK up to the 7th command listed in Step 10:

pip3 install -r https://aka.ms/azdata --user

  • When running that cmd I rec'v error 'connection error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed', for which I believe a fix to bypass cert warning is to use cmd:

pip3 install --trusted-host pypi.org --trusted-host files.pythonhosted.org -r https://aka.ms/azdata --user

  • However, after doing so I encounter a series of other errors regarding 'error Microsoft Visual C++ 14.0 is required', and later also 'c1083 cannot open include file sqlfront.h' when attempting to build pymssql. The former might have been eliminated by a manual install of build tools, the latter I cannot identify how to fix.

NOTE: I am not a python expert, however a certain amount of searching/troubleshooting makes me wonder if the recent (~10/16/19) release of Python version 38 may have been playing a part in breaking previously-functional setup instructs? (I have not yet had the time to see if using the same setup doc, but forcing an install of a previous version of Python helps)

Issues in Lab

Please find the issues mentioned below
Hands on Lab:-

  1. Exercise 1: Using data virtualization > Task 1: Create external table from Azure SQL Database.
    I am not able to run the query in Azure Data Studio and i tried using SSMS also but faced same issue.

Below is the error i am getting while running the querry:-

Msg 110045, Level 16, State 1, Line 52
110045;User authorization failed: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Login failed for user 'admin'. Additional error <2>: ErrorMsg: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Login failed for user 'admin'., SqlState: 28000, NativeError: 18456

image

  1. Exercise 1: Using data virtualization > Task 2: Create external table from CSV files
    I am not getting the Create External Table From CSV Files option. Can you please let me know that which version of Azure Data Studio you are using for the workshop.

image

Could not open a connection to SQL Server

Hello,

I was trying to use the External Table Wizard in Azure Data Studio to connect to the external Azure SQL Database

I used the following code

CREATE EXTERNAL DATA SOURCE SQLServerInstance WITH
 (
   LOCATION = 'sqlserver://<YOUR_AZURE_SERVER_NAME>.database.windows.net',
   -- PUSHDOWN = ON | OFF,
   CREDENTIAL = SQLCred,
   CONNECTION_OPTIONS='Database=wwi_commerce' 
 );
END
GO

IF (OBJECT_ID('dbo.Reviews') IS NULL)
BEGIN
 CREATE EXTERNAL TABLE dbo.Reviews
 (
   [product_id] [bigint] NOT NULL,
   [customer_id] [bigint] NOT NULL,
   [review] [nvarchar](1000) NOT NULL,
   [date_added] [datetime] NOT NULL
 )
 WITH
 (
   LOCATION='wwi_commerce.dbo.Reviews',
   DATA_SOURCE=SqlServerInstance
 );
END
GO

but got the following error

Msg 105082, Level 16, State 1, Line 52 105082;Generic ODBC error: [Microsoft][ODBC Driver 17 for SQL Server]Named Pipes Provider: Could not open a connection to SQL Server [2]. Additional error <2>: ErrorMsg: [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired, SqlState: HYT00, NativeError: 0 Additional error <3>: ErrorMsg: [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute, SqlState: 01S00, NativeError: 0 Additional error <4>: ErrorMsg: [Microsoft][ODBC Driver 17 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to, SqlState: 08001, NativeError: 2.

I can make out that it's likely that I got the SQL server name wrong but I am not sure because I tried the name I used when setting up the server and it still gave me this error

Is there a way of explicitly getting the MS_AZURE_SERVER_NAME or am I missing something else?

Thanks

Issues in Lab.

Please find the issues mentioned below:-
Before HOL:-

  1. Azdata installation steps is missing from before hands on lab.

HOL:-
Exercise 1: Using data virtualization > Task 1 > Step 3 :

While running the SQL query i am facing below error:-

105082;Generic ODBC error: [Microsoft][ODBC Driver 17 for SQL Server]Named Pipes Provider: Could not open a connection to SQL Server [2]. Additional error <2>: ErrorMsg: [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired, SqlState: HYT00, NativeError: 0 Additional error <3>: ErrorMsg: [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute, SqlState: 01S00, NativeError: 0 Additional error <4>: ErrorMsg: [Microsoft][ODBC Driver 17 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to, SqlState: 08001, NativeError: 2 .

I checked all the required things like instance name, SQL credentials and all but everything is good.

Thanks

Issue in Installing SQL Server 2019 Big Data clusters

1.In Before the HOL,Task 3,step 5, the link given to download the python script for creating an AKS service and then deploying a SQL Server 2019 big data cluster to AKS is not available.

python_link
And its redirected to:
error404
2.But in the resource folder deploy-sql-big-data-aks.py is available.
3.Tried deploying the python script but taking unexpected time and cluster is getting failed invariably.
Can you please check ,fix this ASAP.

Thanks,
Amal Gireesh

December 2019 - content update

This workshop is scheduled for a content update. Please review the current workshop, any open issues, and provide update suggestions for review. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.