alessandrocorradini / university-of-california-san-diego-big-data-specialization Goto Github PK

View Code? Open in Web Editor NEW

211.0 8.0 235.0 55.25 MB

Repository for the Big Data Specialization from University of California San Diego on Coursera

Python 2.47% Shell 0.90% Jupyter Notebook 96.38% TSQL 0.26%

coursera coursera-bigdata coursera-big-data mooc moocs

university-of-california-san-diego-big-data-specialization's People

Contributors

Stargazers

Watchers

Forkers

ekamwalia skvrahul colt005 skaiphd divyanaveen31 temimehammami mha535 nmnjn rb98dps rahulmamilla rohit00619 arshiyamittal2000 apoorvaa25ap idomendrasahu ishfaqaref saqlainhussainshah sarah0809 mahmoudheshmat sreejeshkumarvs vishal003 amaidanjum techiebibhuti mahekjain kannan001-g kiranjubrewar rajwrita spich9215 sanjogmhasde sachinsagar624 mayanks04 viji2000 bgg11117 githubrenu kamalik7 abhishekagarwal1968 aarthibabu10 pvgandhi aditisharma2017 ashleshkurhade mancernecro shreyash1811 duonghoaian sushmita7574 zeelanzoya ayushiudasi shristisinha49 aravind9666 pankajkompella anonymous001100 farida02 geminikrishna ayswarya141 edufabc aravindkotte jsherretts blackdevil97 pavi237 itsnitinn patnaik-sg sowvik-misra pprriiyyaannkkaa taep92 deeksha2599 dunget l1onk1ngl twn-wi11i4m kshitijgupta468035 anubhav-nag thiru1274 goldfishh abhishekaditya49 nayamtulla vibhuarvind pranakum smileeys ekipus shravanivenkatesh akshay-kap gresa2811 dishariroy benhurpradeep1 oscargoy srinithi-s-2001 niranjan849 gangineni611 sohaipelyamany prasadpriyesh1 deepika189 abc-99 azurkym maneesha58 prathmeshrustagi divijendra sushmanthnatha raselparvej waqasmarri samarth-panchal tina-gh mahir-ally1 muskaan1208

university-of-california-san-diego-big-data-specialization's Issues

Quiz 10 - Regression, Cluster Analysis, & Association Analysis

In classification, you're predicting a category, and in regression, you're predicting a number
Determining whether power usage will rise or fall
Determine the regression line that best fits the samples.
In simple linear regression, the input has only one variable. In multiple linear regression, the input has more than one variables.
To segment data so that differences between samples in the same cluster are minimized and differences between samples of different clusters are maximized.
All of these choices are valid uses of the resulting clusters.
The mean of all the samples in the cluster
Assign each sample to the closest centroid, then calculate the new centroid.
To find rules to capture associations between items or events
A transaction or set of items that occur together
Captures the frequency of that item set
Prune rules by eliminating rules with low confidence

Quiz 6 - Using GraphX

select all
metro area
green dot with no connections
Social networks have communities or pockets of people who interact densely

Add Solution Quiz 6 - Using GraphX

Quiz 6 - Using GraphX

1. In this code snippet below from the Hands On exercise on importing data, '100L + row...' adds 100 to the value of every country ID. Which of the following statements are true regarding this decision? (Note: you may select more than one)

val countries: RDD[(VertexId, PlaceNode)] =
  sc.textFile("./EOADATA/country.csv").
    filter(! _.startsWith("#")).
    map {line =>
      val row = line split ','
      (100L + row(0).toInt, Country(row(1)))
    }

Another option would have been to add 100 to the metropolis keys as they were imported, and leave the country keys as they were originally numbered.
This step was needed to create unique keys between the country and the metropolis datasets.
Another option would be to add 500 to the country keys.

2. In the metro example, what is an in-degree in relation to a country? Hint: this was covered in the Building a Degree Histogram Hands On exercise.

A street in a city.
Another city.
A continent.
A metro area or metropolis.

3. In the Hands On exercise on network connectedness and clustering, Antarctica was easy to identify. Why?

It had many edges
It had a vertex ID of 205.
It is the green dot that that has no connections, or it is the least connected cluster.

4. In the Facebook graph example, the visualization looked like broccoli. Why?

In a directed graph, the stalks are large.
Social networks have communities or pockets of people who interact densely.
The high centrality of some people nodes in facebook gives the graph its broccoli shape.

Add solution to Quiz 5 - Assessment Question on 'Practicing Graph Analytics

Quiz 5 - Assessment Questions on 'Practicing Graph Analytics in Neo4j With Cypher'

1. What is the number of nodes returned?

50,000
9656
9756
8673

2. What’s the number of edges?

50,000
49,834
46,621
None of the above

3. The number of loops in the graph is:

1035
1395
1221
1243

4. The query match (n)-[r]->(m) where m <> n return distinct n, m, count(r) gives us

the count of all non loop edges between every adjacent node pair.
the count of all edges between every adjacent node pair.
the count of all edges.
None of the above

5. The query match (n)-[r]->(m) where m <> n return distinct n, m, count(r) as myCount order by myCount desc limit 1 produces what?

a random edge
the node with the maximum number of looping edges
two neighboring nodes, each with a high outdegree
the pair of nodes with the maximum number of multi-edges between them

6. The query match p=(n {Name:'BRCA1'})-[:AssociationType*..2]->(m) return p produces what?

The neighbors of the node whose name is ‘BRCA1’
The 2-neighborhood of the node whose name is ‘BRCA1’
The neighbors’ neighbors of the node whose name is ‘BRCA1’
The neighbors whose distance is greater than 1 and less than 2 of the node whose name is ‘BRCA1’

7. How many non-directed shortest paths are there between the node named ‘BRCA1’ and the node named ‘NBR1’?

8
9
10
None of the above

8. The top 2 nodes with the highest outdegree are:

GRB2 and TP53
EP300 and BRCA1
MEPCE and EGFR
SNCA and BRCA1

9. Applying the example queries provided to you, create the degree histogram for the network. How many nodes in the graph have a degree of 3?

1351
821
675
512

Quiz 5 - Assessment Questions on 'Practicing Graph Analytics in Neo4j With Cypher'

Solution for the above quiz as they are missing. I hope the author will update the same.

1.9656
2.46621
3.1221
4.the count of all non loop edges between every adjacent node pair.
5. the pair of nodes with the maximum number of multi-edges between them
6.
7. 9
8. snca
9. 821

Fix the solution on Course 2 - Quiz 2

Course 2: Big Data Modelling and Management System
Quiz 2 - Data Model Quiz

What does the term “atomic” mean in the context of relational databases?

Fixed schema of a particular database.
A tuple that cannot be reduced.
A column or row of data. Depends on the context.
One unit of information that cannot be decomposed.

For the following questions 7, 8, and 9, suppose a registration website creates data with the following fields for each person registered (note: if the user does not input a value, NULL is stored instead): Name, Date, Address, and Account Number.
Suppose we collect data month by month. Each month, we would have a batch of data containing the fields listed above. At the end of the year, we want to summarize our registrant activities for the entire year, so we would remove redundancies in our data by removing any records with duplicate account numbers from month to month. What type of operation do we use in this scenario?

Join
Not an Operation
Subsetting
Union

alessandrocorradini / university-of-california-san-diego-big-data-specialization Goto Github PK

university-of-california-san-diego-big-data-specialization's People

Contributors

Stargazers

Watchers

Forkers

university-of-california-san-diego-big-data-specialization's Issues

Quiz 6 - Using GraphX

1. In this code snippet below from the Hands On exercise on importing data, '100L + row...' adds 100 to the value of every country ID. Which of the following statements are true regarding this decision? (Note: you may select more than one)

2. In the metro example, what is an in-degree in relation to a country? Hint: this was covered in the Building a Degree Histogram Hands On exercise.

3. In the Hands On exercise on network connectedness and clustering, Antarctica was easy to identify. Why?

4. In the Facebook graph example, the visualization looked like broccoli. Why?

Quiz 5 - Assessment Questions on 'Practicing Graph Analytics in Neo4j With Cypher'

1. What is the number of nodes returned?

2. What’s the number of edges?

3. The number of loops in the graph is:

4. The query match (n)-[r]->(m) where m <> n return distinct n, m, count(r) gives us

5. The query match (n)-[r]->(m) where m <> n return distinct n, m, count(r) as myCount order by myCount desc limit 1 produces what?

6. The query match p=(n {Name:'BRCA1'})-[:AssociationType*..2]->(m) return p produces what?

7. How many non-directed shortest paths are there between the node named ‘BRCA1’ and the node named ‘NBR1’?

8. The top 2 nodes with the highest outdegree are:

9. Applying the example queries provided to you, create the degree histogram for the network. How many nodes in the graph have a degree of 3?

Recommend Projects

Recommend Topics

Recommend Org