Coder Social home page Coder Social logo

alessandrocorradini / university-of-california-san-diego-big-data-specialization Goto Github PK

View Code? Open in Web Editor NEW
211.0 8.0 235.0 55.25 MB

Repository for the Big Data Specialization from University of California San Diego on Coursera

Python 2.47% Shell 0.90% Jupyter Notebook 96.38% TSQL 0.26%
coursera coursera-bigdata coursera-big-data mooc moocs

university-of-california-san-diego-big-data-specialization's People

Contributors

alessandrocorradini avatar nmnjn avatar skvrahul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

university-of-california-san-diego-big-data-specialization's Issues

Quiz 10 - Regression, Cluster Analysis, & Association Analysis

  1. In classification, you're predicting a category, and in regression, you're predicting a number
  2. Determining whether power usage will rise or fall
  3. Determine the regression line that best fits the samples.
  4. In simple linear regression, the input has only one variable. In multiple linear regression, the input has more than one variables.
  5. To segment data so that differences between samples in the same cluster are minimized and differences between samples of different clusters are maximized.
  6. All of these choices are valid uses of the resulting clusters.
  7. The mean of all the samples in the cluster
  8. Assign each sample to the closest centroid, then calculate the new centroid.
  9. To find rules to capture associations between items or events
  10. A transaction or set of items that occur together
  11. Captures the frequency of that item set
  12. Prune rules by eliminating rules with low confidence

Quiz 6 - Using GraphX

  1. select all
  2. metro area
  3. green dot with no connections
  4. Social networks have communities or pockets of people who interact densely

Add Solution Quiz 6 - Using GraphX

Quiz 6 - Using GraphX

1. In this code snippet below from the Hands On exercise on importing data, '100L + row...' adds 100 to the value of every country ID. Which of the following statements are true regarding this decision? (Note: you may select more than one)

val countries: RDD[(VertexId, PlaceNode)] =
  sc.textFile("./EOADATA/country.csv").
    filter(! _.startsWith("#")).
    map {line =>
      val row = line split ','
      (100L + row(0).toInt, Country(row(1)))
    }
  • Another option would have been to add 100 to the metropolis keys as they were imported, and leave the country keys as they were originally numbered.
  • This step was needed to create unique keys between the country and the metropolis datasets.
  • Another option would be to add 500 to the country keys.

2. In the metro example, what is an in-degree in relation to a country? Hint: this was covered in the Building a Degree Histogram Hands On exercise.

  • A street in a city.
  • Another city.
  • A continent.
  • A metro area or metropolis.

3. In the Hands On exercise on network connectedness and clustering, Antarctica was easy to identify. Why?

  • It had many edges
  • It had a vertex ID of 205.
  • It is the green dot that that has no connections, or it is the least connected cluster.

4. In the Facebook graph example, the visualization looked like broccoli. Why?

  • In a directed graph, the stalks are large.
  • Social networks have communities or pockets of people who interact densely.
  • The high centrality of some people nodes in facebook gives the graph its broccoli shape.

Add solution to Quiz 5 - Assessment Question on 'Practicing Graph Analytics

Quiz 5 - Assessment Questions on 'Practicing Graph Analytics in Neo4j With Cypher'

1. What is the number of nodes returned?

  • 50,000
  • 9656
  • 9756
  • 8673

2. What’s the number of edges?

  • 50,000
  • 49,834
  • 46,621
  • None of the above

3. The number of loops in the graph is:

  • 1035
  • 1395
  • 1221
  • 1243

4. The query match (n)-[r]->(m) where m <> n return distinct n, m, count(r) gives us

  • the count of all non loop edges between every adjacent node pair.
  • the count of all edges between every adjacent node pair.
  • the count of all edges.
  • None of the above

5. The query match (n)-[r]->(m) where m <> n return distinct n, m, count(r) as myCount order by myCount desc limit 1 produces what?

  • a random edge
  • the node with the maximum number of looping edges
  • two neighboring nodes, each with a high outdegree
  • the pair of nodes with the maximum number of multi-edges between them

6. The query match p=(n {Name:'BRCA1'})-[:AssociationType*..2]->(m) return p produces what?

  • The neighbors of the node whose name is ‘BRCA1’
  • The 2-neighborhood of the node whose name is ‘BRCA1’
  • The neighbors’ neighbors of the node whose name is ‘BRCA1’
  • The neighbors whose distance is greater than 1 and less than 2 of the node whose name is ‘BRCA1’

7. How many non-directed shortest paths are there between the node named ‘BRCA1’ and the node named ‘NBR1’?

  • 8
  • 9
  • 10
  • None of the above

8. The top 2 nodes with the highest outdegree are:

  • GRB2 and TP53
  • EP300 and BRCA1
  • MEPCE and EGFR
  • SNCA and BRCA1

9. Applying the example queries provided to you, create the degree histogram for the network. How many nodes in the graph have a degree of 3?

  • 1351
  • 821
  • 675
  • 512

Fix the solution on Course 2 - Quiz 2

Course 2: Big Data Modelling and Management System
Quiz 2 - Data Model Quiz

  1. What does the term “atomic” mean in the context of relational databases?
  • Fixed schema of a particular database.
  • A tuple that cannot be reduced.
  • A column or row of data. Depends on the context.
  • One unit of information that cannot be decomposed.
  1. For the following questions 7, 8, and 9, suppose a registration website creates data with the following fields for each person registered (note: if the user does not input a value, NULL is stored instead): Name, Date, Address, and Account Number.
    Suppose we collect data month by month. Each month, we would have a batch of data containing the fields listed above. At the end of the year, we want to summarize our registrant activities for the entire year, so we would remove redundancies in our data by removing any records with duplicate account numbers from month to month. What type of operation do we use in this scenario?
  • Join
  • Not an Operation
  • Subsetting
  • Union

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.