List
are likejava.util.List
where they are indexed collections that hold usually homogeneous dataList
are unlike java.util.List where they are an immutable ListList
generally allows duplicates
Varying Ways to create a List
val a = List(1,2,3,4,5) //What is this call?
val b = 1 :: 2 :: 3 :: 4 :: 5 :: Nil
val c = Nil:List[String]
Intitializing Scala interpreter ...
Spark Web UI available at http://32c6e00c8fb3:4043
SparkContext available as 'sc' (version = 2.4.3, master = local[*], app id = local-1562173490938)
SparkSession available as 'spark'
a: List[Int] = List(1, 2, 3, 4, 5)
b: List[Int] = List(1, 2, 3, 4, 5)
c: List[String] = List()
println(a.head)
1
println(a.tail)
List(2, 3, 4, 5)
println(a.init)
println(a.last)
println(a(4)) //5 <--Wait what is this?
println(a.max)
println(a.min)
println(a.isEmpty)
println(a.nonEmpty)
println(a.updated(3, 100)) //Underused
println(a.mkString(",")) //available on all collections!
println(a.mkString("{", " ## ", "}"))
List
are a immutable collection, duplicates allowedNil
is an empty ListList
are created with the object and an apply factoryList
have all the functional properties as other collections have
Obtaining 0…4, exclusively
val exclusive = Range(0, 4) //0,1,2,3
Obtaining 0…4, inclusively
val inclusive = Range.inclusive(0, 4) //0,1,2,3,4
Source: https://www.scala-lang.org/api/current/scala/collection/immutable/Range$.html
Obtaining 0…4, exclusively
val exclusive = 0 until 4 //0,1,2,3
exclusive: scala.collection.immutable.Range = Range(0, 1, 2, 3)
Obtaining 0…4, inclusively
val inclusive = 0 to 4 //0,1,2,3,4
inclusive: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4)
Source: https://www.scala-lang.org/api/current/scala/collection/immutable/Range$.html
Obtaining 1…20 with a positive step of 2, exclusively, using Standard API
Range(0, 20, 2).toVector //Vector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18)
The above obviously uses the apply
method
Obtaining 1…20 with a positive step of 2, exclusively, using Implicit Trickery
(0 until 20 by 2).toVector //Vector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18)
Obtaining 1…20 with a positive step of 2, inclusively, using Standard API
Range.inclusive(0, 20, 2).toVector //Vector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
res7: Vector[Int] = Vector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
The above uses the apply method
Obtaining 1…20 with a positive step of 2, inclusively, using Implicit Trickery
(1 to 20 by 2).toVector //Vector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
res8: Vector[Int] = Vector(1, 3, 5, 7, 9, 11, 13, 15, 17, 19)
Range exclusively with Negative Steps
Obtaining 20…0 with a negative step of -2, exclusively, using Standard API
Range(20, 0, -2).toVector //Vector(20, 18, 16, 14, 12, 10, 8, 6, 4, 2)
res9: Vector[Int] = Vector(20, 18, 16, 14, 12, 10, 8, 6, 4, 2)
The above uses the apply
method
Obtaining 20…0 with a negative step of -2, exclusively, using Implicit Trickery
(20 until 0 by -2).toVector //Vector(20, 18, 16, 14, 12, 10, 8, 6, 4, 2)
res10: Vector[Int] = Vector(20, 18, 16, 14, 12, 10, 8, 6, 4, 2)
Using Standard API
Obtaining 20…0 with a negative step of -2, inclusively, using Standard API
Range.inclusive(20, 0, -2).toVector //Vector(20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0)
res6: Vector[Int] = Vector(20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0)
The above uses the apply method
Obtaining 20…0 with a negative step of -2, inclusively, using Implicit Trickery
(20 to 0 by -2).toVector //Vector(20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0)
res5: Vector[Int] = Vector(20, 18, 16, 14, 12, 10, 8, 6, 4, 2, 0)
- Just like underlying Java and most programming languages, a
Set
is: Collection
that doesn’t have duplicate elements- Generally have more mathematical methods than
List
- Doesn’t maintain order
Some of the characteristics of Set
:
head
,tail
are not availableapply
has a different behavior
val set = Set(1,2,3,4)
val set2 = Set.apply(1,2,3,4,5)
set: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)
set2: scala.collection.immutable.Set[Int] = Set(5, 1, 2, 3, 4)
Calculating the differences of a Set The following calculates the differences of a Set
Set(1,2,3,4) diff Set(1,2,3,4,5,6,7)
Whereas, the opposite…
Set(1,2,3,4,5,6,7) diff Set(1,2,3,4)
res4: scala.collection.immutable.Set[Int] = Set(5, 6, 7)
A union will provide the combination of the two Set
Set(1,2,3,4) union Set(5,10)
intersect is the opposite of a diff and shows the commonality of two Set
Set(1,2,3,4) intersect Set(19,2,3,10)
res2: scala.collection.immutable.Set[Int] = Set(2, 3)
apply
will only return the same as contains
and that is whether the element is in the Set
or not
val set = Set(1,2,3,4)
set.apply(4) //true
set.apply(10) //false
set.contains(4) //true
set: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)
res3: Boolean = true
Set
s are collections with no duplicate elements- More mathematically powerful than the
List
counter part Set
s have a hash order that is undetermined (if less than 5)apply
will returntrue
orfalse
- Called associative arrays, dictionary, or tables in other languages
- Table of keys and values
- Items are looked up by key
The following calls all create the same Map
val m = Map.apply((1, "One"), (2, "Two"), (3, "Three"))
m: scala.collection.immutable.Map[Int,String] = Map(1 -> One, 2 -> Two, 3 -> Three)
val m = Map((1, "One"), (2, "Two"), (3, "Three"))
m: scala.collection.immutable.Map[Int,String] = Map(1 -> One, 2 -> Two, 3 -> Three)
Reminder, on tuples, the ->
creates a Tuple2
val t:(Int, String) = 1 -> "One"
t: (Int, String) = (1,One)
val m = Map(1 -> "One", 2 -> "Two", 3 -> "Three")
m: scala.collection.immutable.Map[Int,String] = Map(1 -> One, 2 -> Two, 3 -> Three)
Given a Map we created previously.
val m = Map(1 -> "One", 2 -> "Two", 3 -> "Three")
m: scala.collection.immutable.Map[Int,String] = Map(1 -> One, 2 -> Two, 3 -> Three)
To retrieve by key:
m.get(1)
res11: Option[String] = Some(One)
When no key is available, then the result will be None
m.get(4)
res12: Option[String] = None
Given a Map we created previously.
val m = Map(1 -> "One", 2 -> "Two", 3 -> "Three")
m: scala.collection.immutable.Map[Int,String] = Map(1 -> One, 2 -> Two, 3 -> Three)
Calling apply will retreive the value direct without wrapping it in an Option
m.apply(1)
res13: String = One
The problem that you will have to be careful about when calling an apply on a Map that doesn’t contain the key
m.apply(4)
java.util.NoSuchElementException: key not found: 4
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
... 36 elided
Given a Map we created previously.
val m = Map(1 -> "One", 2 -> "Two", 3 -> "Three")
m: scala.collection.immutable.Map[Int,String] = Map(1 -> One, 2 -> Two, 3 -> Three)
To retrieve an Iterable of keys
m.keys
res15: Iterable[Int] = Set(1, 2, 3)
To retreive them as a Set
m.keySet
res16: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
Map
are a table-like collection that store keys and values- Internally,
Map
s are a collection of tuples, and can be operated on as such
Vector
is a different kind of sequence likeList
- Offers differing characteristics, particularly in storage using tries
- It is generally faster in many operations
- Has an API very similar to
List
Vector(303.00, -230.2, -12, 19.0, 22.01, -132.00)
res0: scala.collection.immutable.Vector[Double] = Vector(303.0, -230.2, -12.0, 19.0, 22.01, -132.0)
- Collection that evaluates the elements when needed
- To create your own it typically requires use of the
cons
operation and perhapsStream.empty[A]
- To use a
Stream
, particularly if it is infinite, requires the use oftake
which takes a certain number of elements - In Scala 2.13, it has been replaced with
LazyList
which has nearly the same functionality but different evaluation semantics
def continuousEvens(): Stream[BigInt] = {
def ce(n:BigInt):Stream[BigInt] = Stream.cons(n, ce(n + 2))
ce(2)
}
continuousEvens().take(5).mkString(",")
continuousEvens: ()Stream[BigInt]
res5: String = 2,4,6,8,10
Cons can be expressed using the #::
operator, and please not the colon on the right hand side
def continuousEvens(): Stream[BigInt] = {
def ce(n:BigInt):Stream[BigInt] = n #:: ce(n + 2)
ce(2)
}
continuousEvens().take(5).mkString(",")
continuousEvens: ()Stream[BigInt]
res6: String = 2,4,6,8,10
- In determine performance as to which collection you want here is a chart
- The chart is direct from https://docs.scala-lang.org/overviews/collections/performance-characteristics.html
- The notation is as follows
- eC - Effectively Constants
- aC - Amortized constant time, isn't always constant but over an average it comes close
- C - Constant Time
- Log - Logarithmic Time with the collection size.
- L - Linear, propertional to size of collection
-
In Scala, methods names and parameters were curated with care
-
Lesson: Once you learn most if not all methods of List you will also know
Set
Map
Stream
String
Future
Option
Queue
Range
Vector
-
In some capacity that will also include mutable collections
-
Knowing this the learning curve drops significantly
-
This also aids in how we learn and understand Spark