Essential Scala: Collections

Sequences

创建一个序列:

val sequence = Seq(1, 2, 3)
// sequence: Seq[Int] = List(1, 2, 3)

这段代码实现展示了Scala的关键特性:接口和实现的分离.上面的值是Seq[Int]类型,但是实现却是List.

Basic operations

访问元素:我们可以使用序列的apply方法访问元素,该方法接收一个整形的索引作为参数:

sequence.apply(0) 
// res: Int = 1
sequence(0) // 语法糖 
// res: Int = 1

如果索引超出序列长度则会引发异常:

sequence(3)
// java.lang.IndexOutOfBoundsException: 3 // at ...

或者访问序列的头元素或尾部元素列表:

sequence.head 
// res: Int = 1
sequence.tail
// res: Seq[Int] = List(2, 3)
sequence.tail.head 
// res: Int = 2

如果访问一个空序列的头元素同样会引发异常:

Seq().head
// java.util.NoSuchElementException: head of empty list
// at scala.collection.immutable.Nil$.head(List.scala:337) 
// ...
Seq().tail
// java.lang.UnsupportedOperationException: tail of empty list 
// at scala.collection.immutable.Nil$.tail(List.scala:339) 
// ...

如果想要安全的访问头元素而不引发异常,可以使用headOption:

sequence.headOption
// res: Option[Int] = Some(1)
Seq().headOption
// res: Option[Nothing] = None

获取序列长度:

sequence.length

搜索元素:contains方法来检查一个序列是否包含一个元素(使用==进行比较):

sequence.contains(2) // res: Boolean = true

find方法会返回第一个找到的元素,或者返回None:

sequence.find(_ == 3)
// res: Option[Int] = Some(3)
sequence.find(_ > 4)
// res: Option[Int] = None

filter方法返回一个满足谓语函数的元素列表:

sequence.filter(_ > 1)
// res: Seq[Int] = List(2, 3)

元素排序:

sequence.sortWith(_ < _)
// res: Seq[Int] = List(3, 2, 1)

添加元素:

sequence.:+(4)
// res: Seq[Int] = List(1, 2, 3, 4)

sequence :+ 4
// res: Seq[Int] = List(1, 2, 3, 4)

sequence.+:(0)
// res: Seq[Int] = List(0, 1, 2, 3)

0 +: sequence
// res: Seq[Int] = List(0, 1, 2, 3)

sequence ++ Seq(4, 5, 6)
// res: Seq[Int] = List(1, 2, 3, 4, 5, 6)

Seq的默认实现是Lists,通过单例对象Nil创建一个空List:

Nil
// res: scala.collection.immutable.Nil.type = List()

通过 :: 方法构建列表:

val list = 1 :: 2 :: 3 :: Nil
// list: List[Int] = List(1, 2, 3)
4 :: 5 :: list
// res: List[Int] = List(4, 5, 1, 2, 3)

或者通过List的apply方法构建列表:

List(1, 2, 3)
// res: List[Int] = List(1, 2, 3)

或者通过 ::: 方法合并两个列表:

List(1, 2, 3) ::: List(4, 5, 6)
// res: List[Int] = List(1, 2, 3, 4, 5, 6)

Working with Sequences

Map

val sequence = Seq(1, 2, 3)
// sequence: Seq[Int] = List(1, 2, 3)
sequence.map(elt => elt * 2)
// res: Seq[Int] = List(2, 4, 6)

sequence.map(_ * 2)
// res: Seq[Int] = List(2, 4, 6)

"dog".permutations
// res: Iterator[String] = non-empty iterator

"dog".permutations.toList
// res: List[String] = List(dog, dgo, odg, ogd, gdo, god)

 Seq("a", "wet", "dog").map(_.permutations.toList)
// res: Seq[List[String]] = List(List(a), List(wet, wte, ewt, etw, twe, tew), List(dog, dgo, odg, ogd, gdo, god))

FlatMap

Seq("a", "wet", "dog").flatMap(_.permutations.toList)
// res: Seq[String] = List(a, wet, wte, ewt, etw, twe, tew, dog, dgo, odg, ogd, gdo, god

Seq(1, 2, 3).flatMap(num => Seq(num, num * 10)) 
// res: List[Int] = List(1, 10, 2, 20, 3, 30)

import scala.collection.immutable.Vector
Vector(1, 2, 3).flatMap(num => Seq(num, num * 10))
// res: scala.collection.immutable.Vector[Int] = Vector(1, 10, 2, 20, 3, 30)

Folds

Foreach

For Comprehensions

Seq(1, 2, 3).map(_ * 2)
// res: Seq[Int] = List(2, 4, 6)

for {
    x <- Seq(1, 2, 3)
} yield x * 2
// res: Seq[Int] = List(2, 4, 6)

嵌套的for表达式:

val data = Seq(Seq(1), Seq(2, 3), Seq(4, 5, 6))
// data: Seq[Seq[Int]] = List(List(1), List(2, 3), List(4, 5, 6))
data.flatMap(_.map(_ * 2))
// res: Seq[Int] = List(2, 4, 6, 8, 10, 12)

for {
    subseq <- data 
    element <- subseq
} yield element * 2
// res: Seq[Int] = List(2, 4, 6, 8, 10, 12)

更多嵌套:

for {
    x <- a
    y <- b
    z <- c 
} yield e

a.flatMap(x => b.flatMap(y => c.map(z => e)))

Options

Option, Some, and None: Option是一个sealed trait,包含Some和None两个子类型.

sealed trait Option[+A] {
    def getOrElse(default: A): A
    def isEmpty: Boolean
    def isDefined: Boolean = !isEmpty
    // other methods...
}
final case class Some[A](x: A) extends Option[A] {
    def getOrElse(default: A) = x
    def isEmpty: Boolean = false
    // other methods...
}
final case object None extends Option[Nothing] {
    def getOrElse(default: A) = default
    def isEmpty: Boolean = true
    // other methods...
}

从Option中获取值:

readInt("abc").getOrElse(0) 
// res: Int = 0

或者使用模式匹配:

readInt("123") match {
  case Some(number) => number + 1 
  case None => 0
}
// res: Int = 124

Option可以看做是一个仅包含一个或零个元素的序列,支持大部分序列的操作:

sealed trait Option[+A] {
    def getOrElse(default: A): A
    def isEmpty: Boolean
    def isDefined: Boolean = !isEmpty
    def filter(func: A => Boolean): Option[A] 
    def find(func: A => Boolean): Option[A]
    def map[B](func: A => B): Option[B]
    def flatMap(func: A => Option[B]): Option[B] 
    def foreach(func: A => Unit): Unit
    def foldLeft[B](initial: B)(func: (B, A) => B): B
    def foldRight[B](initial: B)(func: (A, B) => B): B 
}

Options as Flow Control

因为Option支持map和flatMap操作,因此同样支持for表达式:

val optionA = readInt("123") 
val optionB = readInt("234")
for {
  a <- optionA 
  b <- optionB
} yield a + b

Monads

Monads中包含的是什么?

Monad的概念很难去解释,因为它太过于泛型.我们可以通过比较一些简单的monad类型来通过简单的认知进行理解.

一般来说,一个monad是一个泛型类型,通过一些抽象技术,允许我们以序列的方式进行计算.我们使用for表达式操作序列,只需要关系程序逻辑即可,隐藏在monad的map和flatMap方法中的代码为了们做了一切需要的准备,比如说:

Option是一个Monad,它允许我们按照序列的方式计算一个可能存在的值,而不用担心这个值是不是真的存在.
Seq是一个Monad,它允许我们按照序列的方式去计算然后返回多种可能的值,而不用担心有非常多可能的组合.
Future是另一个常用的Monad,它允许我们按照序列的方式去异步的进行计算,而不用担心他们是不是真的异步.

为了证明这一原则的普遍性,我们可以举一些例子.第一个例子是计算两个可能存在的值的和:

for {
    a <- getFirstNumber     // getFirstNumber returns Option[Int] 
    b <- getSecondNumber    // getSecondNumber returns Option[Int]
} yield a + b

// 最终的结果是一个Option[Int],如果a和b都存在时,才会对其执行加法操作

第二个例子是计算两个序列中的所有元素的组合对:

for {
    a <- getFirstNumbers // getFirstNumbers returns Seq[Int] 
    b <- getSecondNumbers // getSecondNumbers returns Seq[Int]
} yield a + b

// 最终的结果是一个Seq[Int],其中包含所有两个序列中元素的可能组合的和

第三个例子是异步的计算两个只能以异步方式获取的值的和:

for {
    a <- getFirstNumber     // getFirstNumber returns Future[Int] 
    b <- getSecondNumber    // getSecondNumber returns Future[Int]
} yield a + b
// 最终的结果是一个Future[Int],当两个值都能访问到时,进行加法操作

这里重要的一点是,如果我们忽略最后结果的类型,这三个例子其实是一样的.Monad让我们忘记可能存在的问题,比如可能为空的值,多个值,或者异步可见的值,而只需要关注我们关心的问题-求和.

For Comprehensions Redux

Filtering

一个经常遇到的问题是,只处理选择的那些元素:

for(x <- Seq(-2, -1, 0, 1, 2) if x > 0) yield x 
// res: Seq[Int] = List(1, 2)

for表达式中使用if时不再使用括号将条件扩起.

Parallel Iteration

另一个常用的方法是同时处理多个并行的集合,比如我们想要将两个集合中对应位置的元素进行相加操作:

for {
    x <- Seq(1, 2, 3) 
    y <- Seq(4, 5, 6)
} yield x + y

// res: Seq[Int] = List(5, 6, 7, 6, 7, 8, 7, 8, 9)

可以看到结果中有很多重复的值,因为这些会导致首先遍历第一个集合然后与第二个集合中的第一个值进行逐个相加,再次遍历与第二个值逐个相加,导致了结果的重复.

解决方法是使用zip方法,将两个集合合并成一个包含多个二元组的集合.

Seq(1, 2, 3).zip(Seq(4, 5, 6))
// res: Seq[(Int, Int)] = List((1,4), (2,5), (3,6))

然后计算出我们需要的值:

for(x <- Seq(1, 2, 3).zip(Seq(4, 5, 6))) yield { val (a, b) = x; a + b } 
// res: Seq[Int] = List(5, 7, 9)

或者,直接可以将集合的元素与其索引进行组合:

for(x <- Seq(1, 2, 3).zipWithIndex) yield x
// res: Seq[(Int, Int)] = List((1,0), (2,1), (3,2))

Pattern Matching

生成器的左侧并不进行命名限制,我们可以进行模式匹配然后只对匹配的值进行计算,这提供了另一种过滤结果的方法:

for(x <- Seq(1, 2, 3).zip(Seq(4, 5, 6))) yield { val (a, b) = x; a + b } 
// res: Seq[Int] = List(5, 7, 9)

for((a, b) <- Seq(1, 2, 3).zip(Seq(4, 5, 6))) yield a + b 
// res: Seq[Int] = List(5, 7, 9)

Intermediate Results

如果在序列生成器中经常需要用到中间结果,可以插入一个表达式:

for {
    x <- Seq(1, 2, 3) 
    square = x * x
    y <- Seq(4, 5, 6)
} yield square * y

// res: Seq[Int] = List(4, 5, 6, 16, 20, 24, 36, 45, 54)

Maps and Sets

Maps

不可变Map:

val example = Map("a" -> 1, "b" -> 2, "c" -> 3)
// res: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1, b -> 2, c -> 3)

"a" -> 1
// res: (java.lang.String, Int) = (a,1)

example("a") // The same as example.apply("a") 
// res: Int = 1
example.get("a")
// res: Option[Int] = Some(1)

example("d")
java.util.NoSuchElementException: key not found: d
example.get("d")
// res: Option[Int] = None

example.getOrElse("d", -1) 
// res: Int = -1

example.contains("a") 
// res: Boolean = true

example.size
// res: Int = 3

example.+("c" -> 10, "d" -> 11, "e" -> 12)
// res: scala.collection.immutable.Map[java.lang.String,Int] = Map(e -> 12, a -> 1, b -> 2, c -> 10, d -> 11)

example.-("b", "c")
// res: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1)

example + ("d" -> 4) - "c"
// res: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1, b -> 2, d -> 4)

可变Map:

val example2 = scala.collection.mutable.Map("x" -> 10, "y" -> 11, "z" -> 12) 
// example2: scala.collection.mutable.Map[java.lang.String,Int] = Map(x -> 10, z -> 12, y -> 11)

example2 += ("x" -> 20)
// res: example2.type = Map(x -> 20, z -> 12, y -> 11)

example2 -= ("y", "z")
// res: example2.type = Map(x -> 20)

example2("w") = 30
example2
// res: scala.collection.mutable.Map[java.lang.String,Int] = Map(x -> 20, w -> 30)

其他操作:

Map("a" -> 1) + ("b" -> 2) + ("c" -> 3) + ("d" -> 4) + ("e" -> 5)
// res: scala.collection.immutable.Map[java.lang.String,Int] = Map(e -> 5, a -> 1, b -> 2, c -> 3, d -> 4)

scala.collection.immutable.ListMap("a" -> 1) + ("b" -> 2) + ("c" -> 3) + ("d" -> 4) + ("e" -> 5)
// res: scala.collection.immutable.ListMap[java.lang.String,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4, e -> 5)

example.map(pair => pair._1 -> pair._2 * 2)
// res: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 2, b -> 4, c -> 6)

example.map(pair => pair._1 + " = " + pair._2)
// res: scala.collection.immutable.Iterable[java.lang.String] = List(a = 1, b = 2, c = 3)

example.flatMap {
    case (str, num) =>
           (1 to 3).map(x => (str + x) -> (num * x))
}
// res: scala.collection.immutable.Map[String,Int] =
Map(c3 -> 9, b2 -> 4, b3 -> 6, c2 -> 6, b1 -> 2, c1 -> 3, a3 -> 3, a1 -> 1, a2 -> 2)

for{
    (str, num) <- example
    x         <- 1 to 3
} yield (str + x) -> (num * x)
// res: scala.collection.immutable.Map[String,Int] =
Map(c3 -> 9, b2 -> 4, b3 -> 6, c2 -> 6, b1 -> 2, c1 -> 3, a3 -> 3, a1 -> 1, a2 -> 2)

for{
    (str, num) <- example
    x         <- 1 to 3
} yield (x + str) + "=" + (x * num)
// res: scala.collection.immutable.Iterable[java.lang.String] =
List(1a=1, 2a=2, 3a=3, 1b=2, 2b=4, 3b=6, 1c=3, 2c=6, 3c=9)

Zhange's notes