Simple Scala: For Comprehensions in Depth

Recap: The Elements of for Comprehensions

for表达式包含一个或多个生成器表达式,添加可选的用于过滤的守卫表达式,或者一个副作用的代码块比如println方法.下面的例子用于移除文本中的空格:

object RemoveBlanks {
    /**
     * Remove blank lines from the specified input file.
     */  
    def apply(path: String, compressWhiteSpace: Boolean = false): Seq[String] =
        for {
            line <- scala.io.Source.fromFile(path).getLines.toSeq               // 1
            if line.matches("""^\s*$""") == false                               // 2
            line2 = if (compressWhiteSpace) line replaceAll ("\\s+", " ")       // 3
                    else line
        } yield line2                                                           // 4

    /**
       * Remove blank lines from the specified input files and echo the remaining
       * lines to standard output, one after the other.
       * @param args list of file paths. Prefix each with an optional "-" to
       *             compress remaining whitespace in the file.
       */
    def main(args: Array[String]) = for {
        path2 <- args                                                           // 5
        (compress, path) = if (path2 startsWith "-") (true, path2.substring(1))
                           else (false, path2)                                  // 6
        line <- apply(path, compress)
    } println(line)                                                             // 7
}
  1. 使用scala.io.Source打开一个文件并获取每一行,getLines方法返回一个 scala.collection.Iterator,但是必须转换成一个序列,因为for表达式中起始的生成器决定了返回值类型,而在for表达式中不能返回一个生成器.
  2. 使用正则表达式过滤掉空行
  3. 如果需要压缩空格则进行空格压缩,否则原封不动的返回line
  4. 使用yield返回line,因此for表达式构造了一个Seq[String]
  5. main方法同样用了一个for表达式来处理参数列表,每个参数当做一个文件路径进行处理
  6. 如果一个路径以 “-“ 开头,则需要压缩空格,否则只处理空行
  7. 打印已处理的line

for Comprehensions: Under the Hood

filter 与 withFilter 的区别是,withFilter并不会构造自己的输出集合.为了更好的效率,它和其他方法一起组合进行过滤,从而减少一次新集合的构造.

例子:

val states = List("Alabama", "Alaska", "Virginia", "Wyoming")
for {
s <- states
} println(s) 
// Results: 
// Alabama 
// Alaska
// Virginia
// Wyoming

states foreach println
// Results the same as before.

或者使用yield:

val states = List("Alabama", "Alaska", "Virginia", "Wyoming")
for {
    s <- states
} yield s.toUpperCase
// Results: List(ALABAMA, ALASKA, VIRGINIA, WYOMING)
states map (_.toUpperCase)
// Results: List(ALABAMA, ALASKA, VIRGINIA, WYOMING)

使用yield时会构建一个新的容器,第一个生成器的类型决定了返回值类型.

如果有多个生成器:

for {
    s <- states c<-s
} yield s"$c-${c.toUpper}"
// Results: List("A-A", "l-L", "a-A", "b-B", ...)

states flatMap (_.toSeq map (c => s"$c-${c.toUpper}")) 
// Results: List("A-A", "l-L", "a-A", "b-B", ...)

第二个生成器会对 s 中的每个字符进行迭代,最后的yield语句返回每个字符的大写组合.

如果有多个生成器,把除了最后一个都转换为一个flatMap,最后一个转换成map调用,同时再次生成一个新的List.或者换成其他的类型:

val states = List("Alabama", "Alaska", "Virginia", "Wyoming")

for {
    s <- states c<-s
    if c.isLower
} yield s"$c-${c.toUpper} "
// Results: List("l-L", "a-A", "b-B", ...)

states flatMap (_.toSeq withFilter (_.isLower) map (c => s"$c-${c.toUpper}")) 
// Results: List("l-L", "a-A", "b-B", ...)

注意withFilter处在最后的map之前.或者更复杂一点:

val states = List("Alabama", "Alaska", "Virginia", "Wyoming")

for {
    s <- states
    c<-s
    if c.isLower
    c2 = s"$c-${c.toUpper} "
} yield c2
// Results: List("l-L", "a-A", "b-B", ...)

states flatMap (_.toSeq withFilter (_.isLower) map { c => 
    val c2 = s"$c-${c.toUpper} "
    c2
})
// Results: List("l-L", "a-A", "b-B", ...)

for表达式的转换规则

在一个生成器表达式中, pat <- expr, pat其实是一个模式匹配表达式,比如 (x, y) <- List((1,2),(3,4)),同样一个值被定义为 pat2 = expr,pat2同样被解释为一个模式.

转换中第一步是将 pat <- expr 转换为下面的样子:

// pat <- expr
pat <- expr.withFilter { case pat => true; case _ => false }

一个for表达式带有一个生成器和一个yield语句:

// for ( pat <- expr1 ) yield expr2
expr map { case pat => expr2 }

一个for循环,没有使用yield但是提供一个副作用表达式:

// for ( pat <- expr1 ) expr2
expr foreach { case pat => expr2 }

带有多个生成器的for表达式:

// for ( pat1 <- expr1; pat2 <- expr2; ... ) yield exprN
expr1 flatMap { case pat1 => for (pat2 <- expr2 ...) yield exprN }

注意嵌套的生成器被转换成嵌套的for表达式. 下一次转换的规则是他们转换成方法调用.

带有多个生成器的for循环:

// for ( pat1 <- expr1; pat2 <- expr2; ... ) exprN
expr1 foreach { case pat1 => for (pat2 <- expr2 ...) yield exprN }

一个带有守卫的生成器可以这样转换:

// pat1 <- expr1 if guard
pat1 <- expr1 withFilter ((arg1, arg2, ...) => guard)

后跟一个值定义的生成器有一个比较复杂的转换:

// pat1 <- expr1; pat2 = expr2
(pat1, pat2) <- for {           // 1
    x1 @ pat1 <- expr1          // 2
} yield {
    val x2 @ pat2 = expr2       // 3
    (x1, x2)                    // 4
}
  1. 返回一个模式对(pattern pair)
  2. x1 @ pat1 表示,在整个表达式中pat1匹配的值赋给x1
  3. 将pat2的值赋给 x2
  4. 返回一个tuple

一个 x @ pat = expr 的例子:

scala>valz@(x,y)=(1->2) 
z: (Int, Int) = (1,2) 
x:Int=1
y:Int=2

让我们看一个概念性的例子:

val map = Map("one" -> 1, "two" -> 2)

val list1 = for {
    (key, value) <- map // How is this line and the next translated? 
    i10 = value+10
} yield (i10)
// Result: list1: scala.collection.immutable.Iterable[Int] = List(11, 12)

// Translation:
val list2 = for { 
    (i, i10) <- for {
        x1 @ (key, value) <- map 
    } yield {
        val x2 @ i10= value+10
        (x1, x2) 
    }
} yield (i10)
// Result: list2: scala.collection.immutable.Iterable[Int] = List(11, 12)

Options and Other Container Types

Option as a Container

Option是一个二进制容器,有值或为空,实现了四种基本的方法.

sealed abstract class Option[+T] { self =>                                      // 1
    ...
    def isEmpty: Boolean // Implemented by Some and None. 
    final def foreach[U](f: A => U): Unit =
        if (!isEmpty) f(this.get)

    final def map[B](f: A => B): Option[B] =
        if (isEmpty) None else Some(f(this.get))

    final def flatMap[B](f: A => Option[B]): Option[B] =
        if (isEmpty) None else f(this.get)

    final def filter(p: A => Boolean): Option[A] =
        if (isEmpty || p(this.get)) this else None

    final def withFilter(p: A => Boolean): WithFilter = new WithFilter(p)
      /** We need a whole WithFilter class to honor the "doesn't create a new
       *  collection" contract even though it seems unlikely to matter much in a
       *  collection with max size 1.
       */
    class WithFilter(p: A => Boolean) {
        def map[B](f: A => B): Option[B] = self filter p map f                  // 2
        def flatMap[B](f: A => Option[B]): Option[B] = self filter p flatMap f 
        def foreach[U](f: A => U): Unit = self filter p foreach f
        def withFilter(q: A => Boolean): WithFilter =
            new WithFilter(x => p(x) && q(x)) 
    }
}
  1. 将当前的Option实例起一个别名,便于在内部类WithFilter中使用.
  2. 使用self引用外部定义的方法,指的是Option实例,如果使用this则是指WithFilter实例

下面的例子中我们有一个三元素的列表,每个都是Option[Int]类型:

val results: Seq[Option[Int]] = Vector(Some(10), None, Some(20))
val results2 = for { 
    Some(i) <- results
} yield(2*i)
// Returns: Seq[Int] = Vector(20, 40)

Some(i) <- list 对result中的元素进行模式匹配,移除了None值,并且在Some中解析整数值,最后在yield中返回.

然我们按照之前的转换规则对上面的代码进行转换,首先是将 pat <- expr 转换为 withFilter:

// Translation step #1
val results2b = for {
    Some(i) <- results withFilter {
        case Some(i) => true
        case None => false 
    }
} yield(2*i)
// Returns: results2b: List[Int] = List(20, 40)

然后把外部的 for { x <- y} yield (z) 转换成一个map:

// Translation step #2
val results2c = results withFilter { 
    case Some(i) => true
    case None => false
} map {
    case Some(i) => (2 * i)
}
// Returns: results2c: List[Int] = List(20, 40)

map表达式会产生一个编译警告:

<console>:9: warning: match may not be exhaustive. 
It would fail on the following input: None
    } map { 
          ^

如果 map 中没有对应的case None => …将会非常危险,如果出现了None值,将会抛出 MatchError异常.不过这里的 withFilter 已经移除了所有None值,这里不会发生错误.