Simple Scala: Pattern Matching

简介

咋一看,模式匹配表达式像是类C语言中的case语句,在典型的类C语言中的case语句局限于针对序列号类型值的匹配和触发琐碎的表达式的匹配.
而scala中的模式匹配,匹配值可以包括基本类型,通配符,序列,正则表达式等,甚至深度检查对象的状态.

简单匹配

匹配简单的布尔值:

val bools = Seq(true, false)
for (bool <- bools) { 
    bool match {
        case true => println("Got heads")
        case false => println("Got tails") 
    }
}

这很像C风格的case语句.这时注释掉第二个case,再次运行程序时编译器会给出警告,因为编译器直到在序列中有两种case,true和false,因此发出case不全面的警告,如果我们试图匹配一个没有对应case语句的值时,会抛出MatchError异常.

值,变量,类型

下面举例匹配特定的值和特定类型,以及一个通配方法:

for {
    x <- Seq(1, 2, 2.7, "one", "two", 'four)                        //遍历序列中的值
} {
    val str=x match{                                                //此时x的类型为Any
        case 1          => "int 1"                                  //匹配一个整数1 
        case i: Int     => "other int: "+i                          //匹配任意整数
        case d: Double  => "a double: "+x                           //匹配任意Double
        case "one"      => "string one"                             //匹配字符串"one"
        case s: String  => "other string: "+s                       //匹配任意字符串
        case unexpected => "unexpected value: " + unexpected        //匹配所有上面未能匹配到的值
    }
    println(str)                                                    //整个模式匹配的结果赋值给变量str
}

下面是一个升级版:

for {
    x <- Seq(1, 2, 2.7, "one", "two", 'four)
}{ val str=x match{
    case 1          => "int 1"
    case _: Int     => "other int: "+x
    case _: Double  => "a double: "+x
    case "one"      => "string one"
    case _: String  => "other string: "+x
    case _          => "unexpected value: " + x
    }
    println(str)
}

本例中将一些变量名替换为字符”_”,因为此处并不关心变量名,而只需要该变量的类型即可.

如果我们需要去匹配一个给定的值,实例:

def checkY(y: Int) = { 
    for {
        x <- Seq(99, 100, 101) 
    }{
        val str=x match{
            case y => "found y!" 
            case i: Int => "int: "+i
        }
        println(str)
    }
}
checkY(100)

这里我们定义了一个方法,接受一个整形参数,然后对该参数进行匹配,如果在固定的序列中匹配到了传入的参数,则打印”found y!”,但是调用checkY并不会出现预期的结果.
在match表达式中的变量”y”真正的意义是:匹配所有类型值然后把这个值赋给名为y的新变量,此处的y并不是传入参数变量y的引用,因此实质上我们只是编写了一个名称为y的通配符,当然不能达到预期的目标.
要解决这个问题,只需要将模式匹配中的变量与外部的常量进行区分即可,使用`y`则说明要匹配的时y的值而不是一个通配模式(这个”`”是ESC键下面的键):

def checkY(y: Int) = { 
    for {
        x <- Seq(99, 100, 101) 
    }{
        val str=x match{
            case `y` => "found y!" 
            case i: Int => "int: "+i
        }
        println(str)
    }
}
checkY(100)

或者有时需要同样的处理方式来处理多种类型的值,case中支持”or”的语法”|”:

for {
    x <- Seq(1, 2, 2.7, "one", "two", 'four)
}{ val str=x match{
    case _: Int | _: Double => "a number: "+x       // 同时处理Int和Double类型.
    case "one" 
    case _: String case _
  }
  println(str)
}

匹配序列

Seq是所有可迭代集合的父类,比如List或Vector,以下举例说明如何匹配不同类型的序列:

val nonEmptySeq =Seq(1,2,3,4,5)                                 // 非空序列
val emptySeq = Seq.empty[Int]                                   // 空序列
val nonEmptyList = List(1, 2, 3, 4, 5)                          // 非空List
val emptyList = Nil                                             // 空List
val nonEmptyVector = Vector(1, 2, 3, 4, 5)                      // 非空Vector
val emptyVector = Vector.empty[Int]                             // 空Vector
val nonEmptyMap = Map("one" -> 1, "two" -> 2, "three" -> 3)     // 非空Map
val emptyMap = Map.empty[String,Int]                            // 空Map
def seqToString[T](seq: Seq[T]): String = seq match {           // 
    case head +: tail => s"$head +: " + seqToString(tail)       // 匹配任意非空序列,非空序列都有head和tail方法
    case Nil => "Nil"                                           // 匹配空序列
}
for (seq <- Seq( 
        nonEmptySeq, emptySeq, nonEmptyList, emptyList,
        nonEmptyVector, emptyVector, nonEmptyMap.toSeq, emptyMap.toSeq)) {
    println(seqToString(seq))
}

匹配元祖

元祖的模式匹配也很简单:

val langs = Seq(
    ("Scala", "Martin","Odersky"), 
    ("Clojure","Rich", "Hickey"), 
    ("Lisp", "John", "McCarthy"))

for (tuple <- langs) { 
    tuple match {
        case ("Scala", _, _) => println("Found Scala")              // 解包元祖,匹配第一个值为"Scala"的元祖
        case (lang, first, last) =>                                 // 匹配所有其他包含三个元素的元祖并解包元素
          println(s"Found other language: $lang ($first, $last)")
      }
}

匹配中的守卫

每个匹配项都可以使用多个守卫:

for (i <- Seq(1,2,3,4)) { 
    i match {
        case _ if i%2 == 0 => println(s"even: $i")  
        case _ => println(s"odd: $i") 
    }
}

匹配样本类

让我们看一些”深匹配”的实例:

case class Address(street:String, city:String, country:String)
case class Person(name:String, age:Int, address:Address)

val alice = Person("Alice", 25, Address("1 Scala Lane", "Chicago", "USA")) 
val bob = Person("Bob", 29, Address("2 Java Ave.", "Miami", "USA")) 
val charlie = Person("Charlie", 32, Address("3 Python Ct.", "Boston", "USA"))

for (person <- Seq(alice, bob, charlie)) { 
    person match {
        case Person("Alice", 25, Address(_, "Chicago", _) => println("Hi Alice!")   // 解包匹配嵌套类型
        case Person("Bob", 29, Address("2 Java Ave.", "Miami", "USA")) => println("Hi Bob!")
        case Person(name, age, _) =>
            println(s"Who are you, $age year-old person named $name?")
    }
}

unapply方法

除了scala本身的标准库类型,自己定义的样本类包括多重嵌套类型,同样可以匹配和解析.
每个样本类都有一个伴生对象,其拥有一个用于构造的apply方法,因此我们推断其肯定拥有另一种叫做unapply的方法来用于析构或解析.当遇到这种匹配表达式时的确有一种提取方法:

person match {
    case Person("Alice", 25, Address(_, "Chicago", _)) => ...
    ...
}

scala会寻找Person.unapply(..)和Address.unapply(..)然后进行调用,所有的unapply方法返回一个Option[TupleN[…]]对象,其中的N表示从对象中解析值的个数,此处的Person有3个值,同时每个值的类型也会被提取,这里是String,Int,Address.因此Person的伴生对象在编译器中生成类似于这样的方法:

object Person {
    def apply(name: String, age: Int, address: Address) =
        new Person(name, age, address)
    def unapply(p: Person): Option[Tuple3[String,Int,Address]] =
        Some((p.name, p.age, p.address)) ...
}

匹配变量参数列表

object Op extends Enumeration {         //定义一个枚举实例
    type Op = Value
    val EQ =Value("=") 
    val NE =Value("!=") 
    val LTGT = Value("<>") 
    val LT =Value("<") 
    val LE =Value("<=") 
    val GT =Value(">") 
    val GE =Value(">=")
}
import Op._

// Represent a SQL "WHERE x op value" clause, where +op+ is a
// comparison operator: =, !=, <>, <, <=, >, or >=.
case class WhereOp[T](columnName: String, op: Op, value: T)

// Represent a SQL "WHERE x IN (a, b, c, ...)" clause.
case class WhereIn[T](columnName: String, val1: T, vals: T*)

val wheres = Seq( // 
    WhereIn("state", "IL", "CA", "VA"),
    WhereOp("state", EQ, "IL"),
    WhereOp("name", EQ, "Buck Trends"),
    WhereOp("age", GT, 29)
)

for (where <- wheres) { 
    where match {
        case WhereIn(col, val1, vals @ _*) => // 
            val valStr = (val1 +: vals).mkString(", ")
            println (s"WHERE $col IN ($valStr)")
        case WhereOp(col, op, value) => println (s"WHERE $col $op $value")
        case _ => println (s"ERROR: Unknown expression: $where") 
    }
}

匹配正则表达式

val BookExtractorRE = """Book: title=([^,]+),\s+author=(.+)""".r
val MagazineExtractorRE = """Magazine: title=([^,]+),\s+issue=(.+)""".r

val catalog = Seq(
    "Book: title=Programming Scala Second Edition, author=Dean Wampler", 
    "Magazine: title=The New Yorker, issue=January 2014",
    "Unknown: text=Who put this here??"
)
for (item <- catalog) { 
    item match {
        case BookExtractorRE(title, author) =>  println(s"""Book "$title", written by $author""")
        case MagazineExtractorRE(title, issue) => println(s"""Magazine "title", issue $issue""")
        case entry => println(s"Unrecognized entry: $entry") }
}

样本类中的变量绑定

如果想匹配一个样本类,但同时又想把这个样本类对象作为一个单独的变量:

case class Address(street: String, city: String, country: String) 
case class Person(name: String, age: Int, address: Address)

valalice =Person("Alice", 25, Address("1 Scala Lane", "Chicago", "USA")) 
val bob = Person("Bob", 29, Address("2 Java Ave.", "Miami", "USA")) 
val charlie = Person("Charlie", 32, Address("3 Python Ct.", "Boston", "USA"))

for (person <- Seq(alice, bob, charlie)) { 
    person match {
        case p @ Person("Alice", 25, address) => println(s"Hi Alice! $p") 
        case p @ Person("Bob", 29, a @ Address(street, city, country)) =>
            println(s"Hi ${p.name}! age ${p.age}, in ${a.city}") 
        case p @ Person(name, age, _) =>
            println(s"Who are you, $age year-old person named $name? $p")
      }
}

case语句中的@符号,将符号后面的整个对象最为一个值赋给符号前的变量名,方便后续进行引用.

密闭结构和全面匹配

sealed abstract class HttpMethod() { 
    def body: String 
    def bodyLength = body.length
}

case class Connect(body: String) extends HttpMethod // 
case class Delete(body: String) extends HttpMethod
case class Get(body: String) extends HttpMethod
case class Head(body:String)extendsHttpMethod
case class Options(body: String) extends HttpMethod 
case class Post(body:String) extends HttpMethod 
case class Put(body: String) extends HttpMethod 
case class Trace(body: String) extends HttpMethod

def handle (method: HttpMethod) = method match { // 
    case Connect (body) => s"connect: (length: ${method.bodyLength}) $body" 
    case Delete (body) => s"delete: (length: ${method.bodyLength}) $body" 
    case Get (body) => s"get: (length: ${method.bodyLength}) $body" 
    case Head (body) => s"head: (length: ${method.bodyLength}) $body" 
    case Options (body) => s"options: (length: ${method.bodyLength}) $body"
    case Post(body) => s"post:(length: ${method.bodyLength}) $body"
    case Put(body) => s"put:(length: ${method.bodyLength}) $body"
    case Trace(body) => s"trace:(length: ${method.bodyLength}) $body"
}

val methods = Seq( 
    Connect("connect body..."),
    Delete ("delete body..."), 
    Get ("get body..."), 
    Head ("head body..."), 
    Options("options body..."), 
    Post ("post body..."), 
    Put ("put body..."), 
    Trace ("trace body...")
)

methods foreach (method => println(handle(method)))

更多关于类型匹配

下面的例子中我们试图识别List[Double]和List[String]:

for {
    x <- Seq(List(5.6,5.6,5.6), List("a","b"))
} yield (x match{
    case seqd:Seq[Double] => ("seq double", seqd)
    case seqs:Seq[String] => ("seq string", seqs)
    case _                => ("unknow", x)
})

<console>:12: warning: non-variable type argument Double in type pattern 
Seq[Double] (the underlying of Seq[Double]) is unchecked since it is eliminated by erasure
    case seqd: Seq[Double] => ("seq double", seqd) 
             ^
<console>:13: warning: non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure
    case seqs: Seq[String] => ("seq string", seqs)
             ^
<console>:13: warning: unreachable code
    case seqs: Seq[String] => ("seq string", seqs)
             ^
res0: List[(String, List[Any])] = 
    List((seq double,List(5.5, 5.6, 5.7)),(seq double,List(a, b)))

上面的这些警告是什么意思呢? 编译器要告诉我们的是,当它检查到提供的对象是一个List时,它不能够在运行时判断这是一个List[Double]或List[String],并且提醒第二个分支中的 case seqs:Seq[String] 甚至无法触及,因为第一个分支会匹配到所有List,并且观察最后输出的结果,两个都是 “list double”.

比较不爽的是,有效的方案首先需要匹配这个集合,然后使用一个嵌套的模式匹配来匹配该集合的第一个元素以坚持类型:

def doSeqMatch[T](seq: Seq[T]): String = seq match { 
    case Nil => "Nothing"
    case head +: _ => head match {
        case _ : Double => "Double"
        case _ : String => "String"
        case _ => "Unmatched seq element"
    } 
}
for {
    x <- Seq(List(5.5,5.6,5.7), List("a", "b"), Nil)
} yield { x match {
    case seq: Seq[_] => (s"seq ${doSeqMatch(seq)}", seq)
    case _ => ("unknown!", x) }
}

这种方案能够正确的求得结果.