Scala Tutorials Part #2 - Type Inference & types in Scala


Type inference

This is part 2 of the Scala tutorial series. Check here for the full series.

If you recall from the first part, we briefly dealt with type inference with a few examples. Now we are going to deal it with a little more depth, below is what this article will cover.

: This article has been translated to Chinese by ChanZong Huang, you can check it out here.

Index

What exactly is type inference from a programmer's perspective

This is not something unique to Scala, many languages such as OCaml, Haskell, Rust, Swift, C#(starting with version 3.0) already have these.

Let’s start with the Wikipedia’s definition.

Type inference refers to the automatic deduction of the data type of an expression in a programming language

Well, that was obvious as we saw from the previous post that it automatically deduces types.

The primary purpose is to help the programmer avoiding verbose typing but still maintaining the compile time type safety of a statically typed language.

Type inference is the best of both worlds i.e static and dynamic typing.

Or at-least it tries to be. The reality is largely dependent upon where/how we use it.

Overview of a type system

A type system is a language component that is responsible for type checking.
Scala is a statically typed language, so there are always a defined set of types and anything that does not belong in that set is not regarded as a valid type and an appropriate error is thrown at compile time.

Why have any type system at all?

Because computers cannot match human stupidity, and certain things are better handled by the compiler rather than relying on people for getting it right. It also can prevent a ton of bugs that pop up due to improper types. A type system exists to give type safety, and the levels of strictness is what differentiates between different languages and run times.

This brings us to another question, the kind of type systems that exist and how we can classify different languages.

Language classification according to their type system

Quoting from this wikipedia page,

That was a dizzying number of systems. I also highly encourage you take this course, to understand more. Videos are available for download/offline viewing.

This might also come up on Coursera as well, so make sure you add it to your watchlist. There is also another course which is really good.

Scala can be classified as a statically typed language with type inference. There is a strong relation between functional programming and type inference which we will keep re-visiting from time to time.

Hindley-Milner(HM) type inference

We can talk about type inference for days but perhaps the most famous algorithm of them all is HM type inference algorithm.

The HM algorithm checks the program to deduce what type it belongs to. If you have taken the courses above, then you would have a pretty solid idea what that means.

Below is an example of how a typical type system with type inference would work. It would build a parse tree consisting of all the elements, analyses the elements of what type it could be and prepare the final tree.

Scala type system


The above example is pseudo code and the syntax is not much of importance. It returns true if the sum is less than 10 and false if greater. We can translate/build up from this example to other complicated workflows.

Many algorithms work in almost the same manner. If there are any type errors such as multiplying two strings, it would throw an exception.

Some entry level haskell programming will really help to understand this better. Learn you a haskell is a good website to start with.

Hindley-Milner algorithm is also called as Global type inference. It reads the source code as a whole and deduces the types. Scala’s type system works a little different.

Local vs Global type inference and sub-typing - Why scala choose local type inference

Scala’s follows a combination of sub-typing and local type inference. I tend to compare it with Haskell, since it one of the most famous Functional Programming(F.P) paradigm language out there.

Let’s understand with an example.

If we consider the below code, it gives a compile time error in Intellij.

def factorial(a: Int) = {
    if (a <= 1) 1 else a * factorial(a - 1)
}

Error message

Scala type error

Syntax details can be ignored for now (we can deal with lot more detail while learning methods). The program computes the factorial value based on the number passed in. If we notice the error, the compiler is not able to infer/deduce the type of the recursive function. The same(similar) code can be used in haskell without any errors.

let factorial 0 = 1; factorial n = n * factorial (n - 1)

The above code when executed inside the Haskell GHCI shell (kind of like Scala REPL), it compiles with no errors.

Haskell Global type inference

This is a real world example of Global vs Local type inference. In Scala, we have to annotate the types wherever local type inference does not help (also see below on when to use type inference).

The correct version of the above code would be as follows. Notice the type Int is explicitly mentioned.

 def factorial(a:Int): Int = {
    if(a <=1) 1 else a * factorial(a-1)
  }
  

For a language that is multi-paradigm, it is really hard to do global/hindley-milner style type inference since it restricts doing OOP features such as inheritance and method overloading. We are not going to in detail of why languages such as Haskell cannot do such things (there are lots of resources on the net if you are curious on systems programming/compiler hacking), but the point is Scala has made a different trade-off.

Systems do exist which combine these together(experimental) and continuous research is being done in this area to improve them.

A brief overview of scala's type system and subtyping

A type system is made of predefined components/types and this forms the foundation of how they are inferred.

Scala type system

The picture says it all, you can try to dig into the source code by the usual intellij route of ctrl+click and it all points to the Any class. Please note that types are not regular classes, although they seem to be. We will deal with it in a future article in detail.

Sub-typing is something that is not supported by the Hindley-Milner algorithm, but is essential in a multi-paradigm world. This is also another reason why Scala does not use the HM algorithm.

Let’s look at the below example to understand sub-typing.

Scala sub-typing

We are constructing a heterogeneous list where sub-typing converts the lower type into a higher type wherever necessary. A simple example would converting an Int to a Double which is the second example. If it cannot be fit, it goes to the top level i.e the Any type. All of this conversion can be translated to the type system hierarchy above.

This makes Objected oriented programming much easier to handle. For more information you can visit the Scala docs for type systems.

Value types and Reference types

The left sub-tree in the above tree contains all the value types i.e everything that comes under AnyVal and the right contains the types that come under AnyRef which are all reference types. They are similar to their java counterparts and compiles to the same thing as far as the JVM is concerned (more on that in later tutorials).

Value types are similar to native types in java. They are created as follows.

val x : Int = 3

While reference types need to have the new keyword.

val arr = new ArrayBuffer[Int]()

Of course there are some exceptions to this. String is a special one. Collection classes such as Array and List have their own apply method so that they can be created without the new keyword (apply is explained in part 15). Technically, they are objects in the jvm as opposed to native types, so they require the new keyword for their creation.

When to use type inference?

There is a fine line that divides dynamic typing (no types) and static typing with type inference. As they say “all code should look like well written prose”, it is important to know when to use them and when not to.

When to use them?

When it saves programmer time and also where type information does not really matter. Situations could be inside of a function or a loop where the information about types is obvious.

When not to use them?

When type information is important i.e it should not leave the programmer who reads to code guessing about types.

It is hard to give a code example since it really depends on application under consideration. The only fool-proof way to deal with this is to conduct code reviews with peer programmers and see if they can understand them.

After all writing code is O(K) and reading code would be O(N), where K would be a constant with not much variation, since only a single person would be writing it and N would be the size of your team trying to read the code. It multiplies with team size.

With guessing comes mistakes, with mistakes come bad code, and with bad code comes frustration, with frustration comes the axe murderer

It is a matter of code readability rather than anything else. With freedom comes responsibility

Congratulations !! If you have understood/reached this far, then you should be proud of yourself. Rather than saying this is a pretty difficult topic, I would say it is a very non-intuitive one to get your head around.

Stay tuned !!

References


Tagged Under


Scala


Search this website...




Keeping up with blogs...

I blog occasionally and mostly write about Software engineering and the likes. If you are interested in keeping up with new blog posts, you should follow me on twitter where I usually tweet when I publish them. You can also use the RSS feed , or even subscribe via email below.

Feedio Subscribe


Share & Like

If you like this post, then you can either share/discuss/vote up on the below sites.



Thoughts ...

Please feel free to share your comments below for discussion

Blog comments powered by Disqus