Scala Tutorials Part #2 - Type Inference & types in Scala
Originally Posted On : 18 Dec 2015 Last Updated : 18 Sep 2017
This is part 2 of the Scala tutorial series. Check here for the full series.
If you recall from the first part, we briefly dealt with type inference with a few examples. Now we are going to deal it with a little more depth, below is what this article will cover.
- What exactly is type inference from a programmer’s perspective
- Overview of a type system
- Language classification according to their type system
- Hindley-Milner(HM) type inference
- Local vs Global type inference - Why scala choose local type inference
- A brief overview of scala’s type system and sub-typing
- Value types and Reference types
- When to use type inference?
This is not something unique to Scala, many languages such as OCaml, Haskell, Rust, Swift, C#(starting with version 3.0) already have these.
Let’s start with the Wikipedia’s definition.
Type inference refers to the automatic deduction of the data type of an expression in a programming language
Well, that was obvious as we saw from the previous post that it automatically deduces types.
The primary purpose is to help the programmer avoiding verbose typing but still maintaining the compile time type safety of a statically typed language.
Type inference is the best of both worlds i.e static and dynamic typing.
Or at-least it tries to be. The reality is largely dependent upon where/how we use it.
A type system is a language component that is responsible for type checking.
Scala is a statically typed language, so there are always a defined set of types and anything that does not belong in that set is not regarded as a valid type and an appropriate error is thrown at compile time.
Why have any type system at all?
Because computers cannot match human stupidity, and certain things are better handled by the compiler rather than relying on people for getting it right. It also can prevent a ton of bugs that pop up due to improper types. A type system exists to give type safety, and the levels of strictness is what differentiates between different languages and run times.
This brings us to another question, the kind of type systems that exist and how we can classify different languages.
Quoting from this wikipedia page,
- Dynamic type checking
- Static type checking
- Inferred vs Manifest
- Nominal vs Structural
- Dependant typing
- Gradual typing
- Latent typing
- Sub-structural typing
- Uniqueness typing
- Strong and weak typing
That was a dizzying number of systems. I also highly encourage you take this course, to understand more. Videos are available for download/offline viewing.
Scala can be classified as a statically typed language with type inference. There is a strong relation between functional programming and type inference which we will keep re-visiting from time to time.
We can talk about type inference for days but perhaps the most famous algorithm of them all is HM type inference algorithm.
The HM algorithm checks the program to deduce what type it belongs to. If you have taken the courses above, then you would have a pretty solid idea what that means.
Below is an example of how a typical type system with type inference would work. It would build a parse tree consisting of all the elements, analyses the elements of what type it could be and prepare the final tree.
The above example is pseudo code and the syntax is not much of importance. It returns true if the sum is less than 10 and false if greater. We can translate/build up from this example to other complicated workflows.
Many algorithms work in almost the same manner. If there are any type errors such as multiplying two strings, it would throw an exception.
Some entry level haskell programming will really help to understand this better. Learn you a haskell is a good website to start with.
Hindley-Milner algorithm is also called as Global type inference. It reads the source code as a whole and deduces the types. Scala’s type system works a little different.
Scala’s follows a combination of sub-typing and local type inference. I tend to compare it with Haskell, since it one of the most famous Functional Programming(F.P) paradigm language out there.
Let’s understand with an example.
If we consider the below code, it gives a compile time error in Intellij.
Syntax details can be ignored for now (we can deal with lot more detail while learning methods). The program computes the factorial value based on the number passed in. If we notice the error, the compiler is not able to infer/deduce the type of the recursive function. The same(similar) code can be used in haskell without any errors.
The above code when executed inside the Haskell GHCI shell (kind of like Scala REPL) compiles with no errors.
This is a real world example of Global vs Local type inference. In Scala, we have to annotate the types wherever local type inference does not help (also see below on when to use type inference).
The correct version of the above code would be as follows. Notice the type
Int is explicitly mentioned.
For a language that is multi-paradigm, it is really hard to do global/hindley-milner style type inference since it restricts doing OOP features such as inheritance and method overloading. We are not going to in detail of why languages such as Haskell cannot do such things (there are lots of resources on the net if you are curious on systems programming/compiler hacking), but the point is Scala has made a different trade-off.
Systems do exist which combine these together(experimental) and continuous research is being done in this area to improve them.
A type system is made of predefined components/types and this forms the foundation of how they are inferred.
The picture says it all, you can try to dig into the source code by the usual intellij route of ctrl+click and it all points to the
Please note that types are not regular classes, although they seem to be. We will deal with it in a future article in detail.
Sub-typing is something that is not supported by the Hindley-Milner algorithm, but is essential in a multi-paradigm world. This is also another reason why Scala does not use the HM algorithm.
Let’s look at the below example to understand sub-typing.
We are constructing a heterogeneous list where sub-typing converts the lower type into a higher type wherever necessary.
A simple example would converting a Int to a Double which is the first example. If it cannot be fit, it goes to the top level i.e the
All of this conversion can be translated to the type system hierarchy above.
This makes Objected oriented programming much easier to handle. For more information you can visit the Scala docs for type systems.
The left sub-tree in the above tree contains all the value types i.e everything that comes under
AnyVal and types that come under
AnyRef are all reference types. They are similar to their java counterparts and compiles to the same thing as far
as the JVM is concerned (more on that in later tutorials).
Value types are similar to native types in java. They are created as follows.
While reference types need to have the
Of course there are some exceptions to this.
String is a special one. Collection classes such as
List have their own
so that they can be created without the new keyword (
apply is explained in part 15). Technically,
they are objects in the jvm as opposed to native types, so they require the
new keyword for their creation.
There is a fine line that divides dynamic typing (no types) and static typing with type inference. As they say “all code should look like well written prose”, it is important to know when to use them and when not to.
When to use them?
When it saves programmer time and also where type information does not really matter. Situations could be inside of a function or a loop where the information about types is obvious.
When not to use them?
Simple, when type information is important i.e it should not leave the programmer who reads to code guessing about types.
It is hard to give a code example since it really depends on application under consideration. The only fool-proof way to deal with this is to conduct code reviews with peer programmers and see if they can understand them.
After all writing code is O(K) and reading code would be O(N), where K would be a constant with not much variation, since only a single person would be writing it and N would be the size of your team trying to read the code. It multiplies with team size.
With guessing comes mistakes, with mistakes come bad code, and with bad code comes frustration, with frustration comes the axe murderer
It is a matter of code readability rather than anything else. With freedom comes responsibility
Congratulations !! If you have understood/reached this far, then you should be proud of yourself. Rather than saying this is a pretty difficult topic, I would say it is a very non-intuitive one to get your head around.
Stay tuned !! This is just the beginning.