Scala Tutorials Part #2 - Type Inference & types in Scala
Originally Posted On : 18 Dec 2015 Last Updated : 04 Jan 2017
This is part 2 of the scala tutorial series. Check here for the full series.
If you recall from the first part, we briefly dealt with type inference with a few examples. Now we are going to deal it with a little more depth, below is what this article will cover.
Note: This is a pretty dry topic,mostly theory based, but also important. I have tried to cover the essential ones in the most minimalistic way possible. Many materials have been given for leisure study.
- What exactly is type inference from a programmer’s perspective
- An Overview of a type system
- Language classification according to their type system
- Hindley-Milner(HM) type inference
- Local vs Global type inference - Why scala choose local type inference
- A brief overview of scala’s type system and sub-typing
- When to use type inference?
This is not something unique to scala. Many languages such as OCaml, Haskell, Rust, Swift, C#(starting with version 3.0) already have these.
Let’s start with the wikipedia’s definition.
Type inference refers to the automatic deduction of the data type of an expression in a programming language
Well, that was obvious as we saw from the previous post that it automatically deduces types.
The primary purpose is to help the programmer avoiding verbose typing but still maintaining the compile time type safety of a statically typed language.
Saying in a succinct manner, type inference is the best of both world’s i.e static and dynamic typing.
Or at-least it tries to be. The reality is largely dependant upon where/how we use it.
A type system is a language component that is responsible for type checking.
Scala is a statically typed language, so there are always a defined set of types(, and anything that does not belong in that set is not regarded as a valid type and an error is thrown at compile time.
Why have any type system at all?
Because computers cannot match human stupidity, and certain things are better handled by the compiler rather than relying on people for getting it right. It also can prevent a ton of bugs that pop up due to improper types. A type system exists to give type safety, and the levels of strictness is what differentiates between different languages and runtimes.
This brings us to another question, what kind of type systems exist and based upon which we can classify different languages.
Quoting from this wikipedia page,
- Dynamic type checking
- Static type checking
- Inferred vs Manifest
- Nominal vs Structural
- Dependant typing
- Gradual typing
- Latent typing
- Sub-structural typing
- Uniqueness typing
- Strong and weak typing
That was a dizzying number of systems. I also highly encourage you take this course, to understand more. Videos are available for download/offline viewing.
This might also come up on coursera as well, so make sure you add it to your watchlist.
There is also another course which is really good.
Scala as mentioned above can be classified as a statically typed language with type inference. There is a strong relation between functional programming and type inference which we will keep re-visiting from time to time.
We can talk about type inference for days but perhaps the most famous algorithm of them all is HM type inference algorithm.
The HM algorithm checks the program to deduce what type it belongs to. If you have taken the courses above, then you would have a pretty solid idea what that means.
Below is an example of how a typical type system with type inference would work. It would build a parse tree consisting of all the elements, analyses the elements of what type it could belong to and arrive at a final conclusion.
The above example is pseudo code, the syntax is not much important. It returns true if the sum is less than 10 and false if greater. We can translate/build up from this example to other complicated workflows.
Many algorithms work in almost the same manner. If there are any type errors such as multiplying two strings, it would throw an exception.
Some entry level haskell programming will really help to understand this better. Learn you a haskell is a good website to start with.
Hindley-Milner algorithm is also called as Global type inference. It reads the whole of the code and deduces the types. Scala’s type system works a little different as explained below.
Scala’s follows a combination of sub-typing and local type inference. I tend to compare it with Haskell, since it one of the most famous Functional Programming(F.P) paradigm language out there.
Let’s understand with an example.
If we consider the below code, it gives a compile time error in Intellij.
Syntax details can be ignored for now (we can deal with lot more detail while learning methods). The program computes the factorial value based on the number passed in. If we notice the error , the pre-compiler is not able to infer the type of the recursive function. The same(similar) code can be used in haskell without any errors.
The above code when executed inside the haskell GHCI shell (kind of like scala REPL) compiles with no errors.
This is a real world example of Global vs Local type inference. In scala, we have to annotate the types wherever local type inference does not help (also see below on when to use type inference).
The correct version of the above code would be as follows. Notice the type Int is explicitly mentioned which is not present in the code above.
For a language that is multi-paradigm, it is really hard to do global/hindley-milner style type inference since it restricts doing OOP features such as inheritance and method overloading. We are not going to in detail of why languages such as Haskell cannot do such things (there are lots of resources on the net if you are curious on systems programming/compiler hacking), but the point is scala has made a different trade-off.
Systems do exist which combine these together, but to a programmer if there is a type error, the compiler has to give meaningful error messages so that they can be fixed. In reality, this is very hard to do. Continuous research is being done in this area to improve them.
As mentioned above a type system is made of pre-defined components of types and this forms the foundation of how scala infers them.
The picture says it all, you can try to dig into the source code by the usual intellij route of ctrl+click and it all points to the Any class. Please note that types are not regular classes, although they seem to be. We will deal with it in a future article in detail.
Sub-typing is something that is not supported by the Hindley-Milner algorithm, but is essential in a multi-paradigm world. This is also another reason why scala does not use the HM algorithm.
Let’s look at the below example to understand sub-typing.
We are constructing a heterogeneous list. Sub-typing converts the lower type into a higher type if possible. A simple example would converting a Int to a Double which is the first example.
If it cannot be fit, it goes to the top level i.e the
Any type. All of this conversion can be translated to the type system hierarchy above.
This makes Objected oriented programming much easier to handle.
For more information you can visit the scala docs.
There is a fine line that divides dynamic typing (no types) and static typing with type inference. As they say “all code should look like well written prose”, it is important to know when to use them and when not to.
When to use them?
When it saves programmer time and also where type information does not really matter. Situations could be inside of a function or a loop where the information about types is obvious.
When not to use them?
Simple, when type information is important i.e it should not leave the programmer who reads to code guessing about types.
It is hard to give a code example since it really depends on application under consideration. The only fool-proof way to deal with this is to conduct code reviews with peer programmers and see if they can understand them.
After all writing code is O(K) and reading code would be O(N), where K would be a constant with not much variation, since only a single person would be writing it and N would be the size of your team trying to read the code. It multiplies with team size.
With guessing comes mistakes, with mistakes come bad code, and with bad code comes frustration, with frustration comes the axe murderer
It is a matter of code readability rather than anything else. With freedom comes responsibility
Congratulations !! If you have understood/reached this far, then you should be proud of yourself. Rather than saying this is a pretty difficult topic, I would say it is a very non-intuitive one to get your head around.
Stay tuned !! This is just the beginning.