Not a data science blog

6 minute read

Published:

Failed communication is the main cause for frustration in interdisciplinary collaboration.

The improbability of communication

I am a computational linguist. I am not a data scientist. How is that? It’s very similar to me saying that I am a Swabian. I come from the Stuttgart area in the south of Germany. More precisely from a village close to the edge of a plateau called “Schwäbische Alb”. People there are different. They talk differently. They speak Swabian. That doesn’t mean that I do not speak Standard German (though, many Swabians can’t). I moved away right after I finished school and I had to learn to not use dialectal speech outside of this area. The most important part, however, is that I can speak and understand both: Swabian and Standard German. Many times during my time as a student in Bavaria, studying with people from all over Germany, it happened that I answered my mobile phone. My mom called. A short exchange of a couple of sentences and I hung up. My friends stared at me. They were completely bewildered by the fact that within a heartbeat, I could change my language system to a version of their very own language that they couldn’t understand. This ability of switching between languages would prove useful in the future.

A computational linguist is a Swabian. In the best scenario, a Swabian who travelled, lived in other parts of the world. A computational linguist speaks decent Data Science and has been to Linguistics for a semester abroad. The more countries the computational linguist has travelled, the easier it is for her to see how people come from different cultures, speak different languages and have different values. Since linguistics is inherently about language and since language carries semantics, a computational linguist will learn about medicine, law, literature or whatever else they work with. Backend engineers, project managers, data scientists, domain experts: they are all from different countries, speak different languages or dialects. I like to think of a computational linguist as a translator, as someone who cannot only speak multiple languages but also knows about the pitfalls of different cultures.

Now, when I picked up the phone it was abundantly clear to my friends that I was no longer speaking their language. I had left my sensible Standard German self behind. This is the point where we have to leave this metaphor and zoom into a problem that is hidden. Domain experts and Backend engineers speak the same language - seemingly. They can talk about the weather, lunch or the latest episode of Breaking Bad. They can also talk about the details of an aortocoronary bypass surgery or the failed deployment on the staging environment. However, in the latter case, there might be question marks. Understanding words and really understanding are two different things.

Often, it is obvious at which point communication fails. There are open questions, background information is exchanged, terminology explained and finally a shared understanding will emerge. How, though, do we know that the understanding we reach is indeed the same if we communicate with the terms just now explained to us? We want to believe that we explained ourselves sufficiently well. But communication is in fact improbable.

Communication and the interpretation of signs is a long, demanding process. There are no short cuts. Especially in collaborations of technical and non-technical experts this can take a toll. How many times have I been sitting through hours-long meetings where people talked at cross purpose without even noticing? Terminology, as well defined as it seems, rarely works beyond the borders of a domain and it might very well be that someone from a different domain is not even aware that they are using a word that has a special meaning within another domain. Let me give you an example.

During my time working in a field called Digital Humanities, I sat through a lengthy discussion with scholars from literary studies as well as computer scientists from the visualization department. The goal of the meeting was to provide an interface for the annotation of literary texts. The computer scientists insisted there had to be a way to compare and contrast the annotations of different annotators in order to assess the way in which annotations were intersubjective and generalizable. The literary scholars, however, were shocked by the thought of having their notes and thoughts compared to someone else’s with the goal of reaching a common truth. This clash makes a whole lot of sense given that in computer science and computational linguistics annotation is the process of enriching an artefact with specific and agreed-upon labels which will later be used to train a classifier of some sorts or to uncover patterns. In order to do so, annotations need to be consistent and objective. In literary studies or the humanities in general, annotations are more of a note or subjective thought that one writes down while working through an artefact. It is by nature closely related to the reading and understanding of a text and aims in no way at being an objective truth. What went wrong? The two views being brought into this entire project had clearly very different expectations and views of what had to happen, even though they were using the same term for it, not being aware that the meaning of this term differs in a crucial way in their respective fields.

Another issue is the linguistic concept of framing. It describes the assumption that a word can only be understood having access to essential knowledge about it. Now, a software developer is aware that they probably don’t know as much about appendicitis as a physician does. But are they also aware that for historical reasons the units for lab values in East and West Germany differ and that an app that automatically calculates them for doctors needs to take this into account? Doctors might discuss lab results of 60 - 100 mg/dl or 3,3 - 5,5 mmol/l and are comfortable knowing that it both refers to a norm value for blood sugar and not about two completely different things.

Communication issues are not a one-way road. Often alone the willingness to listen goes a long way. As a computational linguist I see myself as a listener. We do get communication wrong, we make the same mistakes. Anyone can be a listener. The only difference is that I have learned to not despair and see it as a part of the process. You get to know each other, you learn to communicate if you are only aware of the fact that there will be issues. A computational linguist can be a facilitator - especially if she travelled far.