## Further Experiments to approaches for holding translations during training

In my master's thesis, the final experiment for multilingual knowledge completion (No.7) used a modified data set based on interlanguage training triplets (e1, rel, e2) (download here). These interlanguage triplets consist of additional triplets where entity1 and entity2 are from a different language. For example, the training triplet [football player, type of, athlete] is also available in [Fußballer, type of, athlete] or [calciatore, type of, Sportler]. The reason for that introduction is the lack accuracy received in the experiment (No.6). The issues that could be traced back via T-SNE  led me to assumed that the model lost any connection during the training between different languages. A connection was only given at the beginning by the multilingual word embedding space where similar words have similar vecotrs --> for example: dog, Hund and cane have nearly the same representation. In this post, I want to present the results of my ideas for further experiments from the conclusion. Therefore, I said that it would be possible to use a new relation "has_translation" or the existing "is_similar" relation to give the model a chance to hold the connection of knowledge in different languages during training. In the following you find a short evaluation, called experiment no. 8 and 9.

### Experiment 8: Translations are given by "similar_to" triplets

In this experiment, the translation is provided via two additional "similar_to" triplets. The training data consists of 496.385 triplets. The training data looks like the following:

__family_viperidae_1 _similar_to __Familie_Viperidae_1
__Familie_Viperidae_1 _similar_to __family_viperidae_1
__family_viperidae_1 _similar_to __famiglia_viperidae_1
__famiglia_viperidae_1 _similar_to __family_viperidae_1
__famiglia_viperidae_1 _similar_to __Familie_Viperidae_1
__Familie_Viperidae_1 _similar_to __famiglia_viperidae_1

...
__family_viperidae_1 _member_meronym __genus_vipera_1
__Familie_Viperidae_1 _member_meronym __Gattung_Vipera_1
__famiglia_viperidae_1 _member_meronym __genere_vipera_1

The first six training triplets are meant to train the translation and the next three are the triplets which contain the knowledge in each langauge. (FYI: "viperidae" = "toxic snakes" :-) )

Assuming that the knowledge is not available in Italian (if we have not __famiglia_viperidae_1 _member_meronym __genere_vipera_1 trained the model with this triplet) it should be capable to transfer the knowledge from the other languages.

### Experiment 9: Translations are given by "has_translation" triplets

Introduction a new relation called "has_translation" to store the translations is another approach. However, a new relation means an additional neural network to train and lead to larger training times. The training data file can be downloaded here and the example in this case looks as follows:
__family_viperidae_1 _has_translation __Familie_Viperidae_1
__Familie_Viperidae_1
_has_translation __family_viperidae_1
__family_viperidae_1
_has_translation __famiglia_viperidae_1
__famiglia_viperidae_1
_has_translation __family_viperidae_1
__famiglia_viperidae_1
_has_translation __Familie_Viperidae_1
__Familie_Viperidae_1
_has_translation __famiglia_viperidae_1
...
__family_viperidae_1 _member_meronym __genus_vipera_1
__Familie_Viperidae_1 _member_meronym __Gattung_Vipera_1
__famiglia_viperidae_1 _member_meronym __genere_vipera_1
The difference to the example of experiment no. 8 is that has_translation is used instead of similar_to as relation. That allows us to store the translation exclusively and it allows us to learn a neural tensor network on a translation task (Some translation examples and an evaluation of this "translation machine" will be presented in the next post).

### Accuracy Evaluation

The accuracy results for both experiments do not reach the high score of the previous experiment no. 7 which achieved 87.22%%. Experiment no. 8 achieved 86.07 % and experiment no. 9 only achieved 86.15% (stopped after 125 iterations because of no further improvement).

Further evaluation comming soon.

## Summary

In summary, the presented approach for multilingual knowledge base completion in combination with training data that connects entities in different languages over inter-language triplets resulted in an increased accuracy from 85.82% to 87.86% for link prediction on the WordNet WN11 test data set. The achieved accuracy is higher than in the original paper [55] and is currently higher than the best reported accuracy in literature on the data set [25]. In other words, joint multilingual training can improve knowledge base completion in contrast to just monolingual training. On a high level perspective, the proposed approach could be seen as providing further training data through the translation of existing data into other languages and also can be applied to other tasks where a scarcity of labeled training data exist.
In retrospect, the implementation of the neural tensor network required a lot more time than previously planned and was the bottleneck for starting the multilingual experiments. I observed that coding a neural network and working with words, matrices and vectors requires additional skills different than those in standard software development. Furthermore, a highly accurate working style is necessary because it is easier for humans to recognize a false word as a string than as numbers from vector representations, for example.
As been reported of machine learning, it is often presented as simple as: starting a program, changing some parameters, and receiving (great) results. However, during this thesis I experienced that the preparation required before training a model, such as creating and loading data sets, word embeddings and so on, takes a lot of time before the actual training can begin. Long training times for the models and restricted resources, such as access to a server for training, required a lot of thinking before actually starting with the training and limited the number of conducted experiments.
Finally, I enjoyed working on my thesis issues and hope that the developed approach helps to complete knowledge bases and is also applicable in further research.

### Future Work

Further work could start by finding better ways to connect entities of different languages during the training. The approach developed in this thesis via a modification of the training data with inter-language triplets can be seen as a base line approach. For example, future work can create a connection via a new relationcalled translation of or using the available similar to.
In contrast to the approach of this thesis, (which can strictly be separated in the three parts: firstly: monolingual creation of word embeddings, secondly: to map them in a common space and lastly: to train a model) further research could combine different parts or change them with more promising approaches.
Another obvious possibility for future work is the use of additional languages or to use this approach on other data sets.
Finally, future research could try to transfer this approach to other NLP areas beyond knowledge base completion. This could be useful, if the size of labeled training data is rare, however, the translation of training data could be a limitation for further applications.

## Sonntag, 20. September 2015

### Multilingual Knowledge Base Completion Experiments (No. 5-7)

In this post, the previously presented approach for multilingual knowledgebase completion is implemented and evaluated in several experiments. The multilingual experiments are conducted on the same link prediction task as the monolingual experiments and is trained jointly on the same model. That makes the results comparable and lets us see if the approach improves the accuracy in link prediction.

## Results of Multilingual Knowledge Base Completion Experiments

For a joint training of data sets in different languages, it seems to be reasonable that the entities with the same meaning in different languages should be similar. Therefore, as described in this post, a multilingual word embedding space was created. For learning the translation matrix, the first 5,000 words of the entity vocabulary (in order of their appearance in the original word2vec trained file) were picked. This decision ensures that no words with a random word embedding occur in training which would result in a negative influence to the learned translation matrix. It was not possible to use the original BabelNet entity translations because our translation matrix is trained on word level and the BabelNet translates on entity level. For instance, football player has two words in English, but is translated from BabelNet as one word Fußballer in German. In this case, each of the English words are translated via Google Translate in German and this results in the following two translation pairs {football; Fuß ball} and {player; Spieler}.

In the following table the P@1 and P@5 score of the translations matrix used for the multilingual word embedding space creation is depicted with 36% and 40% as well as P@5 score of 53% and 56% is shown.

 Precision of mapping German and Italian into the English word embedding space for multilingual knowledge base completion experiments.

This measure based on the following 200 words (5001-5201). The gold keys created with Google Translate and not all of them found in our entity vocabulary which resulted in a reduction of the useable testing pairs to nearly a half.

Moreover, a new data set referred as WNUKDEIT was constructed by copying all the training, development and test data from the WN11, WN11DE and WN11IT together in one common training, development and test file.

 Multilingual data sets

As shown in the table above, this data is the sum of each monolingual data set and contains 105,446 entities with a vocabulary of 76,023 words and the training data contains of 296,223 triplets. The configuration of the NTNKBC was equal to the previous experiments.

### Results and Evaluation of Experiment 5 Reveals a Problem

As shown in the table below, the result of this experiment number 5 with a test accuracy of 83.29% was disappointing because the score is lower than in training each data set separately, where the lowest of these reached 83.37%. In this case, the assumption that multilingual data can improve the accuracy while providing more training data seems not to be hold based on this result and it is necessary to look for the causes.

An evaluation of how the model learned the entity representations reveals the issue.
 Showing the entity space after training of experiment 5 on the left and experiment 6 on the right. The visualized data is taken from the test of relation similar to. The language of an entity is marked by its color where blue stands for English, red for German and green for Italian. These visualizations are created with T-SNE.
As shown in the left visualization above, after training the entities of the different languages are separated in three parts according to their language and not mapped together with their similar translations. This appearance indicates that the model learns that football player is not similar to the German entity Fußballspieler or the Italian entity calciatore. We can conclude, that the model views each entity as a different one with no relation although they have nearly similar representations at the beginning of the training.

The reason for this behavior is that the model transforms the entity representation during training according their relations to other entities. The model found no relation between football player and Fußballspieler in the training data because the training triplets are still monolingual. For instance, the training triplet [football player, type of, athlete] is translated [Fußballspieler, type of, Sportler].

In summary, the bad performance of the trained model seems to be due to no inter-language connections between the entities of different languages during the training. It is not enough to map the entities only at the beginning in one space because the model modifies them during the training according the backpropagated errors. A solution could be to reduce the parameter optimization from $$\Omega = \left\{W,V,U,b,E\right\}$$ and $$\Omega = \left\{W,V,U,b\right\}$$, where $$E$$ stands for entities. However, the accuracy according to Chan et al. [12] reduce on WN11 to only 75.8%. Consequently, to achieve the goal of a higher accuracy, it is necessary to find a solution that handles similar entities from different languages also during the training.

### Improvement Through Inter-Language Training Triplets

In order to fix the previously exposed issue of the lost connection between entities of different languages during training, a further multilingual data set was constructed. This data set, referred to as WN11UKDEIT Inter-Lang, contains additional training triplets where the first and second entity can be in a different language in order to ensure the connection between different languages.

Therefore, each training triplet is added in every possible language combination and results in 846,647 training triplets. This means that the first entity could be in a different language than the second. For example, the training triplet [football player, type of, athlete] is also available in [Fußballer, type of, athlete] or [calciatore, type of, Sportler]. Entities in different languages have the same relationships to entities of other languages and therefore the model will see them as similar. These additional triplets, referred as inter-language triplets in the following, should help the model to obtain the connection between entities of different languages during the training.

### Results and Evaluation of Experiment 6

The training results showing that mixing entities from different languages into the training triplets results in a higher accuracy of 87.22% at 75’th iteration. The only difference in the NTN configuration was to decrease the number minimizer iterations from 5 to 3 in order to slow down the fast coverage occurred in this experiment.

 Accuracy for multilingual knowledge base completion in percent.

Loading these trained weights for testing the accuracy on each monolingual test set results in an accuracy of 87.86% on the original WordNet WN11 data set. That outperforms the previously achieved 85.82% monolingual result for English and is also higher than the reported 86.2% of [55] for the model and the highest currently achieved accuracy on WN11 of 86.4% by Ji et. al [25, p. 692]. However, comparing the results with other literature is difficult because they initialize the entities mostly randomly and not with word embeddings [25]. Furthermore, the results for the other languages increase to 86.80% for WN11DE and 86.55% for WN11IT.

 Accuracy for each relation for the multilingual experiment 6. The black bar is the accuracy for all languages together. The other bars denote the accuracy for each language separately.

Moreover for verification, a further experiment number 7 on this data set with random word vectors was executed to prove that not only the improved training data of the WN11UKDEIT Inter-Lang data set is responsible for the higher accuracy. A random word vector initialization on this data set only achieved its highest score with 80.55% at the 90’th of 150 iterations and stopped because there were no significant improvement in accuracy achieved.

Furthermore, the same entity embedding space in a T-SNE visualization above on the right side looks quite different this time. In contrast to the previous experiment number 5, entities are spread between the different languages. An example is shown below where the genus region of the entity space from the test data for relation similar to is zoomed in. There we can see that entities with the word genus are mapped near together, and not spread in different areas of the space, as occurred in experiment 5

 T-SNE visualization of the multilingual entity space.

## Results of Monolingual Knowledge Base Completion

In this post monolingual experiments are conducted to reproduce the results as reported in [55] with self pre-trained word embeddings and model implementation. The obtained results also serve as reference values for the multilingual experiments.

### Data sets

The WordNet data set WN11 from Socher et al. [55] was used as a base and new versions for German and Italian were created. For the German and Italian version of this data set, all entities with an WordNet translation entry in BabelNet (BabelSenseSource.WNTR) were translated. Both data sets are referred to as WN11DE and WN11IT in the following.

 Overview of data sets. The original WordNet data set is in English and was published by [55] and is in the literature referred to as WN11.
Missing translations result in smaller data sets for German and Italian. Both are smaller by around 5,300 entities or 14 % less. The number of training, development and test examples are around 81%, 84% and 83% of the original size

### Results and Evaluation of Experiment 2-4

After training, an evaluation of the learned model has shown that it can predict the correctness of triplets with an accuracy between 83% and nearly 86%, and according to Socher et al. [55, p. 7] that can be seen as a high accuracy. The learning and cost decrease during the training are reported in the appendix. In the following table an overview about the results on the different languages is given.

 Monolingual knowledge base completion accuracy for link prediction on WN11 data set in different languages. The final two columns denote the relation of the highest and lowest accuracy

As seen in the illustration below, all models achieve their best accuracy on the relation subordinate instance of. The worst performance is measured for relation similar to for German and Italian and domain topic for English. A reason for this could be that both relations contain the lowest number of training examples.

 A comparison between the accuracy of different relations and experiments.
Example of Reasoning
This example demonstrates how the NTNKBC model can infer that a football player is a type of athlete and an instance of participant as depicted in the following grafic.

 Showing a selected part of a knowledge graph network. The black arcs stand for learned relations from the training data. The green arcs are correct predictions from the test data. As shown by the correct predictions, it is interesting to note that the model learns to reasoning over entities.
An explanation could be that the first word football is similar to hockey or tennis which are part of an entity that is a type of athlete. Additionally, all these entities contain the second word player. Furthermore, the entity football is represented by its single word embedding and has similarities to the other sport games where players are athletes. For the model, entities with the same relations to other entities, seen as similar. In our case, tennis, hockey and football share the same characteristics that result in similar relations to . Consequently, the entity football player should have the same latent features and following similar to the model as entities that are known to have the relation type of athlete.
The German WordNet translation football player is translated as one word Fußballer. A translation for participant was not available in BabelNet . Nevertheless, the model predicts the remaining relationships of football player correct, too. Moreover, the entity lacrosse player was translated as single word entity LacrosseSpieler and is also correctly predicted to not be an instance of asterid dicot genus (herb genus). The same was experienced for the Italian experiment.

Example of Learned Entity Representations
In this subsection, I want to show an example of how the model modifies the entity representations during training. In chapter 3.3.3, we discussed how the model maps entities, or more precisely, their constituting words, based in which relation they occur in the training together.
For analyzing this behavior, let us examine the three entities male, body, and male body. It would be interesting if the entity male body is near male or in the body region.

 T-SNE visualization of the test data from relation part of which shows the position of the entity male and male body in different places.
As the visualization above shows, male body is in the ”body region” and far away from male. An intuitive explanation of this behavior is that the word body shares more similar relations than the male with male body. Furthermore, female body mapped nearly on the same position that indicates similar entity representations because both have the similar relations to other entities in the same word embedding space regions. Taken together, body is the more expressive word in this entity that shares more relations with the other body entities than male or female.

Example of "Knowledge Embeddings" / Tripplets Mapped into a Space
 Showing a T-SNE visualization of all test triplets (after training) of the relation type of with UkWaC word embeddings after the non-linear transformation. Each triplet i is represented by its activation as a vector $$a_{i} \in \Re^{s}$$ , where s is the number of slices. The picture on the left side shows the decision border where the model divides the space into the two classes. The yellow dots predicted as correct triplets and the blue ones as incorrect. The middle picture depicts the solution where incorrect predictions are marked as red points and correct ones are marked as green points. The picture on the right side marks with black points all triplets where the word player is contained. For instance, the rightest black point marks the triplet [football player, type of,athlete].

## Approach for Multilingual Knowledge Base Completion

This chapter outlines the chosen approach for multilingual knowledge base completion. The starting point is based upon the NTNKBC model [55] because of its good results and its novel approach of using word embeddings to represent entities by their constituting word vectors. Hence, it is obvious to enhance the multilingual part by building upon the word embeddings and mapping different languages into one space, referred henceforth as multilingual word embedding space. In this space, word embeddings of different languages are aligned so that similar words have nearly the same positions. Consequently, entities of different languages with the same meaning should have nearly the same entity representation and be recognized as equal for the network.
 This illustration begins by mapping word embeddings, represented as squares, from different languages into one common space to train the neural tensor network model on multilingual knowledge base data. In this example, the model learns that a dog is an animal and based on this knowledge the model should also be able to refer that a cat is an animal because the word embedding of cat is near dog and following they share similar latent features.
The figure above illustrates the proposed approach of this thesis for multilingual knowledge base completion.

The procedure of this approach can be separated in three main parts:
1. Learning the word embeddings of various languages with word2vec.
1. Mapping the word embeddings into one common vector space via a translation matrix
1. Training the NTNKBC model with knowledge base data in various languages to predict new relationships between entities

### 1. Monolingual Word Embeddings Learned by word2vec

The first part of this approach is the creation of word embeddings because they are the input features for the entity representations and are required for combining the different languages.
In order to receive word embeddings with good linguistic regularities, it is important to train them on a large corpus. The major factors influencing the quality of the word embeddings are the amount and quality of training data, the number of dimensions, and the training algorithm [32]. As far as possible, all of these factors were taken into consideration.

For learning the word embeddings, there are three different corpora used as training data from the Web-As-Corpus Kool Yinitiative (WaCky) [3]. These corpora are chosen because they are easy to access, freely available for research purposes, and contain a large vocabular.

### 2. Multilingual Word Embedding Space

In order to predict the correctness of triplets from multiple languages, entities of different languages with the same meaning should have the same representation. Therefore, we need some kind of translation between the data from different languages. One opportunity could be to use dictionaries and phrase tables to translate the data into one language. However, missing words are hard to handle and the assignment of features for previous unseen words is also difficult.
As a result, the previously presented method of Mikolov et al. [37] to map different word spaces together, seems to be a suitable concept that fits to our problem. The advantages and reasons to use this method are: firstly, its simplicity and effectiveness, secondly their new way to exploit similarities of word embeddings in contrast to previous mapping approaches as in [23] and [26].
Our objective is to learn the transformation matrix that helps to infer from some known translations to unknown translations and let us map the two word embedding spaces together.

Construction of a Multilingual Word Embedding Space
To generate a multilingual word embedding space, there are more than two languages involved. Therefore, we only change the source space and leave the target space for both the same. That means, in the context of this thesis, that we mapped first the German word embeddings into the English space and separately, in the same way, the Italian word embeddings into the English space.

 The picture above illusrtrates construction of a multilingual word embeddings space.

In the end, we built the multilingual space by concatenation of these three spaces and their vocabulary lists into one, called our multilingual word embedding space that is used as input for the NTNKBC for our multilingual experiments in this post.

### 3. Neural Tensor Network for Knowledge Base Completion

For predicting new triplets from multilingual data, a neural tensor network for monolingual knowledge base completion (NTNKBC) as proposed in [55] 2.5.1 was implemented. This model was chosen because it outperforms previous models and is suitable for reasoning over relationships between two entities in large data sets [55, p. 4].
In the previous section, I described how they invented the model by deriving and generalizing from the advantage of different previous models [64, p. 3]. Since the NTNKBC has both linear and bilinear relation operations, it is referred to as the most expressive model in literature[64, p. 3].

Implementation
As a part of this thesis, the implementation of a NTN for knowledge base completion in Java was built and is available on GitHub. Therefore, more flexibility and better handling for the multilingual extension was among the main reasons for this decision.

Training Objective and Derivatives

Initialization, Representation and Jointly Training of Entities

## Why is Multilingual Knowledge Completion Important?

A lot of information is expressed by human language, for example, in web pages, documents or books and the amount of informtion is still growing. In this form, the content is difficult to understand for machines, because they cannot derive a semantic meaning from this type of input in a simple way. To overcome this issue and to provide knowledge in a form that is easily understandable for machines, the current approach is to store extracted information with relational tuples (in the form of [entity 1 , relation, entity 2 ]) in databases. Such databases that encode structured information of entities and their relations are called knowledge bases.
Although knowledge bases store billions of relational information between entities, they are far from being complete [49, p. 1]. Most of the knowledge bases are restricted to a specific domain and still have incomplete data. For example, the largest open source knowledge base has the place of birth for only 29% of people according to [63, p. 1], and the nationalities for only 25% of people as measured in October 2013 [15, p. 1]. This incompleteness reflects how difficult it is to extract correct tuples on a large scale and results in a bottleneck for the accuracy of other applications that rely on the data of knowledge bases. Providing complete and correct data is crucial for a lot of other NLP applications such as information retrieval, question and answering, or for providing structured data to users [55, p. 1] and could improve the performance in these knowledge dependent applications.
In this thesis, a neural tensor network is applied for multilingual knowledge base completion in order to jointly learn knowledge that is stored in different languages with the goal of achieving a higher prediction accuracy through the combination.

### What is the Problem?

The main problem statement lies in finding and implementing an approach that combines knowledge that is stored in different languages in order to improve the accuracy for a knowledge base completion model, in the following, often just referred to as model.
The resulting issue is how to combine entities and their relations from different languages in a way that the model can understand it. In contrast to the monolingual case, the model additionally has to recognize that Eishockeyspieler in German or giocatore di hockey su ghiaccio in Italian are similar to hockey player. However, the model itself does not know that these entities are similar because it does not understand human language and cannot translate between the different languages. Therefore, an appropriate method to adjust the input features of the entities, in a way that these are similar for the model, must be implemented. This leads to more training examples and hopefully makes it possible for the model to improve its accuracy.

### Related Work

In contrast to normal machine learning that typically works with two-dimensional data matrices where each column represents an object and rows contain its features and with the objective to learn a mapping function from input to output [49, p. 1], this thesis is more related to the field of Statistical Relational Learning (SRL) [49]. SRL deals with objects that additionally include relationships to other objects. For instance, such objects can be entities of knowledge bases with their various relationships to other entities [49, p. 1].
The most related work is a monolingual knowledge base completion model which used pre-trained word embeddings from Socher et al. [55] and is related to the deep learning literature [55, p. 2]. For mapping separated pre-trained word embedding spaces of different languages together, the way to learn a mapping function that Mikolov et al. [37] proposed for the bilingual case was applied. The idea of combining different languages to enhance knowledge base completion is shown in [30] where the data from Wikipedia in different languages is used to generate the coherent knowledge base [31] or in [27] for DBpedia. An application of joint multilingual approaches that can improve performance in contrast to mono- and bilingual is shown, for instance, in the case of semantic relatedness by Navigli et al. [45] or for named entity recognition by Al-Rfou et al. [2]. Moreover, work that uses multilingual training data to enhance the accuracy of a knowledge base completion model was not found in a literature research.

# Latent Feature Models for Knowledge Base Completion

A common characteristic of latent feature models is that they explain triples via latent features of their entities. For instance, we could explain that a baseball player is a type of athlete because all player of sports are athletes. Therefore, we are using latent features of entities (baseball is sport and all players are athletes) to explain the observable facts that a baseball player is a type of athlete. These features are called latent because they are not directly observable in the data and automatically derived from the data [49, p. 6].

Let us suppose that our entity $$e_{1}$$ is baseball player and $$e_{2}$$ is athlete each represented by a vector $$e \in \Re^{d}$$ with $$d=2$$ (latent) features. $e_{1}=[^{0.9}_{0.2}] , e_{2}=[^{0.2}_{0.8}] .$ In this case $$e_{1}$$ refers to the latent feature "person doing a kind of sport" and $$e_{2}$$ refers to "what people who doing sport are named". However, in contrast to that simple example which is derived from Nickel et al. [49, p. 6], they also mention that in reality latent features inferred by such models, are hard to interpret.

As described in the example above, the intuition behind such models is that the relationship of two entities can be derived from interactions of their latent features. For completeness I want to mention that there are a lot of different ways to model latent feature interactions and derive the existence of a relationship from them, however, in the following only relevant models for this thesis that applied for multilingual knowledge base completion are considered: the bilinear and MLP model.

## Bilinear Model

First, I want to start with the following bilinear model for expressing latent features [49, Eq. 7]: $e_{1}=[^{0.9}_{0.2}] , e_{2}=[^{0.2}_{0.8}] .$ where $$d$$ is the size of the entity vectors and for instance $$w_{ab}$$ denotes the entry of row $$a$$ and column $$b$$ that is multiplied with the value at position $$a$$ of entity vector $$e_{1}$$ and $$b$$ of entity vector $$e_{2}$$. In the end, each value is summed up and results in a scalar. We can see $$W \in \Re^{d \times d}$$ as a weight matrix that is modeling how much the (latent) features $$a$$ and $$b$$ interact in the $$r$$'th relation. If this model is applied on our example, we can model that somebody who is doing some kind of sport is an athlete via the following matrix $$W$$ [49, p. 6]: $W_{type_of} = [^{0.1}_{0.1} |\\^{0.9}_{0.1} ].$According to [49, p. 6], negative entries show a negative correlation between these two features and positive values show a positive correlation. Consequently, entries around zero have no implications. Such a model as shown in Eq. above scores the correctness of a triplet by the weighted sum of all pairwise interactions between the latent features of the entities $$e_{1}$$ and $$e_{2}$$ [49, p. 6]. In case of the example above, we get the following score: $e_{baseball\_player}W_{type\_of}e_{athlete} =[^{0.9}_{0.2}] \times [^{0.1}_{0.1} |\\^{0.9}_{0.1} ] \times [^{0.2}_{0.8}] = 0.686 .$ For instance, in contrast the nonsense triplet that a baseball player is a type of car should consequently results in a lower score: $e_{baseball\_player}W_{type\_of}e_{car} =[^{0.9}_{0.2}] \times [^{0.1}_{0.1} \left|^{0.9}_{0.1} ] \times [^{0.2}_{-0.8}] = -0.642 .$ In conclusion, the latent feature model that is described in this section is called a bilinear model and consists of the parameters $$\Omega = \{W,E\}$$, where the representation of the entities are learned jointly in overall relations. In other words, we use the same entity representations as input over different relationships in order to distribute information via the latent representations of entities. The weight parameter \(W_{R})\ is different for each relation $r$ because it should learn how the latent features interact for a particular relationship.