Welcome!

This application is designed to help you understand transformation models as defined by Hothorn, Möst & Bühlmann, 2018.

The purpose of a transformation model is to predict the distribution of a response \(Y\).
Increasingly complex transformation models for a continuous response are presented on their respective page:

  • The unconditional case, where the distribution of \(Y\) is modeled without involving any predictor.
  • The linear case, where predictors \(X\) are introduced in the model as a linear shift term with a fixed effect.
  • The stratified linear case, where each stratum gets a different transformation but the regession coefficients of the shift term stay the same.
  • The conditional case, where the distribution of \(Y\) is modeled fully interacting with \(X\). The predictors no longer have a fixed effect, but rather an effect that can vary depending on the response.

Additionaly, transformation models for other types of response variables are presented:

  • Categorical response variable (unconditional case).
  • Count response variable.

Features

I recommend opening the application in full-screen.

On each model's page, a transformation model is fitted to a pre-loaded dataset. You can display the dataset and information about it at the bottom of the page.

In the left-side menu, the model status displays the current parameters. Below, you can change some parameters of the model, which is instantly fitted again. This might take a few seconds.

In the centre of the page, a summary of the currently fitted model and several plots are shown. The plots are updated at the same time as the model.

On the last page, you can build a transformation model adapted to your specific needs. Describe your data, and obtain R code ready to be copied into your environment. This feature is limited at the moment.


Acknowledgements

This application was created as a Master Thesis project in Applied Information and Data Science at Lucerne University of Applied Sciences and Arts.

I would like to express my sincere gratitude to the people who contributed to its developement:


Copyrights and Reproducibility

This work is licensed under CC BY-NC-SA 4.0

Main packages versions:

  • mlt (T. Hothorn, 2025): 1.7-1
  • tram (T. Hothorn, L. Barbanti, S. Siegfried, L. Kook, 2025): 1.2-5
  • cotram (S. Siegfried, L. Barbanti, T. Hothorn, 2025): 0.5-3
  • shiny (W. Chang, J. Cheng, JJ. Allaire, C. Sievert, B. Schloerke, G. Aden-Buie, Y. Xie, J. Allen, J. McPherson, A. Dipert, B.Borges, 2025): 1.11.1
  • ggplot2 (H. Wickham, W. Chang, L. Henry, T. Pedersen, K. Takahashi, C. Wilke, K. Woo, H. Yutani, D. Dunnington, T. van den Brand, 2025): 4.0.0

GitHub directory: https://github.com/jugwen/interactive-transformation-models

Contact: Gwen Junod - gwen.junod@gmail.com

App last modified: 27.02.2026

Unconditional Transformation Model

In the unconditional case, the distribution is defined by a transformation function \(h(y)\) and a distribution function \(F_Z\), so we can write \(\mathbb{P}(Y \leq y) = F_Z(h(y))\).

The transformation function \(h(y)\) is parameterised using a basis function \(a\), so we can write \(h(y) = a(y)^\top \theta\) where \(\theta\) denotes parameters to be estimated. So, $$ \mathbb{P}(Y \leq y) = F_Z(h(y)) = F_Z(a(y)^\top \theta) $$ To specify a transformation model, we must define the basis function \(a\), choose the link function \(F_Z\), and estimate the parameter vector \(\theta\).

  • \(a\) is a vector of Bernstein polynomials of order \(M\) that must be defined on an interval corresponding to the range of \(Y\). To do so, a numeric variable representing \(Y\) is created. The author of the mlt package recommends choosing \(M\) between 5 and 10.
  • \(F_Z\) is a link function that defines the distribution to transfom \(Y\) to. It can be chosen freely and influences the interpretation of the regression coefficients.
  • \(\theta\) is the vector of parameters estimated by the model depending on \(a\) and \(F_Z\).

The resulting transformation model can be fitted to the data.


Interactive Model

A case of unconditional transformation model, more precisely a continuous model for a continuous response, is fitted to the Old Faithful dataset with the mlt package.

In this model, waiting is defined as the response variable.

Bernstein Basis
If you change the Bernstein Basis to a lower order, the first mode is less or even not represented in the PDF plot. You could set the order to 1 and increase it 1 by 1 with the arrow. The first mode slowly forms as the model captures more complexity. If you keep increasing the order, the modeled density stays stable until around 15.

Distribution
The distribution parameter defines \(F_Z\). It is the distribution we want to transform \(Y\) to. For unconditional transformation models, that choice is not really important since there are no coefficients to interpret. However, the Bernstein basis order must be large enough so that the model captures enough complexity to estimate a correct shape.

Numeric Variable
A transformation model is parametrised without seeing the data. Instead, a numeric variable representing the response variable is defined for the Bernstein basis. This is the only reference to the data before the fitting.
The support argument represents the range of the observed response, so from the smallest to the largest response value in the dataset. The bounds argument specifies the range of all possible values for the response variable. Theoretically and generally, a duration such as the waiting time variable can only be positive (there is no such thing as a negative duration), and can be infinitely large (there is no such thing as a too long duration).

Fitted Model Summary


                  

Baseline Transformation Function

Baseline transformation \(h(y)\) estimated by the model. It is the transformation applied to the response variable to make it behave like the chosen distribution \(F_Z\).

Probability Density Function

Likelihood for the response variable to have a certain value.

Cumulative Distribution Function

Area under the PDF curve at x.

Dataset

Old Faithful Geyser Data — contains 272 observations on 2 variables of the Old Faithful geyser in Yellowstone National Park.

eruptions — duration of an eruption in mins
waiting — time until the next eruption in mins


Summary


              

Linear Transformation Model

In the linear case, predictors are introduced in the model as linear shift terms. In transformation models, predictors do not influence the slope, as they would in standard regression, but they shift the entire distribution along the response scale. Note that 'linear' refers to the way the predictors enter the model, not to the transformation function. The transformation itself can be (and usually is) non-linear.

We introduce \(X\) in the formula by writing \(\mathbb{P}(Y \leq y | X = x) = F_Z(h(y | x)) = F_Z(h_Y(y)-\tilde{x}^\top \beta)\), with \(\beta\) being estimated coefficients. Put simply, we want to find the distribution of \(Y\), given \(X = x \). As for the unconditional case, the transformation model is defined by \(a\), \(F_Z\) and \(\theta\).


Interactive Model

A case of linear transformation model is fitted to the BostonHousing2 dataset with the mlt package.

In this model, cmedv is defined as the response variable depending on the other variables. medv is not included as predictor as it is collinear with cmedv .
Note that cmedv is right-censored, meaning that the maximum value 50 doesn't mean '= 50', but '>= 50'. This is not taken into account in the model on this page as it has more of a demonstrative purpose, but in a real-life scenario, censoring of the data should be addressed to obtain a model representing the data more accurately.

Bernstein Basis
As stated above, the transformation function \(h(y)\) is usually non-linear. You can force it to be linear by setting the Berstein basis order to 1. If you also set \(F_Z\) to Normal, the resulting transformation model is equivalent to a normal linear regression model. That can be confirmed by looking at the quantile plots at the bottom of the page.

Distribution
The choice of the link function is now important because it defines the scale on which to interpret regression coefficients. Still, we can choose any \(F_Z\) that interests us.

Numeric Variable
The numeric variable is not interactive here. You can still see the values of support and bounds in the model status.

Predictors
You can choose the predictor variables to include in the model. Each predictor has its own effect on the distribution, but all effects are summed into a single linear predictor term.

Fitted Model Summary


                    

When \(F_Z\) is Normal, the coefficients can be interpreted as effects on the transformation, in standard deviation units. A one-unit increase in predictor \(x\) shifts \(h(Y)\) by one standard deviation on the normal scale. So with the default parameters of this page, we can see that nox is the predictor that has the largest effect on the transformation. When nox increases by 1, the transformation function shifts by 4.77 standard deviations on the normal scale. We can also see that zn , age , tax and b have very small effects on the estimated transformation.

Another way to look at the coefficients is through probability. The predictor rm is the number of rooms. For each additional room in a house, the transformed median price shifts by -0.48 standard deviation on the normal scale. If we compare the probabilities of a 2-room house and a 4-room house to have a median value of 20'000 or less, we write:
\(\mathbb{P}(cmedv ≤ 20) = Φ(h(20) - (-0.48 × 2)) = Φ(h(20) + 0.96)\)
and
\(\mathbb{P}(cmedv ≤ 20) = Φ(h(20) - (-0.48 × 4)) = Φ(h(20) + 1.92)\)
The 4-room house shifts the distribution more (1.92 > 0.96), making it less likely that 4-room houses have a median value of 20'000 or less compared to 2-room houses.

When \(F_Z\) is Logistic, the coefficients can be interpreted as odds ratio. For a one-unit increase in the predictor, the exponential of its coefficient multiplies the probability of \(Y \leq y\). lstat is the percentage of lower status people in the population. The exponential of 0.28 is about 1.32. This means that for an increase of 1 percent in the predictor, the odds of any median price being under any threshold is multiplied by 1.32. This makes sense, as the poorer an area, the lower the house prices.

Explanations for the other link functions have yet to be implemented.

Baseline Transformation Function

Baseline transformation \(h(y)\) estimated by the model. It is the transformation applied to the response variable to make it behave like the chosen distribution \(F_Z\).

Probability Density Function

A transformation model estimates the whole distribution for each observation. The PDF and CDF plots are created by predicting for the 506 observations of the dataset at 200 equally spaced values in the range of cmedv (0,50).

With the default parameters of this page, each line's position on the \(x\) axis is influenced by the 13 predictor values of that specific observation. If you deselect all predictors but a discrete one, there will only be as many lines (distributions) as there are levels of that predictor. For example, the rad predictor can only take one of nine values. If you inlcude it alone in the model, every observation with the same rad values will be predicted the same distribution, resulting in only nine distribution functions. This illustrates how the distribution depends on \(X\).

Cumulative Distribution Function

Area under the PDF curve at x.

Quantile Distribution - Fitted Model

Each observation is plotted according to its observed cmedv (y axis) and linear predictor (x axis). The confidence bands indicate the probability of having a home value at or below the observed value, given that observation's linear predictor.

If you set the Bernstein basis order to 1 and \(F_Z\) to Normal, the transformation model is equivalent to the normal linear model on the right.
A transformation model with a higher order captures the larger values of cmedv better.

Quantile Distribution - Normal Linear Model

A normal linear model is fitted with the same predictors as the transformation model. The interpretation of this plot is equivalent to that on the left.

This model fitted with lm() underestimates larger values of cmedv as they are all in the darkest confidence band or above.

Dataset

BostonHousing2 — contains 506 observations on 19 variables of the 1970 United States census in the Boston area.

town — name of town
tract — census tract
lon — longitude of census tract
lat — latitude of census tract
medv — median value of owner-occupied homes in USD 1000's
cmedv — corrected median value of owner-occupied homes in USD 1000's
crim — per capita crime rate by town
zn — proportion of residential land zoned for lots over 25,000 sq.ft
indus — proportion of non-retail business acres per town
chas — Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
nox — nitric oxides concentration (parts per 10 million)
rm — average number of rooms per dwelling
age — proportion of owner-occupied units built prior to 1940
dis — weighted distances to five Boston employment centres
rad — index of accessibility to radial highways
tax — full-value property-tax rate per USD 10,000
ptratio — pupil-teacher ratio by town
b — \(1000(B-0.63)^2\) where \(B\) is the proportion of blacks by town
lstat — percentage of lower status of the population


Summary


              

Stratified Linear Transformation Model

In a stratified transformation model, one of the predictors is defined as strata, and a transformation function is estimated for each level of the strata. The regression coefficients of the shift term stay constant across all levels.

We introduce the strata \(S\) in the formula by writing \(\mathbb{P}(Y \leq y | X = x, S = s) = F_Z(h(y | x, s)) = F_Z(h_Y(y | s)-\tilde{x}^\top \beta)\).


Interactive Model

A case of stratified linear transformation model is fitted to the BostonHousing2 datastet with the tram package.

In this model, cmedv is defined as the response variable depending on the other variables. medv is not included as predictor as it is collinear with cmedv . The chosen strata variable chas is a dummy variable that indicates proximity with the Charles River.
Note that cmedv is right-censored, meaning that the maximum value 50 doesn't mean '= 50', but '>= 50'. This is not taken into account in the model on this page as it has more of a demonstrative purpose, but in a real-life scenario, censoring of the data should be addressed to obtain a model representing the data more accurately.

Bernstein Basis
The Bernstein basis properties are the same as the ones described in the Unconditional and Linear pages.

Distribution
That part of the model is not interactive on this page. It is set to Normal.

Numeric Variable
With the tram package, fitting is more straightforward and there is no need to create a numeric variable.

Predictors
You can choose the predictor variables to include in the model. Each predictor has its own effect on the distribution, but all effects are summed into a single linear predictor term.

Fitted Model Summary


                    

The coefficients can be interpreted as in the linear case.

Baseline Transformation Function

A transformation function is estimated independently for each level of the strata.

Probability Density Function

The PDF and CDF plots are created by predicting for the 506 observations of the dataset at 200 equally spaced values in the range of cmedv (0,50).

Cumulative Distribution Function

Area under the PDF curve at x.

Quantile Distribution for each stratum

Each observation is plotted according to its observed cmedv (y axis) and linear predictor (x axis). The confidence bands indicate the probability of having a home value at or below the observed value, given that observation's linear predictor.

Dataset

BostonHousing2 — contains 506 observations on 19 variables of the 1970 United States census in the Boston area.

town — name of town
tract — census tract
lon — longitude of census tract
lat — latitude of census tract
medv — median value of owner-occupied homes in USD 1000's
cmedv — corrected median value of owner-occupied homes in USD 1000's
crim — per capita crime rate by town
zn — proportion of residential land zoned for lots over 25,000 sq.ft
indus — proportion of non-retail business acres per town
chas — Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
nox — nitric oxides concentration (parts per 10 million)
rm — average number of rooms per dwelling
age — proportion of owner-occupied units built prior to 1940
dis — weighted distances to five Boston employment centres
rad — index of accessibility to radial highways
tax — full-value property-tax rate per USD 10,000
ptratio — pupil-teacher ratio by town
b — \(1000(B-0.63)^2\) where \(B\) is the proportion of blacks by town
lstat — percentage of lower status of the population


Summary


              

Conditional Transformation Model

A (fully) conditional transformation model is more flexible in estimating the transformation: the transformation can vary depending on the response. So far, one regression coefficient was estimated per predictor. Now, each element of the Bernstein basis (number of elements = \(M\) + 1) gets a coefficient for each of the predictors.


Interactive Model

A case of conditional transformation model can be fitted to the BostonHousing2 datastet with the tram package.

In this model, cmedv is defined as the response variable depending on the other variables. medv is not included as predictor as it is collinear with cmedv .
Note that cmedv is right-censored, meaning that the maximum value 50 doesn't mean '= 50', but '>= 50'. This is not taken into account in the model on this page as it has more of a demonstrative purpose, but in a real-life scenario, censoring of the data should be addressed to obtain a model representing the data more accurately.

Bernstein Basis
The Bernstein basis properties are the same as the ones described in the Unconditional and Linear pages.

Distribution
That part of the model is not interactive on this page. It is set to Normal.

Predictors
You can choose the predictor variables to include in the model. Each predictor has its own effect on the distribution, but all effects are summed into a single linear predictor term.

Plots
This feature is not developped yet.

Fitted Model Summary


                    

For the default model on this page, there are 7 Bernstein elements, 13 predictors selected, and 1 intercept, which equals to 13 x (7 + 1) = 98 coefficients. This explains the longer fitting time of this model.

Dataset

BostonHousing2 — contains 506 observations on 19 variables of the 1970 United States census in the Boston area.

town — name of town
tract — census tract
lon — longitude of census tract
lat — latitude of census tract
medv — median value of owner-occupied homes in USD 1000's
cmedv — corrected median value of owner-occupied homes in USD 1000's
crim — per capita crime rate by town
zn — proportion of residential land zoned for lots over 25,000 sq.ft
indus — proportion of non-retail business acres per town
chas — Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
nox — nitric oxides concentration (parts per 10 million)
rm — average number of rooms per dwelling
age — proportion of owner-occupied units built prior to 1940
dis — weighted distances to five Boston employment centres
rad — index of accessibility to radial highways
tax — full-value property-tax rate per USD 10,000
ptratio — pupil-teacher ratio by town
b — \(1000(B-0.63)^2\) where \(B\) is the proportion of blacks by town
lstat — percentage of lower status of the population


Summary


              

Categorical Transformation Model

Categorical transformation models deal with ordered discrete response variables. Here, only the unconditional case is presented.

Explanations about how such a model is implemented should be added.


Model

A case of unconditional categorical transformation model is fitted with the tram package.

In this model, rating is defined as the response variable.

Link Function
That part of the model is not interactive on this page. It is set to Logistic, and since it is an unconditional model, that choice is of little importance.

Fitted Model Summary


                  

Density and Distribution Functions

Plot A is the PDF, showing the probability for a response to belong to each category. For example, there is a 36% chance that a rating is 3.

Plot B is the CDF, so the probability for a response to belong to the observed category or any category below. For example, there is a 90% chance that a rating is 4 or below. The height of a step corresponds to the chance of belonging to that category, as depicted in the PDF. For example, the PDF shows that there is a 31% chance that a rating is 2. In the CDF, the corresponding step from 0.07 to 0.38 is 0.31.

Plots C and D show the PDF and CDF of the latent variable \(Z\), which is an unobserved continuous variable used in the computation of the model. The transformation function maps the discrete response variable to \(Z\). More precise information should be added here.
These two plots can be read in parallel to A and B, because they depict the same relationship but from a different point of view. For example, the area under the curve in C corresponds to the probability in A, so we know that the area between \(h_1\) and \(h_2\) is 0.31 (value of category 2 in A).

Transformation Function Mapping the Discrete Response Variable to the Latent Variable \(Z\)

This is another representation of the relationship between \(Y\) and \(Z\). Plot C is the PDF again, equivalent to plot A above.
The density of \(Y\) (plot C) is mapped to the density of \(Z\) (plot A, equivalent to plot C above) through the transformation step function \(h\) (plot B).

Dataset

Bitterness of wine — dataframe containing 72 observations on 6 variables of a tasting experiment on the bitterness of wine.

response — scorings of wine bitterness on a 0-100 continuous scale
rating — ordered factor with 5 levels; a grouped version of response
temp — temperature during production as a factor with two levels
contact — contact between juice and skins during production as a factor with two levels
bottle — factor with eight levels
judge — factor with nine levels


Summary


              

Count Transformation Model

Count transformation models are specifically designed for count response variables.

They are expressed by \(F_{Y|X=x}(y | x) = \mathbb{P}(Y \leq y | x) = F(h(\lfloor y \rfloor) - x^\top \beta)\), with \(F\) being the link function and \(y\) being rounded to the nearest integer.


Interactive Model

A case of count transformation model is fitted with the cotram package.

In this model, DVC is defined as the response variable depending on all the other variables.

Bernstein Basis
The Bernstein basis is interactive in this model.

Link Function
The choice of the link function is important because it defines the scale on which to interpret regression coefficients. Still, we can choose any \(F\) that interests us.

Log-first
When it is set to TRUE, the model transforms the response with \(log(y+1)\) before the Berstein basis is applied. That changes the interpretation scale of the coefficients from the response scale to the log scale, meaning that the coefficients have a multiplicative effect.

Fitted Model Summary


                    

When the link function is cloglog, the linear predictor is interpreted as discrete hazard ratio. We can interpret the sign of the coefficients: if it is positive, there is a higher risk of having a collision compared to the baseline. For example, the baseline for weekday is Monday. All other weekdays, there is a higher risk of having a collision. However, in the weekend, there is a smaller risk of having a collision.

Explanations for the other link functions have yet to be implemented.

Hazard Ratio for the Year 2011

Evolution of the collision risk across a year, estimated for each day, with fixed effects.

The changes in the hazard ratio are relative to the baseline of January 1st, so a higher ratio means a higher risk of collision. For example, we see that there is a peak of collision risk in May with about 12.5 times more risk to have a collision than on January 1st.

Baseline Transformation Function

Baseline transformation \(h(y)\) estimated by the model. It is the transformation applied to the response variable to make it behave like the chosen distribution \(F\).

Probability Density Function by Year

The PDF and CDF plots represent the isolated year effect on the collision count.

Cumulative Distribution Function by Year

Area under the PDF curve at x.

Dataset

Deer-Vehicle Collisions preprocessed according to the cotram package vignette code (DVC-data and DVC-setup chunks) — time series containing 3'652 observations on 25 variables of collisions between roe deer and vehicles between 2002 and 2011 in Bavaria, Germany.

day — date
DVC — number of deer-vehicle collisions that day
weekday — day of the week
year — year
time — days since beginning (01-01-2002)
tvar1 - tvar20 — sine-cosine transformed times (allow modelling of periodic (yearly) effects)

tvar variables rounded to the fifth decimal in this view.


Summary


              

Build a Transformation Model

On this page, you can generate a made-to-measure transformation model. Define the parameters of the model in the left menu, generate the model's code, and copy it into your R environment. At the moment, this feature offers only limited options.

Your R Code

Copy the code below and replace the placeholder names:


                  

Placeholders

  • my_data : Your data frame name