We present an R-function to generate missing values in complete datasets. Such an amputation procedure is useful to accurately evaluate the effect of missing data on analysis outcomes.

R-function `ampute`

is available in multiple imputation package **mice**. Van Buuren’s book (2018) gives an extensive overview of missing data methodology and multiple imputation algorithm MICE. In this tutorial, we will focus on amputation, which is *the generation of missing values in complete data* and as such, the opposite of imputation.

This tutorial covers

- The function’s underlying multivariate amputation procedure
- The function’s arguments
- Some additional features
- Special solutions for special cases

For a theoretical justification and a demonstration of the method, we refer to Schouten, Lugtig and Vink (2018) (use this paper as your reference). The paper discusses how missing data methods are evaluated in four steps:

- A multivariate, complete dataset is simulated and considered the population of interest
- The complete dataset is made incomplete: amputation
- The incomplete dataset is processed using the missing data method of interest: imputation
- Both the complete dataset as well as the imputed dataset are analyzed with the analysis technique of interest. A comparison of the outcomes gives an indication of the performance of the missing data method

Obiously, the second step in this procedure (amputation) is very important, since the amputation procedure determines the severity of the missing data problem. Before the existence of `ampute`

, a proper amputation procedure was not available. Therefore, most simulation studies were performed with completely random missing data (MCAR). However, in real world problems the MCAR assumption is often unlikely and missing data methods need to handle MAR and MNAR mechanisms as well. Hence, we needed an amputation procedure that could create severe MAR and MNAR missingness: `ampute`

!

An example of how `ampute`

can be used to evaluate missing data methods can be found in Schouten and Vink (2021). With `ampute`

it is straightforward to generate missing values in multivariate datasets, with any desired proportion, varying underlying mechanisms, different missingness patterns and varying data distributions.

We will now discuss the multivariate amputation procedure that underlies `ampute`

. Then, we will discuss the function’s arguments and some additional features. In the end, we propose solutions for special cases such as mixed missingness mechanisms and amputation in datasets with a large number of variables.

The multivariate amputation procedure is built on an initial idea proposed by (1999) and adapted to be more generic and easy to use in Schouten, Lugtig and Vink (2018). Figure 1 shows a schematic overview of the resulting amputation procedure. On the left, the method requires a complete dataset of \(n\) participants and \(m\) variables. On the right, multiple subsets with either incomplete or complete data are merged, resulting in an incomplete version of the original dataset.