discotime.datasets.utils module

class discotime.datasets.utils.DataConfig(*, batch_size: int = 32, n_time_bins: int = 20, discretization_scheme: str = 'number', discretization_grid: list[float] | None = None, max_time: float | None = None)[source]

Bases: object

Configuration class for data modules.

batch_size: int = 32

The batch size defines the number of samples that will be propagated through the network at each training step.

discretization_grid: list[float] | None = None
discretization_scheme: str = 'number'
max_time: float | None = None
n_time_bins: int = 20

Specifies the size of the discretization grid. A default of around 20-30 usually works good.

class discotime.datasets.utils.LabelDiscretizer(scheme: str | None = None, n_bins: int | None = None, *, cut_points: Iterable[int | int64 | float | float64] | None = None, max_time: int | int64 | float | float64 | None = None)[source]

Bases: object

Discretize continous time/event pairs.

The class can either learn a discretization grid from the training data using one of the built-in discretization schemes, or the user can supply an iterable with cut points.

Implementation heavily inspired by pycox.preprocessing.label_tranform [1].

[1]: Kvamme, Håvard, Ørnulf Borgan, and Ida Scheel. “Time-to-event prediction with neural networks and Cox regression.” arXiv preprint arXiv:1907.00825 (2019).

property cuts: ndarray[Any, dtype[float64]]
fit(time: Iterable[int | int64 | float | float64], event: Iterable[int | int64]) None[source]
fit_transform(time: Iterable[int | int64 | float | float64], event: Iterable[int | int64]) tuple[numpy.ndarray[Any, numpy.dtype[numpy.integer]], numpy.ndarray[Any, numpy.dtype[numpy.integer]]][source]
property max_time: int | int64 | float | float64
transform(time: Iterable[int | int64 | float | float64], event: Iterable[int | int64]) tuple[numpy.ndarray[Any, numpy.dtype[numpy.integer]], numpy.ndarray[Any, numpy.dtype[numpy.integer]]][source]
class discotime.datasets.utils.LitSurvDataModule[source]

Bases: LightningDataModule

property batch_size: int
property config: DataConfig
property cuts: ndarray[Any, dtype[float64]]
property lab_transformer: LabelTransformer
abstract property n_features: int
abstract property n_risks: int
property n_time_bins: int
abstract setup(stage: str | None = None) None[source]
abstract property time_range: tuple[int | numpy.int64 | float | numpy.float64, int | numpy.int64 | float | numpy.float64]
class discotime.datasets.utils.SurvDataset(features: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], event_time: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], event_status: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], discretizer: LabelTransformer)[source]

Bases: Dataset

Assemble a survival dataset for discrete-time survival analysis.

A discrete time survival dataset \(\mathfrak{D}\) is a set of \(n\) tuples \((t_{i}, \delta_{i}, \mathbf{x}_{i})\) where \((t_i = \min \{T_i, C_i\})\) is the event time, \(\delta_{i} \in \{0, ..., m\}\) is the event indicator (with \((\delta_i = 0)\) defined as censoring), and \(\mathbf{x}_{i} \in \mathbb{R}^d\) is a \(d\)-dimensional vector of time-independent predictors or covariates.

Parameters:
  • features – time-independent features.

  • event_time – follow-up time (continuous).

  • event_status – event indicator (0=censored, 1/2/…=competing risks).

  • discretizer – discretizer that follows the LabelTransformer protocol that convert continuous time/event tuples to their respective discretized versions. Typically this would be LabelDiscretizer unless a custom discretization object is used.

discotime.datasets.utils.default_fts_transformer()[source]