discotime.datasets.utils module

class discotime.datasets.utils.DataConfig(*, batch_size: int = 32, n_time_bins: int = 20, discretization_scheme: str = 'number', discretization_grid: list[float] | None = None, max_time: float | None = None)[source]

Bases: object

Configuration class for data modules.

batch_size: int = 32: The batch size defines the number of samples that will be propagated through the network at each training step.

discretization_grid: list[float] | None = None

discretization_scheme: str = 'number'

max_time: float | None = None

n_time_bins: int = 20: Specifies the size of the discretization grid. A default of around 20-30 usually works good.

Bases: object

Discretize continous time/event pairs.

The class can either learn a discretization grid from the training data using one of the built-in discretization schemes, or the user can supply an iterable with cut points.

Implementation heavily inspired by pycox.preprocessing.label_tranform [1].

[1]: Kvamme, Håvard, Ørnulf Borgan, and Ida Scheel. “Time-to-event prediction with neural networks and Cox regression.” arXiv preprint arXiv:1907.00825 (2019).

property cuts: ndarray[Any, dtype[float64]]

fit(time: Iterable[int | int64 | float | float64], event: Iterable[int | int64]) → None[source]

fit_transform(time: Iterable[int | int64 | float | float64], event: Iterable[int | int64]) → tuple[numpy.ndarray[Any, numpy.dtype[numpy.integer]], numpy.ndarray[Any, numpy.dtype[numpy.integer]]][source]

property max_time: int | int64 | float | float64

transform(time: Iterable[int | int64 | float | float64], event: Iterable[int | int64]) → tuple[numpy.ndarray[Any, numpy.dtype[numpy.integer]], numpy.ndarray[Any, numpy.dtype[numpy.integer]]][source]

class discotime.datasets.utils.LitSurvDataModule[source]

Bases: LightningDataModule

property batch_size: int

property config: DataConfig

property cuts: ndarray[Any, dtype[float64]]

property lab_transformer: LabelTransformer

abstract property n_features: int

abstract property n_risks: int

property n_time_bins: int

abstract setup(stage: str | None = None) → None[source]

abstract property time_range: tuple[int | numpy.int64 | float | numpy.float64, int | numpy.int64 | float | numpy.float64]

Bases: Dataset

Assemble a survival dataset for discrete-time survival analysis.

A discrete time survival dataset \(\mathfrak{D}\) is a set of \(n\) tuples \((t_{i}, \delta_{i}, \mathbf{x}_{i})\) where \((t_i = \min \{T_i, C_i\})\) is the event time, \(\delta_{i} \in \{0, ..., m\}\) is the event indicator (with \((\delta_i = 0)\) defined as censoring), and \(\mathbf{x}_{i} \in \mathbb{R}^d\) is a \(d\)-dimensional vector of time-independent predictors or covariates.

Parameters:

features – time-independent features.
event_time – follow-up time (continuous).
event_status – event indicator (0=censored, 1/2/…=competing risks).
discretizer – discretizer that follows the LabelTransformer protocol that convert continuous time/event tuples to their respective discretized versions. Typically this would be LabelDiscretizer unless a custom discretization object is used.

discotime.datasets.utils.default_fts_transformer()[source]