evofr.data package

Submodules

evofr.data.case_counts module

class CaseCounts(raw_cases, date_to_index=None)

Bases: DataSpec

Parameters:
  • raw_cases (DataFrame)

  • date_to_index (dict | None)

make_data_dict(data=None)

Get arguments to be passed to numpyro models as a dictionary.

Parameters:

data (dict | None) – Optional dictionary to add arguments to.

Returns:

Dictionary containing arguments.

Return type:

dict

evofr.data.case_frequencies module

class CaseFrequencyData(raw_cases, raw_seq, date_to_index=None, var_names=None, pivot=None)

Bases: DataSpec

Parameters:
  • raw_cases (DataFrame)

  • raw_seq (DataFrame)

  • date_to_index (dict | None)

  • var_names (List | None)

  • pivot (str | None)

make_data_dict(data=None)

Get arguments to be passed to numpyro models as a dictionary.

Parameters:

data (dict | None) – Optional dictionary to add arguments to.

Returns:

Dictionary containing arguments.

Return type:

dict

class HierarchicalCFData(raw_cases, raw_seq, group, date_to_index=None)

Bases: object

Parameters:
  • raw_cases (DataFrame)

  • raw_seq (DataFrame)

  • group (str)

  • date_to_index (dict | None)

make_data_dict(data=None)
Parameters:

data (dict | None)

evofr.data.data_helpers module

counts_to_matrix(raw_seqs, var_names, date_to_index=None)

Process ‘raw_seq’ data to nd.array including unobserved dates.

Parameters:
  • raw_seq – a dataframe containing sequence counts with columns ‘sequences’ and ‘date’.

  • var_names (List[str]) – list of variant to count observations.

  • date_to_index (dict | None) – optional dictionary for mapping calender dates to nd.array indices.

  • raw_seqs (DataFrame)

Returns:

nd.array containing number of sequences of each variant on each date.

Return type:

C

expand_dates(dates, T_forecast)

Extend existing dates list with forecast interval of length ‘T_forecast’

Parameters:

T_forecast (int)

forecast_dates(dates, T_forecast)

Generate dates of forecast given forecast interval of length ‘T_forecast’.

Parameters:

T_forecast (int)

format_var_names(raw_names, pivot=None)

Places pivot category to be last element if present.

Parameters:
  • raw_names (List[str])

  • pivot (str | None)

prep_cases(raw_cases, date_to_index=None)

Process raw_cases data to nd.array including unobserved dates.

Parameters:
  • raw_cases (DataFrame) – a dataframe containing case counts with columns ‘cases’ and ‘date’.

  • date_to_index (dict | None) – optional dictionary for mapping calender dates to nd.array indices.

Returns:

nd.array containing number of cases on each date.

Return type:

C

prep_dates(raw_dates)

Return vector of dates and a mapping of dates to indices.

Parameters:

raw_dates (Series) – pandas series containing dates of interest

Returns:

  • dates – list containing dates

  • date_to_index – dictionary taking in dates and returning integer indices

prep_sequence_counts(raw_seqs, date_to_index=None, var_names=None, pivot=None)

Process ‘raw_seq’ data to nd.array including unobserved dates.

Parameters:
  • raw_seq – a dataframe containing sequence counts with columns ‘sequences’ and ‘date’.

  • raw_seqs (DataFrame)

  • date_to_index (dict | None)

  • var_names (List | None)

  • pivot (str | None)

date_to_index:

optional dictionary for mapping calender dates to nd.array indices.

var_names:

optional list of variant to count observations.

pivot:

optional name of variant to place last. This will usually used as a reference or pivot strain.

Returns:

  • var_names – list of variants counted

  • C – nd.array containing number of sequences of each variant on each date.

Parameters:
  • raw_seqs (DataFrame)

  • date_to_index (dict | None)

  • var_names (List | None)

  • pivot (str | None)

evofr.data.data_spec module

class DataSpec

Bases: ABC

abstract make_data_dict(data=None)

Get arguments to be passed to numpyro models as a dictionary.

Parameters:

data (dict | None) – Optional dictionary to add arguments to.

Returns:

Dictionary containing arguments.

Return type:

dict

registry = {'CaseCounts': <class 'evofr.data.case_counts.CaseCounts'>, 'CaseFrequencyData': <class 'evofr.data.case_frequencies.CaseFrequencyData'>, 'DelaySequenceCounts': <class 'evofr.models.mlr_nowcast.DelaySequenceCounts'>, 'DistanceMigrationData': <class 'evofr.models.migration_from_distances.DistanceMigrationData'>, 'HierCases': <class 'evofr.data.hier_cases.HierCases'>, 'HierFrequencies': <class 'evofr.data.hier_frequencies.HierFrequencies'>, 'InnovationSequenceCounts': <class 'evofr.models.mlr_innovation.InnovationSequenceCounts'>, 'VariantFrequencies': <class 'evofr.data.variant_frequencies.VariantFrequencies'>}

evofr.data.hier_cases module

class HierCases(raw_cases, group, date_to_index=None)

Bases: DataSpec

Parameters:
  • raw_cases (DataFrame)

  • group (str)

  • date_to_index (dict | None)

make_data_dict(data=None)

Get arguments to be passed to numpyro models as a dictionary.

Parameters:

data (dict | None) – Optional dictionary to add arguments to.

Returns:

Dictionary containing arguments.

Return type:

dict

evofr.data.hier_frequencies module

class HierFrequencies(raw_seq, group, date_to_index=None, pivot=None, aggregation_frequency=None)

Bases: DataSpec

Parameters:
  • raw_seq (DataFrame)

  • group (str)

  • date_to_index (dict | None)

  • pivot (str | None)

  • aggregation_frequency (str | None)

make_data_dict(data=None)

Get arguments to be passed to numpyro models as a dictionary.

Parameters:

data (dict | None) – Optional dictionary to add arguments to.

Returns:

Dictionary containing arguments.

Return type:

dict

evofr.data.variant_frequencies module

class VariantFrequencies(raw_seq, date_to_index=None, var_names=None, pivot=None, aggregation_frequency=None)

Bases: DataSpec

Parameters:
  • raw_seq (DataFrame)

  • date_to_index (dict | None)

  • var_names (List | None)

  • pivot (str | None)

  • aggregation_frequency (str | None)

make_data_dict(data=None)

Get arguments to be passed to numpyro models as a dictionary.

Parameters:

data (dict | None) – Optional dictionary to add arguments to.

Returns:

Dictionary containing arguments.

Return type:

dict

variant_counts_to_dataframe(var_counts, var_names=['Variant', 'other'], start_date=Timestamp('2022-01-01 00:00:00'))

Convert matrix of variant counts to pandas dataframe for input to ef.VariantFrequencies.

Parameters:
  • var_counts – nd.array of counts var_counts[t,v] of variant v on day t.

  • variant_names – List of variant names to assign each column.

  • start_date – Pandas datetime to use as first date.

  • var_names (List[str])

Return type:

seq_counts

Module contents