evofr.data package
Submodules
evofr.data.case_counts module
- class CaseCounts(raw_cases, date_to_index=None)
Bases:
DataSpec
- Parameters:
raw_cases (DataFrame)
date_to_index (dict | None)
- make_data_dict(data=None)
Get arguments to be passed to numpyro models as a dictionary.
- Parameters:
data (dict | None) – Optional dictionary to add arguments to.
- Returns:
Dictionary containing arguments.
- Return type:
dict
evofr.data.case_frequencies module
- class CaseFrequencyData(raw_cases, raw_seq, date_to_index=None, var_names=None, pivot=None)
Bases:
DataSpec
- Parameters:
raw_cases (DataFrame)
raw_seq (DataFrame)
date_to_index (dict | None)
var_names (List | None)
pivot (str | None)
- make_data_dict(data=None)
Get arguments to be passed to numpyro models as a dictionary.
- Parameters:
data (dict | None) – Optional dictionary to add arguments to.
- Returns:
Dictionary containing arguments.
- Return type:
dict
evofr.data.data_helpers module
- counts_to_matrix(raw_seqs, var_names, date_to_index=None)
Process ‘raw_seq’ data to nd.array including unobserved dates.
- Parameters:
raw_seq – a dataframe containing sequence counts with columns ‘sequences’ and ‘date’.
var_names (List[str]) – list of variant to count observations.
date_to_index (dict | None) – optional dictionary for mapping calender dates to nd.array indices.
raw_seqs (DataFrame)
- Returns:
nd.array containing number of sequences of each variant on each date.
- Return type:
C
- expand_dates(dates, T_forecast)
Extend existing dates list with forecast interval of length ‘T_forecast’
- Parameters:
T_forecast (int)
- forecast_dates(dates, T_forecast)
Generate dates of forecast given forecast interval of length ‘T_forecast’.
- Parameters:
T_forecast (int)
- format_var_names(raw_names, pivot=None)
Places pivot category to be last element if present.
- Parameters:
raw_names (List[str])
pivot (str | None)
- prep_cases(raw_cases, date_to_index=None)
Process raw_cases data to nd.array including unobserved dates.
- Parameters:
raw_cases (DataFrame) – a dataframe containing case counts with columns ‘cases’ and ‘date’.
date_to_index (dict | None) – optional dictionary for mapping calender dates to nd.array indices.
- Returns:
nd.array containing number of cases on each date.
- Return type:
C
- prep_dates(raw_dates)
Return vector of dates and a mapping of dates to indices.
- Parameters:
raw_dates (Series) – pandas series containing dates of interest
- Returns:
dates – list containing dates
date_to_index – dictionary taking in dates and returning integer indices
- prep_sequence_counts(raw_seqs, date_to_index=None, var_names=None, pivot=None)
Process ‘raw_seq’ data to nd.array including unobserved dates.
- Parameters:
raw_seq – a dataframe containing sequence counts with columns ‘sequences’ and ‘date’.
raw_seqs (DataFrame)
date_to_index (dict | None)
var_names (List | None)
pivot (str | None)
- date_to_index:
optional dictionary for mapping calender dates to nd.array indices.
- var_names:
optional list of variant to count observations.
- pivot:
optional name of variant to place last. This will usually used as a reference or pivot strain.
- Returns:
var_names – list of variants counted
C – nd.array containing number of sequences of each variant on each date.
- Parameters:
raw_seqs (DataFrame)
date_to_index (dict | None)
var_names (List | None)
pivot (str | None)
evofr.data.data_spec module
- class DataSpec
Bases:
ABC
- abstract make_data_dict(data=None)
Get arguments to be passed to numpyro models as a dictionary.
- Parameters:
data (dict | None) – Optional dictionary to add arguments to.
- Returns:
Dictionary containing arguments.
- Return type:
dict
- registry = {'CaseCounts': <class 'evofr.data.case_counts.CaseCounts'>, 'CaseFrequencyData': <class 'evofr.data.case_frequencies.CaseFrequencyData'>, 'DelaySequenceCounts': <class 'evofr.models.mlr_nowcast.DelaySequenceCounts'>, 'DistanceMigrationData': <class 'evofr.models.migration_from_distances.DistanceMigrationData'>, 'HierCases': <class 'evofr.data.hier_cases.HierCases'>, 'HierFrequencies': <class 'evofr.data.hier_frequencies.HierFrequencies'>, 'InnovationSequenceCounts': <class 'evofr.models.mlr_innovation.InnovationSequenceCounts'>, 'VariantFrequencies': <class 'evofr.data.variant_frequencies.VariantFrequencies'>}
evofr.data.hier_cases module
- class HierCases(raw_cases, group, date_to_index=None)
Bases:
DataSpec
- Parameters:
raw_cases (DataFrame)
group (str)
date_to_index (dict | None)
- make_data_dict(data=None)
Get arguments to be passed to numpyro models as a dictionary.
- Parameters:
data (dict | None) – Optional dictionary to add arguments to.
- Returns:
Dictionary containing arguments.
- Return type:
dict
evofr.data.hier_frequencies module
- class HierFrequencies(raw_seq, group, date_to_index=None, pivot=None, aggregation_frequency=None)
Bases:
DataSpec
- Parameters:
raw_seq (DataFrame)
group (str)
date_to_index (dict | None)
pivot (str | None)
aggregation_frequency (str | None)
- make_data_dict(data=None)
Get arguments to be passed to numpyro models as a dictionary.
- Parameters:
data (dict | None) – Optional dictionary to add arguments to.
- Returns:
Dictionary containing arguments.
- Return type:
dict
evofr.data.variant_frequencies module
- class VariantFrequencies(raw_seq, date_to_index=None, var_names=None, pivot=None, aggregation_frequency=None)
Bases:
DataSpec
- Parameters:
raw_seq (DataFrame)
date_to_index (dict | None)
var_names (List | None)
pivot (str | None)
aggregation_frequency (str | None)
- make_data_dict(data=None)
Get arguments to be passed to numpyro models as a dictionary.
- Parameters:
data (dict | None) – Optional dictionary to add arguments to.
- Returns:
Dictionary containing arguments.
- Return type:
dict
- variant_counts_to_dataframe(var_counts, var_names=['Variant', 'other'], start_date=Timestamp('2022-01-01 00:00:00'))
Convert matrix of variant counts to pandas dataframe for input to ef.VariantFrequencies.
- Parameters:
var_counts – nd.array of counts var_counts[t,v] of variant v on day t.
variant_names – List of variant names to assign each column.
start_date – Pandas datetime to use as first date.
var_names (List[str])
- Return type:
seq_counts