| Title: | Taxonomically Informed Metabolite Annotation |
|---|---|
| Description: | TIMA provides a reproducible workflow for taxonomically informed metabolite annotation from feature tables, MS/MS spectra, and optional external resources such as SIRIUS and GNPS outputs. It combines mass, spectral, taxonomic, and structural evidence into a transparent scoring framework that can be inspected step-by-step. The package targets metabolomics practitioners who need configurable, scriptable, and documented annotation pipelines for research and production settings. |
| Authors: | Adriano Rutz [aut, cre] (ORCID: <https://orcid.org/0000-0003-0443-9902>), Pierre-Marie Allard [ctb] (ORCID: <https://orcid.org/0000-0003-3389-2191>) |
| Maintainer: | Adriano Rutz <[email protected]> |
| License: | AGPL (>= 3) |
| Version: | 2.13.0.9000 |
| Built: | 2026-06-04 17:00:23 UTC |
| Source: | https://github.com/taxonomicallyinformedannotation/tima |
Format (outside-multimer, canonical):
[<n>M<-losses><-carrier-losses><+clusters><+carriers>]<|z|><sign>
Format (inside-multimer, when loss_inside_multimer or
cluster_inside_multimer is TRUE and n_mer >= 2):
[<n>(M<inside-losses><inside-clusters>)<outside-losses><outside-clusters><carriers>]<|z|><sign>
adduct_to_string( n_mer, carriers, clusters, losses, z, loss_inside_multimer = FALSE, cluster_inside_multimer = FALSE )adduct_to_string( n_mer, carriers, clusters, losses, z, loss_inside_multimer = FALSE, cluster_inside_multimer = FALSE )
n_mer |
Integer multimer count. |
carriers |
Named integer vector of carrier counts/signs (e.g. H, Na). |
clusters |
Named integer vector of neutral cluster additions. |
losses |
Named integer vector of neutral losses. |
z |
Integer signed charge. |
loss_inside_multimer |
Logical; place losses inside |
cluster_inside_multimer |
Logical; place clusters inside |
The "inside" variant captures the chemistry where each monomer carries
the cluster/loss BEFORE the multimer assembles, e.g. [2(M-H2O)+H]+
(two M-H2O monomers dimerize, then protonate) or [2(M+NaCl)+H]+ (each
M binds NaCl first, then dimerizes). These have different implied
neutral masses than their outside-multimer counterparts.
n omitted when n_mer == 1
|z| omitted when |z| == 1
Canonical adduct string.
Mass-based MS1 annotation. The pipeline is a sequence of clearly-bounded steps; each step is documented inline. In short:
1. **Pairs in RT windows.** For every feature, find all other features in the same RT tolerance window (per sample) and compute the m/z delta. The pair is always oriented `(lower_mz, higher_mz)` so that `delta = mz_higher - mz_lower >= 0`. 2. **Adduct edges.** Match each pair's `delta` against the table of precomputed pairwise differences between known mode-specific adducts. A match labels the edge `adduct_low _ adduct_high` and tentatively assigns the corresponding adduct to each endpoint. 3. **Cluster edges.** Match `delta` against cluster masses (e.g. ACN, MeOH, Na). A cluster adds mass to the *higher* m/z peak, so the cluster suffix `+<cluster>` is attached to the **dest** node's adduct hypotheses. 4. **Neutral-loss edges.** Match `delta` against neutral-loss masses (e.g. H2O, CO2). For an NL pair, the **higher** m/z peak is the precursor and the **lower** m/z peak is the product. The loss suffix `-<loss>` is attached to the precursor's adduct hypotheses (so the same neutral M is inferred from both peaks). 5. **Node hypotheses.** Gather, per feature, **all** plausible adduct labels: (a) what we inferred from adduct/cluster/loss edges, (b) any adduct supplied upstream by the preprocessing tool, and (c) the universal baseline `[M+H]+` / `[M-H]-`. Hypotheses are never dropped at this stage. 6. **Library match.** For every `(feature, candidate_adduct)` pair, compute the implied neutral mass M and look it up in the library within the ppm tolerance. 7. **Network-consensus pruning.** If a feature ends up with several library hits, drop only the candidates whose adduct has *zero* support in the adduct edge graph **and** whose drop still leaves a supported alternative. Ties are kept and drops are logged. 8. **Keep unmatched adducts.** Adduct hypotheses are exported even when no library structure matches, so downstream tools still see the adduct annotation.
annotate_masses( features = get_params(step = "annotate_masses")$files$features$prepared, output_annotations = get_params(step = "annotate_masses")$files$annotations$prepared$structural$ms1, output_edges = get_params(step = "annotate_masses")$files$networks$spectral$edges$raw$ms1, name_source = get_params(step = "annotate_masses")$names$source, name_target = get_params(step = "annotate_masses")$names$target, library = get_params(step = "annotate_masses")$files$libraries$sop$merged$keys, str_stereo = get_params(step = "annotate_masses")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "annotate_masses")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "annotate_masses")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "annotate_masses")$files$libraries$sop$merged$structures$taxonomies$npc, adducts_list = get_params(step = "annotate_masses")$ms$adducts, clusters_list = get_params(step = "annotate_masses")$ms$clusters, neutral_losses_list = get_params(step = "annotate_masses")$ms$neutral_losses, ms_mode = get_params(step = "annotate_masses")$ms$polarity, tolerance_ppm = get_params(step = "annotate_masses")$ms$tolerances$mass$ppm$ms1, tolerance_dalton = get_params(step = "annotate_masses")$ms$tolerances$mass$dalton$ms1, tolerance_rt = get_params(step = "annotate_masses")$ms$tolerances$rt$adducts, adduct_consistency = get_params(step = "annotate_masses")$ms$adducts$consistency$type, adduct_min_support = get_params(step = "annotate_masses")$ms$adducts$consistency$min_support, adduct_consistency_min_degree = get_params(step = "annotate_masses")$ms$adducts$consistency$min_degree )annotate_masses( features = get_params(step = "annotate_masses")$files$features$prepared, output_annotations = get_params(step = "annotate_masses")$files$annotations$prepared$structural$ms1, output_edges = get_params(step = "annotate_masses")$files$networks$spectral$edges$raw$ms1, name_source = get_params(step = "annotate_masses")$names$source, name_target = get_params(step = "annotate_masses")$names$target, library = get_params(step = "annotate_masses")$files$libraries$sop$merged$keys, str_stereo = get_params(step = "annotate_masses")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "annotate_masses")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "annotate_masses")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "annotate_masses")$files$libraries$sop$merged$structures$taxonomies$npc, adducts_list = get_params(step = "annotate_masses")$ms$adducts, clusters_list = get_params(step = "annotate_masses")$ms$clusters, neutral_losses_list = get_params(step = "annotate_masses")$ms$neutral_losses, ms_mode = get_params(step = "annotate_masses")$ms$polarity, tolerance_ppm = get_params(step = "annotate_masses")$ms$tolerances$mass$ppm$ms1, tolerance_dalton = get_params(step = "annotate_masses")$ms$tolerances$mass$dalton$ms1, tolerance_rt = get_params(step = "annotate_masses")$ms$tolerances$rt$adducts, adduct_consistency = get_params(step = "annotate_masses")$ms$adducts$consistency$type, adduct_min_support = get_params(step = "annotate_masses")$ms$adducts$consistency$min_support, adduct_consistency_min_degree = get_params(step = "annotate_masses")$ms$adducts$consistency$min_degree )
features |
Table containing your previous annotation to complement |
output_annotations |
Output for mass based structural annotations |
output_edges |
Output for mass based edges |
name_source |
Name of the source features column |
name_target |
Name of the target features column |
library |
Library containing the keys |
str_stereo |
File containing structures stereo |
str_met |
File containing structures metadata |
str_tax_cla |
File containing Classyfire taxonomy |
str_tax_npc |
File containing NPClassifier taxonomy |
adducts_list |
List of adducts to be used |
clusters_list |
List of clusters to be used |
neutral_losses_list |
List of neutral losses to be used |
ms_mode |
Ionization mode. Must be 'pos' or 'neg' |
tolerance_ppm |
Tolerance to perform annotation. Should be <= 20 ppm |
tolerance_dalton |
Absolute mass tolerance in Daltons for annotation |
tolerance_rt |
Tolerance to group adducts. Should be <= 0.05 minutes |
adduct_consistency |
Consistency mode for adduct edge filtering: one of
|
adduct_min_support |
Minimum number of independent supporting neighbors for an adduct assignment in consistency-filtered regions |
adduct_consistency_min_degree |
In |
Named character of paths to the annotations and edges files.
Other annotation:
annotate_spectra(),
filter_annotations(),
weight_annotations(),
write_mztab()
## Not run: annotate_masses() ## End(Not run)## Not run: annotate_masses() ## End(Not run)
Annotates MS/MS query spectra against one or more spectral libraries, computing similarity scores and returning best candidate annotations above a similarity threshold.
annotate_spectra( input = get_params(step = "annotate_spectra")$files$spectral$raw, libraries = get_params(step = "annotate_spectra")$files$libraries$spectral, polarity = get_params(step = "annotate_spectra")$ms$polarity, output = get_params(step = "annotate_spectra")$files$annotations$raw$spectral$spectral, method = get_params(step = "annotate_spectra")$similarities$methods$annotations, threshold = get_params(step = "annotate_spectra")$similarities$thresholds$annotations, ppm = get_params(step = "annotate_spectra")$ms$tolerances$mass$ppm$ms2, dalton = get_params(step = "annotate_spectra")$ms$tolerances$mass$dalton$ms2, cutoff = get_params(step = "annotate_spectra")$ms$thresholds$ms2$intensity, min_fragments = get_params(step = "annotate_spectra")$ms$thresholds$ms2$min_fragments, approx = get_params(step = "annotate_spectra")$annotations$ms2approx, ms1_annotations = NULL, qutoff = deprecated() )annotate_spectra( input = get_params(step = "annotate_spectra")$files$spectral$raw, libraries = get_params(step = "annotate_spectra")$files$libraries$spectral, polarity = get_params(step = "annotate_spectra")$ms$polarity, output = get_params(step = "annotate_spectra")$files$annotations$raw$spectral$spectral, method = get_params(step = "annotate_spectra")$similarities$methods$annotations, threshold = get_params(step = "annotate_spectra")$similarities$thresholds$annotations, ppm = get_params(step = "annotate_spectra")$ms$tolerances$mass$ppm$ms2, dalton = get_params(step = "annotate_spectra")$ms$tolerances$mass$dalton$ms2, cutoff = get_params(step = "annotate_spectra")$ms$thresholds$ms2$intensity, min_fragments = get_params(step = "annotate_spectra")$ms$thresholds$ms2$min_fragments, approx = get_params(step = "annotate_spectra")$annotations$ms2approx, ms1_annotations = NULL, qutoff = deprecated() )
input |
character Vector or list of query spectral file paths (.mgf). |
libraries |
character Vector or list of library spectral file paths (.mgf / Spectra-supported). Must contain at least one path. |
polarity |
character MS polarity; one of |
output |
character Output file path (the function writes a tabular file here). |
method |
character Similarity method; one of
|
threshold |
numeric Minimal similarity score to retain candidates (0-1). |
ppm |
numeric Relative mass tolerance (ppm) for MS/MS matching. |
dalton |
numeric Absolute mass tolerance (Daltons) for MS/MS matching. |
cutoff |
numeric Intensity cutoff under which MS2 fragments are removed. Non-negative numeric or NULL for dynamic thresholding. |
min_fragments |
integer Minimum number of fragment peaks a spectrum must have after cleaning to be retained (default: 2). |
approx |
logical If TRUE perform matching ignoring precursor masses (broader, slower); if FALSE restrict library to precursor-tolerant spectra first. |
ms1_annotations |
Optional path or data frame containing
|
qutoff |
This is an orchestration wrapper that performs:
Input validation & normalization (query + libraries, numeric params).
Query spectra import & light preprocessing (intensity cutoff).
Library spectra import, cleaning of empty peak lists, optional polarity
filtering, optional precursor-based library size reduction (when
approx = FALSE).
Similarity computation via calculate_entropy_and_similarity().
Candidate metadata extraction (formula, name, etc.).
Result shaping: derive error (mz), select canonical output columns, threshold filtering, keep best per (feature_id, library, connectivity layer).
Export of parameters & results to the configured output path.
If no annotations are produced (empty inputs or below threshold), a
standardized empty template (see fake_annotations_columns()) is exported
to ensure downstream code receives expected columns.
Character scalar: the output file path (invisible). Side effect:
writes the annotations table to output.
The function performs strict validation and logs informative messages.
File existence is checked early; similarity computation is wrapped in a
tryCatch to surface errors without leaving partially allocated objects.
Library precursor reduction (when approx = FALSE) limits similarity
computation to precursor-tolerant spectra, reducing complexity for large
libraries.
Other annotation:
annotate_masses(),
filter_annotations(),
weight_annotations(),
write_mztab()
## Not run: copy_backbone() go_to_cache() get_file( url = get_default_paths()$urls$examples$spectra_mini, export = get_params(step = "annotate_spectra")$files$spectral$raw ) get_file( url = get_default_paths()$urls$examples$spectral_lib_mini$with_rt, export = get_default_paths()$data$source$libraries$spectra$exp$with_rt ) annotate_spectra( libraries = get_default_paths()$data$source$libraries$spectra$exp$with_rt ) unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() get_file( url = get_default_paths()$urls$examples$spectra_mini, export = get_params(step = "annotate_spectra")$files$spectral$raw ) get_file( url = get_default_paths()$urls$examples$spectral_lib_mini$with_rt, export = get_default_paths()$data$source$libraries$spectra$exp$with_rt ) annotate_spectra( libraries = get_default_paths()$data$source$libraries$spectra$exp$with_rt ) unlink("data", recursive = TRUE) ## End(Not run)
This function calculates the neutral mass (M) from an observed m/z value and adduct notation. It accounts for charge, multimers, isotopes, and adduct modifications.
The calculation follows the formula: M = (|z| * (m/z - iso_shift) - modifications + z * e_mass) / n_mer where: - |z| = absolute number of charges - z = signed charge (`|z| * polarity`) - m/z = observed mass-to-charge ratio - iso_shift = `n_iso * ISOTOPE_MASS_SHIFT_DALTONS` - modifications = total neutral mass change from adduct modifications - e_mass = electron mass - n_mer = multimer count
calculate_mass_of_m(mz, adduct_string, electron_mass = ELECTRON_MASS_DALTONS)calculate_mass_of_m(mz, adduct_string, electron_mass = ELECTRON_MASS_DALTONS)
mz |
numeric Observed m/z value in Daltons. Must be positive. |
adduct_string |
character Adduct notation string
(e.g., |
electron_mass |
numeric Electron mass in Daltons (default: ELECTRON_MASS_DALTONS from constants.R - CODATA 2018 value) |
Numeric neutral mass (M) in Daltons. Returns 0 if: - Adduct parsing fails - Invalid input parameters - Division by zero would occur (n_mer = 0 or n_charges = 0) Returns NA if calculated mass is negative (physically impossible)
Other mass-spectrometry:
calculate_mz_from_mass(),
calculate_similarity(),
harmonize_adducts(),
import_spectra(),
parse_adduct()
# Simple protonated molecule calculate_mass_of_m(mz = 123.4567, adduct_string = "[M+H]+") # Expected: ~122.45 Da # Sodium adduct calculate_mass_of_m(mz = 145.4421, adduct_string = "[M+Na]+") # Expected: ~122.45 Da # Complex adduct with water loss calculate_mass_of_m(mz = 105.4467, adduct_string = "[M-H2O+H]+") # Expected: ~122.45 Da # Dimer calculate_mass_of_m(mz = 245.9053, adduct_string = "[2M+H]+") # Expected: ~122.45 Da # Doubly charged calculate_mass_of_m(mz = 62.2311, adduct_string = "[M+2H]2+") # Expected: ~122.45 Da# Simple protonated molecule calculate_mass_of_m(mz = 123.4567, adduct_string = "[M+H]+") # Expected: ~122.45 Da # Sodium adduct calculate_mass_of_m(mz = 145.4421, adduct_string = "[M+Na]+") # Expected: ~122.45 Da # Complex adduct with water loss calculate_mass_of_m(mz = 105.4467, adduct_string = "[M-H2O+H]+") # Expected: ~122.45 Da # Dimer calculate_mass_of_m(mz = 245.9053, adduct_string = "[2M+H]+") # Expected: ~122.45 Da # Doubly charged calculate_mass_of_m(mz = 62.2311, adduct_string = "[M+2H]2+") # Expected: ~122.45 Da
This is the inverse of calculate_mass_of_m. Given a neutral mass and adduct, it calculates the expected m/z value.
calculate_mz_from_mass( neutral_mass, adduct_string, electron_mass = ELECTRON_MASS_DALTONS )calculate_mz_from_mass( neutral_mass, adduct_string, electron_mass = ELECTRON_MASS_DALTONS )
neutral_mass |
Numeric neutral mass (M) in Daltons |
adduct_string |
Character string representing the adduct |
electron_mass |
Numeric electron mass in Daltons |
Numeric m/z value in Daltons
Other mass-spectrometry:
calculate_mass_of_m(),
calculate_similarity(),
harmonize_adducts(),
import_spectra(),
parse_adduct()
# Calculate m/z for a protonated molecule calculate_mz_from_mass(neutral_mass = 122.45, adduct_string = "[M+H]+") # Expected: ~123.4567 # Verify round-trip calculation mass <- 122.45 adduct <- "[M+H]+" mz <- calculate_mz_from_mass(mass, adduct) mass_back <- calculate_mass_of_m(mz, adduct) all.equal(mass, mass_back) # Should be TRUE# Calculate m/z for a protonated molecule calculate_mz_from_mass(neutral_mass = 122.45, adduct_string = "[M+H]+") # Expected: ~123.4567 # Verify round-trip calculation mass <- 122.45 adduct <- "[M+H]+" mz <- calculate_mz_from_mass(mass, adduct) mass_back <- calculate_mass_of_m(mz, adduct) all.equal(mass, mass_back) # Should be TRUE
Calculates similarity scores between query and target spectra using either entropy, cosine, or GNPS methods.
**Important:** For correct results with the GNPS and cosine methods, input spectra should be sanitized (unique, well-separated m/z values; no NaN; sorted by m/z). This is automatically done by [import_spectra()] with `sanitize = TRUE`.
calculate_similarity( method, query_spectrum, target_spectrum, query_precursor, target_precursor, dalton, ppm, return_matched_peaks = FALSE, ... )calculate_similarity( method, query_spectrum, target_spectrum, query_precursor, target_precursor, dalton, ppm, return_matched_peaks = FALSE, ... )
method |
character Similarity method: "entropy", "gnps", or "cosine" |
query_spectrum |
matrix Numeric matrix with columns for mz and intensity |
target_spectrum |
matrix Numeric matrix with columns for mz and intensity |
query_precursor |
numeric Precursor m/z value for query |
target_precursor |
numeric Precursor m/z value for target |
dalton |
numeric Dalton tolerance for peak matching |
ppm |
numeric PPM tolerance for peak matching |
return_matched_peaks |
logical Return matched peaks count? Not compatible with 'entropy' method. Default: FALSE |
... |
Additional arguments passed to MsCoreUtils::join (cosine only) |
Numeric similarity score (0-1), or list with score and matches if return_matched_peaks = TRUE. Returns 0.0 if calculation fails.
Other mass-spectrometry:
calculate_mass_of_m(),
calculate_mz_from_mass(),
harmonize_adducts(),
import_spectra(),
parse_adduct()
sp_1 <- cbind( mz = c(10, 36, 63, 91, 93), intensity = c(14, 15, 999, 650, 1) ) precursor_1 <- 123.4567 precursor_2 <- precursor_1 + 14 sp_2 <- cbind( mz = c(10, 12, 50, 63, 105), intensity = c(35, 5, 16, 999, 450) ) calculate_similarity( method = "entropy", query_spectrum = sp_1, target_spectrum = sp_2, query_precursor = precursor_1, target_precursor = precursor_2, dalton = 0.005, ppm = 10.0 ) calculate_similarity( method = "gnps", query_spectrum = sp_1, target_spectrum = sp_2, query_precursor = precursor_1, target_precursor = precursor_2, dalton = 0.005, ppm = 10.0, return_matched_peaks = TRUE )sp_1 <- cbind( mz = c(10, 36, 63, 91, 93), intensity = c(14, 15, 999, 650, 1) ) precursor_1 <- 123.4567 precursor_2 <- precursor_1 + 14 sp_2 <- cbind( mz = c(10, 12, 50, 63, 105), intensity = c(35, 5, 16, 999, 450) ) calculate_similarity( method = "entropy", query_spectrum = sp_1, target_spectrum = sp_2, query_precursor = precursor_1, target_precursor = precursor_2, dalton = 0.005, ppm = 10.0 ) calculate_similarity( method = "gnps", query_spectrum = sp_1, target_spectrum = sp_2, query_precursor = precursor_1, target_precursor = precursor_2, dalton = 0.005, ppm = 10.0, return_matched_peaks = TRUE )
Updates TIMA workflow parameters for quick setup with a simplified interface. This function modifies the prepare_params YAML configuration file by copying provided input files to the appropriate directories and updating parameter values. Implements SOLID principles with clear separation of concerns.
change_params_small( fil_pat = NULL, fil_fea_raw = NULL, fil_met_raw = NULL, fil_sir_raw = NULL, fil_spe_raw = NULL, fil_ann_mzm = NULL, fil_mzt_raw = NULL, ms_pol = NULL, org_tax = NULL, hig_evi = NULL, summarize = NULL, cache_dir = NULL )change_params_small( fil_pat = NULL, fil_fea_raw = NULL, fil_met_raw = NULL, fil_sir_raw = NULL, fil_spe_raw = NULL, fil_ann_mzm = NULL, fil_mzt_raw = NULL, ms_pol = NULL, org_tax = NULL, hig_evi = NULL, summarize = NULL, cache_dir = NULL )
fil_pat |
Character. Job identifier/pattern for output files (optional) |
fil_fea_raw |
Character. Path to features file (e.g., from mzmine/SIRIUS) |
fil_met_raw |
Character. Path to metadata file (optional if single taxon) |
fil_sir_raw |
Character. Path to SIRIUS annotations directory/zip |
fil_spe_raw |
Character. Path to spectra file (MGF format with MS1/MS2) |
fil_ann_mzm |
Character. Path to mzmine annotations file |
fil_mzt_raw |
Character. Path to an mzTab-M file to import/merge |
ms_pol |
Character. MS polarity: "pos" or "neg" |
org_tax |
Character. Scientific name for single-taxon experiments |
hig_evi |
Logical. Filter for high evidence candidates only |
summarize |
Logical. Summarize all candidates per feature to single row |
cache_dir |
Character. Cache directory path (for testing; uses go_to_cache() if NULL) |
This function:
Validates all input files exist before copying
Copies files to standardized cache locations
Updates the prepare_params YAML configuration
Handles NA values properly for YAML null representation
Invisible NULL. Modifies prepare_params YAML as side effect.
Other workflow:
create_components(),
create_edges(),
create_edges_spectra(),
go_to_cache(),
install(),
install_tima(),
run_app(),
run_tima(),
tima_full(),
validate_inputs()
## Not run: # Setup complete workflow parameters copy_backbone() change_params_small( fil_pat = "gentiana_experiment", fil_fea_raw = "data/raw/features.csv", fil_met_raw = "data/raw/metadata.tsv", fil_sir_raw = "data/raw/sirius_output.zip", fil_spe_raw = "data/raw/spectra.mgf", fil_ann_mzm = "data/raw/mzmine_annotations.csv", fil_mzt_raw = "data/raw/annotations.mztab", ms_pol = "pos", org_tax = "Gentiana lutea", hig_evi = TRUE, summarize = FALSE ) ## End(Not run)## Not run: # Setup complete workflow parameters copy_backbone() change_params_small( fil_pat = "gentiana_experiment", fil_fea_raw = "data/raw/features.csv", fil_met_raw = "data/raw/metadata.tsv", fil_sir_raw = "data/raw/sirius_output.zip", fil_spe_raw = "data/raw/spectra.mgf", fil_ann_mzm = "data/raw/mzmine_annotations.csv", fil_mzt_raw = "data/raw/annotations.mztab", ms_pol = "pos", org_tax = "Gentiana lutea", hig_evi = TRUE, summarize = FALSE ) ## End(Not run)
Cleans and filters chemically weighted annotation results through a multi-tier pipeline. Applies MS1 score thresholds, percentile filtering, ranking, and optional high-evidence filtering. Returns three-tier output: full (comprehensive), filtered (top candidates), and mini (one row per feature).
clean_chemo( annot_table_wei_chemo, components_table, features_table, structure_organism_pairs_table, candidates_final, best_percentile, minimal_ms1_bio, minimal_ms1_chemo, minimal_ms1_condition, compounds_names, high_evidence, remove_ties, summarize, score_chemical_cla_kingdom = 0.2, score_chemical_cla_superclass = 0.4, score_chemical_cla_class = 0.6, score_chemical_cla_parent = 0.8, score_chemical_npc_pathway = 0.25, score_chemical_npc_superclass = 0.5, score_chemical_npc_class = 0.75, max_per_score = 7L, xrefs_table = NULL )clean_chemo( annot_table_wei_chemo, components_table, features_table, structure_organism_pairs_table, candidates_final, best_percentile, minimal_ms1_bio, minimal_ms1_chemo, minimal_ms1_condition, compounds_names, high_evidence, remove_ties, summarize, score_chemical_cla_kingdom = 0.2, score_chemical_cla_superclass = 0.4, score_chemical_cla_class = 0.6, score_chemical_cla_parent = 0.8, score_chemical_npc_pathway = 0.25, score_chemical_npc_superclass = 0.5, score_chemical_npc_class = 0.75, max_per_score = 7L, xrefs_table = NULL )
annot_table_wei_chemo |
Data frame with chemically weighted annotations. Required columns: feature_id, candidate_structure_inchikey_connectivity_layer, score_weighted_chemo, score_biological, score_chemical, candidate_score_pseudo_initial |
components_table |
Data frame with molecular network component assignments. Required columns: feature_id, component_id |
features_table |
Data frame with feature metadata (RT, m/z, etc.). Required columns: feature_id |
structure_organism_pairs_table |
Data frame linking structures to organisms. Required columns: structure_inchikey_connectivity_layer |
candidates_final |
Integer, number of top candidates to retain per feature (>= 1) |
best_percentile |
Numeric (0-1), percentile threshold for score filtering. Candidates with scores >= percentile * max_score are kept. Default: 0.9 (90th percentile) |
minimal_ms1_bio |
Numeric (0-1), minimum biological score for MS1-only annotations |
minimal_ms1_chemo |
Numeric (0-1), minimum chemical score for MS1-only annotations |
minimal_ms1_condition |
Character, logical operator for MS1 filtering: "OR" or "AND". "OR" = keep if bio >= threshold OR chem >= threshold. "AND" = keep if bio >= threshold AND chem >= threshold |
compounds_names |
Logical, include compound names in output (may increase size) |
high_evidence |
Logical, apply strict high-evidence filters |
remove_ties |
Logical, remove tied scores (keep only highest-ranked) |
summarize |
Logical, collapse results to one row per feature |
score_chemical_cla_kingdom |
Numeric (0-1), score for ClassyFire kingdom level |
score_chemical_cla_superclass |
Numeric (0-1), score for ClassyFire superclass level |
score_chemical_cla_class |
Numeric (0-1), score for ClassyFire class level |
score_chemical_cla_parent |
Numeric (0-1), score for ClassyFire direct parent level |
score_chemical_npc_pathway |
Numeric (0-1), score for NPClassifier pathway level |
score_chemical_npc_superclass |
Numeric (0-1), score for NPClassifier superclass level |
score_chemical_npc_class |
Numeric (0-1), score for NPClassifier class level |
max_per_score |
Integer, max candidates to keep per feature per score. If more exist, they are randomly sampled and a note is added. Default 7. |
xrefs_table |
Optional data frame with columns inchikey/prefix/id from get_compounds_xrefs(), used to add candidate_structure_id_* columns before summarization. |
Named list with three data frames:
All annotations (optionally high-evidence filtered)
Top candidates meeting percentile + rank thresholds
One row per feature with best compound/taxonomy
weight_chemo,
filter_high_evidence_only,
summarize_results
## Not run: results <- clean_chemo( annot_table_wei_chemo = annotations, features_table = features, components_table = components, structure_organism_pairs_table = sop_table, candidates_final = 10, best_percentile = 0.9, minimal_ms1_bio = 0.5, minimal_ms1_chemo = 0.5, minimal_ms1_condition = "OR", compounds_names = TRUE, high_evidence = FALSE, remove_ties = FALSE, summarize = FALSE ) ## End(Not run)## Not run: results <- clean_chemo( annot_table_wei_chemo = annotations, features_table = features, components_table = components, structure_organism_pairs_table = sop_table, candidates_final = 10, best_percentile = 0.9, minimal_ms1_bio = 0.5, minimal_ms1_chemo = 0.5, minimal_ms1_condition = "OR", compounds_names = TRUE, high_evidence = FALSE, remove_ties = FALSE, summarize = FALSE ) ## End(Not run)
This function copies the package backbone (default directory structure, configuration files, and parameters) to a cache directory. This sets up the working environment for TIMA workflows.
copy_backbone(cache_dir = fs::path_home(".tima"), package = "tima")copy_backbone(cache_dir = fs::path_home(".tima"), package = "tima")
cache_dir |
Character string path to the cache directory (default: "~/.tima" in user's home directory) |
package |
Character string name of the package (default: "tima") |
NULL (invisibly). Creates cache directory structure as side effect.
## Not run: # Copy to default cache location copy_backbone() # Copy to custom location copy_backbone(cache_dir = "~/my_tima_cache") ## End(Not run)## Not run: # Copy to default cache location copy_backbone() # Copy to custom location copy_backbone(cache_dir = "~/my_tima_cache") ## End(Not run)
This function creates network components (connected subgraphs) from edge lists using igraph. Each component represents a set of features that are connected through spectral similarity or other relationships.
create_components( input = get_params(step = "create_components")$files$networks$spectral$edges$prepared, output = get_params(step = "create_components")$files$networks$spectral$components$raw )create_components( input = get_params(step = "create_components")$files$networks$spectral$edges$prepared, output = get_params(step = "create_components")$files$networks$spectral$components$raw )
input |
Character vector of file path(s) containing edge data. Files should have feature_source and feature_target columns. |
output |
Character string path for the output components file |
Character string path to the created components file
Other workflow:
change_params_small(),
create_edges(),
create_edges_spectra(),
go_to_cache(),
install(),
install_tima(),
run_app(),
run_tima(),
tima_full(),
validate_inputs()
## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" data_interim <- "data/interim/" dir <- paste0(github, repo, data_interim) get_file( url = paste0(dir, "features/example_edges.tsv"), export = get_params(step = "create_components")$files$networks$spectral$edges$prepared ) create_components() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" data_interim <- "data/interim/" dir <- paste0(github, repo, data_interim) get_file( url = paste0(dir, "features/example_edges.tsv"), export = get_params(step = "create_components")$files$networks$spectral$edges$prepared ) create_components() unlink("data", recursive = TRUE) ## End(Not run)
Calculates pairwise spectral similarity between all spectra to create a network edge list.
create_edges( frags, nspecs, precs, method, ms2_tolerance, ppm_tolerance, threshold, matched_peaks )create_edges( frags, nspecs, precs, method, ms2_tolerance, ppm_tolerance, threshold, matched_peaks )
frags |
List of aligned fragment spectra matrices |
nspecs |
Integer number of spectra |
precs |
Numeric vector of precursor m/z values |
method |
Similarity method ("entropy", "gnps", or "cosine") |
ms2_tolerance |
MS2 tolerance in Daltons |
ppm_tolerance |
PPM tolerance |
threshold |
Minimum similarity score threshold |
matched_peaks |
Minimum number of matched peaks required |
Data frame with columns: feature_id, target_id, score, matched_peaks. Returns empty data frame with NA values if no edges pass thresholds.
Other workflow:
change_params_small(),
create_components(),
create_edges_spectra(),
go_to_cache(),
install(),
install_tima(),
run_app(),
run_tima(),
tima_full(),
validate_inputs()
## Not run: edges <- create_edges( frags = fragment_list, nspecs = length(fragment_list), precs = precursor_mz, method = "gnps", ms2_tolerance = 0.02, ppm_tolerance = 10, threshold = 0.7, matched_peaks = 6 ) ## End(Not run)## Not run: edges <- create_edges( frags = fragment_list, nspecs = length(fragment_list), precs = precursor_mz, method = "gnps", ms2_tolerance = 0.02, ppm_tolerance = 10, threshold = 0.7, matched_peaks = 6 ) ## End(Not run)
This function creates molecular network edges based on MS2 fragmentation spectra similarity. Compares all spectra against each other using spectral similarity metrics to identify related features.
create_edges_spectra( input = get_params(step = "create_edges_spectra")$files$spectral$raw, output = get_params(step = "create_edges_spectra")$files$networks$spectral$edges$raw$spectral, name_source = get_params(step = "create_edges_spectra")$names$source, name_target = get_params(step = "create_edges_spectra")$names$target, method = get_params(step = "create_edges_spectra")$similarities$methods$edges, threshold = get_params(step = "create_edges_spectra")$similarities$thresholds$edges, matched_peaks = get_params(step = "create_edges_spectra")$similarities$thresholds$matched_peaks, ppm = get_params(step = "create_edges_spectra")$ms$tolerances$mass$ppm$ms2, dalton = get_params(step = "create_edges_spectra")$ms$tolerances$mass$dalton$ms2, cutoff = get_params(step = "create_edges_spectra")$ms$thresholds$ms2$intensity, min_fragments = get_params(step = "create_edges_spectra")$ms$thresholds$ms2$min_fragments, qutoff = deprecated() )create_edges_spectra( input = get_params(step = "create_edges_spectra")$files$spectral$raw, output = get_params(step = "create_edges_spectra")$files$networks$spectral$edges$raw$spectral, name_source = get_params(step = "create_edges_spectra")$names$source, name_target = get_params(step = "create_edges_spectra")$names$target, method = get_params(step = "create_edges_spectra")$similarities$methods$edges, threshold = get_params(step = "create_edges_spectra")$similarities$thresholds$edges, matched_peaks = get_params(step = "create_edges_spectra")$similarities$thresholds$matched_peaks, ppm = get_params(step = "create_edges_spectra")$ms$tolerances$mass$ppm$ms2, dalton = get_params(step = "create_edges_spectra")$ms$tolerances$mass$dalton$ms2, cutoff = get_params(step = "create_edges_spectra")$ms$thresholds$ms2$intensity, min_fragments = get_params(step = "create_edges_spectra")$ms$thresholds$ms2$min_fragments, qutoff = deprecated() )
input |
character Path or list of paths to query MGF file(s) containing spectra |
output |
character Path for output edges file |
name_source |
character Name of source feature column |
name_target |
character Name of target feature column |
method |
character Similarity method to use |
threshold |
numeric Minimum similarity threshold (0-1) to report edge |
matched_peaks |
integer Minimum number of matched peaks required |
ppm |
numeric Relative mass tolerance in ppm |
dalton |
numeric Absolute mass tolerance in Daltons |
cutoff |
numeric Intensity cutoff below which MS2 fragments are removed. Non-negative numeric or NULL for dynamic thresholding. |
min_fragments |
integer Minimum number of fragment peaks a spectrum must have after cleaning to be retained |
qutoff |
Character string path to the created spectral edges file
Other workflow:
change_params_small(),
create_components(),
create_edges(),
go_to_cache(),
install(),
install_tima(),
run_app(),
run_tima(),
tima_full(),
validate_inputs()
## Not run: copy_backbone() go_to_cache() get_file( url = get_default_paths()$urls$examples$spectra_mini, export = get_params(step = "create_edges_spectra")$files$spectral$raw ) create_edges_spectra() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() get_file( url = get_default_paths()$urls$examples$spectra_mini, export = get_params(step = "create_edges_spectra")$files$spectral$raw ) create_edges_spectra() unlink("data", recursive = TRUE) ## End(Not run)
This function filters initial annotations by removing MS1-only annotations that also have quality spectral matches (gated on similarity and matched peaks), and joins retention time library data when available. RT deltas are computed but no hard cutoff is applied; the downstream scoring system uses a sigmoid penalty to handle RT deviations gracefully.
filter_annotations( annotations = get_params(step = "filter_annotations")$files$annotations$prepared$structural, features = get_params(step = "filter_annotations")$files$features$prepared, rts = get_params(step = "filter_annotations")$files$libraries$temporal$prepared, output = get_params(step = "filter_annotations")$files$annotations$filtered, tolerance_rt = get_params(step = "filter_annotations")$ms$tolerances$rt$library )filter_annotations( annotations = get_params(step = "filter_annotations")$files$annotations$prepared$structural, features = get_params(step = "filter_annotations")$files$features$prepared, rts = get_params(step = "filter_annotations")$files$libraries$temporal$prepared, output = get_params(step = "filter_annotations")$files$annotations$filtered, tolerance_rt = get_params(step = "filter_annotations")$ms$tolerances$rt$library )
annotations |
Character vector or list of paths to prepared annotation files |
features |
Character string path to prepared features file.
Must contain a |
rts |
Character string path to prepared retention time library (optional) |
output |
Character string path for filtered annotations output |
tolerance_rt |
Numeric RT tolerance in minutes (used for deduplication of multiple RT library matches; no hard cutoff is applied) |
Character string path to the filtered annotations file
Other annotation:
annotate_masses(),
annotate_spectra(),
weight_annotations(),
write_mztab()
## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) ann <- get_params(step = "filter_annotations")$files$annotations$prepared$structural[[2L]] |> gsub(pattern = ".gz", replacement = "", fixed = TRUE) features <- get_params(step = "filter_annotations")$files$features$prepared |> gsub(pattern = ".gz", replacement = "", fixed = TRUE) rts <- get_params(step = "filter_annotations")$files$libraries$temporal$prepared |> gsub(pattern = ".gz", replacement = "", fixed = TRUE) get_file(url = paste0(dir, annotations), export = annotations) get_file(url = paste0(dir, features), export = features) get_file(url = paste0(dir, rts), export = rts) filter_annotations( annotations = ann, features = features, rts = rts ) unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) ann <- get_params(step = "filter_annotations")$files$annotations$prepared$structural[[2L]] |> gsub(pattern = ".gz", replacement = "", fixed = TRUE) features <- get_params(step = "filter_annotations")$files$features$prepared |> gsub(pattern = ".gz", replacement = "", fixed = TRUE) rts <- get_params(step = "filter_annotations")$files$libraries$temporal$prepared |> gsub(pattern = ".gz", replacement = "", fixed = TRUE) get_file(url = paste0(dir, annotations), export = annotations) get_file(url = paste0(dir, features), export = features) get_file(url = paste0(dir, rts), export = rts) filter_annotations( annotations = ann, features = features, rts = rts ) unlink("data", recursive = TRUE) ## End(Not run)
Fetches mappings from the Bioregistry for a set of Wikidata property IDs,
queries QLever for compound identifiers, and returns a tidy long data.frame
with one row per InChIKey × database combination, including Wikidata QIDs.
Results are cached to disk; the query is only re-run when the cached file
is older than max_age_hours (default 24 h) or does not exist.
get_compounds_xrefs( props = c("P231", "P592", "P661", "P662", "P665", "P683", "P715", "P2057", "P2063", "P2877", "P8691"), bioregistry_url = paste0("https://raw.githubusercontent.com/", "biopragmatics/bioregistry/refs/heads/main/", "src/bioregistry/data/bioregistry.json"), qlever_url = "https://qlever.cs.uni-freiburg.de/api/wikidata", max_age_hours = 24, output = get_default_paths()$data$interim$xrefs$compounds )get_compounds_xrefs( props = c("P231", "P592", "P661", "P662", "P665", "P683", "P715", "P2057", "P2063", "P2877", "P8691"), bioregistry_url = paste0("https://raw.githubusercontent.com/", "biopragmatics/bioregistry/refs/heads/main/", "src/bioregistry/data/bioregistry.json"), qlever_url = "https://qlever.cs.uni-freiburg.de/api/wikidata", max_age_hours = 24, output = get_default_paths()$data$interim$xrefs$compounds )
props |
Character vector of Wikidata property IDs (without |
bioregistry_url |
URL to the bulk bioregistry JSON. Defaults to the canonical GitHub raw URL. |
qlever_url |
QLever SPARQL endpoint URL. |
max_age_hours |
Numeric maximum age (in hours) of the cached file
before it is refreshed. Default |
output |
Character file path for the cached result. When used inside
a targets pipeline with |
Character path to the exported file (invisibly), for
targets format = "file" compatibility.
Other data-retrieval:
get_example_files(),
get_file(),
get_gnps_tables(),
get_last_version_from_zenodo(),
get_organism_taxonomy_ott()
## Not run: props <- c("P231", "P592", "P683", "P715") result_path <- get_compounds_xrefs(props) utils::head(tidytable::fread(result_path)) ## End(Not run)## Not run: props <- c("P231", "P592", "P683", "P715") result_path <- get_compounds_xrefs(props) utils::head(tidytable::fread(result_path)) ## End(Not run)
This function downloads example data files for testing and demonstration purposes. Supports downloading features, metadata, SIRIUS annotations, mass spectra, and spectral libraries with retention times.
get_example_files( example = c("features", "metadata", "sirius", "spectra"), in_cache = TRUE )get_example_files( example = c("features", "metadata", "sirius", "spectra"), in_cache = TRUE )
example |
Character vector specifying which example files to download. Valid options: "features", "metadata", "sirius", "spectra", "spectral_lib_with_rt" |
in_cache |
Logical whether to store files in the cache directory (default: TRUE) |
NULL (invisibly). Downloads files as a side effect.
Other data-retrieval:
get_compounds_xrefs(),
get_file(),
get_gnps_tables(),
get_last_version_from_zenodo(),
get_organism_taxonomy_ott()
## Not run: # Download features and metadata examples get_example_files(example = c("features", "metadata")) # Download all example files to cache get_example_files( example = c("features", "metadata", "sirius", "spectra"), in_cache = TRUE ) ## End(Not run)## Not run: # Download features and metadata examples get_example_files(example = c("features", "metadata")) # Download all example files to cache get_example_files( example = c("features", "metadata", "sirius", "spectra"), in_cache = TRUE ) ## End(Not run)
This function downloads example SIRIUS annotation files for testing and demonstration purposes. Downloads both SIRIUS v5 and v6 format files.
get_example_sirius( url = get_default_paths()$urls$examples$sirius, export = get_default_paths()$data$interim$annotations$example_sirius )get_example_sirius( url = get_default_paths()$urls$examples$sirius, export = get_default_paths()$data$interim$annotations$example_sirius )
url |
list List containing URLs for SIRIUS examples (must have $v5 and $v6 elements) |
export |
list List containing export paths for SIRIUS examples (must have $v5 and $v6 elements) |
NULL (invisibly). Downloads files as a side effect.
## Not run: get_example_sirius() ## End(Not run)## Not run: get_example_sirius() ## End(Not run)
Downloads a file from a URL with robust error handling, retry logic, and validation. Automatically creates necessary directories and validates downloaded content. Skips download if file already exists.
get_file(url, export, limit = 3600L)get_file(url, export, limit = 3600L)
url |
character URL of the file to download |
export |
character File path where the file should be saved |
limit |
integer Timeout limit in seconds (default: 3600 = 1 hour) |
Path to the downloaded file (invisibly)
Other data-retrieval:
get_compounds_xrefs(),
get_example_files(),
get_gnps_tables(),
get_last_version_from_zenodo(),
get_organism_taxonomy_ott()
## Not run: get_file( url = "https://example.com/data.tsv", export = "data/source/data.tsv" ) ## End(Not run)## Not run: get_file( url = "https://example.com/data.tsv", export = "data/source/data.tsv" ) ## End(Not run)
This function downloads and retrieves GNPS (Global Natural Products Social Molecular Networking) result tables from a completed job. It fetches features, metadata, spectra, and annotation files from GNPS servers. When a job ID is not provided or GNPS resources are missing, small fake files are written so downstream steps do not fail during testing.
get_gnps_tables( gnps_job_id, gnps_job_example = get_default_paths()$gnps$example, filename = "", workflow = "fbmn", path_features, path_metadata, path_spectra, path_source = get_default_paths()$data$source$path, path_interim_a = get_default_paths()$data$interim$annotations$path, path_interim_f = get_default_paths()$data$interim$features$path )get_gnps_tables( gnps_job_id, gnps_job_example = get_default_paths()$gnps$example, filename = "", workflow = "fbmn", path_features, path_metadata, path_spectra, path_source = get_default_paths()$data$source$path, path_interim_a = get_default_paths()$data$interim$annotations$path, path_interim_f = get_default_paths()$data$interim$features$path )
gnps_job_id |
Character string GNPS job ID (32 characters). Can be NULL or empty string to skip download. |
gnps_job_example |
Character string example GNPS job ID for testing |
filename |
Character string name of the file to download (used for fake outputs) |
workflow |
Character string indicating workflow type: "fbmn" (feature-based) or "classical" molecular networking |
path_features |
Character string path for features output (file path) |
path_metadata |
Character string path for metadata output (file path or list) |
path_spectra |
Character string path for spectra output (file path) |
path_source |
Character string path to store source files |
path_interim_a |
Character string path to store interim annotations |
path_interim_f |
Character string path to store interim features |
A named character vector with paths to the written/available files.
Other data-retrieval:
get_compounds_xrefs(),
get_example_files(),
get_file(),
get_last_version_from_zenodo(),
get_organism_taxonomy_ott()
## Not run: # Download GNPS FBMN results paths <- get_gnps_tables( gnps_job_id = "1234567890abcdef", workflow = "fbmn", path_features = "data/interim/features/features.tsv", path_metadata = "data/source/metadata.tsv", path_spectra = "data/interim/annotations/spectra.mgf" ) # Access downloaded files features <- read.delim(paths["features"]) ## End(Not run)## Not run: # Download GNPS FBMN results paths <- get_gnps_tables( gnps_job_id = "1234567890abcdef", workflow = "fbmn", path_features = "data/interim/features/features.tsv", path_metadata = "data/source/metadata.tsv", path_spectra = "data/interim/annotations/spectra.mgf" ) # Access downloaded files features <- read.delim(paths["features"]) ## End(Not run)
Retrieves the latest version of a file from a Zenodo repository record. This function checks the file size and only downloads if the local file is missing or differs from the remote version. Implements robust error handling and retry logic.
get_last_version_from_zenodo(doi, pattern, path, timeout_s = 90)get_last_version_from_zenodo(doi, pattern, path, timeout_s = 90)
doi |
Character. Zenodo DOI (e.g., "10.5281/zenodo.5794106") |
pattern |
Character. Pattern to identify the specific file to download |
path |
Character. Local path where the file should be saved |
timeout_s |
Numeric. Metadata request timeout in seconds (default: 90) |
Credit goes partially to https://inbo.github.io/inborutils/
This function:
Validates DOI format and input parameters
Fetches the latest version metadata from Zenodo API
Finds files matching the specified pattern
Compares local and remote file sizes to avoid unnecessary downloads
Downloads only if needed, with retry logic
Creates necessary directories automatically
Character path to the downloaded (or existing) file
Other data-retrieval:
get_compounds_xrefs(),
get_example_files(),
get_file(),
get_gnps_tables(),
get_organism_taxonomy_ott()
## Not run: # Download LOTUS database from Zenodo get_last_version_from_zenodo( doi = "10.5281/zenodo.5794106", pattern = "lotus.csv.gz", path = "data/source/libraries/sop/lotus.csv.gz" ) # The function will skip download if file exists with correct size get_last_version_from_zenodo( doi = "10.5281/zenodo.5794106", pattern = "lotus.csv.gz", path = "data/source/libraries/sop/lotus.csv.gz" ) ## End(Not run)## Not run: # Download LOTUS database from Zenodo get_last_version_from_zenodo( doi = "10.5281/zenodo.5794106", pattern = "lotus.csv.gz", path = "data/source/libraries/sop/lotus.csv.gz" ) # The function will skip download if file exists with correct size get_last_version_from_zenodo( doi = "10.5281/zenodo.5794106", pattern = "lotus.csv.gz", path = "data/source/libraries/sop/lotus.csv.gz" ) ## End(Not run)
This function retrieves taxonomic information from the Open Tree of Life (OTT) taxonomy service. It cleans organism names, queries the OTT API, and returns structured taxonomic data including OTT IDs and hierarchical classifications.
get_organism_taxonomy_ott( df, url = "https://api.opentreeoflife.org/v3/taxonomy/about", retry = TRUE )get_organism_taxonomy_ott( df, url = "https://api.opentreeoflife.org/v3/taxonomy/about", retry = TRUE )
df |
data.frame Data frame containing organism names in a column named "organism" |
url |
character Character string URL of the OTT API endpoint (default: production API, can be changed for testing) |
retry |
logical Logical indicating whether to retry failed queries using only the generic epithet (genus name) when full species names fail (default: TRUE) |
Data frame with taxonomic information including OTT IDs, ranks, and taxonomic hierarchy. Returns empty template if API is unavailable.
Other data-retrieval:
get_compounds_xrefs(),
get_example_files(),
get_file(),
get_gnps_tables(),
get_last_version_from_zenodo()
## Not run: # Single organism df <- data.frame(organism = "Homo sapiens") taxonomy <- get_organism_taxonomy_ott(df) # Multiple organisms df <- data.frame(organism = c("Homo sapiens", "Arabidopsis thaliana")) taxonomy <- get_organism_taxonomy_ott(df) ## End(Not run)## Not run: # Single organism df <- data.frame(organism = "Homo sapiens") taxonomy <- get_organism_taxonomy_ott(df) # Multiple organisms df <- data.frame(organism = c("Homo sapiens", "Arabidopsis thaliana")) taxonomy <- get_organism_taxonomy_ott(df) ## End(Not run)
Creates and navigates to a cache directory in the user's home directory. Useful for storing temporary files, intermediate results, and downloaded data in a consistent location across sessions.
go_to_cache(dir = ".tima")go_to_cache(dir = ".tima")
dir |
character Character string name of cache directory (default: ".tima"). Created in user's home directory. Must be non-empty. |
The function:
Constructs full path in user's home directory
Creates directory if it doesn't exist
Changes working directory to cache location
Logs all operations
Cache directory persists across R sessions until explicitly deleted.
Path to cache directory (invisibly). Changes working directory as side effect.
Other workflow:
change_params_small(),
create_components(),
create_edges(),
create_edges_spectra(),
install(),
install_tima(),
run_app(),
run_tima(),
tima_full(),
validate_inputs()
## Not run: # Default cache (~/.tima) go_to_cache() # Custom cache go_to_cache(dir = ".my_cache") # Store path cache_path <- go_to_cache() ## End(Not run)## Not run: # Default cache (~/.tima) go_to_cache() # Custom cache go_to_cache(dir = ".my_cache") # Store path cache_path <- go_to_cache() ## End(Not run)
Standardizes adduct notations in a dataframe by replacing various forms with canonical representations. Uses a translation table for efficient batch replacement.
harmonize_adducts(df, adducts_colname = "adduct", adducts_translations)harmonize_adducts(df, adducts_colname = "adduct", adducts_translations)
df |
Data frame or tibble containing adduct column |
adducts_colname |
Character string name of the adduct column (default: "adduct") |
adducts_translations |
Named character vector mapping original adduct notations (names) to standardized forms (values). If missing, returns dataframe unchanged. |
Common adduct variations like "M+H", "[M+H]", and "(M+H)+" are standardized to a consistent format (e.g., "[M+H]+"). This ensures compatibility across different MS tools and databases.
Data frame with harmonized adduct column
Other mass-spectrometry:
calculate_mass_of_m(),
calculate_mz_from_mass(),
calculate_similarity(),
import_spectra(),
parse_adduct()
## Not run: df <- data.frame(adduct = c("M+H", "[M+Na]+", "(M-H)-")) translations <- c("M+H" = "[M+H]+", "(M-H)-" = "[M-H]-") harmonize_adducts(df, adducts_translations = translations) ## End(Not run)## Not run: df <- data.frame(adduct = c("M+H", "[M+Na]+", "(M-H)-")) translations <- c("M+H" = "[M+H]+", "(M-H)-" = "[M-H]-") harmonize_adducts(df, adducts_translations = translations) ## End(Not run)
This function imports mass spectra from various file formats (.mgf, .msp, .rds), harmonizes metadata field names, filters by MS level and polarity, optionally combines replicate spectra, and sanitizes peak data.
import_spectra( file, cutoff = NULL, dalton = 0.01, min_fragments = 1L, polarity = NA, ppm = 10, sanitize = TRUE, combine = TRUE )import_spectra( file, cutoff = NULL, dalton = 0.01, min_fragments = 1L, polarity = NA, ppm = 10, sanitize = TRUE, combine = TRUE )
file |
Character string path to the spectrum file (.mgf, .msp, or .rds) |
cutoff |
Numeric absolute minimal intensity threshold (default: NULL) |
dalton |
Numeric Dalton tolerance for peak matching (default: 0.01) |
min_fragments |
Integer minimum number of fragment peaks required to keep a spectrum after sanitization (default: 1) |
polarity |
Character string for polarity filtering: "pos", "neg", or NA to keep all (default: NA) |
ppm |
Numeric PPM tolerance for peak matching (default: 10) |
sanitize |
Logical flag indicating whether to sanitize spectra (default: TRUE) |
combine |
Logical flag indicating whether to combine replicate spectra (default: TRUE) |
Spectra object containing the imported and processed spectra
Other mass-spectrometry:
calculate_mass_of_m(),
calculate_mz_from_mass(),
calculate_similarity(),
harmonize_adducts(),
parse_adduct()
## Not run: get_file( url = get_default_paths()$urls$examples$spectra_mini, export = get_default_paths()$data$source$spectra ) import_spectra(file = get_default_paths()$data$source$spectra) import_spectra( file = get_default_paths()$data$source$spectra, sanitize = FALSE ) ## End(Not run)## Not run: get_file( url = get_default_paths()$urls$examples$spectra_mini, export = get_default_paths()$data$source$spectra ) import_spectra(file = get_default_paths()$data$source$spectra) import_spectra( file = get_default_paths()$data$source$spectra, sanitize = FALSE ) ## End(Not run)
DEPRECATED: Use install_tima() instead. install() will be removed
in a future version. The generic name install risks masking other packages.
install( package = "tima", repos = c("https://taxonomicallyinformedannotation.r-universe.dev", "https://bioconductor.org/packages/release/bioc", "https://cloud.r-project.org"), dependencies = TRUE )install( package = "tima", repos = c("https://taxonomicallyinformedannotation.r-universe.dev", "https://bioconductor.org/packages/release/bioc", "https://cloud.r-project.org"), dependencies = TRUE )
package |
character Name of the package (default: "tima") |
repos |
character Vector of repository URLs |
dependencies |
logical Whether to install dependencies (default: TRUE) |
NULL (invisibly).
Other workflow:
change_params_small(),
create_components(),
create_edges(),
create_edges_spectra(),
go_to_cache(),
install_tima(),
run_app(),
run_tima(),
tima_full(),
validate_inputs()
## Not run: # DEPRECATED — use install_tima() instead install_tima() ## End(Not run)## Not run: # DEPRECATED — use install_tima() instead install_tima() ## End(Not run)
Installs or updates the TIMA package from r-universe and sets up a Python virtual environment with dependencies.
install_tima( package = "tima", repos = c("https://taxonomicallyinformedannotation.r-universe.dev", "https://bioconductor.org/packages/release/bioc", "https://cloud.r-project.org"), dependencies = TRUE )install_tima( package = "tima", repos = c("https://taxonomicallyinformedannotation.r-universe.dev", "https://bioconductor.org/packages/release/bioc", "https://cloud.r-project.org"), dependencies = TRUE )
package |
character Name of the package (default: "tima") |
repos |
character Vector of repository URLs |
dependencies |
logical Whether to install dependencies (default: TRUE) |
NULL (invisibly). Installs packages and sets up Python environment as side effects.
Other workflow:
change_params_small(),
create_components(),
create_edges(),
create_edges_spectra(),
go_to_cache(),
install(),
run_app(),
run_tima(),
tima_full(),
validate_inputs()
## Not run: install_tima() ## End(Not run)## Not run: install_tima() ## End(Not run)
This function parses mass spectrometry adduct notation strings into their components: multimer count, isotope shift, modifications, charge state, and charge sign. It handles complex adducts with multiple additions/losses.
parse_adduct(adduct_string, regex = ADDUCT_REGEX_PATTERN)parse_adduct(adduct_string, regex = ADDUCT_REGEX_PATTERN)
adduct_string |
character Character string representing the adduct in standard notation (e.g., "[M+H]+", "[2M+Na]+", "[M-H2O+H]+") |
regex |
character Character string regular expression pattern for parsing (default: uses ADDUCT_REGEX_PATTERN from constants) |
Named numeric vector containing:
n_mer |
Integer number of monomers (e.g., 2 for dimer, 1 for monomer) |
n_iso |
Integer isotope shift (e.g., 1 for M+1 isotopologue, 0 for monoisotopic) |
los_add_clu |
Numeric total mass change in Daltons from all modifications |
n_charges |
Integer absolute number of charges (always positive) |
charge |
Integer charge polarity (+1 for positive mode, -1 for negative mode) |
Returns all zeros if parsing fails.
Other mass-spectrometry:
calculate_mass_of_m(),
calculate_mz_from_mass(),
calculate_similarity(),
harmonize_adducts(),
import_spectra()
# Simple adducts parse_adduct("[M+H]+") # Protonated molecule parse_adduct("[M-H]-") # Deprotonated molecule parse_adduct("[M+Na]+") # Sodium adduct # Complex adducts parse_adduct("[2M+Na]+") # Dimer with sodium parse_adduct("[M-H2O+H]+") # Protonated with water loss ## Not run: # Advanced cases parse_adduct("[M1+H]+") # M+1 isotopologue parse_adduct("[2M1-C6H12O6 (hexose)+NaCl+H]2+") # Complex modification ## End(Not run)# Simple adducts parse_adduct("[M+H]+") # Protonated molecule parse_adduct("[M-H]-") # Deprotonated molecule parse_adduct("[M+Na]+") # Sodium adduct # Complex adducts parse_adduct("[2M+Na]+") # Dimer with sodium parse_adduct("[M-H2O+H]+") # Protonated with water loss ## Not run: # Advanced cases parse_adduct("[M1+H]+") # M+1 isotopologue parse_adduct("[2M1-C6H12O6 (hexose)+NaCl+H]2+") # Complex modification ## End(Not run)
This function prepares GNPS spectral library matching results by standardizing column names, integrating structure metadata, and formatting for downstream TIMA annotation workflows.
prepare_annotations_gnps( input = get_params(step = "prepare_annotations_gnps")$files$annotations$raw$spectral$gnps, output = get_params(step = "prepare_annotations_gnps")$files$annotations$prepared$structural$gnps, str_stereo = get_params(step = "prepare_annotations_gnps")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_gnps")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_gnps")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_gnps")$files$libraries$sop$merged$structures$taxonomies$npc )prepare_annotations_gnps( input = get_params(step = "prepare_annotations_gnps")$files$annotations$raw$spectral$gnps, output = get_params(step = "prepare_annotations_gnps")$files$annotations$prepared$structural$gnps, str_stereo = get_params(step = "prepare_annotations_gnps")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_gnps")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_gnps")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_gnps")$files$libraries$sop$merged$structures$taxonomies$npc )
input |
character Character string or vector of paths to GNPS annotation files |
output |
character Character string path for prepared GNPS annotations output |
str_stereo |
character Character string path to structures stereochemistry file |
str_met |
character Character string path to structures metadata file |
str_tax_cla |
character Character string path to ClassyFire taxonomy file |
str_tax_npc |
character Character string path to NPClassifier taxonomy file |
Character string path to prepared GNPS annotations
Other preparation:
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_annotations_gnps() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_annotations_gnps() unlink("data", recursive = TRUE) ## End(Not run)
This function prepares mzmine spectral library matching results by standardizing column names, integrating structure metadata, and formatting for downstream TIMA annotation workflows.
prepare_annotations_mzmine( input = get_params(step = "prepare_annotations_mzmine")$files$annotations$raw$spectral$mzmine, output = get_params(step = "prepare_annotations_mzmine")$files$annotations$prepared$structural$mzmine, str_stereo = get_params(step = "prepare_annotations_mzmine")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_mzmine")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_mzmine")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_mzmine")$files$libraries$sop$merged$structures$taxonomies$npc )prepare_annotations_mzmine( input = get_params(step = "prepare_annotations_mzmine")$files$annotations$raw$spectral$mzmine, output = get_params(step = "prepare_annotations_mzmine")$files$annotations$prepared$structural$mzmine, str_stereo = get_params(step = "prepare_annotations_mzmine")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_mzmine")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_mzmine")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_mzmine")$files$libraries$sop$merged$structures$taxonomies$npc )
input |
character Character string or vector of paths to mzmine annotation files |
output |
character Character string path for prepared mzmine annotations output |
str_stereo |
character Character string path to structures stereochemistry file |
str_met |
character Character string path to structures metadata file |
str_tax_cla |
character Character string path to ClassyFire taxonomy file |
str_tax_npc |
character Character string path to NPClassifier taxonomy file |
Character string path to prepared mzmine annotations
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_annotations_mzmine() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_annotations_mzmine() unlink("data", recursive = TRUE) ## End(Not run)
Extracts structural annotations from mzTab-M tables and standardizes them for TIMA weighting and filtering steps.
prepare_annotations_mztab( input = get_params(step = "prepare_annotations_mztab")$files$mztab$raw, output = get_params(step = "prepare_annotations_mztab")$files$annotations$prepared$structural$mztab, str_stereo = get_params(step = "prepare_annotations_mztab")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_mztab")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_mztab")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_mztab")$files$libraries$sop$merged$structures$taxonomies$npc, strict = FALSE )prepare_annotations_mztab( input = get_params(step = "prepare_annotations_mztab")$files$mztab$raw, output = get_params(step = "prepare_annotations_mztab")$files$annotations$prepared$structural$mztab, str_stereo = get_params(step = "prepare_annotations_mztab")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_mztab")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_mztab")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_mztab")$files$libraries$sop$merged$structures$taxonomies$npc, strict = FALSE )
input |
|
output |
|
str_stereo |
|
str_met |
|
str_tax_cla |
|
str_tax_npc |
|
strict |
|
Annotation source priority:
SME (evidence) rows — highest specificity; mapped back to SML IDs
via the SME_ID_REFS column when an SML section is present.
SML (small molecule summary) rows — used when no SME section exists.
SMF (feature) rows — fallback for feature-only files.
When no input is provided (or the file does not exist) an empty annotation
table is written and the function returns silently.
Character path to the prepared annotation file (invisibly when the empty-annotation fallback is used).
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
Prepares SIRIUS annotation results (structure predictions, CANOPUS chemical classifications, and formula predictions) by harmonizing formats across SIRIUS versions (v5/v6), standardizing column names, and integrating with structure metadata.
prepare_annotations_sirius( input_directory = get_params(step = "prepare_annotations_sirius")$files$annotations$raw$sirius, output_ann = get_params(step = "prepare_annotations_sirius")$files$annotations$prepared$structural$sirius, output_can = get_params(step = "prepare_annotations_sirius")$files$annotations$prepared$canopus, output_for = get_params(step = "prepare_annotations_sirius")$files$annotations$prepared$formula, sirius_version = get_params(step = "prepare_annotations_sirius")$tools$sirius$version, str_stereo = get_params(step = "prepare_annotations_sirius")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_sirius")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_sirius")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_sirius")$files$libraries$sop$merged$structures$taxonomies$npc, max_analog_abs_mz_error = get_params(step = "prepare_annotations_sirius")$tools$sirius$max_analog_abs_mz_error )prepare_annotations_sirius( input_directory = get_params(step = "prepare_annotations_sirius")$files$annotations$raw$sirius, output_ann = get_params(step = "prepare_annotations_sirius")$files$annotations$prepared$structural$sirius, output_can = get_params(step = "prepare_annotations_sirius")$files$annotations$prepared$canopus, output_for = get_params(step = "prepare_annotations_sirius")$files$annotations$prepared$formula, sirius_version = get_params(step = "prepare_annotations_sirius")$tools$sirius$version, str_stereo = get_params(step = "prepare_annotations_sirius")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_sirius")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_sirius")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_sirius")$files$libraries$sop$merged$structures$taxonomies$npc, max_analog_abs_mz_error = get_params(step = "prepare_annotations_sirius")$tools$sirius$max_analog_abs_mz_error )
input_directory |
character Character path to directory or zip file containing SIRIUS results. |
output_ann |
character Character path for prepared structure annotation output. |
output_can |
character Character path for prepared CANOPUS output. |
output_for |
character Character path for prepared formula output. |
sirius_version |
character Character SIRIUS version ("5" or "6"). |
str_stereo |
character Character path to structure stereochemistry file. |
str_met |
character Character path to structure metadata file. |
str_tax_cla |
character Character path to ClassyFire taxonomy file. |
str_tax_npc |
character Character path to NPClassifier taxonomy file. |
max_analog_abs_mz_error |
numeric Maximum allowed absolute m/z deviation (Da) for keeping SIRIUS spectral analog hits. |
This function:
Validates inputs (version, paths, file existence).
Loads SIRIUS output files (CANOPUS, formulas, structures, denovo, spectral matches).
Harmonizes column names across SIRIUS v5 and v6.
Joins with structure metadata (stereochemistry, names, taxonomy).
Splits results into three output files: annotations, CANOPUS, formulas.
Exports parameters and results.
If the input directory does not exist, returns an empty template with expected columns to ensure downstream compatibility.
Character path to the prepared SIRIUS annotations file (invisible).
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_annotations_sirius() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_annotations_sirius() unlink("data", recursive = TRUE) ## End(Not run)
This function prepares MS2 spectral library matching results by standardizing column names, integrating structure metadata, and formatting for downstream TIMA annotation workflows. Handles various spectral matching result formats.
prepare_annotations_spectra( input = get_params(step = "prepare_annotations_spectra")$files$annotations$raw$spectral$spectral, output = get_params(step = "prepare_annotations_spectra")$files$annotations$prepared$structural$spectral, str_stereo = get_params(step = "prepare_annotations_spectra")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_spectra")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_spectra")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_spectra")$files$libraries$sop$merged$structures$taxonomies$npc )prepare_annotations_spectra( input = get_params(step = "prepare_annotations_spectra")$files$annotations$raw$spectral$spectral, output = get_params(step = "prepare_annotations_spectra")$files$annotations$prepared$structural$spectral, str_stereo = get_params(step = "prepare_annotations_spectra")$files$libraries$sop$merged$structures$stereo, str_met = get_params(step = "prepare_annotations_spectra")$files$libraries$sop$merged$structures$metadata, str_tax_cla = get_params(step = "prepare_annotations_spectra")$files$libraries$sop$merged$structures$taxonomies$cla, str_tax_npc = get_params(step = "prepare_annotations_spectra")$files$libraries$sop$merged$structures$taxonomies$npc )
input |
character Character string path to spectral matching results file |
output |
character Character string path for prepared spectral annotations output |
str_stereo |
character Character string path to structures stereochemistry file |
str_met |
character Character string path to structures metadata file |
str_tax_cla |
character Character string path to ClassyFire taxonomy file |
str_tax_npc |
character Character string path to NPClassifier taxonomy file |
Character string path to prepared spectral annotations
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" data_interim <- "data/interim/" dir <- paste0(github, repo) input <- get_params(step = "prepare_annotations_spectra")$files$annotations$raw$spectral$spectral |> gsub(pattern = ".tsv.gz", replacement = "_pos.tsv", fixed = TRUE) get_file(url = paste0(dir, input), export = input) dir <- paste0(dir, data_interim) prepare_annotations_spectra( input = input, str_stereo = paste0(dir, "libraries/sop/merged/structures/stereo.tsv"), str_met = paste0(dir, "libraries/sop/merged/structures/metadata.tsv"), str_tax_cla = paste0(dir, "libraries/sop/merged/structures/taxonomies/classyfire.tsv"), str_tax_npc = paste0(dir, "libraries/sop/merged/structures/taxonomies/npc.tsv") ) unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" data_interim <- "data/interim/" dir <- paste0(github, repo) input <- get_params(step = "prepare_annotations_spectra")$files$annotations$raw$spectral$spectral |> gsub(pattern = ".tsv.gz", replacement = "_pos.tsv", fixed = TRUE) get_file(url = paste0(dir, input), export = input) dir <- paste0(dir, data_interim) prepare_annotations_spectra( input = input, str_stereo = paste0(dir, "libraries/sop/merged/structures/stereo.tsv"), str_met = paste0(dir, "libraries/sop/merged/structures/metadata.tsv"), str_tax_cla = paste0(dir, "libraries/sop/merged/structures/taxonomies/classyfire.tsv"), str_tax_npc = paste0(dir, "libraries/sop/merged/structures/taxonomies/npc.tsv") ) unlink("data", recursive = TRUE) ## End(Not run)
This function prepares molecular network component (cluster) assignments by loading, standardizing, and formatting component IDs for each feature. Components represent groups of related features in the molecular network.
prepare_features_components( input = get_params(step = "prepare_features_components")$files$networks$spectral$components$raw, output = get_params(step = "prepare_features_components")$files$networks$spectral$components$prepared )prepare_features_components( input = get_params(step = "prepare_features_components")$files$networks$spectral$components$raw, output = get_params(step = "prepare_features_components")$files$networks$spectral$components$prepared )
input |
character Character vector of paths to input component files. Can be a single file or multiple files that will be combined. |
output |
character Character string path where prepared components should be saved |
Character string path to the prepared features' components file
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) input <- get_params(step = "prepare_features_components")$files$networks$spectral$components$raw get_file(url = paste0(dir, input), export = input) prepare_features_components(input = input) unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) input <- get_params(step = "prepare_features_components")$files$networks$spectral$components$raw get_file(url = paste0(dir, input), export = input) prepare_features_components(input = input) unlink("data", recursive = TRUE) ## End(Not run)
This function prepares molecular network edges by combining MS1-based and spectral similarity edges, adding entropy information, and standardizing column names. Edges represent relationships between features in the molecular network.
prepare_features_edges( input = get_params(step = "prepare_features_edges")$files$networks$spectral$edges$raw, output = get_params(step = "prepare_features_edges")$files$networks$spectral$edges$prepared, name_source = get_params(step = "prepare_features_edges")$names$source, name_target = get_params(step = "prepare_features_edges")$names$target )prepare_features_edges( input = get_params(step = "prepare_features_edges")$files$networks$spectral$edges$raw, output = get_params(step = "prepare_features_edges")$files$networks$spectral$edges$prepared, name_source = get_params(step = "prepare_features_edges")$names$source, name_target = get_params(step = "prepare_features_edges")$names$target )
input |
list Named list containing paths to edge files. Must have "ms1" and "spectral" elements pointing to respective edge files. |
output |
character Character string path where prepared edges should be saved |
name_source |
character Character string name of the source feature column in input files |
name_target |
character Character string name of the target feature column in input files |
Character string path to the prepared edges file
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) input_1 <- get_params(step = "prepare_features_edges")$files$networks$spectral$edges$raw$ms1 input_2 <- get_params(step = "prepare_features_edges")$files$networks$spectral$edges$raw$spectral get_file(url = paste0(dir, input_1), export = input_1) get_file(url = paste0(dir, input_2), export = input_2) prepare_features_edges( input = list("ms1" = input_1, "spectral" = input_2) ) unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) input_1 <- get_params(step = "prepare_features_edges")$files$networks$spectral$edges$raw$ms1 input_2 <- get_params(step = "prepare_features_edges")$files$networks$spectral$edges$raw$spectral get_file(url = paste0(dir, input_1), export = input_1) get_file(url = paste0(dir, input_2), export = input_2) prepare_features_edges( input = list("ms1" = input_1, "spectral" = input_2) ) unlink("data", recursive = TRUE) ## End(Not run)
Prepares LC-MS feature tables by standardizing column names, filtering to top-intensity samples per feature, and formatting for downstream analysis. Supports multiple formats (mzmine, SLAW, SIRIUS).
prepare_features_tables( features = get_params(step = "prepare_features_tables")$files$features$raw, output = get_params(step = "prepare_features_tables")$files$features$prepared, candidates = get_params(step = "prepare_features_tables")$annotations$canidates$samples, name_adduct = get_params(step = "prepare_features_tables")$names$adduct, name_features = get_params(step = "prepare_features_tables")$names$features, name_rt = get_params(step = "prepare_features_tables")$names$rt$features, name_mz = get_params(step = "prepare_features_tables")$names$precursor )prepare_features_tables( features = get_params(step = "prepare_features_tables")$files$features$raw, output = get_params(step = "prepare_features_tables")$files$features$prepared, candidates = get_params(step = "prepare_features_tables")$annotations$canidates$samples, name_adduct = get_params(step = "prepare_features_tables")$names$adduct, name_features = get_params(step = "prepare_features_tables")$names$features, name_rt = get_params(step = "prepare_features_tables")$names$rt$features, name_mz = get_params(step = "prepare_features_tables")$names$precursor )
features |
character Path to raw features file (CSV/TSV). |
output |
character Path where prepared features should be saved. |
candidates |
integer Number of top-intensity samples to retain per feature (default: from params; recommended <=5 to balance data size and coverage). |
name_adduct |
character Name of the adduct column in input. |
name_features |
character Name of the feature ID column in input. |
name_rt |
character Name of the retention time column in input. |
name_mz |
character Name of the m/z column in input. |
character(1) Path to the prepared feature table (invisibly).
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() get_file( url = get_default_paths()$urls$examples$features, export = get_params(step = "prepare_features_tables")$files$features$raw ) prepare_features_tables() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() get_file( url = get_default_paths()$urls$examples$features, export = get_params(step = "prepare_features_tables")$files$features$raw ) prepare_features_tables() unlink("data", recursive = TRUE) ## End(Not run)
This function prepares retention time libraries by combining experimental and in silico predicted retention times from multiple sources (MGF files, CSV files). It standardizes retention time units, validates structures, and creates both RT libraries and pseudo structure-organism pairs for RT-based annotation.
prepare_libraries_rt( mgf_exp = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$exp$mgf, mgf_is = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$is$mgf, temp_exp = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$exp$csv, temp_is = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$is$csv, output_rt = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$prepared, output_sop = get_params(step = "prepare_libraries_rt")$files$libraries$sop$prepared$rt, col_ik = get_params(step = "prepare_libraries_rt")$names$mgf$inchikey, col_na = get_params(step = "prepare_libraries_rt")$names$mgf$name, col_rt = get_params(step = "prepare_libraries_rt")$names$mgf$retention_time, col_sm = get_params(step = "prepare_libraries_rt")$names$mgf$smiles, name_inchikey = get_params(step = "prepare_libraries_rt")$names$inchikey, name_name = get_params(step = "prepare_libraries_rt")$names$compound_name, name_rt = get_params(step = "prepare_libraries_rt")$names$rt$library, name_smiles = get_params(step = "prepare_libraries_rt")$names$smiles, unit_rt = get_params(step = "prepare_libraries_rt")$units$rt )prepare_libraries_rt( mgf_exp = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$exp$mgf, mgf_is = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$is$mgf, temp_exp = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$exp$csv, temp_is = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$is$csv, output_rt = get_params(step = "prepare_libraries_rt")$files$libraries$temporal$prepared, output_sop = get_params(step = "prepare_libraries_rt")$files$libraries$sop$prepared$rt, col_ik = get_params(step = "prepare_libraries_rt")$names$mgf$inchikey, col_na = get_params(step = "prepare_libraries_rt")$names$mgf$name, col_rt = get_params(step = "prepare_libraries_rt")$names$mgf$retention_time, col_sm = get_params(step = "prepare_libraries_rt")$names$mgf$smiles, name_inchikey = get_params(step = "prepare_libraries_rt")$names$inchikey, name_name = get_params(step = "prepare_libraries_rt")$names$compound_name, name_rt = get_params(step = "prepare_libraries_rt")$names$rt$library, name_smiles = get_params(step = "prepare_libraries_rt")$names$smiles, unit_rt = get_params(step = "prepare_libraries_rt")$units$rt )
mgf_exp |
character Character vector of paths to MGF files with experimental RT |
mgf_is |
character Character vector of paths to MGF files with in silico predicted RT |
temp_exp |
character Character vector of paths to CSV files with experimental RT |
temp_is |
character Character vector of paths to CSV files with in silico predicted RT |
output_rt |
character Character string path for prepared RT library output |
output_sop |
character Character string path for pseudo SOP output |
col_ik |
character Character string name of InChIKey column in MGF |
col_na |
character Character string name of compound name column in MGF |
col_rt |
character Character string name of retention time column in MGF |
col_sm |
character Character string name of SMILES column in MGF |
name_inchikey |
character Character string name of InChIKey column in CSV |
name_name |
character Character string name of compound name column in CSV |
name_rt |
character Character string name of retention time column in CSV |
name_smiles |
character Character string name of SMILES column in CSV |
unit_rt |
character Character string RT unit: "seconds" or "minutes" |
Character string path to the prepared retention time library
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_libraries_rt() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_libraries_rt() unlink("data", recursive = TRUE) ## End(Not run)
This function prepares BiGG (Biochemical, Genetic and Genomic) structure-organism pairs by querying BiGG models and PubChem for metabolite information, extracting chemical structures, and formatting for TIMA annotation workflows.
**Biota organism**: This function creates a special "Biota" organism for metabolites present in all models (shared core metabolism). These structures represent universal biochemical pathways found across all life forms and are always assigned maximum biological score during annotation, regardless of sample taxonomy. The Biota organism has organism_taxonomy_01domain = "Biota" and ottid = 0.
prepare_libraries_sop_bigg( bigg_doi = "10.1093/nar/gkv1049", bigg_models = list(`Escherichia coli` = c(model_id = "iML1515", doi = "10.1038/nbt.3956"), `Saccharomyces cerevisiae` = c(model_id = "iMM904", doi = "10.1186/1752-0509-3-37"), `Homo sapiens` = c(model_id = "Recon3D", doi = "10.1038/nbt.4072")), bigg_url = "http://bigg.ucsd.edu/static/models/", output = get_params(step = "prepare_libraries_sop_bigg")$files$libraries$sop$prepared$bigg )prepare_libraries_sop_bigg( bigg_doi = "10.1093/nar/gkv1049", bigg_models = list(`Escherichia coli` = c(model_id = "iML1515", doi = "10.1038/nbt.3956"), `Saccharomyces cerevisiae` = c(model_id = "iMM904", doi = "10.1186/1752-0509-3-37"), `Homo sapiens` = c(model_id = "Recon3D", doi = "10.1038/nbt.4072")), bigg_url = "http://bigg.ucsd.edu/static/models/", output = get_params(step = "prepare_libraries_sop_bigg")$files$libraries$sop$prepared$bigg )
bigg_doi |
character Character string DOI for BiGG database reference |
bigg_models |
list Named list of BiGG models with organism names as keys and named character vectors containing "model_id" and "doi" as values |
bigg_url |
character Character string base URL for BiGG models API |
output |
character Character string path for prepared BiGG library output |
Character string path to prepared BiGG structure-organism pairs
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_bigg() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_bigg() unlink("data", recursive = TRUE) ## End(Not run)
This function prepares closed (private/restricted) structure- organism pair libraries by formatting columns, rounding values, and standardizing structure. Falls back to an empty template if the closed resource is not accessible.
prepare_libraries_sop_closed( input = get_params(step = "prepare_libraries_sop_closed")$files$libraries$sop$raw$closed, output = get_params(step = "prepare_libraries_sop_closed")$files$libraries$sop$prepared$closed )prepare_libraries_sop_closed( input = get_params(step = "prepare_libraries_sop_closed")$files$libraries$sop$raw$closed, output = get_params(step = "prepare_libraries_sop_closed")$files$libraries$sop$prepared$closed )
input |
character Character string path to input closed library file |
output |
character Character string path where prepared library should be saved |
Character string path to the prepared structure-organism pairs library
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_closed() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_closed() unlink("data", recursive = TRUE) ## End(Not run)
This function prepares ECMDB (E. coli Metabolome Database) structure-organism pairs by parsing JSON data, extracting metabolite information, and formatting for TIMA workflows. Handles E. coli metabolite data with structures.
prepare_libraries_sop_ecmdb( input = get_params(step = "prepare_libraries_sop_ecmdb")$files$libraries$sop$raw$ecmdb, output = get_params(step = "prepare_libraries_sop_ecmdb")$files$libraries$sop$prepared$ecmdb )prepare_libraries_sop_ecmdb( input = get_params(step = "prepare_libraries_sop_ecmdb")$files$libraries$sop$raw$ecmdb, output = get_params(step = "prepare_libraries_sop_ecmdb")$files$libraries$sop$prepared$ecmdb )
input |
character Character string path to ECMDB JSON zip file |
output |
character Character string path for prepared ECMDB library output |
Character string path to prepared ECMDB structure-organism pairs
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_ecmdb() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_ecmdb() unlink("data", recursive = TRUE) ## End(Not run)
This function prepares HMDB (Human Metabolome Database) structure-organism pairs by parsing SDF files, extracting metadata, and formatting for TIMA annotation workflows.
prepare_libraries_sop_hmdb( input = get_params(step = "prepare_libraries_sop_hmdb")$files$libraries$sop$raw$hmdb, output = get_params(step = "prepare_libraries_sop_hmdb")$files$libraries$sop$prepared$hmdb )prepare_libraries_sop_hmdb( input = get_params(step = "prepare_libraries_sop_hmdb")$files$libraries$sop$raw$hmdb, output = get_params(step = "prepare_libraries_sop_hmdb")$files$libraries$sop$prepared$hmdb )
input |
character Character string path to HMDB SDF zip file |
output |
character Character string path for prepared HMDB library output |
Character string path to prepared HMDB structure-organism pairs
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_hmdb() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_hmdb() unlink("data", recursive = TRUE) ## End(Not run)
This function prepares the LOTUS. It standardizes columns, extracts 2D InChIKeys, rounds numeric values, and removes duplicates.
prepare_libraries_sop_lotus( input = get_params(step = "prepare_libraries_sop_lotus")$files$libraries$sop$raw$lotus, output = get_params(step = "prepare_libraries_sop_lotus")$files$libraries$sop$prepared$lotus )prepare_libraries_sop_lotus( input = get_params(step = "prepare_libraries_sop_lotus")$files$libraries$sop$raw$lotus, output = get_params(step = "prepare_libraries_sop_lotus")$files$libraries$sop$prepared$lotus )
input |
character Character string path to the raw LOTUS data file |
output |
character Character string path for the prepared output file |
Character string path to the prepared structure-organism pairs library file
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_lotus() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_lotus() unlink("data", recursive = TRUE) ## End(Not run)
This function merges all structure-organism pair libraries (LOTUS, HMDB, ECMDB, etc.) into a single comprehensive library. Can optionally filter by taxonomic level to create biologically-focused subsets. Also splits structures into separate metadata tables.
prepare_libraries_sop_merged( files = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$prepared, filter = get_params(step = "prepare_libraries_sop_merged")$organisms$filter$mode, level = get_params(step = "prepare_libraries_sop_merged")$organisms$filter$level, value = get_params(step = "prepare_libraries_sop_merged")$organisms$filter$value, cache = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$processed, npc_cache = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$taxonomies$n, cla_cache = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$taxonomies$c, output_key = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$keys, output_org_tax_ott = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$organisms$taxonomies$ott, output_str_can = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$canonical, output_str_stereo = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$stereo, output_str_met = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$metadata, output_str_tax_cla = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$taxonomies$cla, output_str_tax_npc = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$taxonomies$npc )prepare_libraries_sop_merged( files = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$prepared, filter = get_params(step = "prepare_libraries_sop_merged")$organisms$filter$mode, level = get_params(step = "prepare_libraries_sop_merged")$organisms$filter$level, value = get_params(step = "prepare_libraries_sop_merged")$organisms$filter$value, cache = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$processed, npc_cache = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$taxonomies$n, cla_cache = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$taxonomies$c, output_key = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$keys, output_org_tax_ott = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$organisms$taxonomies$ott, output_str_can = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$canonical, output_str_stereo = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$stereo, output_str_met = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$metadata, output_str_tax_cla = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$taxonomies$cla, output_str_tax_npc = get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$merged$structures$taxonomies$npc )
files |
character Character vector or list of paths to prepared library files |
filter |
logical Logical whether to filter the merged library by taxonomy |
level |
character Character string taxonomic rank for filtering (kingdom, phylum, family, genus, etc.) |
value |
character Character string taxon name(s) to keep (can use | for multiple, e.g., 'Gentianaceae|Apocynaceae') |
cache |
character Character string path to cache directory for processed SMILES |
npc_cache |
character Optional path to an additional NPClassifier
taxonomy cache file (TSV/TSV.gz). Structures present in the merged library
but missing NPClassifier taxonomy will be looked up in this cache. Expected
columns: |
cla_cache |
character Optional path to an additional ClassyFire
taxonomy cache file (TSV/TSV.gz). Structures present in the merged library
but missing ClassyFire taxonomy will be looked up in this cache. Expected
columns: |
output_key |
character Character string path for output keys file |
output_org_tax_ott |
character Character string path for organisms taxonomy (OTT) file |
output_str_can |
character Character string path for structures canonical SMILES file |
output_str_stereo |
character Character string path for structures stereochemistry file |
output_str_met |
character Character string path for structures metadata file |
output_str_tax_cla |
character Character string path for ClassyFire taxonomy file |
output_str_tax_npc |
character Character string path for NPClassifier taxonomy file |
Creates merged library by combining all available SOP sources, optionally filtering by taxonomic criteria (e.g., only Gentianaceae). Splits output into structures metadata, names, taxonomy, and organisms.
Character string path to the prepared merged SOP library
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) files <- get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$prepared$lotus |> gsub(pattern = ".gz", replacement = "", fixed = TRUE) get_file(url = paste0(dir, files), export = files) prepare_libraries_sop_merged(files = files) unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) files <- get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$prepared$lotus |> gsub(pattern = ".gz", replacement = "", fixed = TRUE) get_file(url = paste0(dir, files), export = files) prepare_libraries_sop_merged(files = files) unlink("data", recursive = TRUE) ## End(Not run)
This function prepares the PubChem Lite CCSbase export for exposomics as a xenobiotic structure-organism pairs library.
prepare_libraries_sop_pubchemlite( input = get_params(step = "prepare_libraries_sop_pubchemlite")$files$libraries$sop$raw$pubchemlite, output = get_params(step = "prepare_libraries_sop_pubchemlite")$files$libraries$sop$prepared$pubchemlite )prepare_libraries_sop_pubchemlite( input = get_params(step = "prepare_libraries_sop_pubchemlite")$files$libraries$sop$raw$pubchemlite, output = get_params(step = "prepare_libraries_sop_pubchemlite")$files$libraries$sop$prepared$pubchemlite )
input |
character Character string path to PubChem Lite CSV file |
output |
character Character string path for prepared SOP output |
Character string path to prepared SOP file
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_pubchemlite() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_libraries_sop_pubchemlite() unlink("data", recursive = TRUE) ## End(Not run)
Prepares spectral libraries for matching by importing, harmonizing, and splitting spectra by polarity. Exports results as Spectra RDS files (pos/neg) and a structure-organism pair (SOP) table.
prepare_libraries_spectra( input = get_params(step = "prepare_libraries_spectra")$files$libraries$spectral$raw, min_fragments = get_params(step = "prepare_libraries_spectra")$ms$thresholds$ms2$min_fragments, nam_lib = get_params(step = "prepare_libraries_spectra")$names$libraries, col_ad = get_params(step = "prepare_libraries_spectra")$names$mgf$adduct, col_ce = get_params(step = "prepare_libraries_spectra")$names$mgf$collision_energy, col_ci = get_params(step = "prepare_libraries_spectra")$names$mgf$compound_id, col_in = get_params(step = "prepare_libraries_spectra")$names$mgf$inchi, col_io = get_params(step = "prepare_libraries_spectra")$names$mgf$inchi_no_stereo, col_ik = get_params(step = "prepare_libraries_spectra")$names$mgf$inchikey, col_il = get_params(step = "prepare_libraries_spectra")$names$mgf$inchikey_connectivity_layer, col_na = get_params(step = "prepare_libraries_spectra")$names$mgf$name, col_po = get_params(step = "prepare_libraries_spectra")$names$mgf$polarity, col_sm = get_params(step = "prepare_libraries_spectra")$names$mgf$smiles, col_sn = get_params(step = "prepare_libraries_spectra")$names$mgf$smiles_no_stereo, col_si = get_params(step = "prepare_libraries_spectra")$names$mgf$spectrum_id, col_sp = get_params(step = "prepare_libraries_spectra")$names$mgf$splash, col_sy = get_params(step = "prepare_libraries_spectra")$names$mgf$synonyms )prepare_libraries_spectra( input = get_params(step = "prepare_libraries_spectra")$files$libraries$spectral$raw, min_fragments = get_params(step = "prepare_libraries_spectra")$ms$thresholds$ms2$min_fragments, nam_lib = get_params(step = "prepare_libraries_spectra")$names$libraries, col_ad = get_params(step = "prepare_libraries_spectra")$names$mgf$adduct, col_ce = get_params(step = "prepare_libraries_spectra")$names$mgf$collision_energy, col_ci = get_params(step = "prepare_libraries_spectra")$names$mgf$compound_id, col_in = get_params(step = "prepare_libraries_spectra")$names$mgf$inchi, col_io = get_params(step = "prepare_libraries_spectra")$names$mgf$inchi_no_stereo, col_ik = get_params(step = "prepare_libraries_spectra")$names$mgf$inchikey, col_il = get_params(step = "prepare_libraries_spectra")$names$mgf$inchikey_connectivity_layer, col_na = get_params(step = "prepare_libraries_spectra")$names$mgf$name, col_po = get_params(step = "prepare_libraries_spectra")$names$mgf$polarity, col_sm = get_params(step = "prepare_libraries_spectra")$names$mgf$smiles, col_sn = get_params(step = "prepare_libraries_spectra")$names$mgf$smiles_no_stereo, col_si = get_params(step = "prepare_libraries_spectra")$names$mgf$spectrum_id, col_sp = get_params(step = "prepare_libraries_spectra")$names$mgf$splash, col_sy = get_params(step = "prepare_libraries_spectra")$names$mgf$synonyms )
input |
character Character vector of file paths containing spectral data. |
min_fragments |
integer Minimum number of fragment peaks a spectrum must have after cleaning to be retained (default: 2). |
nam_lib |
character Character library name for metadata. |
col_ad |
character Name of the adduct column in MGF. |
col_ce |
character Name of the collision energy column in MGF. |
col_ci |
character Name of the compound ID column in MGF. |
col_in |
character Name of the InChI column in MGF. |
col_io |
character Name of the InChI without stereo column in MGF. |
col_ik |
character Name of the InChIKey column in MGF. |
col_il |
character Name of the InChIKey connectivity layer column in MGF. |
col_na |
character Name of the name column in MGF. |
col_po |
character Name of the polarity column in MGF. |
col_sm |
character Name of the SMILES column in MGF. |
col_sn |
character Name of the SMILES without stereo column in MGF. |
col_si |
character Name of the spectrum ID column in MGF. |
col_sp |
character Name of the SPLASH column in MGF. |
col_sy |
character Name of the synonyms column in MGF. |
This function:
Checks if output files already exist (idempotent).
Imports spectral data from input files.
Extracts and harmonizes spectra for positive and negative modes.
Fixes precursor m/z and InChIKey connectivity layer issues.
Exports polarity-specific Spectra objects and SOP table.
Returns empty templates if input files are missing.
Character vector with paths to prepared library files (invisible).
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_params(),
prepare_taxa(),
read_mztab()
## Not run: copy_backbone() go_to_cache() prepare_libraries_spectra() unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() prepare_libraries_spectra() unlink("data", recursive = TRUE) ## End(Not run)
Prepares and validates main parameters for the TIMA workflow. Loads YAML configuration files, extracts all parameters, and sets up the parameter structure for downstream analysis steps.
prepare_params( params_small = get_params(step = "prepare_params"), params_advanced = get_params(step = "prepare_params_advanced"), step = NA )prepare_params( params_small = get_params(step = "prepare_params"), params_advanced = get_params(step = "prepare_params_advanced"), step = NA )
params_small |
list List of basic parameters for the workflow |
params_advanced |
list List of advanced parameters for the workflow |
step |
character Workflow step identifier (default: NA) |
Character vector of paths to YAML files containing prepared parameters
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_taxa(),
read_mztab()
## Not run: # Prepare parameters for TIMA workflow param_files <- prepare_params( params_small = get_params(step = "prepare_params"), params_advanced = get_params(step = "prepare_params_advanced") ) # Parameters are exported to timestamped files # and can be loaded later for reproducibility ## End(Not run)## Not run: # Prepare parameters for TIMA workflow param_files <- prepare_params( params_small = get_params(step = "prepare_params"), params_advanced = get_params(step = "prepare_params_advanced") ) # Parameters are exported to timestamped files # and can be loaded later for reproducibility ## End(Not run)
This function prepares taxonomic information for features by matching organism names to Open Tree of Life taxonomy. Can attribute all features to a single organism or distribute them across multiple organisms based on relative intensities in samples.
prepare_taxa( input = get_params(step = "prepare_taxa")$files$features$prepared, extension = get_params(step = "prepare_taxa")$names$extension, name_filename = get_params(step = "prepare_taxa")$names$filename, colname = get_params(step = "prepare_taxa")$names$taxon, metadata = get_params(step = "prepare_taxa")$files$metadata$raw, org_tax_ott = get_params(step = "prepare_taxa")$files$libraries$sop$merged$organisms$taxonomies$ott, output = get_params(step = "prepare_taxa")$files$metadata$prepared, taxon = get_params(step = "prepare_taxa")$organisms$taxon )prepare_taxa( input = get_params(step = "prepare_taxa")$files$features$prepared, extension = get_params(step = "prepare_taxa")$names$extension, name_filename = get_params(step = "prepare_taxa")$names$filename, colname = get_params(step = "prepare_taxa")$names$taxon, metadata = get_params(step = "prepare_taxa")$files$metadata$raw, org_tax_ott = get_params(step = "prepare_taxa")$files$libraries$sop$merged$organisms$taxonomies$ott, output = get_params(step = "prepare_taxa")$files$metadata$prepared, taxon = get_params(step = "prepare_taxa")$organisms$taxon )
input |
character Character string path to features file with intensities |
extension |
logical Logical whether column names contain file extensions |
name_filename |
character Character string name of filename column in metadata |
colname |
character Character string name of column with biological source info |
metadata |
character Character string path to metadata file with organism info |
org_tax_ott |
character Character string path to Open Tree of Life taxonomy file |
output |
character Character string path for output file |
taxon |
character Character string organism name to enforce for all features (e.g., "Homo sapiens"). If provided, overrides metadata-based assignment. |
Depending on whether features are aligned between samples from various organisms, this function either: - Attributes all features to a single organism (if taxon specified), or - Attributes features to multiple organisms based on their relative intensities across samples (using metadata)
Character string path to the prepared taxa file
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
read_mztab()
## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) org_tax_ott <- paste0( "data/interim/libraries/", "sop/merged/organisms/taxonomies/ott.tsv" ) get_file(url = paste0(dir, org_tax_ott), export = org_tax_ott) get_file( url = paste0(dir, "data/interim/features/example_features.tsv"), export = get_params(step = "prepare_taxa")$files$features$prepared ) prepare_taxa( taxon = "Homo sapiens", org_tax_ott = org_tax_ott ) unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) org_tax_ott <- paste0( "data/interim/libraries/", "sop/merged/organisms/taxonomies/ott.tsv" ) get_file(url = paste0(dir, org_tax_ott), export = org_tax_ott) get_file( url = paste0(dir, "data/interim/features/example_features.tsv"), export = get_params(step = "prepare_taxa")$files$features$prepared ) prepare_taxa( taxon = "Homo sapiens", org_tax_ott = org_tax_ott ) unlink("data", recursive = TRUE) ## End(Not run)
Processes SMILES using RDKit (via Python) to standardize structures, generate InChIKeys, calculate molecular properties, and extract 2D representations. Results are cached to avoid reprocessing.
process_smiles(df, smiles_colname = "structure_smiles_initial", cache = NULL)process_smiles(df, smiles_colname = "structure_smiles_initial", cache = NULL)
df |
data.frame Data frame containing SMILES strings |
smiles_colname |
character Column name containing SMILES (default: "structure_smiles_initial") |
cache |
character Path to cached processed SMILES file, or NULL to skip caching |
Data frame with processed SMILES including InChIKey, molecular formula (with isotopes shown), exact mass (with isotope contributions), 2D SMILES, xLogP, and connectivity layer
## Not run: # Natural compound df <- data.frame( structure_smiles_initial = "OC[C@H]1OC(O)[C@H](O)[C@H](O)[C@H]1O" ) result <- process_smiles(df) # Formula: C6H12O6, Mass: 180.063 Da # Isotope-labeled compound (4× 13C) df_labeled <- data.frame( structure_smiles_initial = "OC[13C@H]1OC(O)[13C@H](O)[13C@H](O)[13C@H]1O" ) result_labeled <- process_smiles(df_labeled) # Formula: C2[13C]4H12O6 (isotopes shown separately) # Mass: 184.077 Da (difference of ~4.013 Da from natural) # SMILES preserves [13C] notation # InChIKey differs from natural glucose ## End(Not run)## Not run: # Natural compound df <- data.frame( structure_smiles_initial = "OC[C@H]1OC(O)[C@H](O)[C@H](O)[C@H]1O" ) result <- process_smiles(df) # Formula: C6H12O6, Mass: 180.063 Da # Isotope-labeled compound (4× 13C) df_labeled <- data.frame( structure_smiles_initial = "OC[13C@H]1OC(O)[13C@H](O)[13C@H](O)[13C@H]1O" ) result_labeled <- process_smiles(df_labeled) # Formula: C2[13C]4H12O6 (isotopes shown separately) # Mass: 184.077 Da (difference of ~4.013 Da from natural) # SMILES preserves [13C] notation # InChIKey differs from natural glucose ## End(Not run)
Parses mzTab-M plain-text files (v2.0.0–M) and exports TIMA-ready feature, optional spectra (MGF), and optional metadata tables.
read_mztab( input, output_features, output_spectra = NULL, output_metadata = NULL, name_features = "feature_id", name_rt = "rt", name_mz = "mz", name_adduct = "adduct", strict = FALSE )read_mztab( input, output_features, output_spectra = NULL, output_metadata = NULL, name_features = "feature_id", name_rt = "rt", name_mz = "mz", name_adduct = "adduct", strict = FALSE )
input |
|
output_features |
|
output_spectra |
|
output_metadata |
|
name_features |
|
name_rt |
|
name_mz |
|
name_adduct |
|
strict |
|
Two spectrum export modes are supported:
When the mzTab-M file contains
masster-style COM\tMGH / COM\tMGF lines, real MS2 spectra are
extracted. Each entry carries a FEATURE_ID= field so that
get_spectra_ids() can map edges back to feature IDs.
When no embedded spectra are found, a proxy MGF is
generated with one dummy entry per feature (using the precursor m/z as
the sole peak). Each entry carries both TITLE= and FEATURE_ID= set
to the feature identifier.
Named list with paths: $features (always set), $spectra and
$metadata (NULL when the corresponding output argument is NULL or the
export step is skipped).
Other preparation:
prepare_annotations_gnps(),
prepare_annotations_mzmine(),
prepare_annotations_mztab(),
prepare_annotations_sirius(),
prepare_annotations_spectra(),
prepare_features_components(),
prepare_features_edges(),
prepare_features_tables(),
prepare_libraries_rt(),
prepare_libraries_sop_bigg(),
prepare_libraries_sop_closed(),
prepare_libraries_sop_ecmdb(),
prepare_libraries_sop_hmdb(),
prepare_libraries_sop_lotus(),
prepare_libraries_sop_merged(),
prepare_libraries_sop_pubchemlite(),
prepare_libraries_spectra(),
prepare_params(),
prepare_taxa()
Launches the TIMA Shiny web application for interactive metabolite annotation. Automatically detects Docker containers and adjusts network settings accordingly.
run_app(host = "127.0.0.1", port = 3838, browser = TRUE, reinstall = TRUE)run_app(host = "127.0.0.1", port = 3838, browser = TRUE, reinstall = TRUE)
host |
character Host/IP address to listen on. Default: "127.0.0.1" (localhost). Use "0.0.0.0" to allow external connections. |
port |
integer Port number to listen on. Default: 3838. Valid range: 1-65535. |
browser |
logical Whether to automatically launch a web browser when starting the app. Default: TRUE. Automatically set to FALSE in Docker. |
reinstall |
logical Whether to automatically reinstall TIMA. Default: TRUE. |
NULL (invisibly). Launches the Shiny app as a side effect.
Other workflow:
change_params_small(),
create_components(),
create_edges(),
create_edges_spectra(),
go_to_cache(),
install(),
install_tima(),
run_tima(),
tima_full(),
validate_inputs()
## Not run: # Launch app on localhost run_app() # Launch on custom port run_app(port = 8080) # Allow external connections (useful in Docker) run_app(host = "0.0.0.0", port = 3838) ## End(Not run)## Not run: # Launch app on localhost run_app() # Launch on custom port run_app(port = 8080) # Allow external connections (useful in Docker) run_app(host = "0.0.0.0", port = 3838) ## End(Not run)
Executes the full Taxonomically Informed Metabolite Annotation (TIMA) workflow from start to finish. This includes data preparation, library loading, annotation, weighting, and output generation. The function runs the targets pipeline and archives logs with timestamps for reproducibility.
run_tima( target_pattern = "^(ann_wei|exp_mzt)$", log_file = "tima.log", clean_old_logs = TRUE, log_level = "info" )run_tima( target_pattern = "^(ann_wei|exp_mzt)$", log_file = "tima.log", clean_old_logs = TRUE, log_level = "info" )
target_pattern |
character Regex pattern for target selection. Default: "^(ann_wei|exp_mzt)$" (annotation preparation + mzTab export) |
log_file |
character Path to log file. Default: "tima.log" |
clean_old_logs |
logical Remove old log file before starting. Default: TRUE |
log_level |
character or numeric Logging verbosity level. Can be one of: "trace", "debug", "info", "warn", "error", "fatal" or numeric values: TRACE=600, DEBUG=500, INFO=400, WARN=300, ERROR=200, FATAL=100. Default: "info" (400). Use "debug" for detailed troubleshooting. |
The workflow performs the following steps:
Initializes logging and timing
Navigates to cache directory
Executes the targets pipeline (annotation preparation + mzTab export)
Archives timestamped logs to data/processed/
Invisible NULL. Executes workflow as side effect and creates timestamped log files in data/processed/
Other workflow:
change_params_small(),
create_components(),
create_edges(),
create_edges_spectra(),
go_to_cache(),
install(),
install_tima(),
run_app(),
tima_full(),
validate_inputs()
## Not run: # Run full workflow with defaults (INFO level) run_tima() # Run with debug logging for troubleshooting run_tima(log_level = "debug") # Run with minimal logging (warnings and errors only) run_tima(log_level = "warn") # Run with custom target pattern run_tima(target_pattern = "^prepare_") # Preserve existing logs run_tima(clean_old_logs = FALSE) # Combine multiple options run_tima( target_pattern = "^ann_", log_level = "debug", clean_old_logs = FALSE ) ## End(Not run)## Not run: # Run full workflow with defaults (INFO level) run_tima() # Run with debug logging for troubleshooting run_tima(log_level = "debug") # Run with minimal logging (warnings and errors only) run_tima(log_level = "warn") # Run with custom target pattern run_tima(target_pattern = "^prepare_") # Preserve existing logs run_tima(clean_old_logs = FALSE) # Combine multiple options run_tima( target_pattern = "^ann_", log_level = "debug", clean_old_logs = FALSE ) ## End(Not run)
DEPRECATED: This function has been renamed to
run_tima.
Please use run_tima() instead. tima_full() will be removed in a
future version.
tima_full( target_pattern = "^(ann_wei|exp_mzt)$", log_file = "tima.log", clean_old_logs = TRUE, log_level = "info" )tima_full( target_pattern = "^(ann_wei|exp_mzt)$", log_file = "tima.log", clean_old_logs = TRUE, log_level = "info" )
target_pattern |
Character. Regex pattern for target selection. Default: "^(ann_wei|exp_mzt)$" |
log_file |
Character. Path to log file. Default: "tima.log" |
clean_old_logs |
Logical. Remove old log file before starting. Default: TRUE |
log_level |
Character or numeric. Logging verbosity level. Default: "info" |
This function is deprecated as of TIMA version 2.12.0 (November 2025).
It now simply calls run_tima with all arguments passed through,
but issues a deprecation warning.
Migration Guide:
Old: tima_full(target_pattern = "^(ann_wei|exp_mzt)$")
New: run_tima(target_pattern = "^(ann_wei|exp_mzt)$")
All parameters and behavior are identical between the two functions.
Invisible NULL (same as run_tima)
run_tima for the current function
Other workflow:
change_params_small(),
create_components(),
create_edges(),
create_edges_spectra(),
go_to_cache(),
install(),
install_tima(),
run_app(),
run_tima(),
validate_inputs()
## Not run: # DEPRECATED - Use run_tima() instead # tima_full() # RECOMMENDED run_tima() ## End(Not run)## Not run: # DEPRECATED - Use run_tima() instead # tima_full() # RECOMMENDED run_tima() ## End(Not run)
This function transforms SIRIUS CSI (Compound Structure Identification) scores using a sigmoid function. The transformation maps raw scores to a 0-1 range for better interpretability.
transform_score_sirius_csi(csi_score = NULL, K = 100, scale = 20)transform_score_sirius_csi(csi_score = NULL, K = 100, scale = 20)
csi_score |
numeric Numeric SIRIUS CSI score (expected mostly <= 0; can be negative, NA, NULL, or absent) |
K |
numeric Numeric shift parameter to adjust the sigmoid center (default: 100, midpoint at score = -100) |
scale |
numeric Numeric scale parameter controlling sigmoid steepness (default: 20) |
This is an experimental transformation not officially approved by SIRIUS developers. The sigmoid function is: 1 / (1 + exp(-(score + K) / scale))
SIRIUS CSI:FingerID scores are expected to be log-likelihood-like values
on (-Inf, 0], where values closer to 0 are better. A practical rule of
thumb is:
\itemize{
\item score > -10: excellent/awesome
\item score > -100: acceptable/okay
\item score <= -100: weak/low confidence
}
The defaults K = 100 and scale = 20 place the sigmoid midpoint at
score = -100 and strongly reward scores near 0:
\itemize{
\item score = -10 -> ~0.989
\item score = -100 -> 0.500
\item score = -200 -> ~0.007
}
Previous defaults (K = 50, scale = 10) placed the midpoint at -50 and
compressed the useful range to [-70, -30], mapping most realistic SIRIUS
hits to near-zero scores.
Numeric transformed score in the range (0, 1), or NA if input is NA/NULL/absent
## Not run: # Transform a single score transform_score_sirius_csi(csi_score = -100) # Transform with custom parameters transform_score_sirius_csi(csi_score = -100, K = 100, scale = 20) # Transformation scores <- c(-300, -100, -10, -1, 0) transform_score_sirius_csi(csi_score = scores) # Handle NA values scores_with_na <- c(-100, NA, -10, -300) transform_score_sirius_csi(csi_score = scores_with_na) # Handle missing/absent score transform_score_sirius_csi() ## End(Not run)## Not run: # Transform a single score transform_score_sirius_csi(csi_score = -100) # Transform with custom parameters transform_score_sirius_csi(csi_score = -100, K = 100, scale = 20) # Transformation scores <- c(-300, -100, -10, -1, 0) transform_score_sirius_csi(csi_score = scores) # Handle NA values scores_with_na <- c(-100, NA, -10, -300) transform_score_sirius_csi(csi_score = scores_with_na) # Handle missing/absent score transform_score_sirius_csi() ## End(Not run)
Standalone command to validate all input data before starting the TIMA pipeline. This helps catch issues early and avoid wasting time on library downloads and processing.
validate_inputs( features = NULL, spectra = NULL, metadata = NULL, sirius = NULL, filename_col = "filename", organism_col = "organism", feature_col = "feature_id" )validate_inputs( features = NULL, spectra = NULL, metadata = NULL, sirius = NULL, filename_col = "filename", organism_col = "organism", feature_col = "feature_id" )
features |
character Character path to features CSV/TSV file |
spectra |
character Character path to MGF spectra file |
metadata |
character Character path to metadata file |
sirius |
character Character path to SIRIUS output directory or ZIP file |
filename_col |
character Character name of filename column (default: "filename") |
organism_col |
character Character name of organism column (default: "organism") |
feature_col |
character Character name of feature ID column (default: "feature_id") |
Invisible TRUE if all checks pass, stops with error otherwise
Other workflow:
change_params_small(),
create_components(),
create_edges(),
create_edges_spectra(),
go_to_cache(),
install(),
install_tima(),
run_app(),
run_tima(),
tima_full()
## Not run: # Validate all inputs before starting pipeline validate_inputs( features = "data/features.csv", spectra = "data/spectra.mgf", sirius = "data/sirius_output" ) # Validate with metadata consistency check validate_inputs( features = "data/features.csv", metadata = "data/metadata.tsv" ) ## End(Not run)## Not run: # Validate all inputs before starting pipeline validate_inputs( features = "data/features.csv", spectra = "data/spectra.mgf", sirius = "data/sirius_output" ) # Validate with metadata consistency check validate_inputs( features = "data/features.csv", metadata = "data/metadata.tsv" ) ## End(Not run)
This function weights annotations.
weight_annotations( library = get_params(step = "weight_annotations")$files$libraries$sop$merged$keys, org_tax_ott = get_params(step = "weight_annotations")$files$libraries$sop$merged$organisms$taxonomies$ott, str_stereo = get_params(step = "weight_annotations")$files$libraries$sop$merged$structures$stereo, annotations = get_params(step = "weight_annotations")$files$annotations$filtered, canopus = get_params(step = "weight_annotations")$files$annotations$prepared$canopus, formula = get_params(step = "weight_annotations")$files$annotations$prepared$formula, components = get_params(step = "weight_annotations")$files$networks$spectral$components$prepared, edges = get_params(step = "weight_annotations")$files$networks$spectral$edges$prepared, taxa = get_params(step = "weight_annotations")$files$metadata$prepared, output = get_params(step = "weight_annotations")$files$annotations$processed, candidates_neighbors = get_params(step = "weight_annotations")$annotations$candidates$neighbors, candidates_final = get_params(step = "weight_annotations")$annotations$candidates$final, best_percentile = get_params(step = "weight_annotations")$annotations$candidates$best_percentile, weight_spectral = get_params(step = "weight_annotations")$weights$global$spectral, weight_chemical = get_params(step = "weight_annotations")$weights$global$chemical, weight_biological = get_params(step = "weight_annotations")$weights$global$biological, score_biological_domain = get_params(step = "weight_annotations")$weights$biological$domain, score_biological_kingdom = get_params(step = "weight_annotations")$weights$biological$kingdom, score_biological_phylum = get_params(step = "weight_annotations")$weights$biological$phylum, score_biological_class = get_params(step = "weight_annotations")$weights$biological$class, score_biological_order = get_params(step = "weight_annotations")$weights$biological$order, score_biological_infraorder = get_params(step = "weight_annotations")$weights$biological$infraorder, score_biological_family = get_params(step = "weight_annotations")$weights$biological$family, score_biological_subfamily = get_params(step = "weight_annotations")$weights$biological$subfamily, score_biological_tribe = get_params(step = "weight_annotations")$weights$biological$tribe, score_biological_subtribe = get_params(step = "weight_annotations")$weights$biological$subtribe, score_biological_genus = get_params(step = "weight_annotations")$weights$biological$genus, score_biological_subgenus = get_params(step = "weight_annotations")$weights$biological$subgenus, score_biological_species = get_params(step = "weight_annotations")$weights$biological$species, score_biological_subspecies = get_params(step = "weight_annotations")$weights$biological$subspecies, score_biological_variety = get_params(step = "weight_annotations")$weights$biological$variety, score_biological_biota = get_params(step = "weight_annotations")$weights$biological$biota, score_chemical_cla_kingdom = get_params(step = "weight_annotations")$weights$chemical$cla$kingdom, score_chemical_cla_superclass = get_params(step = "weight_annotations")$weights$chemical$cla$superclass, score_chemical_cla_class = get_params(step = "weight_annotations")$weights$chemical$cla$class, score_chemical_cla_parent = get_params(step = "weight_annotations")$weights$chemical$cla$parent, score_chemical_npc_pathway = get_params(step = "weight_annotations")$weights$chemical$npc$pathway, score_chemical_npc_superclass = get_params(step = "weight_annotations")$weights$chemical$npc$superclass, score_chemical_npc_class = get_params(step = "weight_annotations")$weights$chemical$npc$class, minimal_consistency = get_params(step = "weight_annotations")$annotations$thresholds$consistency, minimal_ms1_bio = get_params(step = "weight_annotations")$annotations$thresholds$ms1$biological, minimal_ms1_chemo = get_params(step = "weight_annotations")$annotations$thresholds$ms1$chemical, minimal_ms1_condition = get_params(step = "weight_annotations")$annotations$thresholds$ms1$condition, ms1_only = get_params(step = "weight_annotations")$annotations$ms1only, compounds_names = get_params(step = "weight_annotations")$options$compounds_names, high_evidence = get_params(step = "weight_annotations")$options$high_evidence, remove_ties = get_params(step = "weight_annotations")$options$remove_ties, summarize = get_params(step = "weight_annotations")$options$summarize, pattern = get_params(step = "weight_annotations")$files$pattern, force = get_params(step = "weight_annotations")$options$force, xrefs_file = NULL )weight_annotations( library = get_params(step = "weight_annotations")$files$libraries$sop$merged$keys, org_tax_ott = get_params(step = "weight_annotations")$files$libraries$sop$merged$organisms$taxonomies$ott, str_stereo = get_params(step = "weight_annotations")$files$libraries$sop$merged$structures$stereo, annotations = get_params(step = "weight_annotations")$files$annotations$filtered, canopus = get_params(step = "weight_annotations")$files$annotations$prepared$canopus, formula = get_params(step = "weight_annotations")$files$annotations$prepared$formula, components = get_params(step = "weight_annotations")$files$networks$spectral$components$prepared, edges = get_params(step = "weight_annotations")$files$networks$spectral$edges$prepared, taxa = get_params(step = "weight_annotations")$files$metadata$prepared, output = get_params(step = "weight_annotations")$files$annotations$processed, candidates_neighbors = get_params(step = "weight_annotations")$annotations$candidates$neighbors, candidates_final = get_params(step = "weight_annotations")$annotations$candidates$final, best_percentile = get_params(step = "weight_annotations")$annotations$candidates$best_percentile, weight_spectral = get_params(step = "weight_annotations")$weights$global$spectral, weight_chemical = get_params(step = "weight_annotations")$weights$global$chemical, weight_biological = get_params(step = "weight_annotations")$weights$global$biological, score_biological_domain = get_params(step = "weight_annotations")$weights$biological$domain, score_biological_kingdom = get_params(step = "weight_annotations")$weights$biological$kingdom, score_biological_phylum = get_params(step = "weight_annotations")$weights$biological$phylum, score_biological_class = get_params(step = "weight_annotations")$weights$biological$class, score_biological_order = get_params(step = "weight_annotations")$weights$biological$order, score_biological_infraorder = get_params(step = "weight_annotations")$weights$biological$infraorder, score_biological_family = get_params(step = "weight_annotations")$weights$biological$family, score_biological_subfamily = get_params(step = "weight_annotations")$weights$biological$subfamily, score_biological_tribe = get_params(step = "weight_annotations")$weights$biological$tribe, score_biological_subtribe = get_params(step = "weight_annotations")$weights$biological$subtribe, score_biological_genus = get_params(step = "weight_annotations")$weights$biological$genus, score_biological_subgenus = get_params(step = "weight_annotations")$weights$biological$subgenus, score_biological_species = get_params(step = "weight_annotations")$weights$biological$species, score_biological_subspecies = get_params(step = "weight_annotations")$weights$biological$subspecies, score_biological_variety = get_params(step = "weight_annotations")$weights$biological$variety, score_biological_biota = get_params(step = "weight_annotations")$weights$biological$biota, score_chemical_cla_kingdom = get_params(step = "weight_annotations")$weights$chemical$cla$kingdom, score_chemical_cla_superclass = get_params(step = "weight_annotations")$weights$chemical$cla$superclass, score_chemical_cla_class = get_params(step = "weight_annotations")$weights$chemical$cla$class, score_chemical_cla_parent = get_params(step = "weight_annotations")$weights$chemical$cla$parent, score_chemical_npc_pathway = get_params(step = "weight_annotations")$weights$chemical$npc$pathway, score_chemical_npc_superclass = get_params(step = "weight_annotations")$weights$chemical$npc$superclass, score_chemical_npc_class = get_params(step = "weight_annotations")$weights$chemical$npc$class, minimal_consistency = get_params(step = "weight_annotations")$annotations$thresholds$consistency, minimal_ms1_bio = get_params(step = "weight_annotations")$annotations$thresholds$ms1$biological, minimal_ms1_chemo = get_params(step = "weight_annotations")$annotations$thresholds$ms1$chemical, minimal_ms1_condition = get_params(step = "weight_annotations")$annotations$thresholds$ms1$condition, ms1_only = get_params(step = "weight_annotations")$annotations$ms1only, compounds_names = get_params(step = "weight_annotations")$options$compounds_names, high_evidence = get_params(step = "weight_annotations")$options$high_evidence, remove_ties = get_params(step = "weight_annotations")$options$remove_ties, summarize = get_params(step = "weight_annotations")$options$summarize, pattern = get_params(step = "weight_annotations")$files$pattern, force = get_params(step = "weight_annotations")$options$force, xrefs_file = NULL )
library |
Library containing the keys |
org_tax_ott |
File containing organisms taxonomy (OTT) |
str_stereo |
File containing structures stereo |
annotations |
Prepared annotations file |
canopus |
Prepared canopus file |
formula |
Prepared formula file |
components |
Prepared components file |
edges |
Prepared edges file |
taxa |
Prepared taxed features file |
output |
Output file |
candidates_neighbors |
Number of neighbors candidates to keep |
candidates_final |
Number of final candidates to keep |
best_percentile |
Numeric percentile threshold (0-1) for selecting top candidates within each feature (default: 0.9). Used for consistent filtering between mini and filtered outputs. |
weight_spectral |
Weight for the spectral score |
weight_chemical |
Weight for the biological score |
weight_biological |
Weight for the chemical consistency score |
score_biological_domain |
Score for a |
score_biological_kingdom |
Score for a |
score_biological_phylum |
Score for a |
score_biological_class |
Score for a |
score_biological_order |
Score for a |
score_biological_infraorder |
Score for a |
score_biological_family |
Score for a |
score_biological_subfamily |
Score for a |
score_biological_tribe |
Score for a |
score_biological_subtribe |
Score for a |
score_biological_genus |
Score for a |
score_biological_subgenus |
Score for a |
score_biological_species |
Score for a |
score_biological_subspecies |
Score for a |
score_biological_variety |
Score for a |
score_biological_biota |
Score for a |
score_chemical_cla_kingdom |
Score for a |
score_chemical_cla_superclass |
Score for a |
score_chemical_cla_class |
Score for a |
score_chemical_cla_parent |
Score for a |
score_chemical_npc_pathway |
Score for a |
score_chemical_npc_superclass |
Score for a |
score_chemical_npc_class |
Score for a |
minimal_consistency |
Minimal consistency score for a class. FLOAT |
minimal_ms1_bio |
Minimal biological score to keep MS1 based annotation |
minimal_ms1_chemo |
Minimal chemical score to keep MS1 based annotation |
minimal_ms1_condition |
Condition to be used. Must be "OR" or "AND". |
ms1_only |
Keep only MS1 annotations. BOOLEAN |
compounds_names |
Report compounds names. Can be very large. BOOLEAN |
high_evidence |
Report high evidence candidates only. BOOLEAN |
remove_ties |
Remove ties. BOOLEAN |
summarize |
Summarize results (1 row per feature). BOOLEAN |
pattern |
Pattern to identify your job. STRING |
force |
Force parameters. Use it at your own risk |
xrefs_file |
Optional character path to xrefs file from
|
The path to the weighted annotations
annotate_masses weight_bio weight_chemo
Other annotation:
annotate_masses(),
annotate_spectra(),
filter_annotations(),
write_mztab()
## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) library <- get_params(step = "weight_annotations")$files$libraries$sop$merged$keys |> gsub( pattern = ".gz", replacement = "", fixed = TRUE ) org_tax_ott <- paste0( "data/interim/libraries/", "sop/merged/organisms/taxonomies/ott.tsv" ) str_stereo <- paste0( "data/interim/libraries/", "sop/merged/structures/stereo.tsv" ) annotations <- paste0( "data/interim/annotations/", "example_annotationsFiltered.tsv" ) canopus <- paste0( "data/interim/annotations/", "example_canopusPrepared.tsv" ) formula <- paste0( "data/interim/annotations/", "example_formulaPrepared.tsv" ) components <- paste0( "data/interim/features/", "example_componentsPrepared.tsv" ) edges <- paste0( "data/interim/features/", "example_edges.tsv" ) taxa <- paste0( "data/interim/taxa/", "example_taxed.tsv" ) get_file(url = paste0(dir, library), export = library) get_file(url = paste0(dir, org_tax_ott), export = org_tax_ott) get_file(url = paste0(dir, str_stereo), export = str_stereo) get_file(url = paste0(dir, annotations), export = annotations) get_file(url = paste0(dir, canopus), export = canopus) get_file(url = paste0(dir, formula), export = formula) get_file(url = paste0(dir, components), export = components) get_file(url = paste0(dir, edges), export = edges) get_file(url = paste0(dir, taxa), export = taxa) weight_annotations( library = library, org_tax_ott = org_tax_ott, str_stereo = str_stereo, annotations = annotations, canopus = canopus, formula = formula, components = components, edges = edges, taxa = taxa ) unlink("data", recursive = TRUE) ## End(Not run)## Not run: copy_backbone() go_to_cache() github <- "https://raw.githubusercontent.com/" repo <- "taxonomicallyinformedannotation/tima-example-files/main/" dir <- paste0(github, repo) library <- get_params(step = "weight_annotations")$files$libraries$sop$merged$keys |> gsub( pattern = ".gz", replacement = "", fixed = TRUE ) org_tax_ott <- paste0( "data/interim/libraries/", "sop/merged/organisms/taxonomies/ott.tsv" ) str_stereo <- paste0( "data/interim/libraries/", "sop/merged/structures/stereo.tsv" ) annotations <- paste0( "data/interim/annotations/", "example_annotationsFiltered.tsv" ) canopus <- paste0( "data/interim/annotations/", "example_canopusPrepared.tsv" ) formula <- paste0( "data/interim/annotations/", "example_formulaPrepared.tsv" ) components <- paste0( "data/interim/features/", "example_componentsPrepared.tsv" ) edges <- paste0( "data/interim/features/", "example_edges.tsv" ) taxa <- paste0( "data/interim/taxa/", "example_taxed.tsv" ) get_file(url = paste0(dir, library), export = library) get_file(url = paste0(dir, org_tax_ott), export = org_tax_ott) get_file(url = paste0(dir, str_stereo), export = str_stereo) get_file(url = paste0(dir, annotations), export = annotations) get_file(url = paste0(dir, canopus), export = canopus) get_file(url = paste0(dir, formula), export = formula) get_file(url = paste0(dir, components), export = components) get_file(url = paste0(dir, edges), export = edges) get_file(url = paste0(dir, taxa), export = taxa) weight_annotations( library = library, org_tax_ott = org_tax_ott, str_stereo = str_stereo, annotations = annotations, canopus = canopus, formula = formula, components = components, edges = edges, taxa = taxa ) unlink("data", recursive = TRUE) ## End(Not run)
Exports TIMA weighted-annotation results to mzTab-M 2.1.0 plain-text format. The output is a compliant mzTab-M file containing:
MTD – metadata (software, database, instrument, evidence measures, ms_run, sample, assay, study_variable).
SMF – one row per chromatographic feature (feature_id, m/z, RT).
SME – one row per identification evidence (candidate annotation).
SML – one row per unique compound, linking all associated SMF and SME rows.
write_mztab( input = get_params(step = "write_mztab")$files$annotations$processed, output = get_params(step = "write_mztab")$files$output$mztab, ms_run_location = "null", ms_run_format = "null", ms_run_id_format = "null", polarity = NULL, instrument = NULL, sample_name = NULL, publication = NULL, title = "TIMA annotation results", description = paste0("Annotation results produced by Taxonomically Informed ", "Metabolomics Annotation (TIMA)."), software_version = as.character(utils::packageVersion("tima")), contact = NULL, xrefs_file = NULL, edges_file = NULL, base_mztab = NULL )write_mztab( input = get_params(step = "write_mztab")$files$annotations$processed, output = get_params(step = "write_mztab")$files$output$mztab, ms_run_location = "null", ms_run_format = "null", ms_run_id_format = "null", polarity = NULL, instrument = NULL, sample_name = NULL, publication = NULL, title = "TIMA annotation results", description = paste0("Annotation results produced by Taxonomically Informed ", "Metabolomics Annotation (TIMA)."), software_version = as.character(utils::packageVersion("tima")), contact = NULL, xrefs_file = NULL, edges_file = NULL, base_mztab = NULL )
input |
|
output |
|
ms_run_location |
|
ms_run_format |
|
ms_run_id_format |
|
polarity |
|
instrument |
|
sample_name |
|
publication |
|
title |
|
description |
|
software_version |
|
contact |
|
xrefs_file |
|
edges_file |
|
base_mztab |
|
The function intentionally writes in Summary mode because TIMA
is an annotation/prioritization tool and does not guarantee complete
quantification matrices. Fields that have no TIMA equivalent (e.g.
full InChI, spectra_ref) are written as null.
TIMA columns are mapped to canonical mzTab fields where a direct
equivalent exists; only truly unmapped columns fall back to
opt_global_* to keep downstream consumers happy:
SME section – candidate-level columns with no canonical field
(e.g. SIRIUS subscores, similarity forward/reverse, m/z error)
become opt_global_*. Feature-level columns are not repeated
here; they belong to the SMF section.
SMF section – all feature_* columns beyond feature_mz and
feature_rt (spectrum entropy, spectrum peaks, predicted taxonomy
class/NPC scores …) are exported as opt_global_* in the SMF row,
making them available without polluting SME.
Reliability levels follow the Metabolomics Standards Initiative (MSI) scale:
1 – confirmed (score >= 0.7 with spectral library evidence)
2 – probable (score >= 0.5; or spectral match, any score)
3 – putative (score >= 0.2)
4 – unambiguous compound class only (everything else)
Four TIMA-specific evidence measures are exported as
id_confidence_measure[1..4] in the MTD section and as additional
columns in the SME section. All use the TIMA user-controlled CV
namespace (no PSI-MS accession exists for these composite scores):
[1] – score_final (combined TIMA score; TIMA:001)
[2] – score_biological (taxonomic score; TIMA:002)
[3] – score_chemical (chemical consistency; TIMA:003)
[4] – candidate_score_similarity (spectral similarity;
TIMA:004; omitted when no spectral evidence is present)
ms_level uses PSI-MS accessions MS:1000579 (MS1) and
MS:1000580 (MS2).
scan_polarity uses MS:1000130 (positive) and MS:1000129
(negative) when polarity is supplied.
retention_time_in_seconds is declared in
colunit-small_molecule_feature with UO accession UO:0000010.
theoretical_neutral_mass is declared with UO:0000221 (dalton).
spectra_ref is formatted as ms_run[1]:{spectrum_native_id}.
instrument[1] is populated from the instrument parameter using a
PSI-MS CV Param when provided.
quantification_method is set to
[MS, MS:1001834, LC-MS label-free quantitation analysis, ] for
untargeted metabolomics (per PSI-MS ontology).
assay[1]-quantification_reagent is set to
[MS, MS:1002038, unlabeled sample, ] (no labelling used by TIMA).
sample[1] defaults to a metabolite mixture Param; can be overridden
via the sample_name parameter.
publication emits a formatted citation when publication is supplied.
The software entry includes the TIMA repository URL as a
software[1]-setting for machine-readable provenance.
Character path to the written mzTab-M file (invisibly).
Other annotation:
annotate_masses(),
annotate_spectra(),
filter_annotations(),
weight_annotations()
## Not run: write_mztab( input = "annotations.tsv", output = "annotations.mztab" ) ## End(Not run)## Not run: write_mztab( input = "annotations.tsv", output = "annotations.mztab" ) ## End(Not run)