|Original author(s)||J. Griss, R. Wang|
|Operating system||Windows, Linux, Mac OSX|
The spectra-cluster-cli is a stand-alone, command-line version of the spectra-cluster algorithm to cluster MS/MS spectra. The spectra-cluster algorithm is used to create the PRIDE Cluster resource. The spectra-cluster-cli tool allows user to cluster their own MS/MS data on a normal computer.
- Clustering of MS/MS spectra (MGF format)
- Parallel processing of data
Note: Alternatively, a graphical user interface is also available at https://github.com/spectra-cluster/spectra-cluster-gui
Additionally, you can find a detailed example of how to prepare your data for clustering at http://spectra-cluster.github.io.
To run the spectra-cluster-cli Java needs to be installed on the computer.
- Download the latest release from https://github.com/spectra-cluster/spectra-cluster-cli/releases
- Unpack the downloaded zip file. No further installation is required
Preparing your data
Optionally, identification data can be incorporated into the MGF files using the
SEQ= tag. A tool to do this automatically is under active development and will be released shortly. The identification data does not influence the clustering process. If identification data is present in the used MGF files the corresponding data will automatically be written to the resulting .clustering files.
Performing the clustering
C:\>cd C:\Downloads\spectra-cluster-cli C:\Downloads\spectra-cluster-cli>java -jar spectra-cluster-cli-1.0-SNAPSHOT.jar -rounds 4 -threshold_end 0.99 -threshold_start 0.9999 -output_path C:\my_data\clustering_results.clustering C:\my_data\peaklists\*.mgf
- Open a command prompt. On windows this is achieved by opening the
rundialog, entering "cmd" and clicking enter (on Windows 7 simply click the windows key and immediately enter "cmd").
- Navigate to the directory where you downloaded the spectra-cluster-cli tool.
- Launch the spectra-cluster-cli through the command
java -jar spectra-cluster-cli-1.0-SNAPSHOT.jar(see above). Note: the
spectra-cluster-cli-1.0-SNAPSHOT.jarpart has to be adapted based on the downloaded version.
-binary_directory <arg> path to the directory to (temporarily) store the binary files. By default a temporary directory is being created -cluster_binary_file <arg> if this option is set, only the passed binary file will be clustered and the result written to the file specified in '-output_path' in the binary format -convert_cgf if this option is set the passed CGF file is converted into a .clustering file -fast_mode if this option is set the 'fast mode' is enabled. In this mode, the radical peak filtering used for the comparison function is already applied during spectrum conversion. Thereby, the clustering and consensus spectrum quality is slightly decreased but speed increases 2-3 fold. -fragment_tolerance fragment ion tolerance in m/z to use for fragment peak matching -help print this message. -keep_binary_files if this options is set, the binary files are not deleted after clustering. -major_peak_jobs <arg> number of threads to use for major peak clustering. -merge_binary_results if this option is set, the passed binary results files are merged into a single .cgf file and written to '-output_path' -output_path <arg> path to the outputfile. Outputfile must not exist. -precursor_tolerance <arg> precursor tolerance (clustering window size) in m/z used during matching. -reuse_binary_files if this option is set, the binary files found in the binary file directory will be used for clustering. -rounds <arg> number of clustering rounds to use. -threshold_end <arg> (lowest) final clustering threshold -threshold_start <arg> (highest) starting threshold -x_learn_cdf <output filename> (Experimental option) Learn the used cumulative distribution function directly from the processed data. This is only recommended for high-resolution data. The result will be written to the defined file. -x_load_cdf <CDF filename> (Experimental option) Loads the cumulative distribution function to use from the specified file. These files can be created using the x_learn_cdf parameter -x_min_comparisons <arg> (Experimental option) Sets the minimum number of comparisons used to calculate the probability that incorrect spectra are clustered. -x_n_prefiltered_peaks <number peaks> (Experimental option) Set the number of highest peaks that are kept per spectrum during loading.
Analysing clustering results
We are currently working on an analysis software for clustering results in the .clustering format.