-
Notifications
You must be signed in to change notification settings - Fork 0
Customer Interview Questions
What Questions should we ask in the e-mail?
- Any recommended libraries?
- You open the application. What should you see first? Which options should you have?
- How beautiful should the UI look like? Is there a color scheme (or a main color) you would prefer?
- What file formats should our application support?
- How are these files structured? Where is the spliced/unspliced data? Are these separate files? Or should the user be able to choose between different input alternatives?
- How exactly do you load your data? Using a file system interface? Drag and drop? Copy and paste the raw data? Or should everything be possible?
- If the loaded data is invalid, what should happen?
- Could you list the minimal set of required visualizations for this data?
- Could you list the minimal set of required statistics for this data?
- Easy assessment of different velocity estimation (model types, model parameters) - Could you elaborate?
- I guess it should be possible to save the output data. If that's the case, please explain what options a user has? Saving the visualizations individually? Saving them as images or as other file formats? And how should the statistics be saved? Please elaborate.
- What external documentation is required (e. g. a guide how to use the software)?
- How should we call the software?
- What is your opinion on prioritizing the requirements? On what should we focus first? What are must-haves, what nice-to-haves?
- Are there other things we should know?
Answer:
I would like to have a program where I have a menu for loading data, can specify which analysis I want to run and in the end get the results nicely presented. Right now I have my data as already aligned BAM files but in the future I may also get unaligned FASTQ files (from time to time someone already runs velocyto for me and then I just get loom or h5ad files). I’m not a programmer, so I don’t know any specifics but my colleagues always talk about anndata, velocyto (apparently kallisto or dropEst are alternatives) and scvelo.
Analysis which I want to run as soon possible are spliced/unspliced proportions, number of genes which are usable for velocity analysis, how well the model assumptions hold, a statistical evaluation of the computed velocities (mean, median, covariance, box plots), a nice looking visualisation of the velocity field (see attached), the identification of clusters and their velocity and an analysis of pseudotime.
I want something with a very easy to understand interface, which all my colleagues can use as well. It should be intuitive and straightforward to use (e.g. standard convenience functions like drag&drop and clear instructions on what to do if something goes wrong) and not requiring a lot of knowledge. I always want to be able to continue with my analysis after some time. Results should be easy to include in a paper without loosing quality.
Regarding the minimal set of required visualizations for this data, I would like to have outputs of the velocity gene list, and the criteria for their choice. For the velocity gene list it would be good to output the phase plots, i.e. as pdf but also the underlying data on tabular form.
For the fits of the transcription model it would be good to output the model parameters in tabular form, as well as versions of the phase plots indicating the fitted model (i.e. the linear fit of the velocyto model or the estimated trajectory for the scvelo model). Also, please output the goodness of fit mesures for these models, i.e. regresssion loss, confidence intervals and/or likelihood.
For the cell velocities it would be good to have projections (e.g. UMAP) of the single-cell transcriptomic data overlayed with single-cell velocities, alternatively also average velocities for groups of neighboring cells to achieve a less cluttered plot, as well as stream plots (see below example).
As for the easy assessment of different velocity estimation, it would be helpful to have a visualization of above visualizations side by side for different models and/or parametrizations to allow for a detailed comparison.
It’s very important for the management that all our software sticks to the corporate design which you find attached.
Most important for me is that I can run an analysis on this data myself as soon as possible: https://www.ebi.ac.uk/ena/browser/view/PRJEB43201 Starting from loading the data and getting some statistical analysis in the end.
For now it is difficult to prioritize above issues and I would try to addess these all with equal importance.
As for the software name, we can run with a name of your choice for now and decide on a final name in the end!