Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading model used for ReadIterator? #27

Open
jgans opened this issue Jan 14, 2020 · 9 comments
Open

Threading model used for ReadIterator? #27

jgans opened this issue Jan 14, 2020 · 9 comments

Comments

@jgans
Copy link

jgans commented Jan 14, 2020

I am currently using the ngs::ReadIterator to extract sequence data from SRA files. I notice that when I am reading sequence data from an SRA file, there is an extra thread that appears to be part of the SRA toolkit (i.e. under linux, top reports 200% CPU activity for my application). Where is this extra thread coming from, and it there a way to control the number of threads (e.g. environment variable, API call, etc.)?

Thanks!

@kwrodarmer
Copy link
Contributor

If you are using NGS, there is no SRA Toolkit involved. NGS is an API built upon VDB (sorry for all the acronyms), and VDB does create additional threads when accessing data. We have no configuration for enabling or disabling this.

Out of curiosity, what would be your use case for disabling it?

@jgans
Copy link
Author

jgans commented Jan 14, 2020

I am actually trying to enable it under MPI (which I am using to parallelize by analysis across multiple servers). When I run my program with mpirun with a single rank (i.e. "mpirun -np 1 ..."), the NGS API does not spawn an extra thread (i.e. CPU usage is 100%). If I run my program without mpirun, then the extra thread is spawned. I am already using a hybrid programming model that combines MPI and OpenMP without any issue (I have disabled processor affinity for this test). Finally, I have lots of data to read, and lots of cores per node, so I was wondering if I could further improve performance by getting NGS to utilize even more threads ...

@kwrodarmer
Copy link
Contributor

I see - I should have realized you wanted more threads rather than fewer.

It is possible to open multiple read iterators on different runs using different threads and run them in parallel. Alternatively, use more processes. In any event, we would be interested in hearing more about your experiments in this area!

@jgans
Copy link
Author

jgans commented Jan 14, 2020

I have tried to accelerate the reading of SRA files by using OpenMP to read non-overlapping slices of an SRA file in different threads. If I run with N OpenMP threads, I observe CPU activity in top that indicates N+1 cores are being used (I'm only using a subset of the available CPU cores on this machine). In addition, one of the OpenMP threads finishes much faster than the rest, which suggests that the NGS API is providing an extra thread to just one of the OpenMP threads (which leads to a significant load-imbalance). Can you point me to the code that controls how and when the "helper" thread is spawned when iterating through a ReadCollection?

@kwrodarmer
Copy link
Contributor

It may not be quite so simple (it never is). Are you opening several iterators? Or several ReadCollections? Internally, a VCursor is used which opens a background thread to help with blob decompression. Additionally (and quite dependent upon the software version), there is a thread that assists with background reading which is probably the most important piece.

The SRA has moved from a POSIX file system to a cloud-like object-store, and this has caused some performance issues. VDB (the underlying database system) is in the process of getting upgrades to try to reduce the impact there. Retrieving in parallel might not be the answer.

One of the things you can do to improve performance is to retrieve data series independently. For example, do not try to get both bases and qualities as a single row (at least not with current release software) because this can cause random access within the SRA object - and this is the problem we have with an object-store, because it's really, really bad about random access. fasterq-dump is faster exactly because it retrieves bases and qualities and names separately and then joins them into row-wise fastq on output.

@jgans
Copy link
Author

jgans commented Jan 14, 2020

I have tried (a) one global read collection and then spawn threads that each open one read iterator and (b) spawn threads that each open one read collection and one read iterator. The result is the same in both cases (i.e. N+1 threads are spawned). In my use case, I only need the sequence data (and am currently not reading the qualities or read names). Here is an example of how my code is reading sequences:

`#pragma omp parallel num_threads
{
const size_t num_thead = omp_get_num_threads();
const size_t tid = omp_get_thread_num();

ngs::ReadCollection run(  ncbi::NGS::openReadCollection("input_file.sra") );
			
const size_t num_read = run.getReadCount ();
const size_t chunk = max( size_t(1), num_read/num_thead );
			
size_t start = 1 + chunk*tid;
size_t stop = start + chunk;
			
if( tid == (num_thead - 1) ){
	stop = num_read;
}

    ngs::ReadIterator run_iter( run.getReadRange ( start, stop, ngs::Read::all ) );
			
while( run_iter.nextRead() ){
				
	if( run_iter.nextFragment() ){
					
		const string seq = run_iter.getFragmentBases().toString();
                    /*process sequence here*/
             }
     }

}
`

@jgans
Copy link
Author

jgans commented Jan 16, 2020

For the sake of correctness, the following code fixes the off-by-one error in the start/stop calculation and the incomplete reading of both reads in a pair:

#pragma omp parallel num_threads(NUM_SRA_THREADS)
{
	const size_t num_thread = omp_get_num_threads();
	const size_t tid = omp_get_thread_num();

	// Read from a local file downloaded using prefetch
	ngs::ReadCollection run(  ncbi::NGS::openReadCollection("filename.sra") );
	const size_t num_read = run.getReadCount(ngs::Read::all);
	const size_t chunk = max( size_t(1), num_read/num_thread );

	// Each thread is assigned an non-overlapping slice of the SRA
	// file to read
	const size_t start = 1 + chunk*tid;
	size_t stop = start + chunk - 1;
				
	if( tid == (num_thread - 1) ){
		stop = num_read;
	}
				
	ngs::ReadIterator run_iter( run.getReadRange ( start, stop, ngs::Read::all ) );
				
	while( run_iter.nextRead() ){
					
		while( run_iter.nextFragment() ){
			const string seq = run_iter.getFragmentBases().toString();
			/* Do stuff with the sequence */
		}
	}
}

@jgans
Copy link
Author

jgans commented Jan 16, 2020

Hopefully one last bug fix to the code I posted above -- I did not realize that getReadRange takes the starting read as the first argument and and the number of reads as the second argument (as opposed to the index of the last read to iterate over).

#pragma omp parallel num_threads(NUM_SRA_THREADS)
{
	const size_t num_thread = omp_get_num_threads();
	const size_t tid = omp_get_thread_num();

	// Read from a local file downloaded using prefetch
	ngs::ReadCollection run(  ncbi::NGS::openReadCollection("filename.sra") );
	const size_t num_read = run.getReadCount(ngs::Read::all);
	size_t chunk = num_read/num_thread ;

	// Each thread is assigned an non-overlapping slice of the SRA
	// file to read
	const size_t start = 1 + chunk*tid;
				
	if( tid == (num_thread - 1) ){
		chunk += num_read%num_thread;
	}
				
	ngs::ReadIterator run_iter( run.getReadRange ( start, chunk, ngs::Read::all ) );
				
	while( run_iter.nextRead() ){
					
		while( run_iter.nextFragment() ){
			const string seq = run_iter.getFragmentBases().toString();
			/* Do stuff with the sequence */
		}
	}
}

@jgans
Copy link
Author

jgans commented Jan 22, 2020

While the code I posted above works well for most SRA files, it crashes when trying to read SRA runs that contain aligned reads (i.e. have one or more associated files that start with ReSeq accessions and a file that has the ".vdbcache" file extension). The crash appears to happen after the last OpenMP thread has finished reading (in the destructor for the iterator or the ReadCollection).

Any insights where to look in the API code would be appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants