Skip to content

Commit

Permalink
Add a STYLE_GUIDE for DALI, adjust Kernel example (NVIDIA#1167)
Browse files Browse the repository at this point in the history
* Add a STYLE_GUIDE for DALI, adjust Kernel example
* Change the doxygen format to the chosen one

Signed-off-by: Krzysztof Lecki <[email protected]>
  • Loading branch information
klecki authored Aug 19, 2019
1 parent a9989e0 commit e3679b4
Show file tree
Hide file tree
Showing 37 changed files with 1,618 additions and 1,207 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Contribution rules

- While we have not yet documented a strict code style convention, please follow the existing conventions in the relevant file, submodule, module, and project when you add new code or when you extend/fix existing functionality. To ensure some degree of style consistency, use `make lint`. It applies [Google Style](https://google.github.io/styleguide/cppguide.html) rules to the following:
- DALI Coding Style Guide can be found [here](STYLE_GUIDE.md). We follow [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) with few exceptions and additional guidelines regarding DALI-specific cases. See the DALI Coding Style Guide for details. When no rules can be found, follow the already occuring conventions. If there is no precedence in our codebase we are open to discussion. Ensure that your contribution passes the `make lint` check. It applies rules to the following:
- class, function/method, and variable/field naming
- comment style
- indentation
Expand Down
55 changes: 55 additions & 0 deletions STYLE_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# DALI Coding Style Guide

This document describes DALI Coding Style Guide. Rules specified here take precedence
over [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) which should
be followed in the remaining cases.

The code should always pass the current `make lint` check.

## Changes compared to Google C++ Style Guide

Google C++ Style Guide is the default *style* guide. In places where it limits use of common
C++ idioms or language features it is discouarged.

### C++ Version

DALI uses C++14 standard as this is the most recent version supported with CUDA.

### Line length

We use a line length limit equal to 100.

### Reference arguments

Parameters can be passed as non-const lvalue reference. [Google rule](https://google.github.io/styleguide/cppguide.html#Reference_Arguments)
prohibits semantically valid restriction of not passing null pointer
and introduces ugly code like `foo(&bar)` or `(*buf)[i]`.

## DALI specific rules

### DALI Kernels argument order

DALI Kernels follow order of Outputs, Inputs, Arguments - where Output and Inputs are
expected to be views to Tensors (TensorLists) and Arguments are other inputs.

The same order should be maintained for Kernel template arguments.
See [the example](dali/kernels/kernel.h) kernel implementation for details.

The order of the arguments is following memcpy semantics.

### Documentation

DALI uses Doxygen for C++ code documentation with Javadoc-styled comments:

```
/**
* ... text ...
*/
```


## Unspecified cases

When the style is left unspecified please follow the one used most in the current codebase.
If there is no precedence in the codebase, we are open to discussion, but we hold the final
word to avoid endless discussion in that matter.
20 changes: 11 additions & 9 deletions dali/kernels/common/block_setup.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,17 @@ std::enable_if_t<(skip >= 0), TensorShape<n-1>> skip_dim(const TensorShape<n> &s
return shape_cat(shape.template first<skip>(), shape.template last<n-skip-1>());
}

/// @brief A utility for calculating block layout for GPU kernels
/// @tparam _ndim number dimensions to take into account while calculating the layout
/// @tparam _channel_dim dimension in which channels are stored; channel dimension does not
/// participate in layout calculation \n
/// In cases where channel dimension can or should participate in layout
/// calaculation, do not specify channel dimenion and treat it as an
/// additional spatial dimension (e.g. for linear operations in CHW layout)\n
/// -1 indicates there are only spatial dimensions, all of which
/// participate in layout calculation.
/**
* @brief A utility for calculating block layout for GPU kernels
* @tparam _ndim number dimensions to take into account while calculating the layout
* @tparam _channel_dim dimension in which channels are stored; channel dimension does not
* participate in layout calculation \n
* In cases where channel dimension can or should participate in layout
* calaculation, do not specify channel dimenion and treat it as an
* additional spatial dimension (e.g. for linear operations in CHW layout)\n
* -1 indicates there are only spatial dimensions, all of which
* participate in layout calculation.
*/
template <int _ndim, int _channel_dim>
class BlockSetup {
public:
Expand Down
32 changes: 22 additions & 10 deletions dali/kernels/common/scatter_gather.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ struct CopyRange {
size_t Coalesce(span<CopyRange> ranges);
} // namespace detail

/// Implements a device-to-device batch copy of multiple sources to multiple destinations
/**
* Implements a device-to-device batch copy of multiple sources to multiple destinations
*/
class DLL_PUBLIC ScatterGatherGPU {
public:
static constexpr size_t kDefaultBlockSize = 64<<10;
Expand All @@ -62,7 +64,9 @@ class DLL_PUBLIC ScatterGatherGPU {
blocks_.clear();
}

/// @brief Adds one copy to the batch
/**
* @brief Adds one copy to the batch
*/
void AddCopy(void *dst, const void *src, size_t size) {
if (size > 0) {
ranges_.push_back({
Expand All @@ -73,28 +77,36 @@ class DLL_PUBLIC ScatterGatherGPU {
}
}

/// @brief Executes the copies
/// @param stream - the cudaStream on which the copies are scheduled
/// @param reset - if true, calls Reset after processing is over
/// @param useMemcpyOnly - if true, all copies are executed using cudaMemcpy;
/// otherwise a batched kernel is used if there are more than 2 ranges
/**
* @brief Executes the copies
* @param stream - the cudaStream on which the copies are scheduled
* @param reset - if true, calls Reset after processing is over
* @param useMemcpyOnly - if true, all copies are executed using cudaMemcpy;
* otherwise a batched kernel is used if there are more than 2 ranges
*/
DLL_PUBLIC void Run(cudaStream_t stream, bool reset = true, bool useMemcpyOnly = false);

using CopyRange = detail::CopyRange;

private:
std::vector<CopyRange> ranges_;

/// @brief Sorts and merges contiguous ranges
/**
* @brief Sorts and merges contiguous ranges
*/
void Coalesce() {
size_t n = detail::Coalesce(make_span(ranges_.data(), ranges_.size()));
ranges_.resize(n);
}

/// @brief Divides ranges so they don't exceed `max_block_size_`
/**
* @brief Divides ranges so they don't exceed `max_block_size_`
*/
void MakeBlocks();

/// @brief Reserves GPU memory for the description of the blocks.
/**
* @brief Reserves GPU memory for the description of the blocks.
*/
void ReserveGPUBlocks();

size_t max_size_per_block_ = kDefaultBlockSize;
Expand Down
36 changes: 25 additions & 11 deletions dali/kernels/context.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,21 +45,29 @@ template <typename... Collections>
std::tuple<std::remove_cv_t<element_t<Collections>>*...>
ToContiguousGPUMem(Scratchpad &scratchpad, cudaStream_t stream, const Collections &... c);

/// @brief Interface for kernels to obtain auxiliary working memory
/**
* @brief Interface for kernels to obtain auxiliary working memory
*/
class Scratchpad {
public:
/// @brief Allocates `bytes` bytes of memory in `alloc_type`, with specified `alignment`
/**
* @brief Allocates `bytes` bytes of memory in `alloc_type`, with specified `alignment`
*/
virtual void *Alloc(AllocType alloc_type, size_t bytes, size_t alignment) = 0;

/// @brief Allocates storage for a Tensor of elements `T` and given `shape`
/// in the memory of type `alloc_type`.
/**
* @brief Allocates storage for a Tensor of elements `T` and given `shape`
* in the memory of type `alloc_type`.
*/
template <AllocType alloc_type, typename T, size_t dim>
TensorView<AllocBackend<alloc_type>, T, dim> AllocTensor(TensorShape<dim> shape) {
return { Allocate<T>(alloc_type, volume(shape)), std::move(shape) };
}

/// @brief Allocates storage for a TensorList of elements `T` and given `shape`
/// in the memory of type `alloc_type`.
/**
* @brief Allocates storage for a TensorList of elements `T` and given `shape`
* in the memory of type `alloc_type`.
*/
template <AllocType alloc_type, typename T, size_t dim>
TensorListView<AllocBackend<alloc_type>, T, dim>
AllocTensorList(const std::vector<TensorShape<dim>> &shape) {
Expand All @@ -68,8 +76,10 @@ class Scratchpad {
return tlv;
}

/// @brief Allocates storage for a TensorList of elements `T` and given `shape`
/// in the memory of type `alloc_type`.
/**
* @brief Allocates storage for a TensorList of elements `T` and given `shape`
* in the memory of type `alloc_type`.
*/
template <AllocType alloc_type, typename T, size_t dim>
TensorListView<AllocBackend<alloc_type>, T, dim>
AllocTensorList(TensorListShape<dim> shape) {
Expand All @@ -78,8 +88,10 @@ class Scratchpad {
return tlv;
}

/// @brief Allocates memory suitable for storing `count` items of type `T` in the
/// memory of type `alloc_type`.
/**
* @brief Allocates memory suitable for storing `count` items of type `T` in the
* memory of type `alloc_type`.
*/
template <typename T>
T *Allocate(AllocType alloc_type, size_t count, size_t alignment = alignof(T)) {
return reinterpret_cast<T*>(Alloc(alloc_type, count*sizeof(T), alignment));
Expand Down Expand Up @@ -138,7 +150,9 @@ struct KernelContext {
CPUContext cpu;
GPUContext gpu;

/// @brief Caller-provided allocator for temporary data.
/**
* @brief Caller-provided allocator for temporary data.
*/
Scratchpad *scratchpad;
};

Expand Down
24 changes: 14 additions & 10 deletions dali/kernels/imgproc/resample/bilinear_impl.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@ namespace kernels {

namespace {

/// @brief Implements horizontal resampling for a custom ROI
/// @param x0 - start column, in output coordinates
/// @param x1 - end column (exclusive), in output coordinates
/// @param y0 - start row
/// @param y1 - end row (exclusive)
/**
* @brief Implements horizontal resampling for a custom ROI
* @param x0 - start column, in output coordinates
* @param x1 - end column (exclusive), in output coordinates
* @param y0 - start row
* @param y1 - end row (exclusive)
*/
template <int static_channels = -1, typename Dst, typename Src>
__device__ void LinearHorz_Channels(
int x0, int x1, int y0, int y1,
Expand Down Expand Up @@ -94,11 +96,13 @@ __device__ void LinearHorz(
)); // NOLINT
}

/// @brief Implements vertical resampling for a custom ROI
/// @param x0 - start column, in output coordinates
/// @param x1 - end column (exclusive), in output coordinates
/// @param y0 - start row
/// @param y1 - end row (exclusive)
/**
* @brief Implements vertical resampling for a custom ROI
* @param x0 - start column, in output coordinates
* @param x1 - end column (exclusive), in output coordinates
* @param y0 - start row
* @param y1 - end row (exclusive)
*/
template <typename Dst, typename Src>
__device__ void LinearVert(
int x0, int x1, int y0, int y1,
Expand Down
8 changes: 6 additions & 2 deletions dali/kernels/imgproc/resample/params.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,16 @@ struct FilterDesc {
float radius = 0;
};

/// @brief Resampling parameters for 1 dimension
/**
* @brief Resampling parameters for 1 dimension
*/
struct ResamplingParams {
FilterDesc min_filter, mag_filter;
int output_size = KeepOriginalSize;

/// @brief 1D region of interest
/**
* @brief 1D region of interest
*/
struct ROI {
ROI() = default;
ROI(float start, float end) : use_roi(true), start(start), end(end) {}
Expand Down
62 changes: 33 additions & 29 deletions dali/kernels/imgproc/resample/resampling_impl.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -31,25 +31,27 @@ namespace resample_shared {
extern __shared__ float coeffs[];
};

/// @brief Implements horizontal resampling for a custom ROI
/// @param x0 - start column, in output coordinates
/// @param x1 - end column (exclusive), in output coordinates
/// @param y0 - start row
/// @param y1 - end row (exclusive)
/// @tparam static_channels - number of channels, if known at compile time
///
/// The function fills the output in block-sized vertical spans.
/// Block horizontal size is warp-aligned.
/// Filter coefficients are pre-calculated for each vertical span to avoid
/// recalculating them for each row, and stored in a shared memory block.
///
/// The function follows different code paths for static and dynamic number of channels.
/// For the dynamic, the innermost loop goes over filter taps, which eliminates the need
/// for thread-local memory to store intermediate sums. This allows processing arbitrary
/// number of channels.
/// For static number of channels, the run-time parameter `channels` is ignored and
/// there's also a local temporary storage for a tap sum for each channel. This is faster,
/// but requires extra registers for the intermediate sums.
/**
* @brief Implements horizontal resampling for a custom ROI
* @param x0 - start column, in output coordinates
* @param x1 - end column (exclusive), in output coordinates
* @param y0 - start row
* @param y1 - end row (exclusive)
* @tparam static_channels - number of channels, if known at compile time
*
* The function fills the output in block-sized vertical spans.
* Block horizontal size is warp-aligned.
* Filter coefficients are pre-calculated for each vertical span to avoid
* recalculating them for each row, and stored in a shared memory block.
*
* The function follows different code paths for static and dynamic number of channels.
* For the dynamic, the innermost loop goes over filter taps, which eliminates the need
* for thread-local memory to store intermediate sums. This allows processing arbitrary
* number of channels.
* For static number of channels, the run-time parameter `channels` is ignored and
* there's also a local temporary storage for a tap sum for each channel. This is faster,
* but requires extra registers for the intermediate sums.
*/
template <int static_channels = -1, typename Dst, typename Src>
__device__ void ResampleHorz_Channels(
int x0, int x1, int y0, int y1,
Expand Down Expand Up @@ -140,16 +142,18 @@ __device__ void ResampleHorz_Channels(
}
}

/// @brief Implements vertical resampling for a custom ROI
/// @param x0 - start column, in output coordinates
/// @param x1 - end column (exclusive), in output coordinates
/// @param y0 - start row
/// @param y1 - end row (exclusive)
/// @tparam static_channels - number of channels, if known at compile time
///
/// The function fills the output in block-sized horizontal spans.
/// Filter coefficients are pre-calculated for each horizontal span to avoid
/// recalculating them for each column, and stored in a shared memory block.
/**
* @brief Implements vertical resampling for a custom ROI
* @param x0 - start column, in output coordinates
* @param x1 - end column (exclusive), in output coordinates
* @param y0 - start row
* @param y1 - end row (exclusive)
* @tparam static_channels - number of channels, if known at compile time
*
* The function fills the output in block-sized horizontal spans.
* Filter coefficients are pre-calculated for each horizontal span to avoid
* recalculating them for each column, and stored in a shared memory block.
*/
template <int static_channels = -1, typename Dst, typename Src>
__device__ void ResampleVert_Channels(
int x0, int x1, int y0, int y1,
Expand Down
20 changes: 11 additions & 9 deletions dali/kernels/imgproc/resample/resampling_impl_cpu.h
Original file line number Diff line number Diff line change
Expand Up @@ -173,15 +173,17 @@ inline void ResampleAxis(Surface2D<Out> out, Surface2D<In> in,
assert(!"Invalid axis index");
}

/// @brief Resamples `in` using Nearest Neighbor interpolation and stores result in `out`
/// @param out - output surface
/// @param in - input surface
/// @param src_x0 - starting X coordinate of input
/// @param src_y0 - starting Y coordinate of input
/// @param scale_x - step of X input coordinate taken for each output pixel
/// @param scale_y - step of Y input coordinate taken for each output row
/// @remarks The function clamps input coordinates to fit in range defined by `in` dimensions.
/// Scales can be negative to achieve flipping.
/**
* @brief Resamples `in` using Nearest Neighbor interpolation and stores result in `out`
* @param out - output surface
* @param in - input surface
* @param src_x0 - starting X coordinate of input
* @param src_y0 - starting Y coordinate of input
* @param scale_x - step of X input coordinate taken for each output pixel
* @param scale_y - step of Y input coordinate taken for each output row
* @remarks The function clamps input coordinates to fit in range defined by `in` dimensions.
* Scales can be negative to achieve flipping.
*/
template <typename Out, typename In>
void ResampleNN(Surface2D<Out> out, Surface2D<const In> in,
float src_x0, float src_y0, float scale_x, float scale_y) {
Expand Down
Loading

0 comments on commit e3679b4

Please sign in to comment.