Skip to content

Commit

Permalink
first round of corrections from the reporters
Browse files Browse the repository at this point in the history
  • Loading branch information
JPenuchot committed Oct 21, 2024
1 parent d86c9f1 commit 5158739
Show file tree
Hide file tree
Showing 16 changed files with 157 additions and 128 deletions.
8 changes: 4 additions & 4 deletions 1-current-metaprogramming/1-metaprog-and-hpc-overview.tex
Original file line number Diff line number Diff line change
Expand Up @@ -217,9 +217,9 @@ \section{

In the next section, we will take a deeper look at \cpp as it has
powerful metaprogramming capabilities, while providing bleeding edge support for
all kinds of computer architectures through a decades-old ecosystem. Moreover,
its development is still very active, with many significant metaprogramming
proposals being adopted throughout the latest
releases\cite{10.1145/3564719.3568692}.
all kinds of computer architectures through a decades-old
ecosystem. Moreover, its development is still very active, with many significant
metaprogramming proposals being adopted throughout the latest releases
\cite{10.1145/3564719.3568692}.

\end{document}
13 changes: 7 additions & 6 deletions 1-current-metaprogramming/2-cpp-constructs.tex
Original file line number Diff line number Diff line change
Expand Up @@ -251,10 +251,10 @@ \subsection{
\end{lstlisting}

\subsection{
Compile time logic
Compile time computation
}

Compile time logic can be achieved in many \cpp constructs.
Compile time computation can be achieved through many \cpp constructs.

\begin{itemize}

Expand Down Expand Up @@ -456,7 +456,7 @@ \subsection{
enable the creation of high level, portable \glspl{dsel} that resolve into
high performance code thanks to a combination of metaprogramming techniques.
In the next section, we will see a collection of libraries that go beyond the
idea of using templates for math code generation, and implement or enable the
idea of using templates for math code generation, and enable or facilitate the
implementation of arbitrary compile-time programs.

\section{
Expand Down Expand Up @@ -519,7 +519,8 @@ \subsection{
All these libraries either enable \gls{tmp}, or use \gls{tmp} to achieve
a specific goal. However with the introduction of \gls{constexpr} programming,
a new range of compile-time libraries aims to provide new capabilities
for this new metaprogramming paradigm.
for this new metaprogramming paradigm, including the \cpp standard library
itself.

\subsection{
Value based metaprogramming
Expand Down Expand Up @@ -563,10 +564,10 @@ \subsection{
\lstinline{std::deque} is not usable in \glspl{consteval} whereas
its \textbf{C'est} equivalent named \lstinline{cest::deque} is.
It was instrumental for this thesis as the research work I present here started
a long time before \cpp23 was adopted and standard libraries as well as
before \cpp23 was adopted, and before standard libraries as well as
compilers started implementing it.

Similar to previous examples, listing \ref{lst:cest-example} shows a
Similar to the previous examples, listing \ref{lst:cest-example} shows a
compile-time program in which we find the index of the first element of value 6.
Note that in this example, we are using properly typed values and functions
instead of templates to represent values, predicates, and functions.
Expand Down
19 changes: 10 additions & 9 deletions 1-current-metaprogramming/3-gemv.tex
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ \section{
The efforts of optimizing the performance of \gls{blas} routines
fall into two main directions. The first direction is about
writing very specific assembly code. This is the case for
almost all the vendor libraries including Intel MKL\cite{hpcs1},
AMD ACML\cite{hpcs2} etc. To provide the users with efficient \gls{blas}
almost all the vendor libraries including Intel MKL \cite{hpcs1},
AMD ACML \cite{hpcs2} etc. To provide the users with efficient \gls{blas}
routines, the vendors usually implement their own routines
for their own hardware using assembly code with specific
optimizations which is a low level solution that gives the
Expand All @@ -51,12 +51,12 @@ \section{
abstraction level and the efficiency of the generated codes.
This is for example the case of the approach followed by
the Formal Linear Algebra Methods Environment (FLAME)
with the Libflame library\cite{hpcs3}. Thus, it offers a framework to
develop dense linear solvers using algorithmic skeletons\cite{hpcs4}
with the Libflame library \cite{hpcs3}. Thus, it offers a framework to
develop dense linear solvers using algorithmic skeletons \cite{hpcs4}
and an API which is more user-friendly than LAPACK, giving
satisfactory performance results. A more generic approach is
the one followed in recent years by \cpp libraries built around
expression templates\cite{hpcs5} or other generative programming\cite{hpcs6}
expression templates \cite{hpcs5} or other generative programming \cite{hpcs6}
principles. In this section, we will focus on such an approach.
To show the interest of this approach, we consider as
example the matrix-vector multiplication kernel (gemv) which
Expand All @@ -81,7 +81,7 @@ \section{
}

As we saw earlier, metaprogramming is used in \cpp \cite{hpcs9},D \cite{hpcs10},
OCaml\cite{hpcs11} or Haskell\cite{hpcs12}. A subset of basic notions emerges:
OCaml \cite{hpcs11} or Haskell \cite{hpcs12}. A subset of basic notions emerges:

\begin{itemize}
\item
Expand Down Expand Up @@ -155,8 +155,9 @@ \section{
or template arguments in a comma-separated code
fragment. Its main use was to provide the syntactic
support required to write a code with variadic template
arguments. However, Niebler and Parent showed that
this can be used to generate far more complex code
arguments. However, Niebler and Parent
% TODO: ref
showed that this can be used to generate far more complex code
when paired with other language constructs. Both
code replication and a crude form of code unrolling
were possible. However, it required the use of some
Expand Down Expand Up @@ -621,7 +622,7 @@ \subsection{
Again, the performances of our implementation are close
to that of OpenBLAS and are even quite better for matrices of
small sizes ranging from 4 to 16 elements. For example, for a
matrix of size 8 elements,the automatically generated code has
matrix of size 8 elements, the automatically generated code has
a performance that is 3 times better than the OpenBLAS \gls{gemv}
kernel (15.78 Gflop/s vs 5.06 Gflop/s). Two phenomenons
appear however. The first one is that the increased number
Expand Down
4 changes: 2 additions & 2 deletions 2-compilation-time-analysis/2-ctbench-design.tex
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ \section{
ctbench features
}

ctbench implements a new methodology for the analysis of compilation times:
ctbench implements a new method for the analysis of compilation times:
it allows users to define \cpp sizable benchmarks to analyze the scaling
performance of \cpp metaprogramming techniques, and compare techniques
against each other.
Expand Down Expand Up @@ -172,7 +172,7 @@ \subsection{
Note that JSON data is not stored directly.
This is intentional since a profiling file for a single benchmark repetition
can reach volumes up to several hundreds megabytes, therefore data loading
is delayed to prevent RAM overcomsumption.
is delayed to prevent RAM overconsumption.

\item

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ \section{

% TODO: dig this for references https://youtu.be/q6X9bKpAmnI

As mentioned in \ref{lbl:cpp-meta-constructs}, the \gls{constexpr}
As mentioned in section \ref{lbl:cpp-meta-constructs}, the \gls{constexpr}
allows variables and functions to be used in \gls{consteval},
making a whole subset of the \cpp language itself usable for compile-time logic.

Expand Down Expand Up @@ -73,8 +73,9 @@ \subsection{
// Function template foo takes a polymorphic NTTP
template<auto bar> constexpr int foo() { return 1; }

// generate's return value cannot be stored in a constexpr variable
// or used as a NTTP, but it can be used to produce other literal
// generate's return value cannot be stored
// in a constexpr variable or used as a NTTP,
// but it can be used to produce other literals

// constexpr auto a = generate(); // ERROR
constexpr auto b = generate().size(); // OK
Expand Down Expand Up @@ -133,7 +134,7 @@ \subsection{
The addition of \gls{constexpr} memory allocation goes hand in hand
with the ability to use virtual functions in \glspl{consteval}.
This feature allows calls to virtual functions in constant expressions
\cite{virtual-constexpr}. This allows heritage-based polymorphism in
\cite{virtual-constexpr}. This allows inheritance-based polymorphism in
\gls{constexpr} programming when used with \gls{constexpr} allocation of
polymorphic types.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,15 +72,15 @@ \section{
This section will cover the implementation of \gls{constexpr} \glspl{ast},
and techniques to work around the limitations that prevent their direct use as
\glspl{nttp} either through functional wrapping techniques, or through
their convertion of into values that satisfy \gls{nttp} requirements.
their convertion into values that satisfy \gls{nttp} requirements.

\subsection{
Code generation from pointer tree data structures
}

\label{lbl:ptr-tree-codegen}

In this subsection, we introduce three techniques that will allow us to use
In this section, we introduce three techniques that will allow us to use
a pointer tree generated from a \gls{constexpr} function as a template parameter
for code generation.

Expand All @@ -106,8 +106,8 @@ \subsection{
a \gls{nttp},
\end{itemize}

The compilation performance measurements in \ref{lbl:compile-time-eval} will
rely on the same data passing techniques, but with more complex examples such
The compilation performance measurements in section \ref{lbl:compile-time-eval}
will rely on the same data passing techniques, but with more complex examples such
as embedded compilation of Brainfuck programs, and of \LaTeX math formulae
into high performance math computation kernels.

Expand Down Expand Up @@ -173,7 +173,7 @@ \subsubsection{

The downside of using this value passing technique is that the
number of calls of the generator function is proportional to the
number of nodes. Experiments in \ref{lbl:compile-time-eval} highlight
number of nodes. Experiments in section \ref{lbl:compile-time-eval} highlight
the scaling issues induced by this code generation method.
And while it is very quick to implement, there are still difficulties
related to \gls{constexpr} memory constraints and compiler or library support.
Expand Down Expand Up @@ -310,7 +310,8 @@ \subsubsection{
type-based paradigms, even when \gls{constexpr} allocated memory is involved.

It is worth mentioning that both this technique and the previous one induce
very high compilation times, as we will see in \ref{lbl:bf-parsing-and-codegen}.
very high compilation times, as we will see in section
\ref{lbl:bf-parsing-and-codegen}.

\subsubsection{
FLAT - AST serialization
Expand Down Expand Up @@ -477,11 +478,11 @@ \subsection{
\end{figure}

Parsing algorithms may output serialized data. In this case, the serialization
step described in \ref{lbl:flat-technique} is not needed, and the result
step described in section \ref{lbl:flat-technique} is not needed, and the result
can be converted into a static array.
This makes the code generation process rather straightforward as no complicated
transformation is needed, while still scaling decently as we will see in
\ref{lbl:compile-time-eval} where we will be using a
section \ref{lbl:compile-time-eval} where we will be using a
Shunting Yard parser \cite{shunting-yard} to parse math formulae to a
\gls{rpn}, which is its postfix notation.

Expand Down
14 changes: 7 additions & 7 deletions 3-new-approaches-to-metaprogramming/3-brainfuck.tex
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ \section{
pointer trees generated by \gls{constexpr} functions, we will use them in the
context of compile time parsing and code generation for the Brainfuck language.
Therefore use data structures and code generation techniques introduced in
subsection \ref{lbl:ptr-tree-codegen}.
section \ref{lbl:ptr-tree-codegen}.

We chose Brainfuck as a first language for several reasons: the language
generates approximately one \gls{ast} node per character which makes the size
Expand Down Expand Up @@ -104,7 +104,7 @@ \subsection{
figuring out how to transform its result, which contains dynamic memory,
into \cpp code.

As you may remember from subsection \ref{lbl:constexpr-programming},
As you may remember from section \ref{lbl:constexpr-programming},
there is no direct way to use values holding pointers to dynamic memory
directly as \glspl{nttp}.
Therefore it must be conveyed by other means or transformed into \glspl{litval}
Expand All @@ -121,7 +121,7 @@ \subsection{

The first backend implemented in the poacher project was the \gls{et} backend,
where the AST is transformed into a type-based \gls{ir}
as described in \ref{lbl:pbg-et-technique}. It was later simplified to
as described in section \ref{lbl:pbg-et-technique}. It was later simplified to
remove the \gls{ir} transformation step, which gave the \gls{pbg} backend.

\begin{lstlisting}[
Expand All @@ -143,7 +143,7 @@ \subsection{
\end{lstlisting}

The implementations of these two backends do not differ significantly from
the ones described in \ref{lbl:pbg-technique} and \ref{lbl:pbg-et-technique}:
the ones described in sections \ref{lbl:pbg-technique} and \ref{lbl:pbg-et-technique}:
the generators that evaluate each node are passed as template parameters,
only to work around the fact that pointers to \gls{constexpr} allocated memory
cannot be used in a \gls{nttp}.
Expand All @@ -153,7 +153,7 @@ \subsection{
pack of arbitrary types that may be \lstinline{et_token_t} elements for single
tokens, or \lstinline{et_while_t} elements for nested while loops.

From there, the code generation occurs in the same way as it did in
From there, the code generation occurs in the same way as it did in section
\ref{lbl:pbg-et-technique}: the \gls{et} is traversed recursively using
overloaded functions to generate the \cpp code that corresponds to every
while block, and down to every instruction in the \gls{et}.
Expand All @@ -167,11 +167,11 @@ \subsection{
The last remaining backend to implement is the one that transforms
the \gls{ast} into a serialized, \gls{nttp} compatible \gls{ir}.
The case of Brainfuck introduces a notable difference compared to the simple
use case seen in \ref{lbl:flat-technique}: while \gls{ast} nodes in Brainfuck
use case seen in section \ref{lbl:flat-technique}: while \gls{ast} nodes in Brainfuck
can have an arbitrary number of children nodes, as opposed to add nodes in
the simple use case I presented earlier. This introduces a few technical
differences with regard to serialization and code generation, which I will
cover in detail in this subsection.
cover in detail in this section.

\begin{lstlisting}[
language=c++,
Expand Down
6 changes: 3 additions & 3 deletions 3-new-approaches-to-metaprogramming/4-math-parsing.tex
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ \section{
a parsing algorithm that transforms infix formulas into their \gls{rpn}
representation.

In \ref{lbl:codegen-from-rpn}, we already demonstrated that generating code
In section \ref{lbl:codegen-from-rpn}, we already demonstrated that generating code
from \gls{rpn} formulas is a rather easy task, therefore this section
will only cover the Shunting Yard algorithm, and the use of \gls{rpn}
code generation applied to high performance computing.
Expand All @@ -23,7 +23,7 @@ \subsection{
}

As parsing algorithms and \gls{constexpr} dynamic data representations were
already covered in \ref{lbl:bf-parsing-and-codegen}, the implementation of
already covered in section \ref{lbl:bf-parsing-and-codegen}, the implementation of
the Shunting Yard algorithm will not be covered in detail here.
A thoroughly commented \gls{constexpr} implementation is available in appendix
\ref{app:shunting-yard-impl}. It features the algorithm itself, as well as
Expand All @@ -38,7 +38,7 @@ \subsection{
The worst case time and memory complexity of the algorithm is $O(N)$.

Once again, code generation from postfix notation formulas was already covered
in \ref{lbl:codegen-from-rpn}, so we will skip straight to the use of Blaze
in section \ref{lbl:codegen-from-rpn}, so we will skip straight to the use of Blaze
to generate high performance code from \gls{constexpr} formulas.

\subsection{
Expand Down
2 changes: 1 addition & 1 deletion bibliography/biblio.bib
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ @article{10.1145/243439.243447

@article{10.1023/A:1010095604496,
author = {Futamura, Yoshihiko},
title = {Partial Evaluation of Computation Process—AnApproach to a
title = {Partial Evaluation of Computation Process — An Approach to a
Compiler-Compiler},
year = {1999},
issue_date = {December 1999},
Expand Down
4 changes: 2 additions & 2 deletions bibliography/hpcs2018.bib
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ @book{hpcs9
}

@misc{hpcs10,
title = {Templates revisited - d programming language},
title = {Templates revisited - D programming language},
author = {Walter Bright},
url = {https://dlang.org/articles/templates-revisited.html},
}
Expand Down Expand Up @@ -407,7 +407,7 @@ @misc{hpcs21
}

@book{hpcs22,
title = {Using the gnu compiler collection: a gnu manual for gcc version 4.3.
title = {Using the GNU compiler collection: a GNU manual for gcc version 4.3.
3},
author = {Stallman, Richard M},
year = {2009},
Expand Down
27 changes: 19 additions & 8 deletions format.cpp
Original file line number Diff line number Diff line change
@@ -1,9 +1,20 @@
constexpr std::vector<std::vector<int>> get_vector() {
return {{1, 2, 3}, {4, 5, 6}};
std::tuple<ast_block_t, token_vec_t::const_iterator>
parse_block(token_vec_t::const_iterator parse_begin,
token_vec_t::const_iterator parse_end) {
std::vector<ast_node_ptr_t> block_content;
for (; parse_begin != parse_end; parse_begin++) {
if (*parse_begin == while_end_v) {
return {std::move(block_content), parse_begin};
} else if (*parse_begin == while_begin_v) {
auto [while_block_content, while_block_end] =
parse_block(parse_begin + 1, parse_end);
block_content.push_back(
std::make_unique<ast_while_t>(std::move(while_block_content)));
parse_begin = while_block_end;
} else if (*parse_begin != nop_v) {
block_content.push_back(
ast_node_ptr_t(std::make_unique<ast_token_t>(*parse_begin)));
}
}
return {ast_block_t(std::move(block_content)), parse_end};
}

// Pas bien:
// constexpr std::vector<int> subvec_0 = get_vector()[0];

// Bien:
constexpr auto get_subvec_0 = []() { return get_vector()[0]; }
Loading

0 comments on commit 5158739

Please sign in to comment.