Skip to content

Commit

Permalink
Fix splelling
Browse files Browse the repository at this point in the history
  • Loading branch information
sigvef committed Apr 3, 2014
1 parent f014f40 commit a51d87e
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions report.tex
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
\maketitle

\begin{abstract}
This report examines the relative performances of different Delta Correlation prfetching schemes.
This report examines the relative performances of different Delta Correlation prefetching schemes.
Of the prefetchers tested, DCPT is shown experimentally to be the overall most performant prefetching scheme, but not by a land slide.

\end{abstract}
Expand All @@ -44,7 +44,7 @@ \subsection{Delta Correlation Prefetching}
Stride Prefetching is based on identifying repeating memory address deltas.
The most recent delta along with some information describing its stability is stored and used to prefetch more cache lines.
The stability describes how many times the delta has repeated, and is used to determine how many cache lines to prefetch.
Programs such as those that use large arrays benefit greatly from Stride Prefetching, as they usually access memory in constant reapeating intervalls.
Programs such as those that use large arrays benefit greatly from Stride Prefetching, as they usually access memory in constant repeating intervals.

By storing a history of deltas instead of only the most recent, Data Correlation Prefetching is able to capture much more complex patterns than pure strides.
Figure \ref{fig:delta_stream} shows a repeating pattern that can be captured by Delta Correlation but not with Stride Prefetching.
Expand Down Expand Up @@ -102,7 +102,7 @@ \subsection{CZone Delta Correlation}
Figure \ref{fig:CDC} shows an example of a CZone Delta Correlation Prefetcher that implements GHB and Delta Correlation.
Tag C from the index table points to address C09 from CZone C is the head, and is linked to C08, which again is linked to C06, etc.
The first two address deltas (1 and 2) from the linked list is added to the Correlation Key Register (2 and 1 in the example).
The list is traversed, and the Correlation Comparison Register is continiously uptdated with the two latest deltas.
The list is traversed, and the Correlation Comparison Register is continuously updated with the two latest deltas.
When the correlation of the deltas of addresses C2-C4-C5 and C6-C8-C9 occurs, a delta buffer is filled with the deltas from the traversed list so far, and a prefetch based on the latest addresses and the deltas is issued.
Addresses C11, C12, C13, C15 and C16 will be calculated and prefetched.

Expand All @@ -114,12 +114,12 @@ \subsection{Adaptive CZone Delta Correlation}
The prefetcher increases or decreases the number of blocks to be prefetched by 1, after evaluating the hit rate.
The hit rate is compared with the hit rate of the previous setting, and the prefetcher increases or decreases the number of blocks based on increased or decreased performance.

\subsection{Program Counter Delta Corelation}
\subsection{Program Counter Delta Correlation}
\todo{write something}
A load instruction in the PC may be used several times to load data from different addresses on the memory.
Addresses of a load instruction that misses is read from the PC and placed in the Index table, and it points to the latest missed data address connected to that specific instruction address.

\subsection{Adaptive Program Counter Delta Corelation}
\subsection{Adaptive Program Counter Delta Correlation}
\todo{write something}
This is a more advanced version of the Program Counter Delta Prefetcher, that calculates how many blocks to load by comparing hit rate performance on the current load number with the previous load number.

Expand All @@ -128,7 +128,7 @@ \subsection{Delta Correlation Prediction Tables}
\todo{Describe how the final prefetcher works. I suggest adding a figure. Maybe briefly mention other attempts while if we have space?}


As can be seen, the a row in the table contains fields for the PC, last address, last prefetch, deltas 1 to n, and delta pointer.
As can be seen, a row in the table contains fields for the PC, last address, last prefetch, deltas 1 to $ n $, and delta pointer.
PC stores the address to the load instruction, and works as index in the table.
The Last Address stores the missed address when there is a miss in the cache.
The delta fields stores the address difference, or the deltas, for each time this instruction is called.
Expand All @@ -149,7 +149,7 @@ \section{Methodology}

\subsection{Simulation Framework}

A modified version of M5, the open-source TCP/IP network simulator\cite{M5paper}, has been used to simulate a hierarchical memory environment for evaulation of the different prefetcher implementations.
A modified version of M5, the open-source TCP/IP network simulator\cite{M5paper}, has been used to simulate a hierarchical memory environment for evaluation of the different prefetcher implementations.
The modified M5 simulator is supplied as course material.

The memory model simulated by the framework is based on the Alpha 21264 microprocessor, which is a superscalar processor capable of out-of-order execution through speculative execution and instruction reordering.
Expand All @@ -166,7 +166,7 @@ \subsection{Simulation Framework}
The memory bus runs at 400MHz, is 64 bits wide, and has a latency of 30ns.~\cite{m5userguide}
\end{quote}

To simulate a prefetcher with the modified M5 framework, the bahavior must be implemented as an M5 prefetcher module in C++ and plugged into the system.
To simulate a prefetcher with the modified M5 framework, the behavior must be implemented as an M5 prefetcher module in C++ and plugged into the system.
To emulate realistic conditions for a hardware prefetcher implementation, a hard memory usage limit of 8KB was imposed on the software prefetchers.
That means that a prefetcher may only allocate up to 8KB of memory to hold any eventual data structures.
No further restrictions were imposed.
Expand Down Expand Up @@ -215,9 +215,9 @@ \section{Discussion}

\todo{ Not exactly sure, just say it works better? Compare with other prefetcher IF we chose to describe them. }
All of the explored prefetching techniques resulted in improved performance in the tests, compared to no prefetching.
There are still differences, and none of the prefetching techniques is supperior to the others in every test.
There are still differences, and none of the prefetching techniques is superior to the others in every test.
Overall the DCPT is scoring best on the test with APCDC on a close second.
Which of them would be best suited for a real life appliction, largely depend on the charicaristics of the application.
Which of them would be best suited for a real life appliction, largely depend on the characteristics of the application.

\input{discussion_graph.tex}

Expand All @@ -235,7 +235,7 @@ \section{Conclusion}

In 2009 he served as a civilian guardian of the Norwegian People, and he has since then worked as, amongst other things, a Software Consultant, a Technical Innovator in the mobile banking sector, a Professional Translator, and is currently working as one of the technical Co-Founders of feat.fm.

Mr. Farstad was, together with other teammembers of the demo crew Ninjadev, the winner of the Web Demo Compo at Solskogen 2012.
Mr. Farstad was, together with other team members of the demo crew Ninjadev, the winner of the Web Demo Compo at Solskogen 2012.
\end{IEEEbiography}

\begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{Figures/holmgren.jpg}}]{Rune Holmgren}
Expand Down

0 comments on commit a51d87e

Please sign in to comment.