This is a heavily experimental tool designed as an alternative to git diff
, a program used to compare two text files a see the differences between them.
The tool generates an output that details the differences between the files, which can be viewed either through the included frontend or as text by running it in the CLI.
In contrast with git diff
and other existing tools, this project offers:
A novel algorithm: Described brifely bellow, it uses AST nodes as the main data structure
- Reporting
moves
: Code that was moved from one place to another get's highlighted in a different color, no more deletion on one side and addition on the other - Ignoring trivia changes: No longer running a code prettifier on a messy file will produce an unreadable diff
- Programmable: You can get back the diff result as JSON to implement your own way to display changes
- Language agnostic: This project should work with any programming language, more on this bellow
For more cases check out the tests under /tests/conformance
I will eventually get the time to publish an NPM package, for now the simplest way to run the project locally is to clone it, run pnpm install
, then you can choose how to run the code
# Running ts file directly with tsx
tsx ./examples/basic.ts
# OR:
# Start the watch build in one terminal
npm run build
# Run with node
node dist/examples/basic.js
# Terminal 1
npm run start
# Terminal 2
npm run ui
Default location is http://localhost:4000/
zigzag
passes) and bugs (crashes due to debug failures or mismatched parenthesis) but the main idea is there.
The main difference between this tool and git diff
is that it doesn't use Myers' algorithm. Instead, it employs a novel approach summarized as follows:
- Get the AST from the two source files
- Create an array of
Node
, performing a depth-first walk over the AST - Create two
Iterator
instances, used to iterate and search nodes in the array - Start the main loop, attempting to match sequences of nodes (
findBestMatch
fn) - Iterating over the matches and attempting to align them, if not possible a
move
is created (processMoves
fn)
After the diff is done we are left with an array of Change
, with that we can:
- Apply the changes to the source files to log them in the terminal (
OutputType
"text"
and"textPretty"
) - Return a more complex data structure to be consumed and displayed by the server (
OutputType
"serializedOutput"
)
I will create a more detailed explanation of the algorithm later on, for now, the source code is your best resource, I tried to include as many detailed comments as I could.
npm run profile
npm run view pprof-time-XX.pb.gz
time git diff ./a.ts ./b.ts > /dev/null
time git diff --patience ./a.ts ./b.ts > /dev/null
# diffstastic
time difft ./a.ts ./b.ts > /dev/null
# betterDiff
time node dist/FILE_NAME.js > /dev/null
First of all, the language agnostic
part is theoretical, I still haven't got the time to implement a second language, altohuh using tools like tree-sitter (link) should help us tremendously with this task.
The core logic doesn't ever mention any specific configuration or code path for the given programming language, so in theory you should plug in a parser and it should work.
Similar to diffstastic this project requires a compiler per programming language (unless one compiler can process more than one programing language, like swc
or tsc
), this means that for every new programing language a new parser needs to be configured.
Proper aligment
is the biggest missing part, this makes reviewing diffs a bit confusing. This is the main feature I'm working on right now.
Ignoring trivia is not something you can do unforgivably, in Javascript, for example, there is something known as Automatic semicolon insertion (ASI), which basically means that if you don't use a semicolon the engine will insert one on your behalf, this makes the following two code snippets behave differently
function getNum() {
return 1
}
// Returns 1
function getNum() {
return
1
}
// Returns undefined
This is because a semicolon is automatically inserted after the return keyword
function getNum() {
return; // !
1
}
In Python spacing is even more important, in determining the scope of the code.
Right now the project straight up ignores trivial, but there is an issue up to track this feature here
This is one of the reasons why this project is far from production ready :)
You should also check out the diffstastic project too!