Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzing NPM projects fails the DependencyGraphBuilder with "The following references do not actually refer to packages" #9699

Open
sschuberth opened this issue Jan 7, 2025 · 14 comments
Labels
analyzer About the analyzer tool

Comments

@sschuberth
Copy link
Member

sschuberth commented Jan 7, 2025

Running NPM analysis on https://github.com/doubleopen-project/dos fails with

Exception in thread "main" java.lang.IllegalArgumentException: The following references do not actually refer to packages: [Identifier(type=NPM, namespace=, name=database, version=), Identifier(type=NPM, namespace=, name=s3-helpers, version=), Identifier(type=NPM, namespace=, name=spdx-validation, version=), Identifier(type=NPM, namespace=, name=validation-helpers, version=)].
	at org.ossreviewtoolkit.model.utils.DependencyGraphBuilder.checkReferences(DependencyGraphBuilder.kt:204)
	at org.ossreviewtoolkit.model.utils.DependencyGraphBuilder.build(DependencyGraphBuilder.kt:177)
	at org.ossreviewtoolkit.model.utils.DependencyGraphBuilder.build$default(DependencyGraphBuilder.kt:176)
	at org.ossreviewtoolkit.plugins.packagemanagers.node.npm.Npm.createPackageManagerResult(Npm.kt:146)
	at org.ossreviewtoolkit.analyzer.PackageManager.resolveDependencies(PackageManager.kt:326)
	at org.ossreviewtoolkit.analyzer.PackageManagerRunner$run$3.invokeSuspend(Analyzer.kt:321)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:113)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704)

This used to work before and was probably broken by the NPM packager manager rewrite.

@sschuberth sschuberth added analyzer About the analyzer tool bug labels Jan 7, 2025
@sschuberth sschuberth self-assigned this Jan 7, 2025
@mnonnenmacher mnonnenmacher changed the title Analyzer DOS fails the DependencyGraphBuilder with "The following references do not actually refer to packages" Analyzing DOS fails the DependencyGraphBuilder with "The following references do not actually refer to packages" Jan 7, 2025
@sschuberth
Copy link
Member Author

sschuberth commented Jan 7, 2025

For the record, it seems this regression did not surface before #9616, although it was introduced before that.

@klaxa
Copy link

klaxa commented Jan 21, 2025

Hello,

I can confirm this issue for our project and did a superficial bisect over the tags. For our project at least the analysis produces results up until 39.0.0, but starting with 40.0.0 it aborts with the same error type (different packages) as the opening post. Sadly I am not yet familiar enough with the code base to assist much further at this point.

Best regards

@sschuberth
Copy link
Member Author

the analysis produces results up until 39.0.0, but starting with 40.0.0 it aborts

Thank for this analysis!

Sadly I am not yet familiar enough with the code base to assist much further at this point.

I started looking into the issue, but got distracted and wasn't able to continue so far.

@sschuberth sschuberth changed the title Analyzing DOS fails the DependencyGraphBuilder with "The following references do not actually refer to packages" Analyzing NPM projects fails the DependencyGraphBuilder with "The following references do not actually refer to packages" Jan 22, 2025
@Etsija
Copy link
Contributor

Etsija commented Feb 10, 2025

I managed to run full git bisect for this issue: started from v39.0.0 (GOOD) and from the other end, pretty widely from v45.0.0 (BAD). According to the bisect, this commit seems to break the Analyzer:

03560a58b357023745b1717b29330437e837b476 is the first bad commit

commit 03560a58b357023745b1717b29330437e837b476
Author: Frank Viernau <[email protected]>
Date:   Thu Nov 7 14:00:17 2024 +0100

    refactor(node)!: Make `Npm` separate from `Yarn`
    
    Analog to 0eb1eea, remove the inheritance between the two managers and
    re-write large parts of `Npm` to extract all needed information solely
    based on the output of the `npm` CLI command, instead of relying on the
    file hierarchy under the `node_modules` directory.
    
    This reduces complexity and makes the implementation(s) easy to
    understand, maintain and change in isolation.
    
    Note: The handling of the `installIssues` in `Yarn` has been used only
    for `Npm`, which is why that code is moved from `Yarn` to `Npm`.
    
    Signed-off-by: Frank Viernau <[email protected]>

What I did and what the criterias were for GOOD and BAD:

  1. For testing, I used our DOS monorepo, which was anchored to this revision: doubleopen-project/dos@62a5d31
  2. For each bisect step, I ran
> ./gradlew installDist
> ort --info analyze -i <input_dir> -o <output_dir>
  1. Criterias:
  • GOOD: Analyzer runs successfully and produces analyzer-report.yml which is roughly 500kB and includes dependencies.
  • BAD: either

(a) Closer to v39.0.0, Analyzer writes only a 7kB results file, which is completely missing the dependencies, and logs this issue:

      'NPM::package.json:':
      - timestamp: "2025-02-10T08:25:03.062115412Z"
        source: "NPM"
        message: "NPM failed to resolve dependencies for path 'package.json': IOException:\
          \ Running 'npm list --depth Infinity --json --long' in '/input' failed with\
          \ exit code 1:\nnpm error code ELSPROBLEMS\nnpm error invalid: @typescript-eslint/[email protected]\
          \ /input/node_modules/@typescript-eslint/eslint-plugin\nnpm error A complete\
          \ log of this run can be found in: /home/ort/.npm/_logs/2025-02-10T08_25_00_475Z-debug-0.log\n"
        severity: "ERROR"

(b) Closer to v45.0.0, Analyzer completely fails to resolve the projects of our monorepo and fails with this error log:

12:14:54.147 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'npm info --json [email protected]' in '/home/jyrki/dos/node_modules/scanner-worker'...
12:14:55.255 [DefaultDispatcher-worker-1] WARN  org.ossreviewtoolkit.plugins.packagemanagers.node.npm.Npm - Error getting details for [email protected] in directory /home/jyrki/dos/node_modules/scanner-worker: Running 'npm info --json [email protected]' in '/home/jyrki/dos/node_modules/scanner-worker' failed with exit code 1:
{
  "error": {
    "code": "E404",
    "summary": "Not Found - GET https://registry.npmjs.org/scanner-worker - Not found",
    "detail": "\n '[email protected]' is not in this registry.\n\nNote that you can also install from a\ntarball, folder, http url, or git url."
  }
}

12:14:55.259 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'npm info --json @types/[email protected]' in '/home/jyrki/dos/node_modules/@types/jest'...
12:14:56.031 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'npm info --json [email protected]' in '/home/jyrki/dos/node_modules/spdx-exceptions'...
12:14:56.658 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'npm info --json [email protected]' in '/home/jyrki/dos/node_modules/spdx-license-ids'...
12:14:57.261 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'npm info --json @types/[email protected]' in '/home/jyrki/dos/node_modules/@types/is-glob'...
12:14:59.532 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.DirectoryStash - Moving back directory from '/home/jyrki/dos/.stash11363407600901117333/node_modules' to '/home/jyrki/dos/node_modules'.
12:14:59.532 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.analyzer.PackageManager - NPM resolved dependencies for path 'package.json' in 10m 38.513115124s.
Exception in thread "main" java.lang.IllegalArgumentException: The following references do not actually refer to packages: [Identifier(type=NPM, namespace=, name=database, version=), Identifier(type=NPM, namespace=, name=s3-helpers, version=), Identifier(type=NPM, namespace=, name=spdx-validation, version=), Identifier(type=NPM, namespace=, name=validation-helpers, version=)].
	at org.ossreviewtoolkit.model.utils.DependencyGraphBuilder.checkReferences(DependencyGraphBuilder.kt:204)
	at org.ossreviewtoolkit.model.utils.DependencyGraphBuilder.build(DependencyGraphBuilder.kt:177)
	at org.ossreviewtoolkit.model.utils.DependencyGraphBuilder.build$default(DependencyGraphBuilder.kt:176)
	at org.ossreviewtoolkit.plugins.packagemanagers.node.npm.Npm.createPackageManagerResult(Npm.kt:146)
	at org.ossreviewtoolkit.analyzer.PackageManager.resolveDependencies(PackageManager.kt:326)
	at org.ossreviewtoolkit.analyzer.PackageManagerRunner$run$3.invokeSuspend(Analyzer.kt:321)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:113)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704)

Now this two-fold failure manifestation might have set the bisecting off, but at least it's a start I believe. For info, @fviernau.

@lamppu
Copy link
Contributor

lamppu commented Feb 10, 2025

Yeah, it seems there's a couple issues here, the analysis crashes completely from 81f58ea on for the DOS monorepo, and as @Etsija found, the NPM packages aren't found after 03560a5.

Now that I found this thread #9551 though, I'm testing to see if recreating the package-lock.json for DOS will resolve these issues.

@lamppu
Copy link
Contributor

lamppu commented Feb 10, 2025

Interestingly, after recreating the package-lock.json for DOS, the analyzer now crashes already with 03560a5 but 81f58ea doesn't fix it, it still crashes.

@sschuberth sschuberth removed the bug label Feb 10, 2025
@Etsija
Copy link
Contributor

Etsija commented Feb 11, 2025

So I did some more digging myself, reading the relevant issues and PRs, and here's how to reproduce:

  1. Repro repo is our DOS monorepo, which can be cloned at https://github.com/doubleopen-project/dos
  2. Run npm list --depth Infinity --json --long, then get the dependencies, followed by the error field:
        }
      }
    }
  },
  "error": {
    "code": "ELSPROBLEMS",
    "summary": "invalid: @typescript-eslint/[email protected] /home/jyrki/dos/node_modules/@typescript-eslint/eslint-plugin",
    "detail": ""
  }
}

This error field I believe is what #81f58ea catches.

  1. But, the command also reports problems: [invalid:] at two places concerning the same problematic dependency it found. Moreover, it also has some invalid: fields in the JSON. I will attach the complete results of npm list command for reference, and here, just attach two screenshots with the corresponding lines in the results json:

Image

Image

deps.zip

Without knowing the details of parsing the npm list in ORT implementation, my question is, are we properly handling these problems: and invalid: fields from the results of npm list command, or could it be that they are not catched, but their presence causes a complete failure of parsing, leading to the Analyzer failure later on in the pipeline?

Judging from the data class ModuleInfo 's structure, it does indeed not catch the problems: or invalid: fields - but I don't enough of the implementation to judge, whether the code assumes all information from the npm list command is present in that data class, or whether it should just silently bypass fields that do not belong to the data class, and parse properly, nevertheless.

@Etsija
Copy link
Contributor

Etsija commented Feb 12, 2025

@fviernau with the recent findings, any ideas to why the Analyzer fails? Any chance the parsing of npm list result JSON is failing, or failing to catch information it needs for the problematic packages? I have at least a 100% reproducible problem, so could debug myself as well.

@fviernau
Copy link
Member

fviernau commented Feb 12, 2025

@fviernau with the recent findings, any ideas to why the Analyzer fails?

I have not experimented with this yet, but google it for a while. Some search results indicate
that this kind of issue really indicates something which is actually fixable outside of ORT.
For example, [1].
Have you researched a bit whether this error can be fixed with pnpm / definition files / lock files?
For an overview of the issues, [2] could be interresting.

So after reading [2], looking again at your output it seems that (I would have liked to copy an paste stuff, but you accidentally attached an image), eslint-plugin 8.x is installed while the package who's idis not visible in the image requires either versions 5.x, 6.x or 7.x. Mind investigating why is that? (IIUC correctly, pnpm supports multiple versions of the same package, but maybe it put it into the wrong dir?)

[1] https://stackoverflow.com/questions/71152952/what-does-npm-err-code-elsproblems-mean
[2] https://sebhastian.com/npm-err-code-elsproblems/

@fviernau
Copy link
Member

fviernau commented Feb 12, 2025

In your lockfile the install locations seem to be:

     "node_modules/@typescript-eslint/eslint-plugin": {
            "version": "8.24.0",
 ...           
        
                   "node_modules/@vercel/style-guide/node_modules/@typescript-eslint/eslint-plugin": {
            "version": "7.18.0",

For any X dependency of a dependency Y, if X is not installed in any subdir of Y, then the installed package is search for in ancestor dir.

So for package eslint-plugin-jest there is nothing installed in subdir it takes the installed thing from ancestor dir which is wrong.

"node_modules/eslint-plugin-jest": {
            "version": "27.9.0",
            "resolved": "https://registry.npmjs.org/eslint-plugin-jest/-/eslint-plugin-jest-27.9.0.tgz",
            "integrity": "sha512-QIT7FH7fNmd9n4se7FFKHbsLKGQiw885Ds6Y/sxKgCZ6natwCsXdgPOADnYVxN2QrRweF0FZWbJ6S7Rsn7llug==",
            "dev": true,
            "dependencies": {
                "@typescript-eslint/utils": "^5.10.0"
            },
            "engines": {
                "node": "^14.15.0 || ^16.10.0 || >=18.0.0"
            },
            "peerDependencies": {
                "@typescript-eslint/eslint-plugin": "^5.0.0 || ^6.0.0 || ^7.0.0",
                "eslint": "^7.0.0 || ^8.0.0",
                "jest": "*"
            },
            "peerDependenciesMeta": {
                "@typescript-eslint/eslint-plugin": {
                    "optional": true
                },

I now wonder whether this is caused by plugin being an optional peer dependency? (Would we have to instruct pnpm to install such dependencies, or alternatively, would it be reasonable to ignore any errors for optional peer dependencies (and skip the entries entirely) if we insist on not installing them ?

eslint is a dev dependency and a peer dependency, see:

Sp, probably it was installed with --save-peer .

@fviernau
Copy link
Member

fviernau commented Feb 12, 2025

(Somehow I was accidentally mislead into looking into pnpm).
So, back to npm. Reading the docs for npm ci which is used in your case to install the deps,
it says that the flags used during lockfile creation must align with the ones provided to npm ci.
Which flags have you used for creating the lockfile?

       NOTE: If you create your package-lock.json file by running npm install with flags that can affect the shape of your dependency tree, such  as  --legacy-peer-deps  or  --in‐
       stall-links,  you  must  provide  the same flags to npm ci or you are likely to encounter errors. An easy way to do this is to run, for example, npm config set legacy-peer-
       deps=true --location=project and commit the .npmrc file to your repo.

See also

val options = listOfNotNull(
"--ignore-scripts",
"--no-audit",
"--legacy-peer-deps".takeIf { legacyPeerDeps }
)
.

edit: Hm, well legacy peer deps is not used by default

@fviernau
Copy link
Member

@sschuberth I just found that the project uses workspaces [1].
For npm IIRC there has never been support for workspaces implemented, nor there were tests for it.
What I believe is missing here is to add support for workspaces.
I have a bullet point for this also in my refactoring ticket.

The module info for the failing package is:

---
name: "database"
version: null
path: "/home/frank/sources/dos/dos/node_modules/database"
id: null
dependency_constraints:
  '@prisma/client': "6.3.1"
  adm-zip: "0.5.16"
  s3-helpers: "*"
dependencies: {}
optional: false
dev: false
resolved: null

In this case the NpmDependencyHandler.createPackage() does return null, because resolved==null indicates that
this is a project, not a resolved dependency. Returning null is corret in this case, because this is not a Package.

IMO Npm needs to be adjusted, to return a list of projects in resolveDependencies(). Currently, it only returns a single project for the workspace root.

[1] https://github.com/doubleopen-project/dos/blob/629af7d1858abe39fd9259a33130b49328c5e8b1/package.json#L6-L9

@sschuberth
Copy link
Member Author

For npm IIRC there has never been support for workspaces implemented, nor there were tests for it.

That's likely true, though analysis at least seemed to work before, not saying that the results were correct.

IMO Npm needs to be adjusted, to return a list of projects in resolveDependencies(). Currently, it only returns a single project for the workspace root.

Sounds reasonable. When I briefly started to debug the issue before my vacation, I toyed with changing the condition at

override fun dependenciesFor(dependency: ModuleInfo): List<ModuleInfo> =
dependency.dependencies.values.filter { it.isInstalled }

to

dependency.dependencies.values.filter { it.isProject || it.isInstalled }

which got me a bit further, but then I had to stop looking into the issue.

@sschuberth sschuberth removed their assignment Feb 12, 2025
@fviernau
Copy link
Member

fviernau commented Feb 12, 2025

which got me a bit further, but then I had to stop looking into the issue.

I hope that the code which needs to be added for supporting workspaces will be similar (relatively simple) to the one in Pnpm.kt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer About the analyzer tool
Projects
None yet
Development

No branches or pull requests

5 participants