Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Released and unchanged datasets appear as new drafts when upgrading from 4.8.6 to 4.9.4 #5142

Closed
lmaylein opened this issue Oct 5, 2018 · 12 comments

Comments

@lmaylein
Copy link
Contributor

lmaylein commented Oct 5, 2018

When upgrading form 4.8.6 to 4.9.4 the number of the datasets in our instance is increased.
There are additional draft versions for already publicated (and unchanged) datasets.

@lmaylein lmaylein changed the title Upgrade 4.8.6 -> 4.9.4 Release and unchanged datasets appear as new drafts when upgrading from 4.8.6 to 4.9.4 Oct 5, 2018
@lmaylein lmaylein changed the title Release and unchanged datasets appear as new drafts when upgrading from 4.8.6 to 4.9.4 Released and unchanged datasets appear as new drafts when upgrading from 4.8.6 to 4.9.4 Oct 5, 2018
@pdurbin
Copy link
Member

pdurbin commented Oct 5, 2018

@lmaylein that doesn't sound good. Are you basing this on what you're seeing in the database or Solr?

@lmaylein
Copy link
Contributor Author

lmaylein commented Oct 5, 2018

First: It was only a testmigration :-)
Here are some Screenshots:

4.8.6:

dataset_count_4 8 6

4.9.4:

dataset_count_4 9 4

@lmaylein
Copy link
Contributor Author

lmaylein commented Oct 5, 2018

Some of the additinal draft versions.
The description "dummy" was entered, when the datasets were created (several weeks ago).

draft_datasets

@pdurbin
Copy link
Member

pdurbin commented Oct 5, 2018

It was only a testmigration

Phew. I'm already recovering nicely from my heart attack.

Could it be that the draft datasets have been around for a long time but never indexed? Perhaps you ran an "index all" as part of your migration process? I long time ago I wrote some code to "diff" what's in Solr vs the database but it's buggy and we haven't documented it. You can find references to it in #4205.

@lmaylein
Copy link
Contributor Author

lmaylein commented Oct 5, 2018

I've reindexed our 4.8.6 (productive) dataverse now. The number number of dataset is the same as before. The nine additional draft versions don't appear in 4.8.6.

If I copy the dataset links generated by 4.9.4 to our official/productive domain
(e.g. https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/VGWRB7&version=DRAFT) I get:

Info – The "DRAFT" version was not found. This is version "1.0".

I'll check your diff script.

@lmaylein
Copy link
Contributor Author

lmaylein commented Oct 5, 2018

Ok. After fixing the problem described in #5141 ...

The URL
https://<new_ip_for_dataverse_4.9.4>/dataset.xhtml?persistentId=doi:10.11588/data/9UXUB0&version=DRAFT

shows the same result, when clicking on the "ghost draft":

Info – The "DRAFT" version was not found. This is version "1.0".

@lmaylein
Copy link
Contributor Author

lmaylein commented Oct 8, 2018

Additional information:

Affected are the last nine datasets which have been released (in 4.8.6).

@lmaylein
Copy link
Contributor Author

lmaylein commented Oct 8, 2018

Is this helpful in some way?

curl -s "http://localhost:8983/solr/collection1/select?rows=1000000&wt=json&indent=true&q=identifier%3A*IDSI88"

4.8.6:

{
  "responseHeader":{
    "status":0,
    "QTime":78,
    "params":{
      "q":"identifier:*IDSI88",
      "indent":"true",
      "rows":"1000000",
      "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"dataset_1795",
        "entityId":1795,
        "dataverseVersionIndexedBy_s":"4.8.6",
        "identifier":"doi:10.11588/data/IDSI88",
        "dsPersistentId":"doi:10.11588/data/IDSI88",
        "persistentUrl":"https://doi.org/10.11588/data/IDSI88",
        "dvObjectType":"datasets",
        "dateSort":"2018-10-05T07:56:03.094Z",
        "dateFriendly":"Oct 5, 2018",
        "publicationStatus":["Published"],
        "publicationDate":"2018",
        "dsPublicationDate":"2018",
...

4.9.4:

{
  "responseHeader":{
    "status":0,
    "QTime":3,
    "params":{
      "q":"identifier:*IDSI88",
      "indent":"true",
      "rows":"1000000",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"dataset_1795_draft",
        "entityId":1795,
        "dataverseVersionIndexedBy_s":"4.9.3",
        "identifier":"doi:10.11588/data/IDSI88",
        "dsPersistentId":"doi:10.11588/data/IDSI88",
        "persistentUrl":"https://doi.org/10.11588/data/IDSI88",
        "dvObjectType":"datasets",
        "publicationStatus":["Unpublished",
          "Draft"],
        "dateSort":"2018-07-20T15:18:30.511Z",
        "dateFriendly":"Jul 20, 2018",
        "isHarvested":false,
        "metadataSource":"heiDATA",
        "datasetVersionId":321,
...
        "_version_":1613196186655653888},
      {
        "id":"dataset_1795",
        "entityId":1795,
        "dataverseVersionIndexedBy_s":"4.9.4",
        "identifier":"doi:10.11588/data/IDSI88",
        "dsPersistentId":"doi:10.11588/data/IDSI88",
        "persistentUrl":"https://doi.org/10.11588/data/IDSI88",
        "dvObjectType":"datasets",
        "dateSort":"2018-10-05T07:56:03.094Z",
        "dateFriendly":"Oct 5, 2018",
        "publicationStatus":["Published"],
        "publicationDate":"2018",
        "dsPublicationDate":"2018",
        "isHarvested":false,
        "metadataSource":"heiDATA",
        "datasetVersionId":321,
...

@lmaylein
Copy link
Contributor Author

Can I further support the debugging of this problem?
Do you need any select results from the database?

In the solr result cited, both datasets (draft and release) have the same datasetVersionID. Maybe a problem with a join while indexing?

Which, however, speaks aginst it are the old metadata (description "Dummy") which are displayed in the search result for the drafts and which certainly reflects a state before the first release.

@pdurbin
Copy link
Member

pdurbin commented Oct 12, 2018

@lmaylein hi, all the detail here is much appreciated but can you please email [email protected] to open a support ticket? If you've done this already, you can just mention the ticket number here. Thanks!

@lmaylein
Copy link
Contributor Author

lmaylein commented Oct 16, 2018

Mea culpa!

If I deploy every version between 4.8.6 and 4.9.4
sequentially, it looks good.
It is not enough to run the database upgrade scripts of this versions.

@pdurbin
Copy link
Member

pdurbin commented Oct 16, 2018

@lmaylein no problem. While upgrades are on your mind, you might want to leave a comment on #4980 which is related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants