Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResearchSpace setup documentation #3

Open
natuk opened this issue Jun 21, 2019 · 0 comments
Open

ResearchSpace setup documentation #3

natuk opened this issue Jun 21, 2019 · 0 comments

Comments

@natuk
Copy link

natuk commented Jun 21, 2019

Copying here the OXLOD notes for setting up ResearchSpace on CentOS as discussed on the call of the 19th of June. These may help to assess the amount of effort needed to get a demo instance running for the Linked.Art showcase.

ResearchSpace

Installing ResearchSpace on CentOS7

Install prerequisites

Installed docker as:

yum install docker

Start docker:

systemctl start docker

Tested docker:

docker run docker.io/hello-world

Install GIT:

yum install git

Install Java:

yum install java-1.8.0-openjdk
yum install java-1.8.0-openjdk-devel
export JAVA_HOME=/usr/lib/jvm/java-xxxxxxxxxxxxxxx

Install Scala Interactive Build Tool:

    curl https://bintray.com/sbt/rpm/rpm > bintray-sbt-rpm.repo
    mv bintray-sbt-rpm.repo /etc/yum.repos.d/
    yum install sbt

Install Node.js from here:

    yum install epel-release
    curl --silent --location https://rpm.nodesource.com/setup_6.x | sudo bash -
    yum install nodejs
    yum install gcc-c++ make

Install Yarn from here:

    curl --silent --location https://dl.yarnpkg.com/rpm/yarn.repo | sudo tee /etc/yum.repos.d/yarn.repo
    yum install yarn

Clone ResearchSpace

Install ResearchSpace:

    git clone https://github.com/researchspace/researchspace.git
    cd researchspace
    ./build.sh compile

Setup Blazegraph which comes with ResearchSpace

Install unzip:

    yum install unzip

Unzip example files:

    unzip example-data/blazegraph.jnl.zip -d example-data/
    unzip example-data/assets.zip -d metaphactory/data/

Start Blazegraph in Docker:

    docker run --name rs-blazegraph-default -d --restart=always -p 10080:8080 --env QUERY_TIMEOUT="30000" -v $(pwd)/example-data/:/blazegraph-data/:rw researchspace/blazegraph

Configure ResearchSpace

Configure ResearchSpace with some parameters. Use the docker image for blazegraph:

    echo "sparqlEndpoint=http://localhost:10080/blazegraph/sparql" >> runtime/config-dev/environment.prop

Note: I am actually testing this now with the current installation of Blazegraph on Antheia given that there may be conflict on the port mapping

Specify app directory:

    echo "appsDirectory=$(pwd)/researchspace/apps" >> runtime/config-dev/environment.prop

Build ResearchSpace

Next we need to start the built. It requires lots of RAM to do so, it failed on a laptop with a VM. Works on Antheia. Start the SBT environment in interactive mode:

    ./build.sh

At the command prompt:

    ~jetty:start

The build fails saying that the researchspace directory metaphactory/data/templates does not exist. Creating the directory:

    cd metaphactory/data
    mkdir templates

Build succeeds and ResearchSpace is accessible on port 10214. Rerouting this through reverse proxy brings up the problem of many of the assets required are provided by localhost:3000 addresses which seem to be related to nodejs. These are insecure and with the wrong url so Firefox blocks them. Need to do reverse proxy for port 3000 and replace these URLs.

Edited these files:

    nano project/webpack/webpack.dll.dev.js 
    nano project/PlatformBuildPlugin.scala 
    nano metaphactory/webapp/assets/bundles-manifest.json 
    nano metaphactory/webapp/assets/dll-manifest.json 
    nano project/webpack/server.js 
    nano project/webpack/webpack.dev.config.js
    nano project/webpack/assets/no_auth/basic-styles.css

and replace http://localhost:3000 with https://antheia-node.oerc.ox.ac.uk
Setup reverse proxy from antheia-node.oerc.ox.ac.uk to 127.0.0.1:3000 where the nodejs server listens.

It works.

ResearchSpace does not connect to the docker instance of Blazegraph.

Connect to Antheia's Blazegraph instance

Try to connect to Antheia's blazegraph instance:

Edit runtime/config-dev/environment.prop and instead of the docker Blazegraph instance: http://localhost:10080/blazegraph/sparql specify the antheia endpoint: http://localhost:9999/blazegraph/namespace/exemplar/sparql

It works.

Setup ResearchSpace as a service

Setup ResearchSpace as a service by editing the ./build.sh file and adding at the end of the line ~jetty:start. This allows ResearchSpace to start from the build script without any command being issued in the sbt interactive environment. This can then become a command for starting the service, but more research in sbt is needed as building while starting means that the templates do not persist between builds.

After contacting Mike Kelly this is what I have done:

./build.sh  -DnoYarn=true -DbuildEnv=prod -DzipConfig=./researchspace/dist/zip/researchspace.ini -DplatformVersion=1.0.0 clean platformZip

This will create a zip file of the snapshot alongside an expanded directory. These are currently in:

/root/researchspace/target/researchspace-2.1-SNAPSHOT/

Running start.sh in that directory loads up the platform as a Jetty service. I then edited the config/environment.prop file:

sparqlEndpoint=http://localhost:9999/blazegraph/namespace/exemplar/sparql
appsDirectory=/root/researchspace/researchspace/apps

which added the ResearchSpace customisations and the correct endpoint.

The templates in /root/researchspace/target/researchspace-2.1-SNAPSHOT/data/templates are not persistent (see "ResearchSpace Templates" below on how they can be persistent).

Further advice from Mike Kelly: The build command above only builds the clean metaphacts platform. Customisations like the ResearchSpace templates etc. are added in the /root/researchspace/target/researchspace-2.1-SNAPSHOT/apps directory. Essentially by specifying the appsDirectory parameter we override this default location. Better to copy the files in the default location than specifying it in the environment.prop file. So:

cp -r researchspace/apps/* target/researchspace-2.1-SNAPSHOT/apps/

and change the environment.prop file to choose a different default endpoint:

sparqlEndpoint=http://localhost:9999/blazegraph/namespace/exemplar/sparql

Running start.sh starts Metaphacts with all the customisations of ResearchSpace.

Then, make a new user:

useradd researchspace -m

Copy the directory with the newly compiled version to the researchspace user:

cp -r /root/researchspace/target/researchspace-2.1-SNAPSHOT/* /home/researchspace/

Create a service config file to start ResearchSpace automatically:

nano /etc/systemd/system/researchspace.service

and add:

[Unit]
Description=ResearchSpace Service
After=network.target

[Service]
Type=simple
User=researchspace
WorkingDirectory=/home/researchspace
ExecStart=/bin/sh /home/researchspace/start.sh
Restart=on-abort

[Install]
WantedBy=multi-user.target

Useful commands:

systemctl start researchspace
systemctl stop researchspace

It works!

Working with templates

Templates are stored in the metaphacts directory data/templates. These are the ones that admin users modify in the UI. By default they are overwritten by the ResearchSpace templates in the apps/researchspace/data/templates directory every time ResearchSpace starts. This means that any modifications during runtime are then lost as the ResearchSpace templates are enforced again if the system is restarted.

To make these changes persistent during restarts we change this file:

nano apps/researchspace/plugin.properties

The lines:

plugin.namespaceMergeStrategy=overwrite
plugin.templateMergeStrategy=overwrite

to:

plugin.namespaceMergeStrategy=copy
plugin.templateMergeStrategy=copy

Therefore the file looks like:

plugin.id=researchspace
plugin.provider=Metaphacts
plugin.version=1.2.0
plugin.namespaceMergeStrategy=copy
plugin.templateMergeStrategy=copy
plugin.configMergeStrategy=overwrite

The customisations in the data/templates folder persist now.

Caching templates

ResearchSpace has a caching mechanism for templates which often does not update automatically (although it should). To reset the cache visit here.

Work with multiple endpoints

Default endpoint

The file config/environment.prop allows the definition of the default endpoint using the setting:

sparqlEndpoint=http://localhost:9999/blazegraph/namespace/<namespace>/sparql

This can be equally specified by a new default.ttl file in the config/repositories folder as explained next.

Attention: During the OXLOD tests, this setting was changed from

sparqlEndpoint=http://localhost:9999/blazegraph/namespace/exemplar/sparql

to

sparqlEndpoint=http://localhost:9999/blazegraph/namespace/kb/sparql

While with the options explained in "Define context when querying" (below), the queries still worked, much of the default functionality of ResearchSpace (such as showing images, rdfs labels and maps) broke. Switching back the setting fixed them. TODO: Investigate further how the default functionality is affected by that setting.

Additional endpoints

Additional endpoints can be defined in the directory config/repositories as .ttl files with an empty node. That node is meant to be the repository. Linked nodes specified in this file are purely for describing the endpoint and I think that they consist of node1 plus a random string of 10 lowercase alphanumeric characters. Such string can be produced by services like the random string generator.

RDF4J

By default there is a file assets.ttl which defines an RDF4J repository as follows

[] a <http://www.openrdf.org/config/repository#Repository> ;
	<http://www.openrdf.org/config/repository#repositoryID> "assets" ;
	<http://www.w3.org/2000/01/rdf-schema#label> "Asset repository for platform or user specific artefacts." ;
	<http://www.openrdf.org/config/repository#repositoryImpl> _:node1cbcrvcpsx2 .

_:node1cbcrvcpsx2 <http://www.openrdf.org/config/repository#repositoryType> "openrdf:SailRepository" ;
	<http://www.openrdf.org/config/repository/sail#sailImpl> _:node1cbcrvcpsx3 .

_:node1cbcrvcpsx3 <http://www.openrdf.org/config/sail#sailType> "openrdf:NativeStore" .

Blazegraph

To define yet another endpoint we create a new .ttl file in the same directory. In this case we define an endpoint called "exemplar" in the file exemplar.ttl. This endpoint holds the data used for the May the 2nd OXLOD project board.

[] a <http://www.openrdf.org/config/repository#Repository> ;
    <http://www.openrdf.org/config/repository#repositoryID> "exemplar" ;
    <http://www.w3.org/2000/01/rdf-schema#label> "OXLOD exemplar with datasets on China" ;
    <http://www.openrdf.org/config/repository#repositoryImpl> _:node1x4k9okaroi .
_:node1x4k9okaroi <http://www.openrdf.org/config/repository#repositoryType> "metaphactory:SPARQLRepository" ;
    <http://www.openrdf.org/config/repository/sparql#query-endpoint> <http://localhost:9999/blazegraph/namespace/exemplar/sparql> ;
    <http://www.openrdf.org/config/repository/sparql#update-endpoint> <http://localhost:9999/blazegraph/namespace/exemplar/sparql> ;
    <http://www.metaphacts.com/ontologies/platform/repository#quadMode> "true" .

Check endpoints

To check that endpoints work go to the page https://antheia-researchspace.oerc.ox.ac.uk/sparql and check that the repository appears under the "Repository" drop-down menu. It does!

Define context when querying

When working with templates in the UI, in order to specify the endpoint in which a query needs to be performed we use the <semantic-context> component. For example the code for the OXLOD search page is encased like:

<semantic-context repository='exemplar'> <!-- note this line specifying the endpoint>
<div class="page">
  <div class='page__body'>
    <h1> OXLOD Search </h1>
...
  </div>
</div>
</semantic-context> <!-- this line closes the tag>

to enforce the use of the "exemplar" endpoint for these queries. It should be possible to use multiple endpoints in one template/query page by defining multiple <semantic-context> tags.

Also, loading resources (i.e. URIs) which only exist in one endpoint will not work unless the endpoint is specified as a parameter in the URL. E.g. this will not work if the default endpoint does not contain the URI: http:\\www.ashmus.ox.ac.uk\collections\361987

https://antheia-researchspace.oerc.ox.ac.uk/resource/?uri=http%3A%2F%2Fwww.ashmus.ox.ac.uk%2Fcollections%2F361987

This will work:

https://antheia-researchspace.oerc.ox.ac.uk/resource/?repository=exemplar&uri=http%3A%2F%2Fwww.ashmus.ox.ac.uk%2Fcollections%2F361987

Working with multiple sets

Option 1: multiple apps

Metaphacts is a base system which allows added apps (a.k.a. plugins). While metaphacts provides a set of templates by default, the apps provide alternative sets of templates. The file apps/<appname>/plugin.properties specifies how the template sets interact, i.e. whether one copies the other or overwrites onto other. More information on this interaction is needed. The options copy and overwite are not good labels. Setting the app templates to copy means that the default templates in /data/templates are persistent, i.e. not overwritten by the app templates. At the moment it is unclear how apps are prioritised. It is also unclear whether an endpoint can be matched to an app, given that the endpoints are managed in the base system. Questions for Mike:

  1. If I have two apps (app1 and app2) with two sets of templates and their plugin.properties files are set to:

    plugin.namespaceMergeStrategy=copy
    plugin.templateMergeStrategy=copy

when I edit the templates in the UI, which templates am I copying and modifying? App1 or app2.

Same with plugin.namespaceMergeStrategy=overwrite. Which templates overwrite the base metaphacts templates on restart?

  1. Is it possible to specify an endpoint as part of the app config? The folder config/repositories only exists for the base metaphacts system and not for the apps.

Option 2: named graphs

One way to specify different templates for different datasets is to use named graphs. In this case all data lives in the same endpoint and it is only separated by named graphs. Templates are chosen by the system based on the entity rdf:type. To figure out the entity type the system uses a SPARQL query. This is specified in templateIncludeQuery in ResearchSpace UI: System Settings → Configuration → UI configuration. By default this is:

SELECT ?type WHERE { ?? a ?type }

with ?? being the entity URI. This can change to include a named graph like:

SELECT ?type ?g WHERE {
  GRAPH ?g {
    ?? a ?type
  }
}

which should create different options for templates to be edited according to entity type and according to graph. This last query does not work. More experimentation is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant