Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
TheItCrOw authored Dec 15, 2024
1 parent 843a0f8 commit a1985b9
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,11 @@ UCE consists of several microservices, each dockerized and utilizing distinct te

| Microservice | Description |
|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| A: Corpus-Importer | UCE is based on Corpus-Importer, a Java application that reads UIMA-annotated documents from a specified path, along with a corresponding corpus-configuration JSON file. The importer extracts the raw data and the configured annotations, applying its own post-processing to set up the environment, which includes text segmentation, database indexing, keyword extraction, and the creation of various embedding spaces, before finally storing each processed document in a PostgreSQL database (B). </details> |
| B: Relational Database | As our primary database, we opted for a relational PostgreSQL database, as UCE requires a structured and standardized database schema that can be extended if necessary. Additionally, its compatibility with the pgvector extension enables efficient vector operations directly within the database engine. This allows us to store high-dimensional vector embeddings within relational data tables while also enabling fast vector operations and searches. </details> |
| C: Graph Database | In addition to a relational database (B), UCE utilizes an Apache Jena SPARQL database to incorporate basic semantic searches in the Resource Description Framework (RDF) and Web Ontology Language (OWL) data formats. This integration enables the incorporation of domain-specific ontologies (e.g., biological taxonomy) into the UCE environment, further enriching its search capabilities. </details> |
| D: Python Webserver | Within UCE, we also utilize a Python web service to provide an interface to machine learning and AI models, as these are primarily accessible through Python. In this context, the web server facilitates access to the generation of embedding vectors, their dimensionality reduction methods, such as t-SNE and PCA, and the inference of (Large) Language Models. The web server is accessible via a REST API and is utilized by services (A) and (E). </details> |
| E: UCE Web Portal | The user interacts with UCE and all of its features through a web portal implemented in Java. This service communicates with all other services except for (B), providing a variety of search methods, visualization features, and different ways to interact with the underlying information units, as outlined in detail in Section 3.2. </details> |
| **A**: Corpus-Importer | UCE is based on Corpus-Importer, a Java application that reads UIMA-annotated documents from a specified path, along with a corresponding corpus-configuration JSON file. The importer extracts the raw data and the configured annotations, applying its own post-processing to set up the environment, which includes text segmentation, database indexing, keyword extraction, and the creation of various embedding spaces, before finally storing each processed document in a PostgreSQL database (B). </details> |
| **B**: Relational Database | As our primary database, we opted for a relational PostgreSQL database, as UCE requires a structured and standardized database schema that can be extended if necessary. Additionally, its compatibility with the pgvector extension enables efficient vector operations directly within the database engine. This allows us to store high-dimensional vector embeddings within relational data tables while also enabling fast vector operations and searches. </details> |
| **C**: Graph Database | In addition to a relational database (B), UCE utilizes an Apache Jena SPARQL database to incorporate basic semantic searches in the Resource Description Framework (RDF) and Web Ontology Language (OWL) data formats. This integration enables the incorporation of domain-specific ontologies (e.g., biological taxonomy) into the UCE environment, further enriching its search capabilities. </details> |
| **D**: Python Webserver | Within UCE, we also utilize a Python web service to provide an interface to machine learning and AI models, as these are primarily accessible through Python. In this context, the web server facilitates access to the generation of embedding vectors, their dimensionality reduction methods, such as t-SNE and PCA, and the inference of (Large) Language Models. The web server is accessible via a REST API and is utilized by services (A) and (E). </details> |
| **E**: UCE Web Portal | The user interacts with UCE and all of its features through a web portal implemented in Java. This service communicates with all other services except for (B), providing a variety of search methods, visualization features, and different ways to interact with the underlying information units, as outlined in detail in Section 3.2. </details> |

## In Medias Res

Expand Down

0 comments on commit a1985b9

Please sign in to comment.