Flowbots

Flowbots is a Ruby-based document processing system designed to handle various file types with an emphasis on text analysis and content extraction.

Features

Unified Document Processing: Handles multiple file types including:

Markdown (with YAML frontmatter)
Structured text (JSON, JSONL, CSV)
Documents (PDF)
Media files (Audio, Video)
Images

Content Analysis:

Markdown section extraction
YAML frontmatter parsing
File metadata collection
Content type detection
Text statistics

Ohm Data Models

Flowbots uses Ohm, an object-hash mapping library for Redis, to manage its data models. Ohm provides a flexible and efficient way to store and retrieve structured data in Redis.

Document Model

Primary model for handling all types of content with a unified structure.

Document
├── Attributes
│   ├── path        # File path
│   ├── name        # File name
│   ├── type        # File type (e.g., text, audio, video)
│   ├── mime        # MIME type
│   ├── extension   # File extension
│   ├── size        # File size
│   ├── mtime       # Last modification time
│   ├── ctime       # Creation time
│   ├── checksum    # File checksum
│   ├── collection  # Associated collection name
│   ├── content     # Main content
│   └── metadata    # Hash of file metadata
│
├── Indices
│   ├── path
│   ├── name
│   ├── type
│   ├── collection
│   └── checksum
│
└── Methods
    ├── paragraphs()
    ├── sentences()
    ├── words(conditions)
    └── topics()

Content Structure Models

Paragraph, Sentence, Word

These models remain largely unchanged, but now reference the Document model instead of TextFile.

Media Models

AudioFile, Segment, SegmentText

These models remain largely unchanged, but now reference the Document model instead of Item.

Collection Management

Collection

Organizes related content.

Collection
├── Attributes
│   └── name        # Collection name
│
└── Sets
    └── documents   # Document references

Topic Organization

Topic

Manages content relationships and categorization.

Topic
├── Attributes
│   ├── name        # Topic name
│   ├── description # Topic description
│   └── vector      # Topic vector representation
│
└── Collections
    ├── documents   # Associated Documents
    ├── segments    # Associated Segments
    ├── paragraphs  # Associated Paragraphs
    ├── sentences   # Associated Sentences
    └── phrases     # Associated Phrases

Data Flow Example

# Creating a document with content
document = Document.create(
  path: "/path/to/file.txt",
  name: "file.txt",
  type: "text",
  content: "Some content",
  metadata: { source: "import" }
)

# Processing content
preprocessor = PreprocessDocument.new(document.id)
preprocessor.execute

# Adding topic classification
topic = Topic.find_or_create(name: "Example Topic")
document.update(metadata: document.metadata.merge(topics: [topic.id]))

# Processing media content
audio_document = Document.create(
  path: "/path/to/audio.mp3",
  name: "audio.mp3",
  type: "audio"
)
# Process audio content...

This data structure enables:

Unified content organization
Rich text analysis and processing
Media content segmentation
Topic classification and relationships
Flexible content relationships

[The rest of the README content remains unchanged]

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github		.github
assets		assets
bin		bin
characters		characters
compression_results		compression_results
dify		dify
docker		docker
exception_reports		exception_reports
exe		exe
flowise		flowise
lib		lib
llm_analysis		llm_analysis
nano-bots @ 19adfc7		nano-bots @ 19adfc7
notebooks		notebooks
scripts		scripts
test		test
.aiexclude		.aiexclude
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.prettierignore		.prettierignore
.rubocop.yml		.rubocop.yml
.ruby-gemset		.ruby-gemset
.ruby-version		.ruby-version
.tool-versions		.tool-versions
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
appmap.yml		appmap.yml
docker-compose.yml		docker-compose.yml
flowbots.gemspec		flowbots.gemspec
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flowbots

Features

Unified Document Processing: Handles multiple file types including:

Content Analysis:

Ohm Data Models

Document Model

Content Structure Models

Paragraph, Sentence, Word

Media Models

AudioFile, Segment, SegmentText

Collection Management

Collection

Topic Organization

Topic

Data Flow Example

About

Releases

Packages

Languages

License

b08x/flowbots

Folders and files

Latest commit

History

Repository files navigation

Flowbots

Features

Unified Document Processing: Handles multiple file types including:

Content Analysis:

Ohm Data Models

Document Model

Content Structure Models

Paragraph, Sentence, Word

Media Models

AudioFile, Segment, SegmentText

Collection Management

Collection

Topic Organization

Topic

Data Flow Example

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages