Skip to content

Commit

Permalink
Finalize filefetcher custom processor DX (#4074)
Browse files Browse the repository at this point in the history
  • Loading branch information
paul-m authored Dec 14, 2023
1 parent 554f2c4 commit 00bfcb5
Show file tree
Hide file tree
Showing 13 changed files with 308 additions and 100 deletions.
2 changes: 1 addition & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"fmizzell/maquina": "^1.1.1",
"getdkan/contracts": "^1.0.0",
"getdkan/csv-parser": "^1.3.0",
"getdkan/file-fetcher" : "^5.0.0",
"getdkan/file-fetcher" : "^5.0.2",
"getdkan/harvest": "^1.0.0",
"getdkan/procrastinator": "^5.0.0",
"getdkan/rooted-json-data": "^0.1",
Expand Down
66 changes: 66 additions & 0 deletions docs/source/user-guide/guide_file_fetchers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
Implementing a custom file fetcher
----------------------------------

DKAN uses a library called [getdkan/file-fetcher](https://github.com/GetDKAN/file-fetcher). This library allows developers to extend the file transfer functionality for their specialized needs.

This library is used to download a resource, such as a CSV file, so that it can be loaded into the database and presented through the UI and API. This process is called "localization," because the source resource is copied to the local file system. Usually, this downloaded copy is temporary and is eventually removed.

The standard file fetcher processors will probably be adequate for most uses, but there could be other use cases, such as needing to authenticate, or getting a file from S3 instead of HTTP.

In cases such as these, we might want to add our own processor class to extend the file fetcher functionality.

How-to:
=======

Note that a code example can be found in the ``custom_processor_test`` module, which is used to test this functionality.

Create a file processor class
_____________________________

To implement a new file processor, a create a custom file fetcher processor class. This class could extend ``FileFetcher\Processor\Remote`` or ``FileFetcher\Processor\Local``, or be a totally new implementation of ``FileFetcher\Processor\ProcessorInterface``.

Create a FileFetcherFactory
___________________________

Next, create a new file fetcher factory class. This class should emulate ``Drupal\common\FileFetcher\FileFetcherFactory``. There is example code in the ``custom_processor_test`` module which demonstrates how to do this.

The new factory should create and configure a ``FileFetcher\FileFetcher`` object to use your new custom processor. Do this by merging configuration for your new processor into the ``$config['processors']`` array that is passed to ``FileFetcherFactory::getInstance()``:

.. code-block:: php
public function getInstance(string $identifier, array $config = []) {
// Add OurProcessor as a custom processor.
$config['processors'] = array_merge(
[OurProcessor::class],
$config['processors'] ?? []
);
// Get the instance from the decorated factory, using our modified config.
return $this->decoratedFactory->getInstance($identifier, $config);
}
Declare your factory as a service
_________________________________

It is also very important to declare your new factory class as a service. You accomplish this by decorating ``dkan.common.file_fetcher`` in your module's ``*.services.yml`` file, something like this:

.. code-block:: yaml
our_module.file_fetcher:
class: Drupal\our_module\FileFetcher\FileFetcherFactory
decorates: dkan.common.file_fetcher
arguments: ['@our_module.file_fetcher.inner']
Now whenever DKAN uses the ``dkan.common.file_fetcher`` service, your file fetcher factory will be used instead, and your new processor will find its way into use.

Processor negotiation
=====================

It's important to know how ``FileFetcher`` goes about choosing a processor.

File fetcher knows about two processors by default: ``FileFetcher\Processor\Local`` and ``FileFetcher\Processor\Remote``. It also knows about whichever custom processor class names you configured in the ``processors`` array in configuration.

When you ask a file fetcher object to perform the transfer (using ``FileFetcher::run()``), it will instantiate all the different types of processors it knows about.

Then it will loop through them and use the ``ProcessorInterface::isServerCompatible()`` method to determine if the given ``source`` is suitable for use with that processor object. The file fetcher will use the first processor that answers ``true``.

You can look at the implementations of ``FileFetcher\Processor\Local::isServerCompatible()`` or ``FileFetcher\Processor\Remote::isServerCompatible()`` to see how they each handle the question of whether they're suitable for the ``source``.
1 change: 1 addition & 0 deletions docs/source/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ User Guide
guide_harvest
guide_queues
guide_local_files
guide_file_fetchers
guide_revisions
66 changes: 2 additions & 64 deletions modules/common/src/FileFetcher/DkanFileFetcher.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,72 +5,10 @@
use FileFetcher\FileFetcher;

/**
* Allows FileFetcher to be reconfigured for using existing local files.
* Formerly needed to support using local files with FileFetcher.
*
* This DKAN-specific extension of the FileFetcher (which comes from an
* external library) applies the DKAN configuration
* common.settings.always_use_existing_local_perspective
* when selecting the processor. The configuration itself is retrieved
* in FileFetcherFactory and passed to DkanFileFetcher on getInstance().
* @deprecated Uneeded. Use FileFetcher.
*/
class DkanFileFetcher extends FileFetcher {

/**
* Tell this file fetcher whether to use local files if they exist.
*
* @param bool $use_local_file
* (optional) Whether to use the local file. If TRUE, we'll use the file
* processor that prefers to use local files. Defaults to TRUE.
*
* @return self
* Fluent interface.
*
* @see https://dkan.readthedocs.io/en/2.x/user-guide/guide_local_files.html
*/
public function setAlwaysUseExistingLocalPerspective(bool $use_local_file = TRUE) : self {
// @todo Re-computing the custom processor classes should be in another
// method that is in the parent class.
if ($use_local_file) {
$this->dkanUseLocalFileProcessor();
}
else {
$this->dkanUseDefaultFileProcessor();
}
return $this;
}

/**
* Configure the processor to respect the local file if it already exists.
*/
protected function dkanUseLocalFileProcessor() {
// Set the state/config to use our remote class.
$this->setProcessors(['processors' => [FileFetcherRemoteUseExisting::class]]);
$this->setStateProperty('processor', FileFetcherRemoteUseExisting::class);
// At this very early stage, update the status if the file already exists.
/** @var \Drupal\common\FileFetcher\FileFetcherRemoteUseExisting $processor */
$processor = $this->getProcessor();
$existing_status = $processor->discoverStatusForExistingFile(
$this->getState(),
$this->getResult()
);
$this->setState($existing_status['state']);
}

/**
* Configure the processor to use its default behavior.
*/
protected function dkanUseDefaultFileProcessor() {
// @todo This ignores any other custom processor classes that might have
// been configured. Improve this situation.
$this->customProcessorClasses = [];
$state = $this->getState();
foreach ($this->getProcessors() as $processor) {
if ($processor->isServerCompatible($state)) {
$state['processor'] = get_class($processor);
break;
}
}
$this->getResult()->setData(json_encode($state));
}

}
32 changes: 23 additions & 9 deletions modules/common/src/FileFetcher/FileFetcherFactory.php
Original file line number Diff line number Diff line change
Expand Up @@ -48,18 +48,32 @@ public function __construct(JobStoreFactory $jobStoreFactory, ConfigFactoryInter
* {@inheritDoc}
*/
public function getInstance(string $identifier, array $config = []) {
$config = array_merge($this->configDefault, $config);
$file_fetcher = DkanFileFetcher::get(
return FileFetcher::get(
$identifier,
$this->jobStoreFactory->getInstance(FileFetcher::class),
$config
);
// Inject our special configuration into the file fetcher, so it can use
// local files rather than re-downloading them.
$file_fetcher->setAlwaysUseExistingLocalPerspective(
(bool) $this->dkanConfig->get('always_use_existing_local_perspective')
$this->getFileFetcherConfig($config)
);
return $file_fetcher;
}

/**
* Adjust the provided config for our defaults and DKAN configuration.
*
* @param array $config
* Configuration provided by the caller to getInstance().
*
* @return array
* Modified configuration array.
*/
protected function getFileFetcherConfig(array $config): array {
// Merge in our defaults.
$config = array_merge($this->configDefault, $config);
// Add our special custom processor to the config if we're configured to
// always use the local perspective file.
if ($this->dkanConfig->get('always_use_existing_local_perspective')) {
$processors = [FileFetcherRemoteUseExisting::class] + ($config['processors'] ?? []);
$config['processors'] = $processors;
}
return $config;
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: 'Processor API Test'
type: module
description: 'Support module for testing custom processor APIs.'
package: Testing
core_version_requirement: ^9 || ^10
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
services:
custom_processor_test.file_fetcher:
class: Drupal\custom_processor_test\FileFetcher\CustomFileFetcherFactory
decorates: dkan.common.file_fetcher
arguments: ['@custom_processor_test.file_fetcher.inner']
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<?php

namespace Drupal\custom_processor_test\FileFetcher;

use Contracts\FactoryInterface;

/**
* Creates new file fetcher objects with NonProcessor as a custom processor.
*
* @see modules/common/tests/modules/custom_processor_test/custom_processor_test.services.yml
* @see \Drupal\custom_processor_test\FileFetcher\NonProcessor
*/
class CustomFileFetcherFactory implements FactoryInterface {

/**
* The decorated file factory service object.
*
* @var \Contracts\FactoryInterface
*/
protected $decoratedFactory;

/**
* Constructor.
*
* @param \Contracts\FactoryInterface $decoratedFactory
* The decorated file factory service object.
*/
public function __construct(FactoryInterface $decoratedFactory) {
$this->decoratedFactory = $decoratedFactory;
}

/**
* {@inheritDoc}
*/
public function getInstance(string $identifier, array $config = []) {
// Add NonProcessor as a custom processor.
$config['processors'] = array_merge(
[NonProcessor::class],
$config['processors'] ?? []
);
// Get the instance from the decorated factory, using our modified config.
return $this->decoratedFactory->getInstance($identifier, $config);
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?php

namespace Drupal\custom_processor_test\FileFetcher;

use FileFetcher\Processor\ProcessorInterface;
use Procrastinator\Result;

class NonProcessor implements ProcessorInterface {

public function isServerCompatible(array $state): bool {
return FALSE;
}

public function setupState(array $state): array {
return $state;
}

public function copy(array $state, Result $result, int $timeLimit = PHP_INT_MAX): array {
$result->setStatus(Result::DONE);
return [$state, $result];
}

public function isTimeLimitIncompatible(): bool {
return FALSE;
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?php

namespace Drupal\custom_processor_test\FileFetcher;

use FileFetcher\Processor\ProcessorInterface;
use Procrastinator\Result;

class YesProcessor implements ProcessorInterface {

public function isServerCompatible(array $state): bool {
return TRUE;
}

public function setupState(array $state): array {
return $state;
}

public function copy(array $state, Result $result, int $timeLimit = PHP_INT_MAX): array {
$result->setStatus(Result::DONE);
return [$state, $result];
}

public function isTimeLimitIncompatible(): bool {
return FALSE;
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
<?php

namespace Drupal\Tests\common\Kernel\FileFetcher;

use Drupal\KernelTests\KernelTestBase;
use Drupal\custom_processor_test\FileFetcher\CustomFileFetcherFactory;
use Drupal\custom_processor_test\FileFetcher\NonProcessor;
use Drupal\custom_processor_test\FileFetcher\YesProcessor;
use FileFetcher\FileFetcher;

/**
* Ensures custom processor API is working.
*
* @group dkan
* @group common
* @group kernel
*/
class CustomProcessorTest extends KernelTestBase {

protected static $modules = [
'common',
'custom_processor_test',
];

public function test() {
$identifier = 'my_identifier';
// Services from custom_processor_test module should decorate
// dkan.common.file_fetcher so that we get the custom file fetcher instead.
$factory = $this->container->get('dkan.common.file_fetcher');
$this->assertInstanceOf(CustomFileFetcherFactory::class, $factory);

/** @var \FileFetcher\FileFetcher $instance */
$instance = $factory->getInstance($identifier, ['filePath' => 'asdf']);
$this->assertInstanceOf(FileFetcher::class, $instance);

$ref_custom_processors = new \ReflectionProperty($instance, 'customProcessorClasses');
$ref_custom_processors->setAccessible(TRUE);

// NonProcessor is always set by our custom file fetcher factory.
$this->assertContains(
NonProcessor::class,
$ref_custom_processors->getValue($instance)
);

$ref_get_processor = new \ReflectionMethod($instance, 'getProcessor');
$ref_get_processor->setAccessible(TRUE);

// NonProcessor will not process because it always returns false from
// isServerCompatible(). Also our file path of 'asdf' results in false from
// isServerCompatible() from the default processors as well, so we get NULL.
$this->assertNull($ref_get_processor->invoke($instance));

// Gather a file fetcher again, specifying another custom processor.
$instance = $factory->getInstance($identifier, [
'filePath' => 'asdf',
'processors' => [YesProcessor::class],
]);

// Both custom processors are still available because our factory always
// specifies NonProcessor in addition to whatever is in $config.
$this->assertEquals([
NonProcessor::class,
YesProcessor::class,
], $ref_custom_processors->getValue($instance));
$this->assertInstanceOf(
YesProcessor::class,
$ref_get_processor->invoke($instance)
);

// Gather a third time, and now don't specify a custom processor.
$instance = $factory->getInstance($identifier, ['filePath' => 'asdf']);

// NonProcessor and YesProcessor are still available because they're both
// serialized into the jobstore for this file fetcher.
$this->assertEquals([
NonProcessor::class,
YesProcessor::class,
], $ref_custom_processors->getValue($instance));
// Processor will still be YesProcessor because NonProcessor always returns
// false for isServerCompatible() which rules it out.
$this->assertInstanceOf(YesProcessor::class, $ref_get_processor->invoke($instance));
}

}
Loading

0 comments on commit 00bfcb5

Please sign in to comment.