Repository API

This API is built on top of Git and Git-LFS.

Renku repository management.

class renku.core.management.LocalClient(path=<function default_path>, renku_home='.renku', parent=None, commit_activity_cache=NOTHING, activity_index=None, remote_cache=NOTHING, migration_type=<MigrationType.ALL: 7>, external_storage_requested=True, *, data_dir='data')[source]

A low-level client for communicating with a local Renku repository.

Datasets

Client for handling datasets.

class renku.core.management.datasets.DatasetsApiMixin[source]

Client for handling datasets.

CACHE = 'cache'

Directory to cache transient data.

DATASETS = 'datasets'

Directory for storing dataset metadata in Renku.

DATASETS_PROVENANCE = 'dataset.json'

File for storing datasets’ provenance.

DATASET_IMAGES = 'dataset_images'

Directory for dataset images.

POINTERS = 'pointers'

Directory for storing external pointer files.

add_data_to_dataset(dataset, urls, force=False, overwrite=False, sources=(), destination='', ref=None, external=False, extract=False, all_at_once=False, destination_names=None, repository=None, clear_files_before=False)[source]

Import the data into the data directory.

clear_temporary_datasets_path()[source]

Clear path to Renku dataset metadata directory.

create_dataset(name=None, title=None, description=None, creators=None, keywords=None, images=None, safe_image_paths=None, update_provenance=True)[source]

Create a dataset.

property datasets

A map from datasets name to datasets.

property datasets_provenance_path

Path to store activity files.

static get_dataset(name, strict=False, immutable=False)Optional[renku.core.models.dataset.Dataset][source]

Load dataset reference file.

has_external_files()[source]

Return True if project has external files.

is_external_file(path)[source]

Checks if a path within repo is an external file.

is_protected_path(path)[source]

Checks if a path is a protected path.

is_using_temporary_datasets_path()[source]

Return true if temporary datasets path is set.

move_files(files, to_dataset)[source]

Move files and their metadata from one or more datasets to a target dataset.

prepare_git_repo(url, ref=None, gitlab_token=None, renku_token=None, deployment_hostname=None, depth=1)[source]

Clone and cache a Git repo.

static remove_file(filepath)[source]

Remove a file/symlink and its pointer file (for external files).

property renku_dataset_images_path

Return a Path instance of Renku dataset metadata folder.

property renku_datasets_path

Return a Path instance of Renku dataset metadata folder.

property renku_pointers_path

Return a Path instance of Renku pointer files folder.

set_dataset_images(dataset: renku.core.models.dataset.Dataset, images, safe_image_paths=None)[source]

Set the images on a dataset.

set_temporary_datasets_path(path)[source]

Set path to Renku dataset metadata directory.

update_dataset_git_files(files: List[renku.core.metadata.immutable.DynamicProxy], ref, delete=False)[source]

Update files and dataset metadata according to their remotes.

Parameters
  • files – List of files to be updated

  • delete – Indicates whether to delete files or not

Returns

List of files that should be deleted

update_dataset_local_files(records: List[renku.core.metadata.immutable.DynamicProxy], delete=False)[source]

Update files metadata from the git history.

update_external_files(records: List[renku.core.metadata.immutable.DynamicProxy])[source]

Update files linked to external storage.

with_dataset(database_dispatcher: renku.core.management.interface.database_dispatcher.IDatabaseDispatcher, name: str = None, create: bool = False, commit_database: bool = False, creator: renku.core.models.provenance.agent.Person = None)[source]

Yield an editable metadata object for a dataset.

Repository

Client for handling a local repository.

class renku.core.management.repository.PathMixin(path=<function default_path>)[source]

Define a default path attribute.

class renku.core.management.repository.RepositoryApiMixin(renku_home='.renku', parent=None, commit_activity_cache=NOTHING, activity_index=None, remote_cache=NOTHING, migration_type=<MigrationType.ALL: 7>, *, data_dir='data')[source]

Client for handling a local repository.

ACTIVITY_INDEX = 'activity_index.yaml'

Caches activities that generated a path.

DATABASE_PATH: str = 'metadata'

Directory for metadata storage.

DEPENDENCY_GRAPH = 'dependency.json'

File for storing dependency graph.

DOCKERFILE = 'Dockerfile'

Name of the Dockerfile in the repo.

LOCK_SUFFIX = '.lock'

Default suffix for Renku lock file.

PROVENANCE_GRAPH = 'provenance.json'

File for storing ProvenanceGraph.

WORKFLOW = 'workflow'

Directory for storing workflow in Renku.

activities_for_paths(paths, file_commit=None, revision='HEAD')[source]

Get all activities involving a path.

property activity_index_path

Path to the activity filepath cache.

add_to_activity_index(activity)[source]

Add an activity and it’s generations to the cache.

property cwl_prefix[source]

Return a CWL prefix.

data_dir

Define a name of the folder for storing datasets.

property database_path

Path to the metadata storage directory.

property dependency_graph_path

Path to the dependency graph file.

property docker_path

Path to the Dockerfile.

find_previous_commit(paths, revision='HEAD', return_first=False, full=False)[source]

Return a previous commit for a given path starting from revision.

Parameters
  • revision – revision to start from, defaults to HEAD

  • return_first – show the first commit in the history

  • full – return full history

Raises

KeyError – if path is not present in the given commit

get_template_files(template_path, metadata)[source]

Gets paths in a rendered renku template.

has_graph_files()[source]

Return true if database exists.

import_from_template(template_path, metadata, force=False)[source]

Render template files from a template directory.

init_repository(force=False, user=None, initial_branch=None)[source]

Initialize an empty Renku repository.

is_project_set()[source]

Return if project is set for the client.

is_workflow(path)[source]

Check if the path is a valid CWL file.

property latest_agent

Returns latest agent version used in the repository.

property lock

Create a Renku config lock.

property migration_type

Type of migration that is being executed on this client.

parent

Store a pointer to the parent repository.

property path_activity_cache

Cache of all activities and their generated paths.

process_commit(commit=None, path=None)[source]

Build an Activity.

Parameters
  • commit – Commit to process. (default: HEAD)

  • path – Process a specific CWL file.

property project

Return the Project instance.

property provenance_graph_path

Path to store activity files.

property remote

Return host, owner and name of the remote if it exists.

remove_graph_files()[source]

Remove all graph files.

renku_home

Define a name of the Renku folder (default: .renku).

renku_path

Store a Path instance of the Renku folder.

resolve_in_submodules(commit, path)[source]

Resolve filename in submodules.

subclients(parent_commit)[source]

Return mapping from submodule to client.

property submodules[source]

Return list of submodules it belongs to.

property template_checksums

Return a Path instance to the template checksums file.

with_metadata(project_gateway: renku.core.management.interface.project_gateway.IProjectGateway, database_gateway: renku.core.management.interface.database_gateway.IDatabaseGateway, read_only=False, name=None)[source]

Yield an editable metadata object.

property workflow_names[source]

Return index of workflow names.

property workflow_path

Return a Path instance of the workflow folder.

renku.core.management.repository.default_path(path='.')[source]

Return default repository path.

renku.core.management.repository.path_converter(path)[source]

Converter for path in PathMixin.

Git Internals

Wrap Git client.

class renku.core.management.git.GitCore[source]

Wrap Git client.

property candidate_paths

Return all paths in the index and untracked files.

commit(commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True, skip_dirty_checks=False)[source]

Automatic commit.

property dirty_paths

Get paths of dirty files in the repository.

ensure_clean(ignore_std_streams=False)[source]

Make sure the repository is clean.

ensure_unstaged(path)[source]

Ensure that path is not part of git staged files.

ensure_untracked(path)[source]

Ensure that path is not part of git untracked files.

find_attr(*paths)[source]

Return map with path and its attributes.

find_ignored_paths(*paths)[source]

Return ignored paths matching .gitignore file.

property modified_paths

Return paths of modified files.

remove_unmodified(paths, autocommit=True)[source]

Remove unmodified paths and return their names.

repo

Store an instance of the Git repository.

setup_credential_helper()[source]

Setup git credential helper to cache if not set already.

worktree(path=None, branch_name=None, commit=None, merge_args=('--ff-only'))[source]

Create new worktree.

renku.core.management.git.finalize_commit(client, diff_before, commit_only=None, commit_empty=True, raise_if_empty=False, commit_message=None, abbreviate_message=True)[source]

Commit modified/added paths.

renku.core.management.git.finalize_worktree(client, isolation, path, branch_name, delete, new_branch, merge_args=('--ff-only'), exception=None)[source]

Cleanup and merge a previously created Git worktree.

renku.core.management.git.get_mapped_std_streams(lookup_paths, streams=('stdin', 'stdout', 'stderr'))[source]

Get a mapping of standard streams to given paths.

renku.core.management.git.prepare_commit(client, commit_only=None, skip_dirty_checks=False)[source]

Gather information about repo needed for committing later on.

renku.core.management.git.prepare_worktree(original_client, path=None, branch_name=None, commit=None)[source]

Set up a Git worktree to provide isolation.

Git utilities.

class renku.core.models.git.GitURL(href, pathname=None, protocol='ssh', hostname='localhost', username=None, password=None, port=None, owner=None, name=None, regex=None)[source]

Parser for common Git URLs.

property image

Return image name.

classmethod parse(href)[source]

Derive URI components.

class renku.core.models.git.Range(start, stop)[source]

Represent parsed Git revision as an interval.

classmethod rev_parse(git, revision)[source]

Parse revision string.

renku.core.models.git.filter_repo_name(repo_name)[source]

Remove the .git extension from the repo name.

renku.core.models.git.get_user_info(git)[source]

Get Git repository’s owner name and email.