Crazy idea: changes in WC should share an API with changes in repository

Discussion:

Julian Foad

2018-11-09 10:56:12 UTC

The WC's main job is to compose a new revision.

While the WC also supports merge conflict resolution, a mixed-revision base state, commit of selected parts, and many other additional features, the fundamental purpose of the WC is to help the user prepare, review and commit a set of changes which will create a new revision in the repository.

On the repository side is a similar but simpler mechanism. The FS "transaction" API, designed for programmatic rather than human users, allows the system to prepare and commit a set of changes built upon an existing revision, and commit the result as a new revision.

It is really quite important that WC modifications and FS transactions have exactly compatible semantics for the changes that they represent.

It is really quite important for the feasibility of writing higher level code such as shelving, that WC modifications can be read and written by a common, abstract, interface. That is, an interface definition which enables a 'copy' function to plug an input interface to an output interface and push all the changes through the pipe.

To better support these needs, changes in the WC should share an API with changes in the repository.

* WC mods API := (basic changes API) + (lots of WC-specific API)
* FS txn API := (basic changes API) + (some FS-specific API)

Is this such a crazy idea?

[ image: Loading Image...

]

There are two levels at which we can perform this API refactoring. First, streamy changes via the delta editor API. The repository side already has a commit editor for input and a 'replay' edit-driver for output of the changes in one revision. The WC already has a commit edit-driver for output, but no editor for receiving modifications. It needs one.

Second, underneath the delta editor APIs on the FS side is a random-access API for reading and writing the modifications in a transaction: 'txn_vtable_t'. On the WC side we should be able to use the same API as a base, minus the few FS-specific bits, and extended with lots of WC-specific features.

The common APIs for basic changes could be:

* basic changes API (streamy): delta-editor.
* basic changes API (random-access): most of root_vtable_t.

Also the WC base layer corresponds to the base revision of a FS transaction, again with a lot of WC-specific extensions. Again, common APIs should be used as the base API for reading and writing that base, extended by a WC-specific companion API. In the FS API, the same vtable is used for a rev-root as for a txn-root; each method that does not make sense on a revision returns an error at run-time. In the WC, the base layer is modifiable, albeit with its own semantics.

The common APIs for the WC base layer and the FS txn base revision could be:

* basic base-layer API (random-access): most of root_vtable_t.

For the base layer we can talk about a streamy base-layer creation API for checkout, a WC-shape/layout/viewspec API, and so on, but let's start with the APIs for reading and writing the working modifications layer.

== WC streamy input/output editor APIs ==

- WC replay delta
- drives a delta-editor, like 'commit' does
- thin wrapper around 'harvest_committables' and 'svn_client__do_commit'
- will be used for 'shelve'

- WC replay wc metadata
- transmits WC-specific local-mods metadata
- a streamy companion to wc-replay-delta

- WC delta editor
- receives and applies modifications into the WC local-mods
- expects unmodified WC states: no merging except trivial A-or-B merges
- write from scratch
- will be used for 'unshelve'
- use it for all existing WC modification ops ('svn add', 'svn propset', etc.)
- what special features does it need, that existing ops expect?
- ...
- add those features in wrappers where possible, else internally

- WC-specific metadata editor
- applies WC-specific local-mods metadata
- a streamy companion to WC delta editor

Definition of WC-specific local-mods metadata API:
- includes: conflicts, missing/obstructed, ...

I am starting with this streamy I/O layer because I can use it to improve shelving.

The "WC replay delta" is simple and about ready to commit. Implementation of the "WC delta editor" is in progress. I will now look into designing the streamy WC metadata APIs.

--
- Julian
[ Also posted at: https://blog.foad.me.uk/2018/11/09/svn-wc-repo-should-share-a-changes-api/ ]

Branko Čibej

2018-11-09 11:31:42 UTC

Permalink

Post by Julian Foad
Is this such a crazy idea?

Not at all. The WC-NG with its multiple op-depths behaves like a
limited-history repository. Your picture does lack the op-depth part,
though; there are N layers between BASE and WORKING, unlike in the
filesystem, where we always work against a single (base) revision.
(Well, almost always ... the update report is more complex.)

Also note that the working copy may have switched subtrees, or
individual files, which IIRC are represented in the BASE, so that would
make WC replay somewhat different from repository replay.

Post by Julian Foad
I am starting with this streamy I/O layer because I can use it to improve shelving.
The "WC replay delta" is simple and about ready to commit. Implementation of the "WC delta editor" is in progress. I will now look into designing the streamy WC metadata APIs.

I understand these APIs could be reused for both shelving and
driving/being driven by the RA layer. Could any of them be used by
libsvn_client for local working copy modifications? Also is this "just"
an additional layer in svn_wc.h, or do you expect to deprecate swathes
of the remaining public WC APIs as a result?

-- Brane

Julian Foad

2018-11-09 11:47:15 UTC

Permalink

Post by Branko Äibej
The WC-NG with its multiple op-depths behaves like a
limited-history repository. Your picture does lack the op-depth part,
though; there are N layers between BASE and WORKING, unlike in the
filesystem, where we always work against a single (base) revision.

I don't think that's an accurate description. The WC layers provide the repository base revisions of nested copies, like a cache of the referenced disparate bits of the repository history. The FS txn also supports nested copies, storing references to their bases in other revisions where the data can be found. In each cases the "modification" layer is built by reference to one main base plus zero or more copy-bases. In each case the relevant "base" API must support access to both the main base (which is mixed-rev in the case of WC) and those copy-bases.

Post by Branko Äibej
Also note that the working copy may have switched subtrees, or
individual files, which IIRC are represented in the BASE, so that would
make WC replay somewhat different from repository replay.

Yes, "switched" comes under the head of "WC specifics".

Post by Branko Äibej
I understand these APIs could be reused for both shelving and
driving/being driven by the RA layer. Could any of them be used by
libsvn_client for local working copy modifications?

Absolutely! In order to prove an implementation of the "WC modifications editor" I would expect to convert more or less *all* of the existing client commands to make their WC changes through this editor.

Post by Branko Äibej
Also is this "just"
an additional layer in svn_wc.h, or do you expect to deprecate swathes
of the remaining public WC APIs as a result?

I would expect to deprecate swathes of the existing WC APIs.

--
- Julian

Julian Foad

2018-11-09 17:17:23 UTC

Permalink

Post by Julian Foad
The "WC replay delta" is simple and about ready to commit.

r1846252.

Post by Julian Foad
Implementation of the "WC delta editor" is in progress.

Currently factoring out our "copy dir from repos to WC" implementation from these two places:

libsvn_client/copy.c : repos_to_wc_copy_single()
libsvn_client/conflicts.c : merge_incoming_added_dir_replace()

--
- Julian

Julian Foad

2018-12-03 18:22:03 UTC

Permalink

Post by Julian Foad

Post by Julian Foad
Implementation of the "WC delta editor" is in progress.

Currently factoring out our "copy dir from repos to WC" implementation

Some more notes on progress.

The way we handle "copy" into the WC is a beast. Untangling this is by far the most complex part of the whole exercise.

* Copy via Checkout

Copying a directory tree from the same repository into the WC is currently handled by performing a new checkout into a temporary WC, then running a WC-to-WC-copy from there to the target location, then deleting the remnants of that temporary WC. Ugh.

The essential purpose of the WC-to-WC-copy step is putting down a layer in the WC 'working' table to represent the pristine version (aka 'revert-base') of this copy.

The temporary WC, ugly though it is, is a fairly well hidden implementation detail and so not too worrying.

What is ugly is the way this procedure calls back to the RA layer using a full-blown 'checkout' procedure, scanning the WC, setting up a reporter, and using it to drive a new 'update' edit.

Instead, at this level in our WC editor, we already know there's an empty spot in the WC where the result is going to go, and we know the 'report' is of the trivial 'give me the whole subtree ***@REV' variety. We should be able to make a callback that doesn't require access to the WC nor the repository but just to whatever datastore the caller has available. The driver of this WC edit might be some shelving storage or might be another WC.

Requiring a callback to the RA layer connecting to The Repository might be a workable initial version, even though ultimately we would certainly not want to require repository access for shelving or unshelving or WC-to-WC copying.

* Single-file vs. Directory

Need to unify. Handling these two cases separately is the cause of a lot of complexity duplication and resulting bugs.

Potential solution: Some form of the old "anchor-and-target" idea. Create a generic editor wrapper that transforms a request for an editor rooted at a file (or a maybe-file) into a request for an editor rooted at its parent directory with operations performed on a single target entry.

* Foreign-repo copies:

Currently a separate code path. (Currently the only one that calls the new WC editor.)

Unify.

* Externals

r1847834: "Unify how 'copy' processes externals with and without pinning. ... Remove the optional 'externals' processing from inside the copy APIs, as there was already support for doing it outside. Previously, externals were fetched outside the 'copy' API if and only if some externals were to be pinned. Now we always use that code path. As a side effect, this makes the notifications consistent between the two cases."

* Mergeinfo

Mergeinfo is deleted when we are copying from a foreign repo. (Inconsistency bug found and fixed: SVN-4792.)

This should be re-implemented as a wrapper editor that just performs mergeinfo stripping.

--
- Julian

Daniel Shahaf

2018-11-10 00:53:01 UTC

Permalink

Post by Julian Foad
It is really quite important for the feasibility of writing higher level
code such as shelving, that WC modifications can be read and written by
a common, abstract, interface. That is, an interface definition which
enables a 'copy' function to plug an input interface to an output
interface and push all the changes through the pipe.
To better support these needs, changes in the WC should share an API
with changes in the repository.
* WC mods API := (basic changes API) + (lots of WC-specific API)
* FS txn API := (basic changes API) + (some FS-specific API)
Is this such a crazy idea?
http://blog.foad.me.uk/wp-content/uploads/2018/11/20181108-SvnWcAPIs.png ]
There are two levels at which we can perform this API refactoring.
First, streamy changes via the delta editor API. The repository side
already has a commit editor for input and a 'replay' edit-driver for
output of the changes in one revision. The WC already has a commit edit-
driver for output, but no editor for receiving modifications. It needs
one.

I assume you mean here an editor for receiving modification to the
WORKING layer, since the update editor exists but modifies the BASE
layer, correct? That would also square with your clarification
elsethread that you'd make svn subcommands use that editor.

That sounds interesting. The editor interface has strict ordering
constraints, and if a sequence of 'svn' subcommands is to be
equivalent to a sequence of editor drives, that are then committed in
a *single* editor drive... that implies some sort of editor-drive-combiner
algebra. (From this point of view, 'svn commit foo' when both foo and
bar are modified is an editor-drive-splitter, reordering commits would
be commutativity¹, etc.)

The immediate implication is that wc commit logic could, in
principle, be generalized into a logic for combining successive
revisions in a dumpfile.

While we're looking at these two things, do we really need both the
reporter and the commit editor? Aren't they logically doing exactly the
same job (up to the reporter not uploading text and property deltas)?

Cheers,

Daniel

¹ #ifdef MATHEMATICIAN \n and associativity \n #endif

Greg Stein

2018-11-12 03:58:01 UTC

Permalink

Post by Julian Foad
...
Is this such a crazy idea?

Not at all. This is what Ev2 was supposed to do. Part of my work around
that was to start shifting code from the old delta-editor to Ev2. We have
shims already available to support that work. I'd suggest looking at Ev2
rather than creating Yet Another Editor.

Cheers,
-g

Julian Foad

2018-11-12 15:21:39 UTC

Permalink

Post by Greg Stein

Post by Julian Foad
Is this such a crazy idea?

Not at all. This is what Ev2 was supposed to do. Part of my work
around that was to start shifting code from the old delta-editor to
Ev2. We have shims already available to support that work. I'd suggest
looking at Ev2 rather than creating Yet Another Editor.

In this thread the main proposal is not to create a new or modified
delta editor but to start *using* the existing one. (And the additional
kinds of editor mentioned, such as for WC-specific metadata, are outside
the remit of Ev2.)
- Julian