Discussion:
svn commit: r1640915 - in /subversion/trunk/subversion: include/svn_fs.h libsvn_fs_fs/fs_fs.c tests/libsvn_fs/fs-test.c
(too old to reply)
Branko Čibej
2014-11-21 14:15:12 UTC
Permalink
Author: ivan
Date: Fri Nov 21 13:41:34 2014
New Revision: 1640915
URL: http://svn.apache.org/r1640915
Add FS config option to control log-addressing feature for created
repository.
Can we *please* have some discussion about such changes on the dev@
list? I pointed this out last time when you ripped out mixed-mode
addressing, and there was no response and the discussion only happened
after the fact.

Why do you want to be able to turn off log addressing, when you can just
not use FSFSv7 in the first place?

-- Brane
Ivan Zhakov
2014-11-21 16:07:49 UTC
Permalink
Post by Branko Čibej
Author: ivan
Date: Fri Nov 21 13:41:34 2014
New Revision: 1640915
URL: http://svn.apache.org/r1640915
Add FS config option to control log-addressing feature for created
repository.
list?
What particular kind of changes you're talking about?

[...]
Post by Branko Čibej
Why do you want to be able to turn off log addressing, when you can just
not use FSFSv7 in the first place?
This is just a small addition to existing API like
SVN_FS_CONFIG_FSFS_SHARD_SIZE, SVN_FS_CONFIG_FSFS_BLOCK_READ,
SVN_FS_CONFIG_BDB_TXN_NOSYNC etc. It doesn't change user-visible
behavior and I'm planning to use this in the test suite.

If you have any concerns regarding this change, please let me know.
--
Ivan Zhakov
Branko Čibej
2014-11-21 16:34:17 UTC
Permalink
Post by Ivan Zhakov
Post by Branko Čibej
Author: ivan
Date: Fri Nov 21 13:41:34 2014
New Revision: 1640915
URL: http://svn.apache.org/r1640915
Add FS config option to control log-addressing feature for created
repository.
list?
What particular kind of changes you're talking about?
Changes that significantly affect some existing feature in trunk. In
other words, things that are not bug fixes, robustness fixes or similar.
Post by Ivan Zhakov
Post by Branko Čibej
Why do you want to be able to turn off log addressing, when you can just
not use FSFSv7 in the first place?
This is just a small addition to existing API like
SVN_FS_CONFIG_FSFS_SHARD_SIZE, SVN_FS_CONFIG_FSFS_BLOCK_READ,
SVN_FS_CONFIG_BDB_TXN_NOSYNC etc. It doesn't change user-visible
behavior and I'm planning to use this in the test suite.
If you have any concerns regarding this change, please let me know.
The parameters you mention are all performance-related knobs, so not
quite comparable to the change you made. Perhaps a better comparison is
the FS layout option (sharded vs. non-sharded, packed vs. unpacked).

Specifically, I'm looking for an explanation why you think it makes
sense to turn off log addressing in v7, as opposed to just using v6 if
one doesn't want log addressing. Certainly log addressing does change
user-visible behaviour; cold-cache performance is user-visible.

You said you're planning to use this parameter in the test suite ...
however, it since it doesn't make any sense to test something that we're
not planning to release, I have to assume that you intend this to be a
feature of FSFSv7 in 1.9 (and, consequently forever).

It's the "forever" without prior discussion here that I find
problematic. And so is yet another parameter that changes behaviour,
because: (a) we have to maintain it, and (b) it adds another dimension
to the already-too-large number of possible option combinations that we
need to test.

As a counter-example, I'd much prefer to drop the ability to create
non-sharded, non-packed FSFSv7 repositories; that would change the
number of possible v7 layouts from 4 (or 8, with r1640915 in place) to
just one, not counting shard sizes etc.

-- Brane
Branko Čibej
2014-11-22 19:03:08 UTC
Permalink
Post by Branko Čibej
As a counter-example, I'd much prefer to drop the ability to create
non-sharded, non-packed FSFSv7 repositories; that would change the
number of possible v7 layouts from 4 (or 8, with r1640915 in place) to
just one, not counting shard sizes etc.
To clarify, dropping non-sharded, non-packed layout options for FSFSv7
was not an option as long as we had support for mixed-mode addressing as
well. Now that we agreed to remove that, restricting FSFSv7 to only
sharded+packed layout sounds like a good idea to me, from the point of
view of reducing the number of possible layouts.

-- Brane
Julian Foad
2014-11-24 10:20:09 UTC
Permalink
Post by Branko Čibej
Post by Branko Čibej
As a counter-example, I'd much prefer to drop the ability to create
non-sharded, non-packed FSFSv7 repositories; that would change the
number of possible v7 layouts from 4 (or 8, with r1640915 in place) to
just one, not counting shard sizes etc.
To clarify, dropping non-sharded, non-packed layout options for FSFSv7
was not an option as long as we had support for mixed-mode addressing as
well. Now that we agreed to remove that, restricting FSFSv7 to only
sharded+packed layout sounds like a good idea to me, from the point of
view of reducing the number of possible layouts.
Firstly, as I understand it, sharded layout (with a given shard size) is a config option and applies to all or none of the revisions, whereas packing a shard is an asynchronous
activity and so there is no way to enforce that all or some or even any of the repository is packed.

Non-sharded layout is like a special case of sharded, with shard size = infinity and the intermediate directory level omitted. And it can't be packed, so we're talking about a much smaller impact than reducing "4 or 8" layouts to one.

Nevertheless I strongly support dropping any configuration options that aren't really useful, because reducing the number of configuration options does have a significant contribution to lowering the burdens of testing and support. I presume the non-sharded option has no significant real-world benefits over sharded with shard size = <more revisions than you'll ever have>.

Secondly, what are the upgrade options and issued when a user has a format 6 repo and upgrades to f7? Is it easy and cheap to upgrade non-sharded f6 to sharded f7, for example?

Thirdly, what do we mean by "supporting" or "dropping" an option? We don't currently have an option in "svnadmin create" to choose anything other than sharded with shard size = 1000. I believe the procedure to follow to get a different layout is to manually edit the "db/format" file after "svnadmin create" but before creating any revisions. But that's not a great system. If other layout options or shard sizes are "supported" we should allow them to be specified in "svnadmin create".

We could choose either to drop all the code for reading and writing the non-sharded layout (which probably amounts to only about ten lines of code), or just keep on omitting an option to create it but leave it supported if somebody manually creates it. The more important thing is not whether the code is there, but whether the option is included in our testing and our declared supported formats.

Thoughts?

- Julian
Branko Čibej
2014-11-24 10:45:14 UTC
Permalink
Post by Julian Foad
Post by Branko Čibej
Post by Branko Čibej
As a counter-example, I'd much prefer to drop the ability to create
non-sharded, non-packed FSFSv7 repositories; that would change the
number of possible v7 layouts from 4 (or 8, with r1640915 in place) to
just one, not counting shard sizes etc.
To clarify, dropping non-sharded, non-packed layout options for FSFSv7
was not an option as long as we had support for mixed-mode addressing as
well. Now that we agreed to remove that, restricting FSFSv7 to only
sharded+packed layout sounds like a good idea to me, from the point of
view of reducing the number of possible layouts.
Firstly, as I understand it, sharded layout (with a given shard size) is a config option and applies to all or none of the revisions, whereas packing a shard is an asynchronous
activity and so there is no way to enforce that all or some or even any of the repository is packed.
Non-sharded layout is like a special case of sharded, with shard size = infinity and the intermediate directory level omitted. And it can't be packed, so we're talking about a much smaller impact than reducing "4 or 8" layouts to one.
Nevertheless I strongly support dropping any configuration options that aren't really useful, because reducing the number of configuration options does have a significant contribution to lowering the burdens of testing and support. I presume the non-sharded option has no significant real-world benefits over sharded with shard size = <more revisions than you'll ever have>.
All of the above is how I understand these things, too.
Post by Julian Foad
Secondly, what are the upgrade options and issued when a user has a format 6 repo and upgrades to f7? Is it easy and cheap to upgrade non-sharded f6 to sharded f7, for example?
When the option of having mixed-mode addressing (i.e., having part of
the repository up to some revision use physical addressing, IOW FSFSv6
mode, and the rest using logical addressing, IOW FSFSv7 mode), there was
no way to directly upgrading v6 to v7 without a dump/load cycle.

With the latest change that introduces the option of having a v7
repository that does not use logical addressing, my best guess is that
direct v6-to-v7 upgrade is again possible; but I'm not 100% sure, and
I'm also not sure if doing so gives the user enough benefit over leaving
the repo in at v6 that it's worth supporting. Logical addressing is not
the only interesting feature of v7, but it's arguably the most visible one.
Post by Julian Foad
Thirdly, what do we mean by "supporting" or "dropping" an option? We don't currently have an option in "svnadmin create" to choose anything other than sharded with shard size = 1000. I believe the procedure to follow to get a different layout is to manually edit the "db/format" file after "svnadmin create" but before creating any revisions. But that's not a great system. If other layout options or shard sizes are "supported" we should allow them to be specified in "svnadmin create".
I can't quite agree with that. Other shard sizes are "supported" in the
sense that you /can/ edit db/format and get different-sized shards, or
no shards at all, etc. When I proposed to "drop" that feature for
FSFSv7, I meant the following:

* That FSFSv7 layout is always sharded, regardless of what db/format says.
* That it always supports packing, again, regardless of db/format.
* That shard sizes can still be adjusted in db/format (and/or at
creation time)

The above would imply that v7 reads the format and warns if the layout
is not supported, but other than that, just ignores the sharded/packed
options there.
Post by Julian Foad
We could choose either to drop all the code for reading and writing the non-sharded layout (which probably amounts to only about ten lines of code), or just keep on omitting an option to create it but leave it supported if somebody manually creates it.
Can't really do that, because we still support FSFSv6 with the same code
base.
Post by Julian Foad
The more important thing is not whether the code is there, but whether the option is included in our testing and our declared supported formats.
And/or whether a v7 repository ignores, or doesn't ignore, those
specific parameters in db/format.

-- Brane
Stefan Fuhrmann
2014-11-24 15:00:26 UTC
Permalink
Post by Branko Čibej
As a counter-example, I'd much prefer to drop the ability to create
non-sharded, non-packed FSFSv7 repositories; that would change the
number of possible v7 layouts from 4 (or 8, with r1640915 in place) to
just one, not counting shard sizes etc.
To clarify, dropping non-sharded, non-packed layout options for FSFSv7
was not an option as long as we had support for mixed-mode addressing as
well. Now that we agreed to remove that, restricting FSFSv7 to only
sharded+packed layout sounds like a good idea to me, from the point of
view of reducing the number of possible layouts.
Firstly, as I understand it, sharded layout (with a given shard size) is a config option and applies to all or none of the revisions, whereas packing a shard is an asynchronous
activity and so there is no way to enforce that all or some or even any of the repository is packed.
Non-sharded layout is like a special case of sharded, with shard size = infinity and the intermediate directory level omitted. And it can't be packed, so we're talking about a much smaller impact than reducing "4 or 8" layouts to one.
Nevertheless I strongly support dropping any configuration options that aren't really useful, because reducing the number of configuration options does have a significant contribution to lowering the burdens of testing and support. I presume the non-sharded option has no significant real-world benefits over sharded with shard size = <more revisions than you'll ever have>.
All of the above is how I understand these things, too.
It is incorrect in a few details. Sharding basically changes
the file paths on disk and there is little extra code to that
handles linear repos explicitly.

The complication is that we represent linear repos as
"shard size 0" internally while we use the shard size as
divisor in many places. So, we must check for 0, usually
treat it as meaning granularity "1" where it is not implied
that we do have a sharded repo (everything packed is
known sharded). Chances are that we miss instances
and cause a division by 0 for linear repos. This is very
different from infinitely large shard sizes.
Post by Branko Čibej
Secondly, what are the upgrade options and issued when a user has a format 6 repo and upgrades to f7? Is it easy and cheap to upgrade non-sharded f6 to sharded f7, for example?
When the option of having mixed-mode addressing (i.e., having part of the
repository up to some revision use physical addressing, IOW FSFSv6 mode,
and the rest using logical addressing, IOW FSFSv7 mode), there was no way
to directly upgrading v6 to v7 without a dump/load cycle.
With the latest change that introduces the option of having a v7
repository that does not use logical addressing, my best guess is that
direct v6-to-v7 upgrade is again possible; but I'm not 100% sure, and I'm
also not sure if doing so gives the user enough benefit over leaving the
repo in at v6 that it's worth supporting. Logical addressing is not the
only interesting feature of v7, but it's arguably the most visible one.
You could always upgrade directly from format 6 to 7.
Linear repos would never use log. addressing in that case
and sharded ones would use it starting with the next shard.
r1637184 has changed that to never using log. addressing
in upgraded repos.

Thirdly, what do we mean by "supporting" or "dropping" an option? We
don't currently have an option in "svnadmin create" to choose anything
other than sharded with shard size = 1000. I believe the procedure to
follow to get a different layout is to manually edit the "db/format"
file after "svnadmin create" but before creating any revisions. But
that's not a great system. If other layout options or shard sizes are
"supported" we should allow them to be specified in "svnadmin create".
Post by Branko Čibej
I can't quite agree with that. Other shard sizes are "supported" in the
sense that you *can* edit db/format and get different-sized shards, or no
shards at all, etc.
Switching between linear and sharded mode is not supported
in the sense that you also have to manually move r0 data to
the respective new path. You may only change the shard size
while the first shard is not completely filled (and make damn
sure nobody has an open svn_fs_t on that repo).
Post by Branko Čibej
- That FSFSv7 layout is always sharded, regardless of what db/format
says.
- That it always supports packing, again, regardless of db/format.
- That shard sizes can still be adjusted in db/format (and/or at
creation time)
The above would imply that v7 reads the format and warns if the layout is
not supported, but other than that, just ignores the sharded/packed options
there.
I'm in favour of phasing linear repos out. We still have to support
them but forgetting to check for shard size 0 is more likely to
happen for new code than existing code. I also tend to issuing
a warning or even failure when trying to upgrade a linear repos.
And read_format() should error out on the combination "linear"
and "logical addressing".

-- Stefan^2.
Julian Foad
2014-11-24 15:32:51 UTC
Permalink
Stefan Fuhrmann wrote:
[...]
Post by Stefan Fuhrmann
I'm in favour of phasing linear repos out. We still have to support
them but forgetting to check for shard size 0 is more likely to
happen for new code than existing code.I also tend to issuing
a warning or even failure when trying to upgrade a linear repos. [...]
Can we feasibly convert an unsharded repo to sharded during an upgrade?

I think the current upgrade takes O(1) time, and sharding would take at least Order(N) time (N = number of revs) to move each rev file into a shard directory, and possibly much higher order if there are disk file systems that have slow 'move' performance (given that the original directory might have a very large number of rev files in it), but it might be acceptable given that in those cases the performance of adding new revs was probably degrading for the same reason.

- Julian
Stefan Fuhrmann
2014-11-24 17:40:27 UTC
Permalink
Post by Julian Foad
[...]
Post by Stefan Fuhrmann
I'm in favour of phasing linear repos out. We still have to support
them but forgetting to check for shard size 0 is more likely to
happen for new code than existing code.I also tend to issuing
a warning or even failure when trying to upgrade a linear repos. [...]
Can we feasibly convert an unsharded repo to sharded during an upgrade?
Yes, that would be feasible. OTOH, there is already
/tools/server-side/fsfs-reshard.py
Post by Julian Foad
I think the current upgrade takes O(1) time,
Well for packed repos, the upgrade from f4 to f6 or
f7 already is O(N) because the revprops get packed.
Post by Julian Foad
and sharding would take at least Order(N) time (N = number of revs) to
move each rev file into a shard directory, and possibly much higher order
if there are disk file systems that have slow 'move' performance (given
that the original directory might have a very large number of rev files in
it), but it might be acceptable given that in those cases the performance
of adding new revs was probably degrading for the same reason.
I guess the most tricky part will be recovery from
a failed upgrade. Given that active linear repos
that people want to upgrade should be rare by now,
I think simply directing them to the reshard script is
acceptable and clearly the lower risk option.

-- Stefan^2.

Ivan Zhakov
2014-11-24 13:28:34 UTC
Permalink
Post by Branko Čibej
Post by Ivan Zhakov
Post by Branko Čibej
Why do you want to be able to turn off log addressing, when you can just
not use FSFSv7 in the first place?
This is just a small addition to existing API like
SVN_FS_CONFIG_FSFS_SHARD_SIZE, SVN_FS_CONFIG_FSFS_BLOCK_READ,
SVN_FS_CONFIG_BDB_TXN_NOSYNC etc. It doesn't change user-visible
behavior and I'm planning to use this in the test suite.
If you have any concerns regarding this change, please let me know.
The parameters you mention are all performance-related knobs, so not
quite comparable to the change you made. Perhaps a better comparison is
the FS layout option (sharded vs. non-sharded, packed vs. unpacked).
That's not quite true. This API option is similar to the following:
1) create a FSFSv6 repository
2) upgrade it to the FSFSv7 format.

In other words, this option does not create new repository layout
combination. Maybe my log message was not clear about that.

But to avoid further developers confusion I could revert my commit and
then implement the tests through the creation of FSFSv6 repositories
and upgrading them to FSFSv7. While I prefer to have an explicit
option for different already supported layouts than remember what
happens during the upgrade, but it's not a big deal for me. What do
you think?
Post by Branko Čibej
As a counter-example, I'd much prefer to drop the ability to create
non-sharded, non-packed FSFSv7 repositories; that would change the
number of possible v7 layouts from 4 (or 8, with r1640915 in place) to
just one, not counting shard sizes etc.
Generally, I like the idea to reduce the number of possible repository
layouts. This is exactly what I'm saying for the
last few months.

Note that number of layouts is greater than number of format versions,
because they got multiplied due to different upgrade paths. The
problem is that all the possible combinations are already exist in the
wild. For example:
1. Repository is created using Subversion 1.1 (format 1). At this
point only non-sharded repositories exists.
2. Then upgraded using Subversion 1.5 to format 5. Sharding cannot be
changed during upgrade, so it still remain as non-sharded.
3. Then it's upgraded Subversion 1.9 to format 7. And it is still
non-sharded and log-addressing disabled.

IMHO the best way to keep it simple - implement the significant format
changes in new FS-backends (i.e. FSX), but this is
a different story.
--
Ivan Zhakov
Branko Čibej
2014-11-24 13:53:53 UTC
Permalink
Post by Ivan Zhakov
Post by Branko Čibej
Post by Ivan Zhakov
Post by Branko Čibej
Why do you want to be able to turn off log addressing, when you can just
not use FSFSv7 in the first place?
This is just a small addition to existing API like
SVN_FS_CONFIG_FSFS_SHARD_SIZE, SVN_FS_CONFIG_FSFS_BLOCK_READ,
SVN_FS_CONFIG_BDB_TXN_NOSYNC etc. It doesn't change user-visible
behavior and I'm planning to use this in the test suite.
If you have any concerns regarding this change, please let me know.
The parameters you mention are all performance-related knobs, so not
quite comparable to the change you made. Perhaps a better comparison is
the FS layout option (sharded vs. non-sharded, packed vs. unpacked).
1) create a FSFSv6 repository
2) upgrade it to the FSFSv7 format.
I was under the impression that, since r1637184, upgrading from v6 to v7
without a dump/load cycle is no longer realistic for non-trivial (i.e.,
non-empty) repositories?

-- Brane
Ivan Zhakov
2014-11-24 14:39:00 UTC
Permalink
Post by Branko Čibej
Post by Ivan Zhakov
Post by Branko Čibej
Post by Ivan Zhakov
Post by Branko Čibej
Why do you want to be able to turn off log addressing, when you can just
not use FSFSv7 in the first place?
This is just a small addition to existing API like
SVN_FS_CONFIG_FSFS_SHARD_SIZE, SVN_FS_CONFIG_FSFS_BLOCK_READ,
SVN_FS_CONFIG_BDB_TXN_NOSYNC etc. It doesn't change user-visible
behavior and I'm planning to use this in the test suite.
If you have any concerns regarding this change, please let me know.
The parameters you mention are all performance-related knobs, so not
quite comparable to the change you made. Perhaps a better comparison is
the FS layout option (sharded vs. non-sharded, packed vs. unpacked).
1) create a FSFSv6 repository
2) upgrade it to the FSFSv7 format.
I was under the impression that, since r1637184, upgrading from v6 to v7
without a dump/load cycle is no longer realistic for non-trivial (i.e.,
non-empty) repositories?
The 'svnadmin upgrade' command always upgrades to the latest
repository format. It leaves marker in FORMAT file if some feature
cannot be enabled during in-place upgrade. That's why FORMAT file
still have option for non-sharded layout and physical addressing.
--
Ivan Zhakov
Loading...