[Feature Suggestion] Time remaining improvement

Posts: 6 · warpi 3 May 2018, 08:43

Hello!

I have a suggestion to improve the "time remaining" calculation with the following forumla:
total time left [s] = total size left [mb] / maximum transfer speed [mb/s] + total files left [files] / maximum speed [files/s]

The maximum speeds will be updated as soon the application gets a new record. Also, the que of files shall be sorted to have a medium big file in beginning (to get a good understanding of mb/s) and a couple of small files (to get a good understanding of files/s).

With this, you will no longer get "there is 5 hours left to complete" estimation when it is in fact 5 minutes left.

Posts: 7 · klaus_kraemer 8 Sep 2018, 13:26

I'd very much support this suggestion.

I just worked with time prediction wandering between 1:14 hours to beyond 7 days. I think warpi's suggestion wouldn't be too bad...

Posts: 7 · klaus_kraemer 3 Nov 2019, 09:16

No development here? I'm just syncing with predictions between 1.5 hours and besyon 7 days. That's ridiculous!

Posts: 4056 · xCSxXenon 7 Nov 2019, 19:28

That algorithm is wrong though. The size left and files left are the same data, so you would be doubling the remaining time essentially. The better way to do the algorithm would be:
T = [(Size remaining/avg transfer rate) + (files remaining/avg file rate)] / 2
avg rates can be calculated based on recording rates every second for the last 10 seconds then dividing by 10. You can do longer than 10 seconds to get a more accurate value. The longer you track back, to less a spike or valley will skew the results, but the more data you have to store for the calculations.

If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S

Posts: 7211 · Zenju 7 Nov 2019, 19:51

Here's some (cool?) idea: Maybe FFS should start recording some historical (per-device-)data: item count, bytes and total time of last syncs. Then assume some simple formula like

total time = A * item count + B * bytes

Then fine-tune the unkown parameters A and B based on the historical data.

Hm, one drawback comes directly to mind: how long of a record to track? If the user upgrades to a faster hard-drive the old perf records would be instantly obsolete...

Posts: 6 · warpi 7 Nov 2019, 20:10

That algorithm is wrong though. The size left and files left are the same data, so you would be doubling the remaining time essentially. The better way to do the algorithm would be:
T = [(Size remaining/avg transfer rate) + (files remaining/avg file rate)] / 2
avg rates can be calculated based on recording rates every second for the last 10 seconds then dividing by 10. You can do longer than 10 seconds to get a more accurate value. The longer you track back, to less a spike or valley will skew the results, but the more data you have to store for the calculations.

If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S xCSxXenon, 07 Nov 2019, 19:28

Thanks for the feedback! If the files are very different in size, the above equation will come very close to correct estimation.

However, if all files does have equal size, you are correct, the calculated time would be doubled.

So, we have a solution for this also :)

Introduce a ratio to correct this issue and the remaining time would be spot on every time :)

------ Updated proposal -----

Remaining time = r(S/C1+n/C2)

--- Details

Assuming that nothing else is disturbing the traffic, i.e. all transfers speed is dedicated to FreeFileSync there will be two time constants that needs to be found out in order to calculate remaining time correctly.

C1 = Constant 1 = Maximum transfer speed [mb/s]
C2 = Constant 2 = Maximum speed to handle files [files/s]

Then we have the characteristics of the complete transfer

S = Total size left [mb]
n = Total files left [files]

Remaining time can then be calculated

Remaining time = S/C1+n/C2

However, the above equation relies on calculating C1 amd C2 accuratly which is not possible if the transfer que is not including files with big differences in size.

Therefore, a new variable called r is introduced which stands for ratio. This is here to correct any errors introduced by the uniformity of the transfer que.

Remaining time = r(S/C1+n/C2)

So finally, C1 and C2 and r is defined by.

1. The maximum speeds (C1 and C2) will be updated as soon the application gets a new record.
2. Also, the que of files shall be sorted to have a medium big file in beginning (to get a good understanding of mb/s) and a couple of small files (to get a good understanding of files/s).
3. r is calculated by the ratio between elapsed time and calculated elapsed time. Where calculated elapsed time is given by the above formula.

r=elapsed time/calculated elapsed time

Posts: 6 · warpi 7 Nov 2019, 20:13

Here's some (cool?) idea: Maybe FFS should start recording some historical (per-device-)data: item count, bytes and total time of last syncs. Then assume some simple formula like

total time = A * item count + B * bytes
Then fine-tune the unkown parameters A and B based on the historical data.

Hm, one drawback comes directly to mind: how long of a record to track? If the user upgrades to a faster hard-drive the old perf records would be instantly obsolete... Zenju, 07 Nov 2019, 19:51

Thanks for comments. I also think it is hard to rely on historical data. I think the updated proposal will estimate the time very good given that nothing else is disturbing the traffic from outside. If this is the case, FFS could detect this automatically and recalculate C1, C2 and r.

Remaining time = r(S/C1+n/C2)

r=elapsed time/calculated elapsed time
S = Total size left [mb]
C1 = Maximum transfer speed [mb/s]
n = Total files left [files]
C2 = Maximum speed to handle files [files/s]

Posts: 4056 · xCSxXenon 7 Nov 2019, 22:55

You can't do that because in order to calculate the calculated elapsed time for r, you have to call the formula again with starting conditions for the transfer, then again, then again. Recursive algorithms don't work without an exit condition, and so this would be a memory leak at best and outright crash at worst. r can't be a constant if it needs to be calculated. The simplest way is to find the remaining time for data size and file count, then average the two times you get from them. You could more efficiently get avg data rate and file rate by (data transferred/time elapsed) and (files transferred/time elapsed). That is only four values to iterate through, much more efficient than finding avg over the last x seconds.

Posts: 6 · warpi 9 Nov 2019, 07:00

You can't do that because in order to calculate the calculated elapsed time for r, you have to call the formula again with starting conditions for the transfer, then again, then again. Recursive algorithms don't work without an exit condition, and so this would be a memory leak at best and outright crash at worst. r can't be a constant if it needs to be calculated. The simplest way is to find the remaining time for data size and file count, then average the two times you get from them. You could more efficiently get avg data rate and file rate by (data transferred/time elapsed) and (files transferred/time elapsed). That is only four values to iterate through, much more efficient than finding avg over the last x seconds. xCSxXenon, 07 Nov 2019, 22:55

Thanks for feedback! I think we have a misunderstanding, of course we shall not have an recursive algorithm. I have attached a spreadsheet to describe how it works with an example.

: 2019-11-09 07_56_37-Calculated.ods - LibreOffice Calc.png (60.34 KiB) Viewed 2702 times

: 2019-11-09 07_56_20-Calculated.ods - LibreOffice Calc.png (73.07 KiB) Viewed 2702 times

Calculated.ods: (18.24 KiB) Downloaded 322 times

Posts: 1 · chrisv 24 Aug 2023, 08:14

Suggestion, as the estimated time is just that, an estimate, why not display a range
Remaining time for data size and also the remaining time based on file count, display the smaller one first.

So the display would be something like 5min - 3hr 35min

That way the user knows the range it could be in and as the process goes on the numbers should converge.

Posts: 4056 · xCSxXenon 24 Aug 2023, 15:02

Brought back to life, I re-read the thread. I think using 'maximum' value(s) in any algorithm is fundamentally flawed. As I suggested earlier, "If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S"

As for using a range, that feels like bandaging half a wound. You increase accuracy but are decreasing precision greatly. I am in favor of calculating estimated time remaining based and transfer speed and items/second, then displaying the smaller value as the remaining time.

Posts: 6 · warpi 24 Aug 2023, 19:02

Suggestion, as the estimated time is just that, an estimate, why not display a range
Remaining time for data size and also the remaining time based on file count, display the smaller one first.

So the display would be something like 5min - 3hr 35min

That way the user knows the range it could be in and as the process goes on the numbers should converge. chrisv, 24 Aug 2023, 08:14

Thanks for feedback! A range could be used but I think not necessary. If you arrange the file que in a way to constantly get the best information about transfer speeds, it will be spot on.

Posts: 6 · warpi 24 Aug 2023, 19:05

Brought back to life, I re-read the thread. I think using 'maximum' value(s) in any algorithm is fundamentally flawed. As I suggested earlier, "If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S"

As for using a range, that feels like bandaging half a wound. You increase accuracy but are decreasing precision greatly. I am in favor of calculating estimated time remaining based and transfer speed and items/second, then displaying the smaller value as the remaining time. xCSxXenon, 24 Aug 2023, 15:02

It is nice with old posts isnt it :)

I agree, if there is a spike, the result would get wrong. So, I guess, some kind of fix is needed to reduce the risk of spikes. Actaually, you do not need to determine the max. You just need to determine the average, however, you need to extract the two pieces of information from each file transferred and that is transfer speed in both mb/s and files/s.