Hello!
I have a suggestion to improve the "time remaining" calculation with the following forumla:
total time left [s] = total size left [mb] / maximum transfer speed [mb/s] + total files left [files] / maximum speed [files/s]
The maximum speeds will be updated as soon the application gets a new record. Also, the que of files shall be sorted to have a medium big file in beginning (to get a good understanding of mb/s) and a couple of small files (to get a good understanding of files/s).
With this, you will no longer get "there is 5 hours left to complete" estimation when it is in fact 5 minutes left.
[Feature Suggestion] Time remaining improvement
- Posts: 6
- Joined: 3 May 2018
- Posts: 7
- Joined: 8 Sep 2018
I'd very much support this suggestion.
I just worked with time prediction wandering between 1:14 hours to beyond 7 days. I think warpi's suggestion wouldn't be too bad...
I just worked with time prediction wandering between 1:14 hours to beyond 7 days. I think warpi's suggestion wouldn't be too bad...
- Posts: 7
- Joined: 8 Sep 2018
No development here? I'm just syncing with predictions between 1.5 hours and besyon 7 days. That's ridiculous!
- Posts: 4056
- Joined: 11 Jun 2019
That algorithm is wrong though. The size left and files left are the same data, so you would be doubling the remaining time essentially. The better way to do the algorithm would be:
T = [(Size remaining/avg transfer rate) + (files remaining/avg file rate)] / 2
avg rates can be calculated based on recording rates every second for the last 10 seconds then dividing by 10. You can do longer than 10 seconds to get a more accurate value. The longer you track back, to less a spike or valley will skew the results, but the more data you have to store for the calculations.
If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S
T = [(Size remaining/avg transfer rate) + (files remaining/avg file rate)] / 2
avg rates can be calculated based on recording rates every second for the last 10 seconds then dividing by 10. You can do longer than 10 seconds to get a more accurate value. The longer you track back, to less a spike or valley will skew the results, but the more data you have to store for the calculations.
If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S
- Site Admin
- Posts: 7211
- Joined: 9 Dec 2007
Here's some (cool?) idea: Maybe FFS should start recording some historical (per-device-)data: item count, bytes and total time of last syncs. Then assume some simple formula like
Hm, one drawback comes directly to mind: how long of a record to track? If the user upgrades to a faster hard-drive the old perf records would be instantly obsolete...
Then fine-tune the unkown parameters A and B based on the historical data.total time = A * item count + B * bytes
Hm, one drawback comes directly to mind: how long of a record to track? If the user upgrades to a faster hard-drive the old perf records would be instantly obsolete...
- Posts: 6
- Joined: 3 May 2018
Thanks for the feedback! If the files are very different in size, the above equation will come very close to correct estimation.That algorithm is wrong though. The size left and files left are the same data, so you would be doubling the remaining time essentially. The better way to do the algorithm would be:
T = [(Size remaining/avg transfer rate) + (files remaining/avg file rate)] / 2
avg rates can be calculated based on recording rates every second for the last 10 seconds then dividing by 10. You can do longer than 10 seconds to get a more accurate value. The longer you track back, to less a spike or valley will skew the results, but the more data you have to store for the calculations.
If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S xCSxXenon, 07 Nov 2019, 19:28
However, if all files does have equal size, you are correct, the calculated time would be doubled.
So, we have a solution for this also :)
Introduce a ratio to correct this issue and the remaining time would be spot on every time :)
------ Updated proposal -----
Remaining time = r(S/C1+n/C2)
--- Details
Assuming that nothing else is disturbing the traffic, i.e. all transfers speed is dedicated to FreeFileSync there will be two time constants that needs to be found out in order to calculate remaining time correctly.
C1 = Constant 1 = Maximum transfer speed [mb/s]
C2 = Constant 2 = Maximum speed to handle files [files/s]
Then we have the characteristics of the complete transfer
S = Total size left [mb]
n = Total files left [files]
Remaining time can then be calculated
Remaining time = S/C1+n/C2
However, the above equation relies on calculating C1 amd C2 accuratly which is not possible if the transfer que is not including files with big differences in size.
Therefore, a new variable called r is introduced which stands for ratio. This is here to correct any errors introduced by the uniformity of the transfer que.
Remaining time = r(S/C1+n/C2)
So finally, C1 and C2 and r is defined by.
1. The maximum speeds (C1 and C2) will be updated as soon the application gets a new record.
2. Also, the que of files shall be sorted to have a medium big file in beginning (to get a good understanding of mb/s) and a couple of small files (to get a good understanding of files/s).
3. r is calculated by the ratio between elapsed time and calculated elapsed time. Where calculated elapsed time is given by the above formula.
r=elapsed time/calculated elapsed time
- Posts: 6
- Joined: 3 May 2018
Thanks for comments. I also think it is hard to rely on historical data. I think the updated proposal will estimate the time very good given that nothing else is disturbing the traffic from outside. If this is the case, FFS could detect this automatically and recalculate C1, C2 and r.Here's some (cool?) idea: Maybe FFS should start recording some historical (per-device-)data: item count, bytes and total time of last syncs. Then assume some simple formula like
Then fine-tune the unkown parameters A and B based on the historical data.total time = A * item count + B * bytes
Hm, one drawback comes directly to mind: how long of a record to track? If the user upgrades to a faster hard-drive the old perf records would be instantly obsolete... Zenju, 07 Nov 2019, 19:51
Remaining time = r(S/C1+n/C2)
r=elapsed time/calculated elapsed time
S = Total size left [mb]
C1 = Maximum transfer speed [mb/s]
n = Total files left [files]
C2 = Maximum speed to handle files [files/s]
- Posts: 4056
- Joined: 11 Jun 2019
You can't do that because in order to calculate the calculated elapsed time for r, you have to call the formula again with starting conditions for the transfer, then again, then again. Recursive algorithms don't work without an exit condition, and so this would be a memory leak at best and outright crash at worst. r can't be a constant if it needs to be calculated. The simplest way is to find the remaining time for data size and file count, then average the two times you get from them. You could more efficiently get avg data rate and file rate by (data transferred/time elapsed) and (files transferred/time elapsed). That is only four values to iterate through, much more efficient than finding avg over the last x seconds.
- Posts: 6
- Joined: 3 May 2018
Thanks for feedback! I think we have a misunderstanding, of course we shall not have an recursive algorithm. I have attached a spreadsheet to describe how it works with an example.You can't do that because in order to calculate the calculated elapsed time for r, you have to call the formula again with starting conditions for the transfer, then again, then again. Recursive algorithms don't work without an exit condition, and so this would be a memory leak at best and outright crash at worst. r can't be a constant if it needs to be calculated. The simplest way is to find the remaining time for data size and file count, then average the two times you get from them. You could more efficiently get avg data rate and file rate by (data transferred/time elapsed) and (files transferred/time elapsed). That is only four values to iterate through, much more efficient than finding avg over the last x seconds. xCSxXenon, 07 Nov 2019, 22:55
- Posts: 1
- Joined: 24 Aug 2023
Suggestion, as the estimated time is just that, an estimate, why not display a range
Remaining time for data size and also the remaining time based on file count, display the smaller one first.
So the display would be something like 5min - 3hr 35min
That way the user knows the range it could be in and as the process goes on the numbers should converge.
Remaining time for data size and also the remaining time based on file count, display the smaller one first.
So the display would be something like 5min - 3hr 35min
That way the user knows the range it could be in and as the process goes on the numbers should converge.
- Posts: 4056
- Joined: 11 Jun 2019
Brought back to life, I re-read the thread. I think using 'maximum' value(s) in any algorithm is fundamentally flawed. As I suggested earlier, "If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S"
As for using a range, that feels like bandaging half a wound. You increase accuracy but are decreasing precision greatly. I am in favor of calculating estimated time remaining based and transfer speed and items/second, then displaying the smaller value as the remaining time.
As for using a range, that feels like bandaging half a wound. You increase accuracy but are decreasing precision greatly. I am in favor of calculating estimated time remaining based and transfer speed and items/second, then displaying the smaller value as the remaining time.
- Posts: 6
- Joined: 3 May 2018
Thanks for feedback! A range could be used but I think not necessary. If you arrange the file que in a way to constantly get the best information about transfer speeds, it will be spot on.Suggestion, as the estimated time is just that, an estimate, why not display a range
Remaining time for data size and also the remaining time based on file count, display the smaller one first.
So the display would be something like 5min - 3hr 35min
That way the user knows the range it could be in and as the process goes on the numbers should converge. chrisv, 24 Aug 2023, 08:14
- Posts: 6
- Joined: 3 May 2018
It is nice with old posts isnt it :)Brought back to life, I re-read the thread. I think using 'maximum' value(s) in any algorithm is fundamentally flawed. As I suggested earlier, "If you store the max rate, a spike will destroy your estimation. If it spikes to 80MB/s for a second, it will say your last 800MB will take 10 seconds, but it may take over a minute if the overall average rate it 10MB/S"
As for using a range, that feels like bandaging half a wound. You increase accuracy but are decreasing precision greatly. I am in favor of calculating estimated time remaining based and transfer speed and items/second, then displaying the smaller value as the remaining time. xCSxXenon, 24 Aug 2023, 15:02
I agree, if there is a spike, the result would get wrong. So, I guess, some kind of fix is needed to reduce the risk of spikes. Actaually, you do not need to determine the max. You just need to determine the average, however, you need to extract the two pieces of information from each file transferred and that is transfer speed in both mb/s and files/s.