What | Removed | Added |
---|---|---|
Status | CONFIRMED | RESOLVED |
Resolution | --- | WONTFIX |
OK, so I've tracked this down by "bisecting" operations inside reaim workfile (which was a bit tedious but not too bad). The small workfile that still reproduces the regression is: 20 disk_dio_rd 20 sync_disk_cp 20 sync_disk_update You could probably still drop sync_disk_update from it but I didn't really try since at this point I understood what's going on. The culprit is that disk_dio_rd opens tmpa.common file with O_DIRECT and reads from it. sync_disk_cp also opens tmpa.common file and reads from it but without O_DIRECT. These files are different for different reaim processes (each process runs in its dedicated dir) but each process alternates between these two workloads so direct IO read usually has runs on a file with existing page cache. And iomap code (unlike legacy direct IO) does evict page cache even during direct IO reads which then forces sync_disk_cp to reread the file from the disk. When I changed reaim not to share the same file for DIO and buffered test, the regression went away. Since mixing of buffered & direct IO is not really interesting wrt performance, I think there's no kernel bug to fix. But as a side note there's now 54752de928c "iomap: Only invalidate page cache pages on direct IO writes" sitting in linux-next which makes iomap DIO code behave the same way as legacy DIO code in this regard so that will make the regression go away as well. I'm somewhat undecided whether we should modify reaim to not use the same file for direct IO read tests and buffered IO read tests. I don't think the sharing of the file is particularly interesting usecase but OTOH if all the accesses are reads, it isn't completely insane either. So I'll probably leave it for now.