forked from mirrors/linux
		
	mm/hugetlb: restore the reservation if needed
Patch series "mm/hugetlb: Restore the reservation", v2. This is a fix for a case where a backing huge page could stolen after madvise(MADV_DONTNEED). A full reproducer is in selftest. See https://lore.kernel.org/all/20240105155419.1939484-1-leitao@debian.org/ In order to test this patch, I instrumented the kernel with LOCKDEP and KASAN, and run the following tests, without any regression: * The self test that reproduces the problem * All mm hugetlb selftests SUMMARY: PASS=9 SKIP=0 FAIL=0 * All libhugetlbfs tests PASS: 0 86 FAIL: 0 0 This patch (of 2): Currently there is a bug that a huge page could be stolen, and when the original owner tries to fault in it, it causes a page fault. You can achieve that by: 1) Creating a single page echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2) mmap() the page above with MAP_HUGETLB into (void *ptr1). * This will mark the page as reserved 3) touch the page, which causes a page fault and allocates the page * This will move the page out of the free list. * It will also unreserved the page, since there is no more free page 4) madvise(MADV_DONTNEED) the page * This will free the page, but not mark it as reserved. 5) Allocate a secondary page with mmap(MAP_HUGETLB) into (void *ptr2). * it should fail, but, since there is no more available page. * But, since the page above is not reserved, this mmap() succeed. 6) Faulting at ptr1 will cause a SIGBUS * it will try to allocate a huge page, but there is none available A full reproducer is in selftest. See https://lore.kernel.org/all/20240105155419.1939484-1-leitao@debian.org/ Fix this by restoring the reserved page if necessary. These are the condition for the page restore: * The system is not using surplus pages. The goal is to reduce the surplus usage for this case. * If the VMA has the HPAGE_RESV_OWNER flag set, and is PRIVATE. This is safely checked using __vma_private_lock() * The page is anonymous Once this is scenario is found, set the `hugetlb_restore_reserve` bit in the folio. Then check if the resv reservations need to be adjusted later, done later, after the spinlock, since the vma_xxxx_reservation() might touch the file system lock. Link: https://lkml.kernel.org/r/20240205191843.4009640-1-leitao@debian.org Link: https://lkml.kernel.org/r/20240205191843.4009640-2-leitao@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Rik van Riel <riel@surriel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
		
							parent
							
								
									4e76c8cc33
								
							
						
					
					
						commit
						df7a6d1f64
					
				
					 1 changed files with 25 additions and 0 deletions
				
			
		
							
								
								
									
										25
									
								
								mm/hugetlb.c
									
									
									
									
									
								
							
							
						
						
									
										25
									
								
								mm/hugetlb.c
									
									
									
									
									
								
							|  | @ -5585,6 +5585,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, | ||||||
| 	struct page *page; | 	struct page *page; | ||||||
| 	struct hstate *h = hstate_vma(vma); | 	struct hstate *h = hstate_vma(vma); | ||||||
| 	unsigned long sz = huge_page_size(h); | 	unsigned long sz = huge_page_size(h); | ||||||
|  | 	bool adjust_reservation = false; | ||||||
| 	unsigned long last_addr_mask; | 	unsigned long last_addr_mask; | ||||||
| 	bool force_flush = false; | 	bool force_flush = false; | ||||||
| 
 | 
 | ||||||
|  | @ -5677,7 +5678,31 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, | ||||||
| 		hugetlb_count_sub(pages_per_huge_page(h), mm); | 		hugetlb_count_sub(pages_per_huge_page(h), mm); | ||||||
| 		hugetlb_remove_rmap(page_folio(page)); | 		hugetlb_remove_rmap(page_folio(page)); | ||||||
| 
 | 
 | ||||||
|  | 		/*
 | ||||||
|  | 		 * Restore the reservation for anonymous page, otherwise the | ||||||
|  | 		 * backing page could be stolen by someone. | ||||||
|  | 		 * If there we are freeing a surplus, do not set the restore | ||||||
|  | 		 * reservation bit. | ||||||
|  | 		 */ | ||||||
|  | 		if (!h->surplus_huge_pages && __vma_private_lock(vma) && | ||||||
|  | 		    folio_test_anon(page_folio(page))) { | ||||||
|  | 			folio_set_hugetlb_restore_reserve(page_folio(page)); | ||||||
|  | 			/* Reservation to be adjusted after the spin lock */ | ||||||
|  | 			adjust_reservation = true; | ||||||
|  | 		} | ||||||
|  | 
 | ||||||
| 		spin_unlock(ptl); | 		spin_unlock(ptl); | ||||||
|  | 
 | ||||||
|  | 		/*
 | ||||||
|  | 		 * Adjust the reservation for the region that will have the | ||||||
|  | 		 * reserve restored. Keep in mind that vma_needs_reservation() changes | ||||||
|  | 		 * resv->adds_in_progress if it succeeds. If this is not done, | ||||||
|  | 		 * do_exit() will not see it, and will keep the reservation | ||||||
|  | 		 * forever. | ||||||
|  | 		 */ | ||||||
|  | 		if (adjust_reservation && vma_needs_reservation(h, vma, address)) | ||||||
|  | 			vma_add_reservation(h, vma, address); | ||||||
|  | 
 | ||||||
| 		tlb_remove_page_size(tlb, page, huge_page_size(h)); | 		tlb_remove_page_size(tlb, page, huge_page_size(h)); | ||||||
| 		/*
 | 		/*
 | ||||||
| 		 * Bail out after unmapping reference page if supplied | 		 * Bail out after unmapping reference page if supplied | ||||||
|  |  | ||||||
		Loading…
	
		Reference in a new issue
	
	 Breno Leitao
						Breno Leitao