为什么Postgres的select查询非常慢?

我有一个简单的Postgres表。一个简单的计算总记录的查询需要很长时间。我的表中有750万条记录,我使用8个vCPU,32GB内存的机器。数据库在同一台机器上。

编辑:添加查询。

下面的查询非常慢。

SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000

解释的输出

$ explain SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000

---------------------------------------------------------------------------------------------------------
 Limit  (cost=5.42..49915.17 rows=10000 width=1985)
   ->  Index Scan using import_csv_id_idx on import_csv  (cost=0.43..19144730.02 rows=3835870 width=1985)
         Filter: (NOT processed)
(3 rows)

我的表格如下。

      Column       |      Type      | Collation | Nullable | Default 
-------------------+----------------+-----------+----------+---------
 id                | integer        |           |          | 
 name              | character(500) |           |          | 
 domain            | character(500) |           |          | 
 year_founded      | real           |           |          | 
 industry          | character(500) |           |          | 
 size_range        | character(500) |           |          | 
 locality          | character(500) |           |          | 
 country           | character(500) |           |          | 
 linkedinurl       | character(500) |           |          | 
 employees         | integer        |           |          | 
 processed         | boolean        |           | not null | false
 employee_estimate | integer        |           |          | 
Indexes:
    "import_csv_id_idx" btree (id)
    "processed_idx" btree (processed)

谢谢你

编辑3:

# explain analyze SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8331.070..8355.556 rows=10000 loops=1)
   ->  Index Scan using import_csv_id_idx on import_csv  (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8331.067..8354.874 rows=10001 loops=1)
         Filter: (NOT processed)
         Rows Removed by Filter: 3482252
 Planning time: 0.081 ms
 Execution time: 8355.925 ms
(6 rows)

解释(分析,缓冲器)

# explain (analyze, buffers) SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;


                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8236.899..8260.941 rows=10000 loops=1)
   Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
   ->  Index Scan using import_csv_id_idx on import_csv  (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8236.896..8260.104 rows=10001 loops=1)
         Filter: (NOT processed)
         Rows Removed by Filter: 3482252
         Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
 Planning time: 0.386 ms
 Execution time: 8261.406 ms
(8 rows)

解决方案:

它是缓慢的,因为它必须通过3482252行,失败的挖掘 processed = False criterion才找到第10001条上的通过,显然这些失败的行都是随机分散在表上的,导致了大量的缓慢IO。

你要么需要一个索引在 (processed, id)(id) where processed = false

如果你做了其中的第一项,你可以放弃单独使用processed的索引,因为它将不再是独立有用的(如果它曾经是开始的)。

给TA打赏
共{{data.count}}人
人已打赏
未分类

猜牌游戏和帮助实施

2022-9-8 5:41:38

未分类

检查两列都是TRUE,如果是,就用数据填充第三列

2022-9-8 5:41:40

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索