Six Ways to Force “Sequential Scan” in PostgreSQL

Here are some six ways that can be used to force a query go with sequential scan access method in PostgresSQL. The objective is to eliminate any index access in the query. The test table looks as follows.

postgres=# create table tablescandemo as select generate_series(1,100000) id;
SELECT 100000
postgres=# create index ind1demo on testtable1(id);
CREATE INDEX

Let’s get started.

Option I. Using enable_indexscan, enable_indexonlyscan and enable_bitmapscan parameters.
The enable_indexscan parameter disables index access path. Let’s try it out.

postgres=# set enable_indexscan = off;
SET
postgres=# explain analyze select * from tablescandemo where id = 100;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tablescandemo  (cost=12.17..482.50 rows=500 width=4) (actual time=0.039..0.041 rows=1 loops=1)
   Recheck Cond: (id = 100)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on inddemo  (cost=0.00..12.04 rows=500 width=0) (actual time=0.026..0.027 rows=1 loops=1)
         Index Cond: (id = 100)
 Planning Time: 0.568 ms
 Execution Time: 0.091 ms
(7 rows)

The planner still accesses the index but this time it is behaving differently. The index is accessed in the same way as in index scan method, however, as a result, the database doesn’t build a set of rowids of the heap rows but a bitmap: one bit per heap page. This is how the bitmap index scan works. The table is sequentially scanned and the bitmap is used to decide whether a particular page must be probed or not. The enable_bitmapscan parameter controls bitmap index scan access method:

postgres=# set enable_bitmapscan = off;
SET
postgres=# explain analyze select * from tablescandemo where id = 100;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using inddemo on tablescandemo  (cost=0.29..4.31 rows=1 width=4) (actual time=0.070..0.071 rows=1 loops=1)
   Index Cond: (id = 100)
   Heap Fetches: 0
 Planning Time: 0.086 ms
 Execution Time: 0.094 ms
(5 rows)

Now, index only scan scan kicks in since the table has a single column which is indexed. Let’s turn it off as well:

postgres=# set enable_indexonlyscan = off;
SET
postgres=# explain analyze select * from tablescandemo where id = 100;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Seq Scan on tablescandemo  (cost=0.00..1693.00 rows=1 width=4) (actual time=0.021..9.114 rows=1 loops=1)
   Filter: (id = 100)
   Rows Removed by Filter: 99999
 Planning Time: 0.065 ms
 Execution Time: 9.137 ms
(5 rows)

Basically, once needs to make sure that not just one but all index access methods (index scan, index only scan and bitmap index scan) are off.

Option II. Using random_page_cost parameter.
This parameter tells the optimizer the cost of a randomly accessed disk page. This concerns both index and table pages. Therefore, the higher this value, the more it should cost the optimizer to use index access path, especially, if the index is not very selective. Let’s see the experiment results (index scan, index only scan and bitmap index scan control parameter values are all on now):

postgres=# set random_page_cost = 100000;
SET
postgres=# explain analyze select * from tablescandemo where id = 100;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Seq Scan on tablescandemo  (cost=0.00..1693.00 rows=1 width=4) (actual time=0.034..10.175 rows=1 loops=1)
   Filter: (id = 100)
   Rows Removed by Filter: 99999
 Planning Time: 0.071 ms
 Execution Time: 10.200 ms
(5 rows)

The optimizer estimates that retrieving tuple(s) via the index will be higher than sequential scan and, hence, prefers solely the latter.

Option III. Altering table statistics.
At the end of the day, optimizer uses database statistics which one can actually fake. Selectivity of the predicate is the primary factor in deciding which access path to take. It is based on the cardinality which itself calculated as the number of tuples that qualify the predicate. The latter is (assuming no extended statistics exist) nothing but N/NDV where N is the number of tuples in the table and NDV is the number of distinct values of the column. The higher the NDV, the more selective the predicate is, and, thus, the higher the probability that the optimizer leans towards index access rather than sequential scan. Therefore, if NDV is low, the optimizer will think that most of the table tuples qualify: that is, index access will cost much more than sequential scan, thus, preferring the latter. Let’s check it out.

postgres=# alter table tablescandemo alter column id set (n_distinct = 1);
ALTER TABLE
postgres=# analyze tablescandemo;
ANALYZE
postgres=# explain analyze select * from tablescandemo where id = 100;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Seq Scan on tablescandemo  (cost=0.00..1693.00 rows=100000 width=4) (actual time=0.015..6.366 rows=1 loops=1)
   Filter: (id = 100)
   Rows Removed by Filter: 99999
 Planning Time: 0.098 ms
 Execution Time: 6.379 ms
(5 rows)

Option IV. ‘Hacking’ table statistics.
Even though the previous option is a valid one, it has two drawbacks: 1) alter table command will lock the table in exclusive mode and 2) analyze table may take considerable amount of time depending on the size of the table. So, if that is the case, then you can update statistics directly in the pg_statistic table. Surely, this is not an accurate way but still works and in extreme cases may come in handy:

postgres=# update pg_statistic set stadistinct = 1 where starelid = (select oid from pg_class where relname = 'tablescandemo');;
UPDATE 1
postgres=# explain analyze select * from tablescandemo where id = 6;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Seq Scan on tablescandemo(cost=0.00..1693.00 rows=100000 width=4) (actual time=0.008..6.612 rows=1 loops=1)
   Filter: (id = 6)
   Rows Removed by Filter: 99999
 Planning Time: 0.115 ms
 Execution Time: 6.625 ms
(5 rows)

postgres=#

Option V. ‘Hacking’ table statistics (Part II).
Another statistics you can fake is number of pages a table contains. The lesser it is, the higher the probability that sequential scan will kick in. This information is kept in pg_class table and can be manipulated just like any other column value:

postgres=# update pg_class set reltuples=1 where relname = 'tablescandemo';
UPDATE 1
postgres=# explain analyze select * from tablescandemo where id = 6;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Seq Scan on tablescandemo(cost=0.00..443.01 rows=1 width=4) (actual time=0.010..6.496 rows=1 loops=1)
   Filter: (id = 6)
   Rows Removed by Filter: 99999
 Planning Time: 0.138 ms
 Execution Time: 6.511 ms
(5 rows)
postgres=#

Option VI. Using hints.
PostgreSQL does not support hints out-of-the-box. However, you can use pg_hint_plan to hint queries. It provides hints for scan methods, join methods, join order, parallelism, row number correction and few others.

Ohh…yes…and option VII: drop all indexes :)).

Posted in PostgreSQL.

Leave a Reply