1 00:00:02,110 --> 00:00:02,970 Another 2 00:00:02,990 --> 00:00:09,010 very interesting kind of configuring a filter is setting up a so-called partial filter. 3 00:00:09,170 --> 00:00:16,380 Now in our persons collection, we have persons older than 20, older than 30 and so on, 4 00:00:16,500 --> 00:00:23,540 now let's say in the application you're building, which is an application for calculating what your friends 5 00:00:23,540 --> 00:00:25,410 will get once they retire, 6 00:00:25,550 --> 00:00:33,740 you typically only look for persons older than 60. Having an index on the date of birth age field then 7 00:00:33,740 --> 00:00:35,060 might make sense 8 00:00:35,390 --> 00:00:41,060 but the problem of course is that you have a lot of values in your index that you never actually query 9 00:00:41,060 --> 00:00:42,530 for. Now 10 00:00:42,550 --> 00:00:46,810 your index will still be efficient but it will be unnecessarily big 11 00:00:46,850 --> 00:00:47,890 and of course an index 12 00:00:47,900 --> 00:00:50,530 also eats up size on your disk, 13 00:00:50,780 --> 00:00:57,960 additionally the bigger the index is, well the more performance certain queries will take nonetheless. 14 00:00:58,010 --> 00:01:06,410 So if you know that certain values will not be looked at or only very very rarely and you would be fine 15 00:01:06,410 --> 00:01:12,800 using a collection scan if that happens, you can actually create a partial index where you only add the 16 00:01:12,800 --> 00:01:16,220 values you're regularly going to look at, 17 00:01:16,220 --> 00:01:19,100 so in our case here, contacts, 18 00:01:19,280 --> 00:01:21,490 let's see which indexes we have right now 19 00:01:21,580 --> 00:01:29,030 and I only got the ID, so let's now create a new index with create index of course and let's create that 20 00:01:29,030 --> 00:01:31,870 index on the date of birth age field 21 00:01:32,180 --> 00:01:34,750 but there, still in ascending order 22 00:01:34,760 --> 00:01:43,010 but now with an additional argument here and that argument for me is a partial filter expression. As 23 00:01:43,010 --> 00:01:43,900 a side note, 24 00:01:43,970 --> 00:01:53,060 you can also add this to compound index of course because now here, I will define which field is my interesting 25 00:01:53,060 --> 00:02:00,090 field for narrowing down the set of values I want to add and I will use my dob.age field. 26 00:02:00,110 --> 00:02:03,410 Now you could also use a totally different field by the way, 27 00:02:03,650 --> 00:02:13,130 so if you know my queries which I will run should only return me persons which have a gender of male, 28 00:02:13,430 --> 00:02:19,160 then I could also set this here and now I would only store the ages of males in my index, so I could 29 00:02:19,160 --> 00:02:21,200 set gender male here 30 00:02:21,410 --> 00:02:28,890 and this is a normal query expression for equality, greater than, lower than is also supported, exists is supported 31 00:02:28,970 --> 00:02:35,840 and you find the article in the last lecture which also mentions which other queries or filters are supported 32 00:02:35,840 --> 00:02:36,520 here. 33 00:02:36,560 --> 00:02:38,610 So this equal filter is supported 34 00:02:38,660 --> 00:02:45,800 and now what I would create is an index on age, not on gender but on age but only for elements where 35 00:02:45,800 --> 00:02:48,730 the underlying document is for a male. 36 00:02:48,980 --> 00:02:52,720 So this is what I can do, I could have also used the age here though, 37 00:02:52,820 --> 00:02:55,490 so I could have also narrowed down my age filter, 38 00:02:55,910 --> 00:02:58,330 so this would have been an alternative for example, 39 00:02:58,340 --> 00:03:03,560 now I still have the other index created but this would have been alternative, not filtering for another field 40 00:03:03,590 --> 00:03:08,460 but for the same field, the age and then only store elements greater than 60, so whatever you need. 41 00:03:08,570 --> 00:03:13,160 Now I got the male case and therefore we can query for 42 00:03:15,990 --> 00:03:17,520 dob.age, 43 00:03:17,560 --> 00:03:22,320 let's say greater than 60 44 00:03:23,590 --> 00:03:27,820 and pretty print this and we'll see something interesting. 45 00:03:27,820 --> 00:03:32,200 Here we got a male but here, we have a female, 46 00:03:32,210 --> 00:03:37,040 now how does this fit our partial index? 47 00:03:37,830 --> 00:03:40,090 Well let's explain our results here, 48 00:03:40,110 --> 00:03:44,120 so remove pretty and add dot. 49 00:03:44,120 --> 00:03:52,620 explain in front of find. What we see here is that collection scan was performed and not an index scan 50 00:03:53,130 --> 00:03:59,760 because mongodb saw that yes we were looking for a field that is part of an index but it also 51 00:03:59,760 --> 00:04:06,240 determined that since we say nothing about the gender in our query here, it would be too risky to use 52 00:04:06,240 --> 00:04:12,410 the index for that because the index is a partial index and mongodb as a top priority ensures 53 00:04:12,480 --> 00:04:14,480 that you don't lose any data. 54 00:04:14,850 --> 00:04:20,280 So it does not work in a way of naturally filtering out your result sets, 55 00:04:20,310 --> 00:04:27,210 instead what you have to do to use that index is you also have to filter for the gender here, 56 00:04:27,210 --> 00:04:34,170 so if I set gender male here and I explain that, you see now an index scan was performed with the partial 57 00:04:34,170 --> 00:04:40,500 index. And now you might be asking, ok what's the difference between a partial index and a compound index then 58 00:04:40,830 --> 00:04:49,260 if I have to specify both values here? The difference is that for the partial index, the overall index simply 59 00:04:49,260 --> 00:04:50,280 is smaller, 60 00:04:50,310 --> 00:04:53,980 there really are only the ages of males stored in there, 61 00:04:54,100 --> 00:05:01,860 the female keys are not stored in the index and therefore, the index size is smaller leading to a lower 62 00:05:01,920 --> 00:05:09,060 impact on your hard drive and also your right queries are of course also sped up because if you insert 63 00:05:09,060 --> 00:05:13,410 a new female, that will never have to be added to your index. 64 00:05:13,410 --> 00:05:19,050 So this still makes a lot of sense if you often filter for this combination, 65 00:05:19,050 --> 00:05:21,900 so for the age and then only males, 66 00:05:21,930 --> 00:05:28,650 so then a partial index can make sense if you rarely look for your other result, if you rarely look for 67 00:05:28,650 --> 00:05:29,260 women, 68 00:05:29,280 --> 00:05:31,180 this makes a lot of sense. 69 00:05:31,500 --> 00:05:38,910 Whenever mongodb has the impression that your find request would yield more than what's in 70 00:05:38,910 --> 00:05:39,620 your index, 71 00:05:39,660 --> 00:05:45,450 it will not use that index but if you typically run queries where you are within your index, your filtered 72 00:05:45,480 --> 00:05:46,860 or your partial index, 73 00:05:47,040 --> 00:05:49,080 well then mongodb will take advantage of it 74 00:05:49,170 --> 00:05:53,970 and then you benefit from having a smaller index and having less impact with writes. 75 00:05:53,970 --> 00:05:58,950 So again it depends on the application you're writing and whether you often just need a subset 76 00:05:59,130 --> 00:06:04,140 or whether you typically need to be able to query everything, in which case a partial index won't make 77 00:06:04,140 --> 00:06:04,970 much sense.