Cookies help us deliver our services. By using our services, you agree to our use of cookies. More information

Difference between revisions of "AGGREGATE examples"

From NoSQLZoo
Jump to: navigation, search
Line 18: Line 18:
 
<div class='extra_space' style='width:1em; height:6em;'></div>
 
<div class='extra_space' style='width:1em; height:6em;'></div>
  
<div class=q data-lang="py3">
+
<div class=q data-lang="mongo">
 
<code>$match</code> performs queries in a similar way to <code>find()</code>
 
<code>$match</code> performs queries in a similar way to <code>find()</code>
 
<p class=strong>Show all the details for France</p>
 
<p class=strong>Show all the details for France</p>
 
<pre class=def>
 
<pre class=def>
pp.pprint(list(
+
db.world.aggregate([
    db.world.aggregate([
+
    {$match:{name:"France"}}
        {"$match":{"name":"France"}}
+
])
    ])
 
))
 
 
</pre>
 
</pre>
 
<div class=ans>
 
<div class=ans>
pp.pprint(list(db.world.aggregate([{"$match":{"name":"France"}}])))
+
db.world.aggregate([{$match:{name:"France"}}])
 
</div>
 
</div>
 
</div>
 
</div>
  
<div class=q data-lang="py3"><code>limit</code> sets the amount of documents to be handed to the next stage in the pipeline.
+
<div class=q data-lang="mongo"><code>limit</code> sets the amount of documents to be handed to the next stage in the pipeline.
 
<p class=strong>Return the first document</p>
 
<p class=strong>Return the first document</p>
 
<pre class=def>
 
<pre class=def>
pp.pprint(list(
+
db.world.aggregate([
    db.world.aggregate([
+
    {$limit:1}
        {"$limit":1}
+
])
    ])
 
))
 
 
</pre>
 
</pre>
 
<div class=ans>
 
<div class=ans>
pp.pprint(list(db.world.aggregate([{"$limit":1}])))
+
db.world.aggregate([{"$limit":1}])
 
</div>
 
</div>
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class=q data-lang="mongo">
 
<code>$project</code> selects what fields to display.<br/>
 
<code>$project</code> selects what fields to display.<br/>
 
It can also has the ability to create new fields and to compare fields against each other without using <code>$where</code>
 
It can also has the ability to create new fields and to compare fields against each other without using <code>$where</code>
Line 57: Line 53:
 
</div>
 
</div>
 
<pre class=def>
 
<pre class=def>
pp.pprint(list(
+
db.world.aggregate([
    db.world.aggregate([
+
    {$match:{area:{$ne:0}, continent:"Asia"}},
        {"$match":{"area":{"$ne":0},"continent":"Asia"}},
+
    {$project:{
        {"$project":{
+
        _id:0,
            "_id":0,
+
        name:1,
            "name":1,
+
        density: {$divide: ["$population","$area"]}
            "density": {"$divide": ["$population","$area"]}
+
    }}
        }}
+
])
    ])
 
))
 
 
</pre>
 
</pre>
 
<div class=ans>
 
<div class=ans>
pp.pprint(list(db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}])))
+
db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}])
 
</div>
 
</div>
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class=q data-lang="mongo">
 
Because aggregate is a pipeline stages may be repeated, and stages don't have to be used in a specific order.
 
Because aggregate is a pipeline stages may be repeated, and stages don't have to be used in a specific order.
 
<p class=strong>Show the name of Asian countries with a density that's over 500 people per km<sup>2</sup>. (population/area)</p>
 
<p class=strong>Show the name of Asian countries with a density that's over 500 people per km<sup>2</sup>. (population/area)</p>
 
<pre class=def>
 
<pre class=def>
pp.pprint(list(
+
db.world.aggregate([
    db.world.aggregate([
+
  {$match:{area:{$ne:0},continent:"Asia"}},
        {"$match":{"area":{"$ne":0},"continent":"Asia"}},
+
  {$project:{
        {"$project":{
+
      _id:0,
            "_id":0,
+
      name:1,
            "name":1,
+
      density: {$divide: ["$population","$area"]}
            "density": {"$divide": ["$population","$area"]}
+
  }},
        }},
+
  {$match:{density:{$gt:500}}}
        {"$match":{"density":{"$gt":500}}}
+
])
    ])
 
))
 
 
</pre>
 
</pre>
 
<div class=ans>
 
<div class=ans>
pp.pprint(list(db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}},{"$match":{"density":{"$gt":500}}}])))
+
db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}},{"$match":{"density":{"$gt":500}}}])
 
</div>
 
</div>
 
</div>
 
</div>
  
<div class=q data-lang="py3">
+
<div class=q data-lang="mongo">
 
<code>$sort</code> allows ordering of the results set, where 1 is ascending and -1 is descending.<br/>
 
<code>$sort</code> allows ordering of the results set, where 1 is ascending and -1 is descending.<br/>
 
Note that not including <code>$match</code> is the same as <code>{"$match":{}}</code>
 
Note that not including <code>$match</code> is the same as <code>{"$match":{}}</code>
 
<p class=strong>Show the name of all countries in descending order.</p>
 
<p class=strong>Show the name of all countries in descending order.</p>
 
<pre class=def>
 
<pre class=def>
pp.pprint(list(
+
db.world.aggregate([
    db.world.aggregate([
+
    {"$project":{
        {"$project":{
+
        "_id":0,
            "_id":0,
+
        "name":1,
            "name":1,
+
    }},
        }},
+
    {"$sort":{
        {"$sort":{
+
        "name":-1
            "name":-1
+
    }}   
        }}   
+
])
    ])
 
))
 
 
</pre>
 
</pre>
 
<div class=ans>
 
<div class=ans>
pp.pprint(list(db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}])))
+
db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}])
 
</div>
 
</div>
 
</div>
 
</div>

Revision as of 15:52, 3 December 2015

#ENCODING
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-16')
#MONGO
from pymongo import MongoClient
client = MongoClient()
client.progzoo.authenticate('scott','tiger')
db = client['progzoo']
#PRETTY
import pprint
pp = pprint.PrettyPrinter(indent=4, width=160)

Introducing the aggregation framework

These examples introduce the aggregation framework and its operators. Again we will be using the collection world

$match performs queries in a similar way to find()

Show all the details for France

db.world.aggregate([
    {$match:{name:"France"}}
])

db.world.aggregate([{$match:{name:"France"}}])

limit sets the amount of documents to be handed to the next stage in the pipeline.

Return the first document

db.world.aggregate([
    {$limit:1}
])

db.world.aggregate([{"$limit":1}])

$project selects what fields to display.
It can also has the ability to create new fields and to compare fields against each other without using $where

Show the name and population density of all Asian countries. (population/area)

Note that "density" is a new field, made from the result of dividing two existing fields, and that $divide is an aggregate function.

To avoid diving by 0 insert a $match to remove any countries with 0 area (Vatican City), then pipe these results through to $project
There is no need to check if values are null, MongoDB will ignore these documents.

db.world.aggregate([
    {$match:{area:{$ne:0}, continent:"Asia"}},
    {$project:{
        _id:0,
        name:1,
        density: {$divide: ["$population","$area"]}
    }}
])

db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}}])

Because aggregate is a pipeline stages may be repeated, and stages don't have to be used in a specific order.

Show the name of Asian countries with a density that's over 500 people per km2. (population/area)

db.world.aggregate([
   {$match:{area:{$ne:0},continent:"Asia"}},
   {$project:{
       _id:0,
       name:1,
       density: {$divide: ["$population","$area"]}
   }},
   {$match:{density:{$gt:500}}}
])

db.world.aggregate([{"$match":{"area":{"$ne":0},"continent":"Asia"}},{"$project":{"_id":0,"name":1,"density":{"$divide":["$population","$area"]}}},{"$match":{"density":{"$gt":500}}}])

$sort allows ordering of the results set, where 1 is ascending and -1 is descending.
Note that not including $match is the same as {"$match":{}}

Show the name of all countries in descending order.

db.world.aggregate([
    {"$project":{
        "_id":0,
        "name":1,
    }},
    {"$sort":{
        "name":-1
    }}  
])

db.world.aggregate([{"$project":{"_id":0,"name":1,}},{"$sort":{"name":-1}}])

Grouping

Grouping provides accumulator operations such as $sum
All groups must have an _id. To see why this is useful imagine the following:

So far you've been using the collection world
As every country has a continent, it would make sense to have countries as a nested document inside continents: e.g:

[    
    {"name":"Africa",
     "countries":[
         {"name":"Algeria",    "capital":"Algiers",   ...},
         {"name":"Angola",     "capital":"Luanda",    ...},
         {"name":"Benin",      "capital":"Porto-Novo",...}.
         {...},
         ...
    ]},
    {"name":"Asia",
     "countries":[
         {"name":"Afghanistan","capital":"Kabul", ...},
         {"name":"Azerbaijan", "capital":"Baku",  ...},
         {"name":"Bahrain",    "capital":"Manama",...},
         {...},
         ...
    ]},
    {...},
    ...
]

The world collection isn't like this however. It uses the following structure, which has a redundancy where continent is repeated for each country.

[ 
    {"name":"Afghanistan","capital":"Kabul",  "continent":"Asia",  ...},
    {"name":"Albania",    "capital":"Tirana", "continent":"Europe, ...},
    {"name":"Algeria",    "capital":"Algiers","contiennt":"Africa",...},
    {...},
    ...
]

The code to group by continent is "_id":"$continent"
If instead the question was to group by country the code would be "_id":"$name".
To operate over the whole document (which would have the same effect as "_id":"$name") "_id":"null" or "_id":None can be used.

$max and $min can be used to get the largest and smallest values in a group.

Get the smallest and largest GDPs of each continent.

pp.pprint(list(
    db.world.aggregate([
        {"$group":{
            '_id':'$continent',
            'min':{"$min":"$gdp"},
            'max':{"$max":"$gdp"}
        }},
        {"$project":{
            "_id":1,
            "min":1,
            "max":1
        }},
    ])
))

pp.pprint(list(db.world.aggregate([{"$group":{'_id':'$continent','min':{"$min":"$gdp"},'max':{"$max":"$gdp"}}},{"$project":{"_id":1,"min":1,"max":1}},])))

Some other useful aggregate functions to know are $sum and average: $avg
This example combines all the material in these examples.

Order the continents in descending order by total GDP, Include the average GDP for each country.

pp.pprint(list(
    db.world.aggregate([
        {"$match":{}},
        {"$group":{
            "_id":"$continent",
            "Total GDP": {"$sum": "$gdp"},
            "Average GDP": {"$avg": "$gdp"}
        }},
        {"$sort":{
            "Total GDP":-1
        }},
        {"$project":{
            "Area":"$_id",
            "Total GDP": 1,
            "Average GDP":1,
            "_id":0
        }}
    ])
))

pp.pprint(list(db.world.aggregate([{"$group":{"_id":"$continent","Total GDP":{"$sum":"$gdp"},"Average GDP":{"$avg":"$gdp"}}},{"$sort":{"Total GDP":-1}},{"$project":{"Area":"$_id","Total GDP":1,"Average GDP":1,"_id":0}}])))

Using Conditions

$cond is similar to a CASE statement in other languages.
It has the form "$cond": [{<comparison> :[<field or value>,<field or value>]},<true case>,<false case>]

pp.pprint(list(
    db.world.aggregate([
        {"$group":{
            "_id":{
                "$cond": [{"$eq":["$continent","Eurasia"]},"Europe","$continent"]
            },
            "area":{"$sum": "$area"}
        }},
        {"$sort":{
            "area": -1
        }},
        {"$project":{
            "_id":1,
            "area":1
        }}
    ])
))