MongoDB Query Language Part 1 – The basics of selecting data from MongoDB
/What we will cover:
- Selecting data
- Filtering results
- Sorting
- Top N
Before you get started:
- Make sure you load the test data in the document found here.
- Download a good MongoDB IDE. I recommend Robo 3T, formerly named Robomongo. https://robomongo.org/
- Once you install Robo 3T and open up the shell, make sure you set the results mode to text, this will make reading the output easier.
- Everything in MongoDB is case sensitive including database names, collection names, users, methods, etc.
- It is important to familiarize yourself with the structure of a JSON document before you begin. This is a good place to start: http://www.json.org/
Selecting Data
There are several different ways you can query data in MongoDB. It is important to note early on, that there is the standard way of retrieving data and then there is the aggregation framework. This can be confusing especially when you start Googling for answers. Sometimes you will see examples that use the standard method for querying data and other times you will see examples that use the aggregation framework. We are going to start with the basic methods and move on to the aggregation framework later in this series.
There are two standard methods for querying MongoDB data, find() and findOne(). find() returns a cursor of results 20 at a time, you can type “it” to keep iterating through the results; findOne() only returns a single document. It’s important to understand the syntax of the find method. In the example below you will see two sets of curly braces, comma separated, enclosed in a set of parens—here is what all of that means:
Userprofile is our collection (table) and find is the method we will use to select data from the collection followed by a set of parens. Inside your parens the first set of curly braces represents your filters while the second set of curly braces represents the data you want to return. Either set of curly braces can be excluded, in which case both of the statements below are functionally equal. Run either of the find statements below to select all documents from the userprofile collection.
db. userprofile.find( {}, {} )
is equivalent to
db. userprofile.find()
Now let’s look at the findOne() method. This is just like find() except that findOne() only returns a single document whereas find() returns a cursor of documents. Note the letter case, findone() is not the same as findOne(). Run the statement below to select a single document from the userprofile collection.
db.userprofile.findOne()
If we want to query in to an array of objects we can use what’s called dot notation. In our userprofile collection we have an array of objects named friends. If we wanted to filter our results by a value in an embedded object we would use dot notation. In the query below we call the name from the friends array using “friends.name” where the user name is Martha Solis. This returns the document where this name exists in the array. You can use dot notation to drill down multiple levels of embedded arrays.
db. userprofile.find({"friends.name":"April Hammond"})
Filtering
Now that we can return documents from our collection, let’s add a filter to our query. You can think of this as something like a WHERE clause in SQL, in MongoDB we typically refer to this as matching. Using the first set of curly braces we are going to return only documents where the isActive status for the user is true. Notice I am excluding the second set of curly braces, if we want to return the entire document we don’t need to include the second set of curly braces.
db. userprofile.find({"isActive":true})
All of the standard comparison operators are also available to filter on such as greater than, less than, in, etc. If we wanted to find all users where the isActive status is true and their age is greater than 25 we could use the following.
db. userprofile.find({"isActive":true, "age": { $gt: 35 }})
Notice that all filters are still contained in that first set of curly braces. Each filter is comma separated and when using a comparison operator you will have another set of curly braces that represent the comparison object. Here is a common list of comparison operators.
$eq – Equals
$gt – Greater Than
$gte – Greater Than or Equals
$lt – Less Than
$lte – Less Than of Equal
$ne – Not Equal
$in – Where Value In
$nin – Where Value Not In
In our last example we are combining two filters to what amounts to, effectively, isActive = true AND age is > 35. Another way to write this is using a logical query operator, in the following example we will use the $and logical operator. Notice the enclosures of the code - the $and comparison object has an array of values it compares. Arrays are always contained in square brackets.
As our scripts get longer, good code formatting will be essential for reading and debugging. I would strongly encourage good formatting practices when writing code for the MongoDB shell or any programming language for that matter.
db.userprofile.find({ $and:[ { "isActive":true, "age":{ $gt:35 } } ] })
Here is a list of logical operators we can use like the $and operator above:
$or – Performs a logical OR operation
$not – Performs a logical NOT operation
$nor – Performs a logical NOR operation. Only returns the results when all inputs are negative.
Now what if we don’t want the entire document returned but instead only want to return each users name, their status, and their age? Now we are going to use the second set of curly braces to specify the values we want to return. This process of limiting the columns returned is called projection in MongoDB.
db.userprofile.find({ $and:[ { "isActive":true, "age":{ $gt:35 } } ] }, { "name":1, "age":1, "isActive":1, "_id":0 })
To select certain name/value pairs we call the name followed by a colon and a 1 or 0, a true/false Boolean. So we are asking for the name, the age, and the isActive values in all documents in the collection. Notice we include “_id”:0, this excludes the system _id from being returned in the query. By default, the system _id will always be returned unless you exclude it as we have done above.
Sorting
Sorting a result is one of the more common tasks in data analysis. We need to be able to order the data in a certain way so that important values are at the top of the result. We can accomplish this using the sort() method. The sort method follows the end of the find() method. You specify the name/value pair you want to sort on and then the direction, ASC ( 1 ) or DESC ( -1 ). Run the statement below to sort by the age in descending order.
db.userprofile.find({ }, { "_id":0, "name":1, "age":1 }).sort ({ "age":-1 })
In addition to the sort, we use projection in this query but there are no data filters. Notice the empty set of curly bracers before the projection. If you recall the first set of curly braces is the filter, the second set it the projection.
Top N
What if we only want to return the first 5 documents from the collection? We can accomplish this using the limit() method. The limit() method is similar to the TOP function in SQL. The limit method is applied at the end of the find() using .limit(). The limit method is called a cursor method because it is applied to a cursor result, which is what the find() method produces. In the query below, we are only returning the first 5 documents from the table.
db.userprofile.find({ }, { "_id":0, "name":1, "gender":1, "company":1, "email":1 }).limit(5)
Putting it all together
Now we need to put all the pieces together. In the following query we will apply everything we learned in the previous steps. We are going to select the name, age, and email address of the top 5 youngest active users. This will use a filter, a projection, a sort, and limit the number of results to 5.
// Add commets to your code using a double forward slash. db.userprofile.find({ isActive:true // This is our filter }, { "_id":0, "name":1, "age":1, "email":1 }).sort({ // Here is our sort "age":1 }).limit(5) // And this is limiting the results to 5
This is the first step to querying MongoDB. The above queries are the most basic building blocks for querying MongoDB. Once we get in to the aggregation framework, it gets a little more complex. We will begin to cover the aggregation framework in the next part in this series.