Databases are also a rich source of data and information that can be used for IAs and for practice IAs. Databases can be tricky as the datasets can be very, very large and tough to manipulate for students who are not used to working with big data.

Here are three characteristics that a good database should have:

  1. Large enough database - The database should have enough data so that students can explore multiple paths of research. For example, the North American Breeding Bird Survey, has large amounts of data, spread over many years and different species that would allow students to explore various research questions.

  2. Manipulation - Students need to be able to manipulate the data through sorting, deleting and comparing data. In chemistry, this could be using the NIST Chemistry WebBook to look at the effect of London dispersion forces on the boiling points of alkanes. In physics, students can analyze information using the Exoplanet Orbit Database, perhaps looking at habitable zones.

  3. Querying - Students need to be able to search databases, usually through the website or using Excel or Google Sheets. For example, the Animal Genome Size Database, allows students to search by animal, genome size and other factors. This allows students to isolate and focus on their area of interest. Students that wish to find relationships between stars can use the Internet Stellar Database. This allows searching for specific stars, or stars based on their galactic coordinates and distance from one another.

Students would need to be taught how to import the data into a spreadsheet and then how to sort it so that a meaningful conclusion could be drawn.

This usually involves being able to download the data in a spreadsheet format and import it into Excel or Google Sheets. Students will need training on how to use the sort functions in spreadsheets before they start this analysis. Pivot tables (excel version) are a powerful tool that students need to be able to manipulate databases effectively

Image: Altmann, Gerd. Pixaby, 2013.

From our experience, here are the most common mistakes made with databases:

  • Students try to pull data from scientific studies. Most of that data is already heavily processed and hard to work with. Students could consider emailing the authors asking for the raw data.

  • Epidemiological studies, such as comparing the human development index to heart disease, do not have a clear biology focus and start to fall into geography. These need to have a specific biology focus. Heart disease compared to diet, for example, can be a better choice than the human development index.

  • Students need to apply the same rigour and through processes, as they would for a wet lab. This includes comments and discussions on the suitability of the database, a detailed procedure and the ethical implications of the database. Remember that the rubric for IAs are the same for a database IA as for a wet lab.