Quick-start to Python for Ecologists¶
Looking to implement AI/ML tools into your pipeline? Here's a quick-start guide with a collection of resources to help you get started.
Getting Started¶
Let's start with a brief introduction to the basic coding scaffolding you'll need:
- Python is an object-oriented programming language that works similarly to R: it has basic built-in functions and gains enhanced functionality by installing packages.
- Programming languages like R, Python can be run in Integrated Development Environments (IDE) (e.g. Rstudio for R, and Visual Studio Code for Python).
- VSCode is a convenient, customizable Integrated Development Environment (IDE) for writing and editing code files in multiple languages, especially Python.
- RStudio supports other languages as well, but is most commonly used for R.
- Spyder is an open-source IDE with a similar visual appearance and functionality to RStudio, so it offers an easy transition for users familiar with the R environment. More advanced users who want added functionality will appreciate VSCode, which can be used for R, Python, and many other coding languages.
- Many programmers find it useful to write scripts in “lab notebook”-style documents that integrate comments, code, and printing and plotting results (e.g., R Markdown Notebooks for R and Jupyter Notebooks for Python).
- Notebooks provide space for exploration and testing, with more immediate feedback and nicely rendered documentation of design decisions alongside the code.
- Jupyter Notebooks can be run and edited in VSCode.
-
Programs and workflows can also be run on the command line, specifically this is running a program through the Command Line Interface (CLI), a.k.a the terminal, shell, console, or "bash shell".
- On Mac or Linux, open "Terminal" to get started.
- On Windows, you'll need to install one first; we recommend Windows Terminal.
- Some useful common commands can be found in the Command Line Cheat Sheet.
-
To run your program through the CLI, save your code in a file with the correct file extension (e.g.
myfile.py), and type into the command line something like
-
As your code evolves beyond "Hello World!", it will likely require you to use some packages or libraries: code written by developers, which you can download and use to make your life easier. For Python packages, you’ll want to use an environment manager like conda or venv.
- Generally in Python, you want to scope a single environment for a particular project or task. Similar to R, specific libraries are loaded for a specific script or project. A key difference in Python is that the environment will also fix a particular version of Python and consist of all packages installed in that specific environment. Python can also selectively import set libraries (similar to R's "library" function) but python can also load single modules (or functions) within libraries used for a particular script.
- Environment management can also be accomplished for R projects, e.g., with renv.
- Learn more about different Python environment managers on the Virtual Environments Page.
- Generally in Python, you want to scope a single environment for a particular project or task. Similar to R, specific libraries are loaded for a specific script or project. A key difference in Python is that the environment will also fix a particular version of Python and consist of all packages installed in that specific environment. Python can also selectively import set libraries (similar to R's "library" function) but python can also load single modules (or functions) within libraries used for a particular script.
- For effective collaboration—with yourself and others—use Git (
git) for version control and sync it to a remote, such as GitHub.- Learn more about the basics of, and motivations for, version control in The Turing Way.
Two great resources for lessons covering these topics:
-
The Missing Semester of Your CS Education: a collection of computer science-themed lessons from MIT.
-
The Software Carpentry Lessons: hands-on, guided introductions to the topics introduced above. Each of the Core Software Carpentry lessons is described and linked to below.
Software Carpentries Lessons¶
Note
You can work through these lessons on your own or check The Carpentries site for upcoming workshops being offered virtually or at a location near you.
Working on the Command Line¶
Lesson: The Unix Shell
Work through each of the episodes in this lesson to gain a familiarity with Unix-based operating system basics. This lesson will prepare you for navigating the shell, i.e., working from the command line. We recommend you complete this before the GitHub lesson, since the Git lesson uses the command line.
Introduction to GitHub¶
Lesson: Version Control with Git
This lesson introduces users to local version control with git through the command line, then builds to interacting with the remote (e.g., https://github.com). It provides a comparison of tools and features introduced in the command line with their analogous UI (user interface) options in the remote (online). It also covers some common conventions and includes discussions of open science (see also The Turing Way's discussion) and some core repository files, such as .gitignore, license, and citation files (also covered in our GitHub Repo Guide).
For those using R, this lesson includes a supplemental section on using Git from RStudio. VSCode also has a Git integration.
Pro tip
Follow the GitHub Workflow Guide to improve collaboration and help avoid conflicts. GitHub Projects are also a particularly powerful tool for collaborative project management.
Basic Python¶
Many machine learning algorithms and workflows are run using Python. If you're not familiar with Python, there are many resources to help you gain familiarity. Below are two Carpentries lessons to get you started:
Warning
Indexing and indentation are different in Python than R. For a more comprehensive comparison, check out this translation of common commands and syntax.
Introduction to Data Analysis with Python¶
This data workshop training was first presented at the Imageomics All-Hands in 2024. It runs through an initial analysis of a simplified dataset, filling in a dataset card as the data is explored, cleaned, and prepared for training. Notebooks and more information can be found in the data workshop repo. To complete the training, follow the below instructions.
Key Packages¶
The key packages used in this workshop are described below, organized by their use-cases.
- Data wrangling:
pandas(DataFrames, the data structure),datasets(for accessing the data from Hugging Face). - Notebooks (where the work gets done):
jupyterlab,ipywidgets.- The notebooks can be run in VSCode or by launching Jupyter from the command line (as done in the tutorial itself).
- Image handling and visualizations:
pillow,seaborn. - Machine learning tools:
scikit-learn,opencv.
Tutorial step-by-step instructions¶
- Clone the workshop repository and follow instructions to set up your local environment.
- Read the Key Learning Objectives.
- Read the Story of the Workshop.
- Follow along with the workshop lesson.
- Review extra notes in the Further Reading section, which contains pointers and links to other resources.
Modeling Overview¶
For more on various training paradigms, see the training paradigms section of the ABC Glossary. The glossary also covers models such as transformers, CLIP, and diffusion models.
For a more general discussion of Machine Learning topics, IBM has a detailed guide.
General hands-on practice¶
- Introductory PyTorch tutorials, which has a full PyTorch machine learning workflow example.
- A GitHub Repo on "How to Read Pytorch", which may help with some foundational concepts.
- A Medium article on training an object detector with Pytorch, if you'd rather read about it first (be warned: there are large codeblocks included).
- Climate Change AI has a number of tutorials across a wide variety of topics and skill levels.
Camera traps¶
Megadector, fine-tuned for your particular setup, is often the go-to when dealing with camera trap data. Check out this YouTube video by Siyu Yang to help you get started.
Bioacoustics¶
See the OpenSoundScape tutorial by Lauren Chronister, Tessa Rhinehart, Sam Lapp, and Santiago Ruiz Guzman, for a conceptual introduction to the classifier training workflow. This will prepare you dive into the OpenSoundScape Documentation and build on the basic tutorials, expanding to your own data and use cases.
Are there more options, you ask?¶
In addition to the resources described on this page, you may also want to check out the following resources from the broader ABC Community:
-
Data Science & Computing Cheat Sheet compiled by Tessa Rhinehart, Lauren Chronister, and Sara Beery with resources from both the Kitzes and Beery Labs. Some content links back to or is described in this guide, but there are other tutorials and resources that are not covered here.
-
Ecological Modeling with AI and Python tutorial by Sara Beery and Timm Haucke as part of the Ecological Forecasting Initiative and the ESA Statistical Ecology Section Statistical Methods Seminar Series.
-
Coding Club offers a number of open-source tutorials covering topics in data analysis, reproducible research, and modeling, all in different languages (including R and Python), as well as the basics of R and Python. Be aware that some content might be out of date (check the last modified date at the top of lessons), as the site does not appear to be actively maintained anymore.