Using the SDM, create distribution maps for 700 species of butterflies using iNaturalist data
For each species, get all observations (data) for all years
For each species, create 13 sets:
Everything (all months for all years)
Each month (for all years)
Run SDM on each dataset (13 per species)
Clients’ SDM generates a raster and image
Document the process and steps such that it is reproducible.
Table of number of records retrieved for each species
- Total for all years
- Broken down by month
- Through Jupyter Notebook: Client types in one or more species names and SDM maps are displayed in that Notebook
Project Update (10/26)
Each team will give a 5 minute stand up report on progress at the beginning of class.
Project Deliverables (Due 10/31)
Teams: 5-6 people per team
Group Report: Wiki
Individual Exam: D2L
Class Presentation: In class (10/31)Remember: Be professional in your work
10-12 min with 3 min for questions.
Slides from Eller, Business Communication
Professional Writing: https://www.dropbox.com/s/z4xru80fdlfvdno/PRO%20writing.pptx?dl=0
Professional Presentation: https://www.dropbox.com/s/i7dld1kay98new5/Presentation%20Skills%20.pptx?dl=0
Group Report: (wiki)
These are section headers for your wiki – please include all of them
Summary: Summary of Project
Description: Description of project. Please include a concept map/flow chart of work (workflows, programs, dataflow, etc)
Description of results
Where to find results
Use a Jupyter Notebook that lets our clients search for species and retrieve SDMs
Code availability: Where code can be obtained
Instructions: Detailed instructions for downloading, installing, and using code
Make sure to highlight any gotchas!
Project Plan/Project timeline:
List the milestones and deliverables
Who did what on the project
Can be shown as a chart:
Document and explain
How long it took to run the full dataset:
Visualization of results
Link to download/access presentation
Post-mortem analysis (subpage):
What worked well
What didn't work well
What you would do differently
You will lose 50% of your final score if this is not completed.
Individual Reports: (D2L)
What is container technology? Give two real-world examples of how it is being used with links to the uses.
Take two linux command line commands and describe what they do. Find the author of one of these commands and write their bio.
Calculate how much it would cost to do the following in AWS:
You have a 10 terabyte dataset.
Processing 1 gigabyte of data takes 1 hour and produces 100 megabytes of output
The input data can be split and run in parallel with no dependencies (you can split the input data any way you wish).
Remember to account for costs to move data into and out of AWS.
Use AWS Simple Monthly Calculator and include screen shots of configuration and costs: https://calculator.s3.amazonaws.com/index.html
Describe two computational workflows: one that works well on a high performance computer and one that works well in a distributed/high throughput computer.
Describe two best practices for scientific computing and point out how they were used for your midterm.