This box searches only this space. The box at the upper right searches the entire iPlant wiki.

Skip to end of metadata
Go to start of metadata

Project Description

Using the SDM, create distribution maps for 700 species of butterflies using iNaturalist data

  • Species list: https://github.com/jcoliver/ebutterfly-sdm/blob/master/data/gbif/taxon-ids.txt

  • For each species, get all observations (data) for all years

  • For each species, create 13 sets:

    • Everything (all months for all years)

    • Each month (for all years)

  • Run SDM on each dataset (13 per species)

    • Clients’ SDM generates a raster and image

  • Document the process and steps such that it is reproducible.

  • Results:

    • Table of number of records retrieved for each species

      • Total for all years
      • Broken down by month
    • Through Jupyter Notebook: Client types in one or more species names and SDM maps are displayed in that Notebook

Project Update (10/26)

Each team will give a 5 minute stand up report on progress at the beginning of class.

Project Deliverables (Due 10/31)

Teams:  5-6 people per team

Due: 10/31

Class Presentation: In class (10/31)Remember: Be professional in your work

Group Report: (wiki)

These are section headers for your wiki – please include all of them

  • Summary: Summary of Project

  • Description: Description of project.  Please include a concept map/flow chart of work (workflows, programs, dataflow, etc)

  • Results:

    • Description of results

    • Where to find results

    • Use a Jupyter Notebook that lets our clients search for species and retrieve SDMs

  • Code availability: Where code can be obtained

    • GitHub

    • DockerHub

    • Jupyter Notebook

  • Instructions: Detailed instructions for downloading, installing, and using code

    • Make sure to highlight any gotchas!

  • Project Plan/Project timeline:

    • List the milestones and deliverables

    • Who did what on the project

    • Can be shown as a chart:

  • Benchmarking:

Document and explain

    • How long it took to run the full dataset:

    • Software installation

    • Data Staging

    • Data Processing

    • Workflow monitoring

    • Visualization of results

    • Results deposition

  • Presentation:

    • Link to download/access presentation

  • Post-mortem analysis (subpage):

    • What worked well

    • What didn't work well

    • What you would do differently

Teammate Evaluations:

Individual Reports: (D2L)

  • What is container technology? Give two real-world examples of how it is being used with links to the uses.

  • Take two linux command line commands and describe what they do.  Find the author of one of these commands and write their bio.

  • Calculate how much it would cost to do the following in AWS:

    • You have a 10 terabyte dataset.

    • Processing 1 gigabyte of data takes 1 hour and produces 100 megabytes of output

    • The input data can be split and run in parallel with no dependencies (you can split the input data any way you wish).

    • Remember to account for costs to move data into and out of AWS.

    • Use AWS Simple Monthly Calculator and include screen shots of configuration and costs: https://calculator.s3.amazonaws.com/index.html

  • Describe two computational workflows:  one that works well on a high performance computer and one that works well in a distributed/high throughput computer.

  • Describe two best practices for scientific computing and point out how they were used for your midterm.

  • No labels