Spreadsheet tips for taxonomists

A quick reference for Taxonomic Database editing in Excel

N. Hensold, December 2012

       This is a crib sheet useful for common problems in taxonomic database editing. For example -- joining genus and species from two columns into one -- or separating them again. It also shows how to get a unique species list from a specimen database. Lately I'm trying to cobble together a species list for northwestern Peru, using data from three sources, and thought it might be useful to summarize the procedure in a place I can find it again. These are my personal notes, and they work on my machines.

        Note: This is a quick reference, and not intended as a beginner's guide to Excel.  ... Use at your own risk!


General reminders on use of formulas:    

    Enter a formula into the cell where you want to see the result.

    Start a formula with = sign.

    Write the formula in the top row to apply to cells of that row.

       Then you can drag it down to propagate it for all rows. Better yet - activate Fill Handle & double-click on it.

    Remember if you have a header the top row of data is '2'

    After you perform the operation on the whole column, copy the column, and do a "Paste Special" -- "Values Only" -- to get rid of the formulas.


FUNCTION SYNTAX



=FIND("text",cell#,starting position in cell)

   -----> returns numeric value

        [NOTE: New Excel may use SEARCH fxn instead of FIND]



=REPLACE(old text,start_num,num_chars,new_text)



=CONCATENATE (text1&text2&text3....)



=LEN(A2)

= number of characters in Cell A2

To specify a range of cells:

Column A, rows 10-20   = A10:A20

Columns B-E, row 15      = B15:E15

All cells in row 5             = 5:5

All cells in rows 5-10      = 5:10

All cells in column H       = H:H


FUNCTION EXAMPLES

(cell locations 'A2,' 'B2' etc. given below are used as examples only)



To find a space between words:

=FIND(" ",A2,1)

Example: Inga pavoniana -- returns value "5"



To find a second space between words in the same cell:

=FIND(" ",A2,FIND(" ",A2,1)+1)

in other words,

find a "space" in Cell A2, starting at the letter just after the first space you found.



TO SPLIT DATA FROM 1 COLUMN INTO 2 or MORE



To Delete everything after the first space:

=REPLACE(B2,FIND(" ",B2,1),LEN(B2)-FIND(" ",B2,1)+1,"")

 

To Delete everything after the second space:

=REPLACE(B2,FIND(" ",B2,FIND(" ",B2,1)+1),LEN(B2) - FIND(" ",B2,FIND(" ",B2,1)-1),"")



To Delete everything before the first space:

=REPLACE(B2,1,FIND(" ",B2,1),"")



To Delete everything before the second space:

=REPLACE(B2,1,FIND(" ",B2,FIND(" ",B2,1)+1),"")





TO JOIN DATA FROM 2 COLUMNS INTO 1 (with space between)



Create an empty column. In top cell:

=CONCATENATE(B2&" "&C2)





TO CREATE A UNIQUE LIST

For example, from a specimen database with multiple entries of the same name, make a list of unique name occurrences.

UPDATE: Aha! (New version Excel)

Copy columns in question to another sheet. Highlight them. Click "Remove Duplicates"

OLD INSTRUCTIONS (Old version Excel)

Pick a column a couple cells to the right of the main spreadsheet. Make a note of the Column letter, e.g. "Z"

Go to Data -> Filter -> Advanced Filter

Select top cell in the Column you want to create the list from.

In the pop-up box,

    *Click:  "Copy to another location"

    List Range (for Column B, for example):

        $B:$B

    Copy to:

        $Z$1

    *Click:   "Unique Records Only"

 

TO COMPARE CONTENTS OF TWO LISTS

Highlights items which appear in both lists, or in only one list.

In Excel for Mac 2011, this is quick but seems to freeze up the list sometimes after I do it.

1. Select both lists (columns).

2. Click 'Conditional Formatting' --> Highlight Cells Rules --> Duplicate Values [or Unique]

OR

You want to do a pairwise comparison of cells in two aligned columns.

For example, Column A has a species name, Column B has an author from Source 1, Column C author from Source 2.

To find rows with two conflicting values in Columns B and C, do this:

1. Make an empty Column D.

2. In the top cell (D2) enter:

=OR(b2=c:c)     and copy it down the column

3. If values in columns B and C conflict, Column D will be "FALSE".  

OR

You have 2 columns of values and you want to know if any of the values in Column A are NOT found in Column B.

The columns may be of unequal length and unaligned. 

1. Make an empty Column C.

2. Enter in top cell of Column C (C2) the following:

=IF(COUNTIF(B:B,A2)=0,A2,"")   and copy formula to bottom. (For older versions of Excel subsitute semicolons for commas.)

3. This enters in Column C the value from Column A, if this value is not found in Column B.