The Grammar According to West

by Douglas B. West

Summary

I have been accumulating observations about writing mathematics for many years. These conclusions arose both from writing textbooks and from noting writing errors commonly made by my thesis students and in papers submitted to journals.

My first objective was to train my students, thereby reducing the time needed to edit their theses. As the document grew, I made it publicly available in the hope that others may find it useful. I have received a broad range of responses, mostly positive. If you don't find it useful (or if you object to it on principle), then please ignore it. I hope to make some writers of mathematics (especially students and non-native speakers of English) aware of issues they may not have considered; small changes can produce mathematical writing that is easier to read by wider audiences.

After an introductory explanation of why care in writing mathematics is needed, I discuss (1) mathematical style, (2) notation and terminology, (3) punctuation and English grammar as used in mathematical writing, and (4) English usage for non-native speakers. Some points are minor distinctions, but even these make mathematical writing clearer when used consistently. My intent is not to make writing rigid, but rather to make it transparent to avoid distracting the reader by ambiguities or awkwardness in the flow of the narrative.

I note that a number of other articles and even books have been published on writing mathematics; see some of these here. I list these just to make alternative viewpoints available, without judgment pro or con. One contrast to note is that my list below focuses on relatively specific items. I have not yet included much discussion of the general structure of an article; some of these references discuss that.

Index of specific items

Mathematical style
Abstract/Intro/Conclusion
syntax for definitions
"where" in definitions
"double-duty" definitions
"Let G=(V,E) be a graph"
expressions as units
separation of formulas
notation starting sentence
"let x,y be"
conditions in parentheses
mixing words & notation
"Let .... Then"
"When/For/Since"
"As/For" as reasons
"Hence/Thus/Therefore"
"by Theorem X"
"so" vs. "so that"
"such that" vs. "so that"
"Assume/Suppose/Let"
"any/each/every"
universal quantifiers
"less/fewer"
sets vs. sizes
possessives on notation
nested proofs
"best possible"
numerals vs. words
Terminology/Notation
":=" (for definitions)
":" vs. "|" for "such that"
"sequence/series/list"
"v1,v2,…,vn"
lists of relations
"k=1,2,...,n"
"Big Oh" notation
"maximum degree Δ"
hyphens for parameters
vertex vs. edge parameters
two-word adjectives
adverbs and "well-known"
notation for paths
order and size of graph
"h∈G"; graphs are not sets
digraphs and hypergraphs
connected components
"maximal" vs. "maximum"
multicharacter operators
"induct on", "by induction"
"clique" or "complete subgr."
isomorphism vs. subgraphs
"proper coloring"
"partitions" vs. "parts"
"pairwise" vs. "mutually"
"pairwise disjoint/isomorphic"
"union/join"
edge or path "between"
set minus
"left hand side"
English usage
introductory words
quotations/periods
which/that
antecedents
naked "this"
"distinct/unique"
contractions
"i.e." and "e.g."
"different than"
articles ("a/the")
possessives & titles
capitalization & titles
adjectival names
conjunctions & commas
semicolons
excessive commas
serial comma
appositives
passive voice
"the below"
"either"
"we have been proving"
"non-" include hyphen?
"placement of citation"
For Non-native speakers
"bound of"
"a joint work"
"few" vs. "a few"
"usual"
"partial case"
"passing a vertex"
"can not" and "may be"
"evidently"
"principal" vs. "principle"
more extra commas
expressions to avoid

Introduction and Motivation

Live mathematical conversations use many shortcuts that are inappropriate in precise mathematical writing. The context is known by all participants, and shortcuts evolve to save time. Furthermore, the speaker can immediately clarify ambiguity. Without immediate access to the author, written mathematics must use language more carefully. Also, mathematical concepts are abstract, without context from everyday experience, so the writing must be more consistent to make the meaning clear. Outside mathematics, imprecise writing can still be understood because the objects and concepts discussed are familiar.

Some mathematicians object to some of my recommendations. Many time-honored practices in the writing of mathematics are grammatically incorrect. These mistakes in writing cause no difficulty for readers with sufficient mathematical sophistication or familiarity with the subject, but it is unnecessary to restrict the audience to such readers. A bit of care leads to clearer writing that makes mathematics more easily accessible and readable to a wider and less specialized audience.

Some languages have conventions of usage or grammar that lead to typical errors in English mathematical writing by their native speakers. I discuss some such items in a separate section at the end. I use some terminology for English parts of speech and punctuation in the explanations. I hope that readers who are unfamiliar with these terms will still benefit from seeing the recommendations.

I apologize in advance for my own grammatical errors. Habits die hard, and it is easy to err in applying principles of writing. In particular, there are inconsistencies between what I propose here and what I wrote in my earlier books. Those books were written in the previous millennium, and I have learned many things about writing since then. Also, I am a speaker of American English, and some points are consistently different in British English (such as the treatment of "which" vs. "that" and the aversion to serial commas).

Some of my conclusions conflict with manuals of English style. My conclusions are intended to produce clear mathematical writing that is more logically consistent than publishers' conventions. This applies especially to punctuation and to words that serve as logical connectives.

I welcome corrections, suggestions/inquiries, and "pet peeves" that may lead to further items in later versions of this guide.

    Mathematical style

  1. Abstract, Introduction, and Conclusion. We begin with the overall structure of a research article in mathematics. The abstract states the results as fully as possible in a brief presentation. Crucial specialized terms the reader needs to know to understand the statements should be defined. The abstract stands on its own, especially in the age of electronic communication where it may be separate from the rest of the paper, and hence it contains no numbered reference to the bibliography.

    The first section of the paper is an "Introduction" that should motivate the problem, discuss related results, state the results more completely, and perhaps summarize the techniques or the structure of the paper or crucial definitions.

    The introduction should also contain any concluding remarks or key conjectures. There is generally little or no value in a separate section of concluding remarks. Such remarks either are redundant or contain information that readers usually seek in the introduction. Readers who study the full details of the proofs have no need of a summary in a concluding section. Readers who do not read the full details have no desire to go on to a concluding section. A mathematical research article is not read like a novel or even like an essay that seeks to "persuade" the reader; it does not need an epilogue.

  2. Definitions. Words being defined should be distinguished by italics (or perhaps boldface in a textbook context). When italics are used to indicate a word being defined, it is unnecessary to use "called" or "said to be"; the use of italics announces that this is the term being defined and replaces these words.

    Many definitions are phrased as "An object has property italicized term if condition holds." Here we use the word "if" even though subsequently it is understood that an object satisfies the property being defined if and only if the given condition holds. The italicization alerts the reader to this situation. The convention can be justified by saying that the property or object does not actually exist until the definition is complete, so one does not yet in the definition formally say that the named property is equivalent to the condition.

    Definitions written by non-native speakers sometimes contain extra commas. In each sentence below, the comma should be deleted.
       "A bipartite graph, is a graph that is 2-colorable".
       "A graph is bipartite, if it is 2-colorable".
    The first example is a mistaken placement of a comma inside a clause (see discussion of Commas).

    Note the difference in italicization above. When written as an adjective-noun combination, the term being defined is the name for structures that have the property; hence the full term bipartite graph is italicized. When the property alone is being defined and is positioned as a predicate adjective, only the adjective is italicized.

  3. "Where". A formula may contain notation that has not yet been defined, if the definition of that notation follows immediately in the same sentence. The formula is then followed by a comma and the word "where" to introduce the definition. For example, "If G is a bipartite graph, then χ'(G)≤Δ(G), where χ'(G) is the edge-chromatic number and Δ(G) is the maximum degree of G." (Technically, the second comma is needed because the subsequent definition is an appositive).

    Note the difference between "where" and "such that". "Where" is used when the preceding notation is being defined; "such that" is used when it is already defined and its value is being restricted.

  4. Double-Duty Definitions. One cannot make a statement about an object before the object has been defined. Similarly, one cannot use notation as part of a formula making a statement about the denoted object unless the notation has previously been defined. In particular, these tasks cannot correctly be accomplished at the same time with one instance of the notation. For example, "The neighborhood of a vertex v is N(v)={u: uv∈ E(G)}" is incorrect. With a subject and a verb before the equation, the equation is a single unit (see expressions as units). This sentence defines the neighborhood of v to be a particular equation, and it does not define the notation N(v).

    Of course, readers sufficiently familiar with the context have no trouble understanding what is meant, but why disenfranchise other readers? One can just as easily write "The neighborhood of a vertex v, denoted N(v), is {u: uv∈ E(G)}". Alternatively, one can introduce the notation as an appositive in a conventional position immediate after the term defined: "The neighborhood N(v) of a vertex v is {u: uv∈ E(G)}".

    A common Double-Duty definition is "Let G=(V,E) be a graph". The sentence defines the equation G=(V,E) to be a graph. Of course, the writer intends simultaneously to introduce notation for a particular graph and its vertex set and edge set, but that is not what the sentence says. It is better to write "Let G be a graph" and use operators V and E to refer to the vertex and edge sets of G as V(G) and E(G) (see also Operators vs. constants.)

    A more subtle example is "For each 1≤ i≤ n,". The introduction of the notation i has been lost because the inequalities impose conditions on it before it is defined. Since the expression is a unit, grammatically the phrase is referring to each inequality written in this way. Correct alternatives that express the intended meaning include "For all i such that 1≤i≤n", "For i∈[n]", and "For 1≤i≤n". The third option is slightly different from the others; it means "whenever i is such that the conditions hold", implicitly introducing i in a specified range but avoiding the grammatical problem.

  5. Expressions as units. Is an equation or inequality a noun unit, or is it read with the relational symbol as a verb? Treating the symbol as a verb often forces rereading to clarify the meaning, and one often wants to have another verb in the sentence. For these reasons, it is best to treat notational expressions as single objects (that is, nouns), with some exceptions.

    For example, "there exists i<j with xi=xj" ascribes a property to the inequality i<j (and is a Double-Duty Definition of i). Without context, it is hard to tell that the author meant "there exists i such that i<j and xi=xj". Consider also "The number of nonneighbors is n-1-d(u)≥ i." The number of nonneighbors is not an inequality, it is a number; the author is trying to make two statements in one inequality. For clarity, separate the statements: "The number of nonneighbors is n-1-d(u), which is at least i".

    Exceptions. Applying this principle with very simple expressions leads to ponderous writing. Here are two notable exceptions:
       1) In "Choose x∈ V(G) such that x has minimum degree," we are choosing x, not the expression "x∈ V(G)". The justification for this exception is that the membership or containment symbol is read as "in", which is not a verb. (One can treat nonmembership in the same way.)
       2) "Let G'=G-x". When introducing notation for an object or expression by a single imperative verb ("let", "set", "put", "choose", etc.), we read the equality symbol as the verb "equal", truly an exception. This exception can be recognized by the lack of any English verb in the sentence. Continuing with another verb, as in "Let G'=G-x be ...", would produce a Double Duty Definition.

    If the introductory part of the sentence is longer, then we may already have a noun and a verb, and the expression again becomes a unit. For example, "Include each vertex independently with probability p=(ln n)/n" should be "Include each vertex independently with probability p, where p=(ln n)/n".

  6. Separation of formulas. Avoid placing two formulas consecutively, separated only by a comma. For example, "For x<0, x²>0" may be read as something other than a hypothesis and a conclusion. Similarly, "For some k with k<n, n-k+f(n)<n/2" requires the reader to stop and go back to insert the missing words. The difficulty arises because commas occur also in notation, and the eye cannot distinguish between commas that occur in notation and commas that are intended to cause a pause or to substitute for words. The mathematics will be easier to read when the formulas are separated by the comma plus words that enable the reader to understand the sentence at the first reading. Such phrases are "it follows that", "we have", etc.

    [On the other hand, "we have" is an awkward phrase that often should be dropped when not needed to separate formulas. For example, instead of "By the preceding theorem, we have A=B," prefer "By the preceding theorem, A=B".]

    When the second formula just specifies an object, the separation can be accomplished by specifying the type of object, as in "When k=2, the graph G is Eulerian" instead of "When k=2, G is Eulerian." One can always rewrite notational expressions separated by a comma to avoid the difficulty. Usually this is easy, as in changing "For every bipartite graph G, χ(G)≤2" to "If G is bipartite, then χ(G)≤2".

  7. Initial notation. Never begin a sentence with notation. Always one can prepend a specifier (such as "The graph G is" instead of "G is") or rewrite the sentence in another way to avoid starting with notation. Following this rule makes mathematics easier to read. The principle here is similar to the separation of formulas. An exception is that the statement of a numbered theorem may begin (or be entirely) a formula, because the numbered designation serves as a label that begins the sentence.

  8. Lists of size 2. It is common but ungrammatical to write "Let x,y be vertices in G"; we would not write "My friends John, Mary came to dinner." The concatenation is an instance of two formulas separated by a comma. To see what can go wrong, consider the following clause: "Since a|b and a,b are maximal and minimal,". What was meant was: "Since a|b, with a maximal and b minimal,". In general, the comma within a list of two elements should be replaced with "and" when discussing the two elements as individual items. For example, "If x,y are adjacent" should be "If x and y are adjacent" or "If {x,y} is a pair of adjacent vertices".

    Exceptions. With lists of size at least three, omission of "and" does not cause as much confusion, and including it can be awkard. Here the objection to the common mathematical convention is much weaker: we accept "Let x,y,z be the vertices of T," although writing "Let {x,y,z} be the vertex set of T" would be more precise. Still, "let x, y, and z be the vertices" reads better.

    Another sensible exception is "Choose x,y∈ V(G)". Here the relation is between each variable and the set, and we accept this as a single formula. Again a justification is that we can read as the single word "in", without a verb. Similarly, many mathematicians write, "For n,m≥2" to mean the conjunction of n≥2 and m≥2. The exception for the membership symbol is consistent with other exceptions for the membership symbol; doing it with inequalities is more questionable. Avoid doing it with equalities (see Variable equal to list). it unnecessarily requires a pause for the reader to figure it out.

  9. Parenthetic or wordless restrictions. Many writers of mathematics impose restrictions parenthetically or via commas, thereby omitting words in sentences and juxtaposing formulas. This makes reading unnecessarily difficult. Parentheses around notation are mathematical objects and hence cannot substitute for words. A phrase like "Let m(m≤n) be the size" is immediately clear only to the author.

    Other examples: "Suppose there is an edge xy (≠e) in G" should be "Suppose that G has an edge xy other than e". Similarly, "For k≤m with k even" improves on "For k≤m (k even)" or "For k≤m, k even", and "Consider ai for 1≤i≤n" is better than "Consider ai (1≤i≤n)". One can also separate by putting words into the parentheses: "For k≤ m (where k is even)". Note that "Suppose that there is an edge xy≠e in G" is a Double-Duty Definition; "xy≠e" is not an edge.

  10. Mixing words and notation. Words cannot be compared with notation via a relational symbol. Do not write "Consider a graph G with maximum degree ≤ k". Grammatically, the sentence does not indicate where the inequality starts. If one side is written in words, then the relation must also be written in words. Furthermore, the sentence above says that the maximum degree of G equals the expression "≤ k".

    The same principle applies to logical symbols. In written mathematics, do not use the symbols ∃,∀,⇒,iff) to substitute for words in sentences. Shorthand notation used to save space on lecture slides need not follow these restrictions, since the slides summarize the lecture and are accompanied by oral explanation.

  11. Statements of implication ("Let ... Then"). The common two-sentence mathematical construction
         "Let hypothesis. Then conclusion."
    is grammatically incorrect. The second sentence is not a sentence, since the implicative sense of "then" plays the role of a
    conjunction. The simpler form
         "If hypothesis, then conclusion."
    is less choppy, easier to read, grammatically correct, and faithful to the mathematical sense of a conditional statement. When there are many hypotheses, resulting in too long a sentence, some creativity can be applied. First a sentence (perhaps beginning with "Let") sets the context. The last crucial hypothesis is saved for a statement of implication, using the "If/then" form.

    Used at the beginning of a sentence, the English word "Then" is temporal, as in "Then we left." Since the implicative sense of "then" is so common in mathematics, the temporal sense should rarely be used, to avoid confusion. Usually the temporal "then" at the beginning of a sentence can be changed to "Now" or "Next" with less confusion and essentially the same (and more accurate) meaning, especially in a proof.

    Writing "Let . . ., then . . ." in one sentence is similarly problematic. This sentence is constructed from one correct sentence by changing the first clause to a completely different sentence. The content is that of a conditional statement, but it is not written as a conditional statement. This style seems to result from a conscious effort to subordinate language to jargon.

  12. Words of hypothesis: "If", "When", "For", "Since". For ease of understanding, a sentence that begins with "If" should later have ", then" to start the conclusion. The word "then" should not be omitted, and a comma should precede it. The comma can be omitted in a brief implication contained within a clause already set off by a comma, as in "Since f is the squaring function, if x=0 then f(x)=0".

    When readability would be improved by omitting "then", the sentence should instead start with "When" or "For", as in this sentence itself. A comma still follows the condition introduced by "When" or "For". The structure of a sentence beginning with "Since" is like those beginning with "When" or "For"; a comma follows the first clause. After "Since" or "Because", the concluding clause cannot begin with "then" or "so"; "then" is used only with "If".

  13. "As" and "For" introducing reasons. In English, the words "as" and "for" may be used to introduce a reason given after the statement of the conclusion from that reason. For example: "I ate early today, for I was hungry," or "He stopped writing his answer, as time had expired." Banish these uses from mathematical writing; they introduce confusion, especially for non-native readers. "As" also means "like", and "for" is most often used to specify a universe. Compare "The degree is at least one, for a vertex in the neighborhood" with "The degree is at least one, for a vertex in the neighborhood is not isolated." The meanings of "for" differ, but the reader does not discover that until the end of the sentence.

  14. Words of conclusion: "Hence", "Thus", "Therefore" A long proof does not fit in a single sentence; hence often one needs a word to start a sentence that states a conclusion. Among the choices are "Therefore", "Hence", and "Thus". Purists (and copy editors) desire a comma after every such introductory word or phrase (as they do after "Finally", "On the other hand", "In 1965", etc.). This can make language overly formal.

    Among these choices, I treat "Therefore" as the most formal, introducing a major conclusion and hence taking a comma. Because "Hence" and "Thus" are single syllables, I use them without commas to indicate the flow of argument without making the writing choppy. This choice modifies strict English punctuation in the service of mathematical understanding. It is not incorrect to put commas after all these introductory words, but it enhances mathematical communication to omit the commas after short words introducing short conclusions that are just a step along the way. Copy editors put in the commas, and I insist that they be removed again.

  15. "by theorem X". Consider the sentence "Since G has at least 3n-5 edges, by Theorem X, we know that G is not planar." Does Theorem X imply that G has at least 3n-5 edges or that G is not planar? Since the reader will not know the author's intent, "by Theorem X" should not be placed between a reason and a conclusion without clearer indication of which meaning is intended. Dropping one comma makes it clearer, as in "Since G has at least 3n-5 edges, by Theorem X we know that G is not planar" or "Since G has at least 3n-5 edges by Theorem X, we know that G is not planar". However, the second option can be written better as "By Theorem X, G has at least 3n-5 edges, and therefore G is not planar". Alternatively, a reader suggested using parentheses, as in "Note that G has at least 3n-5 edges (by Theorem X), and hence G is not planar."

  16. "So" and "so that". Because of its other uses in English, "So" is too informal to introduce a sentence of conclusion (with or without being followed by a comma). It is best to reserve "so" for use as a conjunction, like "but": "The graph is connected, so each vertex is reachable from every other vertex." In this usage, no word is needed to introduce the reason that precedes the conclusion. As a conjunction, "so" is preceded by a comma, not a semicolon: "The graph has no odd cycles, so it is bipartite." This form is best used when the conclusion is short. When "so" is used as a conjunction, there is no "that". Thus "We have x²=0, so that x=0" should instead be "We have x²=0, so x=0".)

  17. "Such that" vs. "so that". "So that" means "in such a way that". Use "such that" when imposing a condition on an object and "so that" when performing an action in a certain way. That is, "so that" requires a verb and describes how the action is done, while "such that" restricts a noun. Compare "Consider a graph such that no vertex is isolated" and "Color the graph so that no two adjacent vertices have the same color. What follows "such that" modifies "graph", but what follows "so that" describes how the coloring is performed.

  18. "Assume", "Suppose", and "Let" A statement that is assumed is an axiom, considered throughout to be true. Something supposed is a hypothesis. Hence "Suppose" or "Suppose that" is more appropriate to introduce a case or an argument by contradiction. In contrast, "we may assume" introduces a consequence of an argument or symmetry and henceforth will be true. I do not really understand the phrase "Assume for a contradiction that"; use "Suppose to the contrary that". Similarly, do not use the incomprehensible "By way of contradiction"; a possibility is "Toward a contradiction, suppose that".

    "Suppose" vs. "Suppose that". After words of hypothesis or conclusion ("suppose", "assume", "implies", "conclude", etc), use "that" when what follows is a clause with an English verb. Omit "that" when what follows is just a noun unit, such as a notional expression. For example, "Assume the hypothesis" is a complete sentence with imperative verb and object. The structure is the same in "Suppose x+y≤10". When an English verb follows, we have "Suppose that f is a proper coloring".

    This distinction is a matter of some debate. Some more formal authors use "Suppose that" when what follows is a formula containing a relational symbol, treating the symbol as a verb. I think it is better to maintain the consistency of treating formulas as noun units. The clarification accomplished by "that" when a verb follows become unnecessary when the clause is condensed into notation. When the notation is displayed, its role as a fact (noun) is clearer and makes "that" especially unnecessary; the use of "that" should be the same when the formula is not displayed. A related example is "the case k=2", as opposed to "the case that k=2"; here "k=2" is the case, which is a noun, so there is no "that".

    Writers who always drop "that" from "Suppose that" have a valid point. In spoken English, we usually drop "that" in this conntext to avoid ponderous language. When the instruction is informal, without abstract concepts, it is reasonable to drop "that". For example, "Suppose the hypothesis is true" would be awkward with "that". Similarly, the very short "Suppose there is" would be awkward with "that" after "Suppose". Here the verb is gone before one even notices it; this is almost like "Suppose [notation]".

    This exception may seem awkward. A better solution when introducing notation is to avoid "Suppose x is" entirely: "Let G be a graph" is better than "Suppose G is a graph". Compare "Suppose x=1" and "Let x=1". The first assumes the truth of an equality and treats the equation is a unit. The second is more active. Because we never say "Let that . . .", we either view "Let" as the verb or view the equality sign as the verb. This usage of "Let" is an exception to the treatment of expressions as noun units; it is not used with inequalities, because an inequality sign would need to be read as the lengthy "be less than or equal to" to become a verb.

  19. Universal quantifiers. The word "any" can mean "some" or "all" in different contexts, so it can be imprecise. It is clearer to use "each" or "every" as a universal quantifier when referring to a singular object.

    Numbered plural variables cause difficulty. In English, "for every two elements" is awkward because "every" is singular. Thus here it is better to say "for any two elements". The presence of "for" is suggestive of the universal quantification and helps avoid ambiguity. Confusion can still arise: consider "Form G' from G by adding an edge joining any two vertices with distance 2 in G." Here some readers will think that only one edge is added.

    Avoiding "any" is not imperative. Evaluate its use in context, making sure to prevent misinterpretation. "Any" is a good substitute for "an arbitrary", and the meaning of "not any" is fairly clear. ("arbitrary" indicates that all ways of making the choice are allowed.)

    Using an indefinite article ("a" or "an") as a universal quantifier can be dangerous, as in "Prove that a bipartite graph has no odd cycle." Some readers (often students) may interpret "a" as "one" or "some", turning universality into existence. Using "every" is clearer. Putting "must" before the conclusion can suggest universality but is usually unnecessary.

  20. Position of universal quantifiers. In a logical formula, a quantifier specify the universe over which the formula holds is placed before the formula. In written words, a single universal quantification may read better with the quantifier at the end. This order also better emphasizes the conclusion. For example, one might prefer to change "For every graph G that is bipartite, χ(G)≤2" into "Always χ(G)≤2 when G is bipartite". Similarly, "ai∈S for 1≤i≤n" improves on "for 1≤i≤n, ai∈S".

  21. "Less" vs. "fewer". Use "less" when comparing numbers, and use "fewer" when referring to a set of objects. For example, "the number of edges is less than k" is correct, as is "the graph has fewer than k edges" or "G' has fewer edges than G".

  22. A set differs from its size. Comparing incomparable quantities is often called "comparing apples and oranges". One cannot compare a set with an integer; it is incorrect to write "Sperner proved that no antichain of subsets of an n-set is larger than C(n,n/2)". One must distinguish between a set and its size. Here one can write "no antichain has size greater than C(n,n/2)" or "no antichain has more than C(n,n/2) elements". (Due to the inadequacy of html, we use the notation C(n,k) for the binomial coefficient "n choose k".)

  23. "Estimate". Many mathematicians, particularly analysts, use the English word "estimate" as if it had the same meaning as the English word "bound" (both as a noun and as a verb). They write "now we estimate this quantity" when they mean "now we prove an upper bound on this quantity". In English, "estimate" means "approximate"; both upper and lower bounds are needed to give an estimate. This common usage by analysts is incorrect English and does not say what is meant, even when they assume an unstated implicit lower bound of 0.

  24. Possessives on notation. Do not write "Let x and y be v's neighbors"; always use "of" ("the neighbors of v") instead. Similarly, do not pluralize notation by referring to indexed elements or sets together as "the ai's". Usually "each ai" or "a1,…an" or some other notation is preferable. Possessives and plurals of this sort should be reserved for informal oral communication.

  25. Nested proofs. Do not nest proof environments. No new proof label should occur before the end-of-proof marker for the current proof.

  26. "Best possible". "Best possible" is an adjective used as a single term; it indicates sharpness. We write "This result is best possible", just as we would write "This result is sharp". "This result is the best possible" indicates that this result is better or more valuable aesthetically than all other results in the world, which is not what is meant. The definite article should not be used here. Think of "best possible" as a technical term that is already a specific predicate adjective, so no definite article is needed.

    The informal phrase "is most likely" is similar to "is best possible"; there is no article because "most likely" is used as a single term. It means that the probability is high, whereas "is the most likely" means having higher probability than any other outcome. Another example is "best practice", which is a single technical term in areas of management. It is used as a single term, without "the". For example, I have seen the title "Best Practices in Online and Blended Learning and Teaching".

    Although "This result is best possible" is a complete sentence, it is somewhat vague, since it does not specify the sense in which the result cannot be improved. Often it is more informative to say something like "the constant in the upper bound cannot be improved". For this reason, some writers suggest avoiding the term "best possible".

  27. Numerals and spelled numbers. In standard English writing, numbers less than 10 usually are spelled in full, while numbers more than 10 are written in numerals. In mathematical writing, the basis for the distinction is different. Numbers less than 10 are spelled out only when used as adjectives expressing the quantity of objects in a set. They must remain as numerals when designating the value that a quantity equals. For example, "The two vertices both have degree 3" or "A cycle of length 4 has four edges" or "Consider a 4-vertex path" or "Consider a path with four vertices". A reader provided another excellent example; compare the two sentences below:
       Although X is not a cycle, its Betti invariant is 1.
       Although X is not a cycle, its Betti invariant is one.
    The first sentence says that the Betti invariant of X equals 1. The second sentence says that the Betti invariant of X is a cycle.

    Terminology and notation (especially in discrete mathematics)

  28. Definition symbol ":=". Some mathematicians use this symbol to indicate that the preceding symbol is being defined to mean the subsequent object. If this occurs in a sentence like "Let [n]:={1,…,n}", then the verb states that the notation is being defined, and the special notation is unnecessary. If it occurs in a sentence about the object being defined, such as "Consider a coloring of [n]:={1,…,n}", then it is an improper Double-Duty Definition and should be rewritten: "Consider a coloring of [n], where [n]={1,…,n}." Reading ":=" requires thinking "be defined to be" when preceded by "let", and it requires even more convoluted phrases when placed in a Double-Duty Definition. This awkward notation is never needed and encourages grammatical errors.

  29. "Such that" in set definitions: ":" vs. "|". For many reasons, the colon ":" is a far better choice than the vertical bar "|" to mean "such that" in a "notation/condition" definition for a set. For example, we may write "{3n+1: n∈N}". The vertical bar is heavily used in mathematics, most notably for size of sets, but also for divisibility and other purposes. Using it for this purpose leads to such messes as "{|A|||A|||B|}", which purports to describe the set of sizes of sets A whose size divides the size of B. The colon is far less used in mathematics. Even so, the best reason for using the colon is that this mathematical usage is similar to the meaning of the character in English; it separates part of a sentence from some elaboration of that part. Finally, since "such that" is not a binary operator, this usage should be expressed in TeX using "\colon\," instead of ":". As in English, there should be space after the colon but not before it (or at least less space before it).

  30. Sequences, series, and lists. In mathematics, a sequence is a function whose domain is the set of natural numbers (perhaps with a shift of the initial element). Discrete mathematicians abuse this term in using it for an ordered finite set. A good name for such an object is list. An n-tuple is a list of length n. It is an abuse of terminology to say "a sequence of length n". (For finite graphs, in particular, "degree sequence" should be changed to "degree list". To avoid this problem, one can sometimes refer to the "vertex degrees" rather than the "degree sequence" or "degree list".)

    The usage of "series" in English is contrary to its usage in mathematics. In English a "series" usually consists of finitely many occurrences in order, as in the "World Series" or the title "A Series of Unfortunate Events". In mathematics a series is an infinite sum. So I believe, but one correspondent tells me that a finite sum is also a series, though I would just call it a summation or finite sum.

  31. The second element of a list. The expression "v1,v2,…,vn" for an indexed n-tuple is a style used to suggest that the elements are indexed by the first n positive integers with no skips. However, the second element is not needed, since the default interpretation of "v1,…,vn" is exactly the same. By convention, indices in a list are consecutive unless explicitly indicated otherwise. Another reason to eliminate v2 from the expression is that "v1,v2,…,vn" forbids the case n=1.

  32. A list with relations. The sentence "Let x1≤…≤xn be a list of integers" is a Double-Duty Definition; the writer attempts simultaneously to introduce notation for the elements of a list and to impose inequalities on them. The expression "x1≤…≤xn" denotes a set of relations, not a list; what is meant is "Let x1,…,xn be integers such that x1≤…≤xn." (To avoid repeating the notation, write "Let x1,…,xn be integers, indexed in nondecreasing order.") Similarly, a chain of sets under inclusion is a list A1,…,Ak such that A1⊆…⊆Ak; the expression "A1⊆…⊆Ak" is not itself a chain.

    Although html does not provide line-centered dots, the ellipsis in an indexed list with relations should be vertically centered on the line ("\cdots" in tex), while the ellipsis in an indexed list separated by commas should be on the baseline ("\ldots" in tex).

  33. Variable equal to list. Many mathematicians write "for m=1,2,…,n" (with or without the "2") to mean "for m∈{1,…,n}" or "for 1≤ m≤ n". The expression "for m=1,…,n" is mathematically incorrect; it sets the value of m to be a list of numbers. The same principle applies to writing "i=1,2" to name two cases; this should be i∈{1,2}.

  34. "Big Oh" Common usage of "Big Oh" notation is another instance of setting expressions equal when they cannot be equal. The expression "f(n)=O(n²)" does not mean that the value f(n) equals the set represented by the notation O(n²). What is meant is "f(n)∈ O(n²)"; Knuth has written at length on this subject. An alternative that is roughly correct is to be more informal, writing "f(n) is O(n²)", in which "is O(" can be read as "is on the order of". Since it is convenient to do arithmetic with these classes of functions, this problem will not go away. An unsatisfying compromise is to use the membership symbol where the grammar of computation permits, in order to ensure that the meaning of the concept is understood.

  35. Operators vs. constants. We never use f to denote the value of a function f at a point x. The same principle applies to graph parameters and other operators. For example, the maximum degree of a graph G is denoted Δ(G). Here Δ is a function, not a number, and hence Δ should not be used to denote the value of the function Δ on a particular graph.

    It is tempting for mnemonic reasons to write "We write V=V(G) and Δ=Δ(G)". Admittedly, this usage is not confusing when discussing only one graph at a time; the difference between a graph invariant and a real-valued function is that we rarely focus on the value of a real-valued function at just one point. Nevertheless, it is rare that a paper discusses only one graph, and hence it is better to use V(G) and Δ(G) for objects associated with G. The problem is particularly bad with Δ, since this character also occurs in mathematics as a difference operator. One often sees "Δn" meaning the change in the value of n, so one should not use "Δn" to mean the maximum degree times the number of vertices in a graph. (In my textbook I violated this principle by using n(G) and e(G) for the numbers of vertices and edges in a graph G while using n for the number of vertices of a particular graph and e as a particular edge; the error will be corrected in the third edition.)

  36. Hyphens for parameters. The expression "k connected graphs" refers to k graphs that are connected, in contrast to "k-connected graphs", which are graphs having the property of being k-connected. Note also that an expression involving addition or subtraction used as a parameter modifying a noun should be enclosed in parentheses before the hyphen. For example, write "(k+1)-connected graph", not "k+1-connected graph". This hyphenation rule applies whenever notation is used to modify or further specify the subsequent word. Examples include "k-cycle", "n-vertex graph", "p-group", etc. A related hyphenation issue is particularly important in graph theory. A "k-edge connected graph" is a connected graph with k edges (compare with "n-vertex connected graph"); the meaning is different from "k-edge-connected graph". When the hyphen is missing, "k-edge" modifies "connected graph" because adjectives modify only nouns, not other adjectives. Similarly, a non-specialist reader would think that a k-edge coloring is a coloring of k-edges, not a coloring of edges using k colors. This example can be viewed as following the general rule: "k" is modifying "edge-coloring"

  37. Vertex vs. edge terminology. In graph theory, many fundamental concepts involving vertices have analogues involving edges. Here we do not need "vertex" as an adjective to specify "connectivity" or "chromatic number", but we add "edge" for the analogous edge concept. We then hyphenate "edge-connectivity" and "edge-chromatic number". In both cases, the problem for edges is a special case (for line graphs) of the general problem.

    Furthermore, using the hyphen in the edge context maintains consistency with the needed usage explained in the preceding item. When comparing "edge-coloring" and "list coloring", we are not coloring the lists, so the hyphenation of the term is different.

    The presence of the word "vertex" sometimes becomes an issue. As mentioned above, the fundamental parameters involve vertices and do not require the word "vertex" as a modifier. Similarly, when we speak of "disjoint subgraphs", it must be that they cannot share anything, vertices or edges, so the word "vertex" is unnecessary. The concept "edge-disjoint" indicates a less restrictive condition. Saying "vertex disjoint" suggests vertices as an alternative to edges; it is better just to say "disjoint". Clearly disjoint cycles share no vertices.

  38. Two-word adjectives. Two-word terms used as single concepts to modify nouns must be hyphenated when placed before the noun, such as in this sentence (without the hyphen, we would be discussing two "word terms"). This is particularly necessary when the first of the two words is a noun, as in "vertex-transitive". An especially common instance of this error is "polynomial time algorithm"; "polynomial-time" must be hyphenated when used as an adjective. In constrast, the hyphen is dropped when the expression is not being used to modify something else, as in "The algorithm runs in polynomial time".

    Further examples: "graph-theoretic techniques", "straight-line drawing" (what is a straight line drawing or a straight line segment as opposed to one that is not straight?)

  39. Adverbs and "well-known". Unlike nouns or adjectives, adverbs can modify adjectives. Thus we write "strongly connected digraph" and "simply connected region" without hyphens.

    The adverb "well" is a possible exception. In "well-known theorem" we think of the combination "well-known" as a single technical term, leading to "A well-known theorem is a theorem that is well known." The term "well-defined" also behaves this way. However, opinion on this point is sharply divided; some authors insist that because "well" is an adverb the term should not be hyphenated. Further support for the hyphen: the mathematical usage of "well" in the hyphenated term differs from the English usage of "well" is the unhyphenated expression. A "well defined function" is a function for which we have done a good job of giving a definition, but a "well-defined function" is an object that has been given a valid definition as a function, with every domain element given a unique image.

  40. Notation for paths in graphs. In "x-y path", "x-y" is not a word and has no notational meaning by itself. Even worse, often x-y is treated as a mathematical expression in TeX and is typeset using a long minus sign with extra space around it. The intent is to specify a path with endpoints x and y. Thus x and y are parameters designating a certain type of path. Under the principles of hyphenation above, there must be a hyphen between "y" and "path". Furthermore, the endpoints are independently expressed parameters, with no operation being performed on them. Hence for consistency with other instance of two-parameter terminology (such as "f,g-factor", "x,y-chain"), the correct notation is "x,y-path".

  41. Order and size of a graph. Traditionally, the terms "order" and "size" refer to the number of vertices and number of edges of a graph. These terms are not as popular as they once were; "order" has too many other mathematical uses, and "size" without a clear context is confusing, sometimes taken to mean the number of vertices. Some writers thus prefer "number of vertices" and "number of edges".

    It must be admitted that "order" and "size" are quite convenient, while overuse of "number of vertices" and "number of edges" becomes quite awkward. Introducing notation for the numbers of vertices and edges minimizes this difficulty. Unfortunately, there is no generally agreed notation for operators returning the numbers of vertices and edges of a graph G. The only notation that cannot be misunderstood is the absence of special notation: |V(G)| and |E(G)|, using notation for cardinality of finite sets that is standard throughout mathematics. Nevertheless, these expressions seems cumbersome to use repeatedly, so it is often beneficial to write "Let n=|V(G)| and m=|E(G)|."

  42. Graphs are not sets. When h is a vertex in a graph G, it makes no sense to write h∈ G, since h could just as easily be an edge. A graph consists of a vertex set and an edge set; one should write v∈ V(G) and e∈ E(G). This is also a reason why the convenient notations |G| and ||G|| for the order and size of a graph are problematic. When one sees |G|, how does one know whether it is the number of vertices or the number of edges?

    On the other hand, it is true that we write G-v and G-e for deletion of a vertex v or edge e from a graph G, defining "-" in the context of the objects it operates on. In this sense, it would also be legitimate to define the meaning of the cardinality and norm operators when operating on a graph. However, until there is very strong and widespread support for this usage, it seems wisest to stick to the notation |V(G)| and |E(G)| that is understood by all mathematicians.

    When A⊆V(G), it is clear that |A| is the size of the vertex subset A. It can then be useful to define ||A|| to be |E(G[A])|, the number of edges in the subgraph of G induced by A.

  43. Directed graphs and hypergraphs. These models are variations or generalizations of graphs. In a digraph, the edge set consists of ordered pairs. The redundancy of saying "directed edge" or "directed path" or "directed cycle" is not helpful, as it suggests that the digraph contains such objects that are not directed (the term "weak path" is available for a path in the underlying undirected graph). Unnecessarily adding the adjective "directed" in a context where the default should incorporate that notion also prevents making statements that hold in the context of both graphs and digraphs, like Menger's Theorem.

    Similarly, one should not use "hyperedges" to refer to the edges of a hypergraph. Hypergraphs generalize graphs by allowing edges to have arbitrary size. Calling them "hyperedges" eliminates the possibility of saying that graphs arise as a special case, since graphs have edges, not hyperedges.

  44. "Connected components". Unnecessary redundancy has similar disadvantages. We should not speak of the "connected components" of a graph, because there are no disconnected components of a graph. Writing "connected components" suggests that there are components that are not connected.

  45. "Maximal" vs. "maximum". Many mathematicians use these words interchangeably. One can make a useful distinction by using "maximum" to compare numbers or sizes and "maximal" to compare sets or other objects. Thus a maximal object of type A is an object of type A that is not contained in any other object of type A. A maximum object of type A is a largest object of type A; here "maximum" is an abbreviation for "maximum-sized". For example, in a graph we may speak of "maximal independent sets" and "maximum independent sets"; these are convenient terms for distinct concepts that are both important.

    Although this distinction is sensible and has become established in many settings (such as "maximum antichain" and "maximum independent set"), potential confusion can be reduced by using "largest" and "smallest" instead of "maximum" and "minimum". For example, it is harder to misinterpret "a largest matching" than to misinterpret "a maximum matching".

    For consistency, then, one should not write "a vertex of maximal degree" or "the maximal number of edges"; that is, "maximal" should not be applied to numerical values. This is consistent with usage in continuous mathematics, where we write that a continuous function "attains its maximum" on a closed and bounded set.

  46. Multicharacter operators. A string of letters in notation denotes the product of individual quantities. Therefore, any operator whose notation is more than one character should be in a different font, generally roman. This convention is well understood for trigonometric, exponential, and logarithmic functions, and it applies equally well to such operators as dimension (dim), crossing number (cr), choice number (ch), Maximum average degree (Mad), etc.

  47. "Induct on" and "By induction". The phrase "We induct on n" is convenient but not correct. From given hypotheses, we deduce a conclusion; we don't "deduct" it. When we announce the method of induction, we must instead say "We use induction on n." The verb "to induct" is used when a person is inducted into an honorary society, for example.

    A different problem arises in the induction step. When we cite the induction hypothesis, we must write "By the induction hypothesis", not "By induction". To obtain the conclusion for the smaller instance, we are invoking the hypothesis that the claim holds for smaller values; we are not invoking the principle of mathematical induction.

  48. Cliques vs. complete subgraphs. In an earlier era, these terms were used interchangeably in graph theory, but it is more useful to distinguish them. There is a difference between a set of pairwise adjacent vertices in a graph (complementary to an independent set of vertices) and a subgraph isomorphic to a complete graph. Both concepts are needed, and the appropriate terms for them are "clique" and "complete subgraph". Thus "clique" should be reserved for a set of vertices, and then the meanings of "clique of size 5" and "5-clique" (the same) are clear. In previous centuries, also "clique" was sometimes used to mean "maximal clique", which should not be done.

  49. Isomorphism classes vs. subgraphs. A graph is a pair consisting of a vertex set and an edge set. Paths, cycles, and complete graphs are graphs whose edge sets are described in specific ways. The notations Pn, Cn, and Kn do not distinguish a particular set of vertices, and hence in specifying paths, cycles, and complete graphs they must refer to the isomorphism classes.

    Hence we should never write "a Pn" for a member of that class. We can write that a graph "contains a path with n vertices", because that is a structural description of the subgraph, but we cannot write "contains a Pn" or "consider a Pn in G". We can say "contains ten copies of Pn" to refer to subgraphs that are n-vertex paths; each such subgraph is a member of the isomorphism class denoted by Pn.

    Neverthless, some flexibility is helpful here. When H is the notation for an isomorphism class, we still write "H⊆G" to mean that some subgraph of G belongs to the isomorphism class or is "isomorphic to H" (or "G contains a copy of H"), even though we are not specifying the particular vertices or edges of G used in the subgraph.

  50. Proper coloring. A k-coloring (or k-edge-coloring) of a graph is a partition of the vertices (or edges, respectively) into k classes. In combinatorics generally, a k-coloring of a set partitions it into k classes, arbitrarily. This general concept appears in many areas of mathematics, including Ramsey theory, graph decomposition, and chromatic numbers. In the latter context, a proper [edge-]coloring is one in which adjacent [or incident] elements do not receive the same color.

    Some authors who write extensively about chromatic number and edge-chromatic number drop the word "proper" and use k-[edge-]coloring for the restricted concept. The minor convenience gained by dropping this word is overwhelmed by the negative influence of introducing inconsistency of terminology in combinatorics. Use "proper k-coloring" when that is what is meant. For other variations, such as "acyclic k-coloring" or "dynamic k-coloring", the adjectives replace "proper" by imposing further restrictions on the k-coloring, so the word "proper" is then no longer needed.

  51. Partitions vs. parts A partition consists of blocks or "parts". Do not use "partition" to refer to the members of a partition. (Students often make this mistake.)

    A bipartition is a partition into two parts. In particular, we say that a bipartition of a bipartite graph is a partition of its vertex set into two independent sets. In the past I used "partite sets" to refer to the parts of such a partition, but there are objections to that term, and students never get it (for example, they refer to one "partite" of a graph, and certainly "partite" is not a noun. Hence I now refer to the "parts" of a bipartite graph. This is a slight abuse of terminology, but I think its familiarity as a word better facilitates discussion.

  52. "Pairwise" and "mutually". Old-fashioned mathematics took the old-fashioned word "mutually" to describe a binary relation satisfied by all pairs in a set, as in "a set of mutually orthogonal Latin squares". In English usage, "mutual" indicates symmetry in a more global way. Hence modern mathematics should avoid using "mutually" to mean "pairwise"; the word "pairwise" states exactly what is meant. The change becomes even more important in light of modern terms like "mutual independence" in which "mutual" explicitly does not mean "pairwise". (Thus "mutually orthogonal Latin squares" is now ambiguous, but we cannot escape the notation "MOLS(n,k)" in design theory.)

  53. Pairwise disjoint/isomorphic. The phrase "Consider disjoint sets A1,…Ak" is technically incorrect; we should instead say "pairwise disjoint sets". However, this is a universally understood abuse of terminology, and including the word "pairwise" each time would be ponderous. Hence we understand a family of disjoint sets to be pairwise disjoint. The point is that many binary relations really make sense essentially only in a binary context. This principle extends to other commonly used binary relations do not make non-binary sense, such as "isomorphic".

  54. Disjoint union vs. join. In much of graph theory, the notation rG indicates a graph consisting of r disjoint copies of G. For consistency, G+H should therefore denote the disjoint union of two graphs G and H. For example, Pn1+…+Pnk denotes a linear forest, consisting of k components that are paths with orders n1,…,nk.

    Some authors use G+H instead (or also!) to denote the join of G and H, which consists of the disjoint union plus edges joining every vertex of G to every vertex of H. Other notation has been used, such as GH, borrowing the join operation (x∨y) in lattices or logic, but this is not satisfactory. Instead the best notation for the graph join is \diamondplus (unavailable in html?) which overstrikes a diamond and a plus, much like "⊕" except with a rotated square whose corners are at the points of the "+" ("⊕" is unavailable because it represents symmetric difference or binary sum). The \diamondplus is consistent with the Nesetril notation for graph products: the symbol is a picture of the result of applying the operation to two copies of K2. In addition, the use of "+" indicates that the number of vertices is additive.

  55. Between. An object that is between two other objects separates them; this is the common mathematical sense of "between". Referring to an edge (or path) with endpoints u and v as an edge "between" u and v is somewhat inconsistent with the rest of mathematics. One can say "an edge joining u and v" instead. In a planar embedding of a graph, an edge shared by the boundaries of two faces is an an edge between the faces.

  56. Setminus. The operator \setminus most often denotes difference of sets. Hence it is somewhat misleading or old-fashioned (and looks rather pompous) to use it for deletion of elements, as in "G\setminus e". Use "G-e" instead. Also, the notation G\setminus H is easily confused with G/H (especially by students). Of course, there are some contexts (matroids and various algebraic topics), where these notations have special meanings and are quite important, but for simple set difference A-B is preferable.

  57. "Left hand side". There is no "hand side", so this expression makes no sense. Even if one correctly hyphenates to make it "left-hand side", there is still no "hand". Just write "left side".

    English usage in mathematical writing

  58. Introductory words. Words or phrases like "nevertheless", "for example", "to the contrary", and "on the other hand" usually should be separated by commas from the rest of the sentence. Introductory prepositional phrases are a bit different. I am told that a phrase with one preposition ("In 1995") does not require a comma, but a phrase with two prepositions ("In August of 1995") does. Another reader tells me that an introductory phrase with at least five words (perhaps we should say five syllables) should be followed by a comma. I would use the comma unless the intent is to lead into what follows as a single thought (see Hence/Thus/Therefore).

  59. Quotations and ends of sentences. It is traditional correct style in English grammar that all terminal punctuation comes inside quotation marks. My understanding is that this convention arose from the technical aspects of printing presses. Its purpose was to lessen the danger of breakage of fixed metal type in printing presses. In the era of electronic publishing of mathematics, this justification is obsolete, and we can replace the convention with logical punctuation. When the material being quoted is treated as an item within the sentence and is not itself a sentence, the terminal punctuation logically comes outside the quotation marks. Copy editors trained in literary punctuation still object to logical punctuation but should be overruled.

  60. "Which" vs "that". The following two sentences have different meanings:
    1) "She will attend our meetings that concern calculus."
    2) "She will attend our meetings, which concern calculus."
    Sentence (1) states that among our meetings, she will attend those concerning calculus and perhaps no others. Sentence (2) states that all the meetings concern calculus, and she will attend them all. In common English, the distinction is perhaps even clearer: compare
    1) "I have two shirts that need cleaning."
    2) "I have two shirts, which need cleaning."
    In (1), two of my shirts need cleaning. In (2), I have only two shirts.

    When the phrase after the relative pronoun specifies a further restriction of the class that has just been introduced, the correct pronoun is "that", and the subsequent phrase tells which of the items in the class are those being discussed. If the subsequent phrase speaks about the totality of the class, then the proper pronoun is "which". When "that" and "which" both seem usable, use "that" when the sense is "having the property that", and use "which" when the sense is "all of which" or "the only one of which". Usually a comma is appropriate before "which". Usually "that" is correct when an indefinite article ("a" or "an") has been used on the word being modified. Beware: This distinction is not made or is made the opposite way in British English. Some American style manuals don't care, but in mathematics there are two distinct meanings to be expressed.

  61. Immediacy of antecedents. When using "which", "that", "where", or other words to introduce explanatory or descriptive phrases, the subsequent phrase modifies the most recent item. For example, "an embedding of G on a surface which has no crossings" indicates that the surface has no crossings, not that the embedding has no crossings. Making the comment on crossings apply to the embedding requires rewriting: "On a specified surface, consider an embedding of G, which has no crossings". Here "which" is proper because every embedding has no crossings; an embedding is a drawing that has no crossings.

  62. The naked "This". When "This" is used as the subject of a sentence, its antecedent is the most recent noun. If the desired antecedent is the preceding paragraph or some other object, then a noun should be inserted, as in "This discussion implies" or "This inequality implies" instead of merely "This implies". One way to understand this issue is to view "this" only as an adjective, not as a pronoun.

  63. "Every", "distinct", and "unique". The word "every" is singular; it means "each one". Because of this, we write "all values" or "every value"; not "all value" or "every values".

    The word "distinct" has the same meaning as "different". Two things can be distinct, but one thing cannot be distinct. Thus the sentence "Every value is distinct" is incorrect; it has no meaning. Many beginning students think it means that each value is different from every other value, but it does not.

    The word "unique" indicates that there is only one of the items being described. It does not mean that this item is different from other items. Some students think that "The function f maps the points in A to unique points in B" is a statement that f is injective, but it is not. Every function from A to B maps each point in A to a unique point in B.

    The distinction between the words "distinct" and "unique" is made clear by a typical boast on the World Wide Web. The sentence "Our website has one million unique visitors" makes no sense. The intent is to say that among millions of hits there are one million distinct visitors; if there is a unique visitor, then there is no other visitor.

  64. Contractions. Because mathematical writing is formal, contractions ("can't", "won't", etc.) should be avoided. They introduce a sudden informality that is inconsistent with the tone of proof.

  65. "I.e." vs. "e.g.". "I.e." and "e.g." are abbreviations for Latin phrases. "I.e." means "that is" and is used to introduce an explanation or restatement of what came before. "E.g." means "for example" and introduces an example. In formal mathematical writing, abbreviations (except as notation) are like contractions; it is better to avoid "i.e." and "e.g." altogether. In addition, the expressions "that is" and "for example" provide better visual separation than "i.e." and "e.g.".

  66. "Different than". It is not correct to write "A differs than B", and for the same reason it is not correct to write "A is different than B". The correct wording is "A is different from B". The incorrect wording is modern American laziness.

  67. Abstract nouns and articles. Nouns that specify abstract concepts rather than objects need no articles. For example, "graph colorability" is an indefinite concept, so we do not say "Next we discuss the graph colorability". In contrast, "chromatic number" may be abstract or specific. We may say "Next we discuss chromatic number", referring to the general concept, or "Next we discuss the chromatic number of this graph", since this graph has only one value as its chromatic number.

    Functions or parameters assign a number to each domain object. The resulting value is specific for the object; there is only one choice for it. Hence we do not say "the graph has a chromatic number 3" or "the vertex has a degree 3". These sentences suggest that the object may have more than one value of the parameter. The answer to the question "What is the degree of this vertex?" may be "This vertex has degree 3", but it cannot be "This vertex has a degree 3".

    We also do not say "This vertex has the degree 3", although "The degree of this vertex is 3" is correct. Consider the sentence "Every graph has an even number of vertices with odd degree, which means that the list of vertex degrees has even sum." The term "even number" takes the article "an" because we are saying which type of number is being used (it is one of the even numbers). The later "odd degree" and "even sum" do not, because these are properties that the vertices and the list do or do not satisfy. Articles are inappropriate when invoking a property.

    Articles also are not used with conceptual nouns. Compare with familiar conversation: we say "This chair has value $100" and not "This chair has the value $100." "Value" and "degree" are abstract properties. Here is another non-mathematical example: We say "I receive compensation for my work," not "I receive a compensation for my work." Compensation is an amount, but here only the abstract concept of receiving compensation is meant, not some number of things. Hence we do not use an article.

    Similarly, abstract properties do not take articles. We say "because transitivity of A implies transitivity of B", not "because the transitivity of A implies the transitivity of B". The property in question is "transitivity", not "the transitivity".

  68. Possessives and titles. The definite article "the" specifies uniqueness. Possessives also play this role. It is incorrect to use both together, because the possessive already provides definite specification. For example, we write "Greene's Theorem" but not "the Greene's Theorem"; this is a theorem proved by Greene, not by "the Greene".

    When discussing a result by two authors, we cannot put possessives on both names, because there is only one object (compare with "Greene's and Kleitman's theorems"). Making only the second name possessive would be correct in English grammar but poor mathematical style ("Greene and Kleitman's Theorem", like "Dick and Jane's house). The main reason this is poor mathematical style is that the entire phrase is a title. Hence we write "the Greene--Kleitman Theorem", like "the Woolbright-Abernathy House". Here "the" serves as a definite article for the unique object with the title "Greene--Kleitman Theorem". When the result is less celebrated and not known by its authors' names, one can indicate the possessive by "of", as in "the theorem of Greene and Kleitman".

  69. Capitalization of titles. In the examples above, "Theorem" is capitalized. When there is only one instance of an object, and the name of it involves a person, it plays the role of a proper noun and its name is a title. Another example is "the Cauchy-Schwarz Inequality". Although some style sheets vote against capitalization based on "common usage", failing to capitalize does not reflect the true meaning and seems to be mostly a matter of laziness. We would have the Chinese remainder theorem (as opposed to the French one), the Hungarian method (as opposed to the Austrian one), and the mean value theorem (as opposed to the nice value theorem).

  70. Adjectival forms of names. Some graph theorists use "Hamilton cycle" to mean a spanning cycle in a graph, but they would never say "Abel group". When describing a type of cycle, the modifier should be an adjective, so it is better to use an adjectival form of the name when that is available: "Hamiltonian cycle". The same applies to "Euler circuit" and "Eulerian circuit". However, some uses of names as adjectives are heavily ingrained and unchangeable, such as "Fibonacci numbers" and "Catalan numbers". In these examples adjectival forms are not readily available (we do have "Eulerian numbers").

  71. Conjunctions and commas. Punctuation shapes sentences; commas encourage the reader to pause at places where doing so aides understanding. Missing commas may require the reader to stop and re-read in order to understand what has been said. Excessive commas delay the reader and impede the flow of logic.

    Two clauses (in essence, two complete sentences) may be combined using a conjunction; the conjuction must be preceded by a comma. Examples of conjunctions are "and", "but", "then", and "so" (the latter should be treated as conjunctions in mathematical writing). Since a conjunction joins two things, sentences should not begin with these words. This is a logical approach that helps keep writing clear, though strict English usage (especially British) may call some of these words adverbs. See further comments on the use of then and so.

    Exception. The situation is more complicated when the second clause itself contains a conjunction. Compare "If A, then B holds and C holds" with "If A, then B holds, and C holds". In the first sentence, it is clear that A implies both B and C. The proper grouping or meaning in the second sentence is unclear. Since we only have one comma symbol and don't parenthesize sentences to indicate grouping, a short conjunction of two sentences within a larger conjunction is written without a comma.

  72. Semicolons. Compound sentences consist of two complete sentences with no conjunction separating them. This form is used especially when the second part clarifies or comments on the first. Such sentences need a semicolon (not a comma!) to separate the two parts; this sentence is an example. Do not use a semicolon before a conjunction; in particular, there should never be a semicolon before "and", "but", "then", or "so".

  73. Excessive commas. A clause requires a subject and a verb. When "and" joins two parts of a sentence that do not both stand on their own with a subject and a verb, there should be no comma before it. The comma in "We will prove the lemma, and then the theorem" is incorrect; one must delete the comma or add a subject and verb to the second part. This example came from a newspaper: "In February, the graduate student in Electrical and Computer Engineering, was awarded the A--B--C Prize"; here there should be no comma before "was". (See further examples of excessive commas in Definitions.)

  74. Serial commas. A serial comma is a comma after the next-to-last element of a list. Wikipedia gives a discussion of its use. It is generally safest to use a comma in this situation. For example, compare "Under the conditions 1≤ i,k≤ r and m even" with "Under the conditions 1≤ i, k≤ r, and m even"; the two sentences have different meanings. The issue arises often when listing three mathematical objects or three authors. (For a list of lists, clarity can be achieved by using semicolons or by changing "and" to "&" within list items.)

    One reason for using the serial comma in lists is to avoid confusion in sentences that do not contain lists. Consider the sentences "Like a, b and c have the same property" and "Later, Early and Jones proved the conjecture". These are not lists, and using a comma would be wrong, but when a document does not use serial commas these examples initially appear to be lists. Similarly, in that context an item in a list that itself joins two subitems with "and" looks like the last two items in a list.

    Omitting the serial comma can also cause confusion mathematically, as in "The value of f is positive at 2, negative at 1 and 0 at 0."

  75. Appositives. An appositive is a noun or noun phrase that renames or substitutes for another noun or noun phrase immediately preceding or following it. It can be recognized by the fact that omitting it would yield a clear and complete sentence and that the additional information in it is not grammatically essential to the statement being made. It should be set off by commas: "His book, the best book on the subject, took years to write. An appositive in the middle of a sentence cannot have a comma on only one side.

    When an appositive is short enough or contains essential information, the commas are omitted: "My friend Bob is a student." In mathematical writing, a similar situation applies when notation is introduced: "The degree d(v) of a vertex v is the number of neighbors of v." Here "d(v)" is a brief appositive. One could argue that the notation for "degree" is not essential to the sense of the sentence, but putting commas around very short appositives can produce very choppy sentences. A speaker need not pause for such appositives, and hence one may omit the commas.

  76. Passive voice. Good writers of English minimize the use of passive voice. This accepted principle applies also in writing mathematics. Active verbs make the exposition more engaging; for example, "It suffices to show" is preferable to "It is sufficient to show". Nevertheless, judicious use of the passive voice can be appropriate.

  77. "Above" and "Below". These words are adverbs; they do not directly modify nouns. Hence "the above graph" and "the below figure" are incorrect. We can write "the graph above" or "the figure below" as a short form for "the graph shown above" or "the figure located below".

  78. "Either". The word either is used to indicate exclusive or. If the alternatives are mutually exclusive by definition, then "either" is unnecessary.

  79. "We have been proving" Do not use the "perfect" tenses, which involve the helping word "have". Phrases like "In Section 3 we have been analyzing" or "in [4] we had shown" are either grammatically wrong or confusing. The simple tenses, as in "In Section 3 we analyzed" or "in [4] we showed", are almost always better. Even the future can be eliminated: "in Section 4 we show" rather than "in Section 4 we will show"; the justification for this is viewing the entire article as a unit, happening in the present.

  80. Words containing "non". When a word in English initially has a negation introduced by prefixing "non", the resulting word is hyphenated. The initial sense is the negation, so the hyphen is appropriate. As decades pass and the word is accepted on its own, it becomes a positive concept incorporating the "non". This and familiarity lead to dropping the hyphen. Some of the most familiar examples in mathematics are "nonsingular", "nontrivial", "nonzero", and "nonconstructive". Adding hyphens to these words is now jarring to more readers than is the absence of hyphens. I also use "nonempty", "nonnegative", "nonneighor", and "nonadjacent". However, I would keep the hyphen in "non-word" and "non-edge", for clarity and infrequency.

  81. Placement of citations. The numeral indicating an item in the bibliography should appear immediately after the name(s) of the author(s), not after the statement of the result.

    Mathematical English for non-native speakers

  82. "Bound of". Many non-native speakers use "bound of" when they mean "bound on". If x≤ k, then we have an upper bound of k on x. Using "bound of x" for "bound on x" can become confusing when comparing parameters. We do not want to say that the maximum degree Δ(G) is a bound of the chromatic number χ(G); when Δ(G)=k we want to say that Δ(G) establishes a bound of k on the chromatic number of the graph. (Writers from Asia typically overuse the preposition "of" when many others are more accurate, such as "on", "for", "about", etc.)

  83. "few" vs "a few". In English, "few" means "not many", while "a few" means "several". The sentence "In this paper we prove few good results" means that the paper does nothing worthwhile, while "In this paper we prove a few good results" means that it is worth reading.

  84. "Usual". It is a quirk of English that the word "usual" as an adjective usually requires the definite article "the". We cannot say "In this section we consider only usual chromatic number"; it must be "In this section we consider only the usual chromatic number". (This is a common error by speakers of languages that do not have articles.)

  85. "Partial case". In English, we do not say that one result is a "partial case" of another. We say that it is a "special case". (This is an issue of translation from one language to another.) However, it is correct to say that proving a special case of a conjecture is a partial result.

  86. "Pass" vs. "Pass through". In English, the word "pass" means "go by without entering". Thus a path that passes a vertex does not visit that vertex. To say that path P visits vertex v, one should say that P passes through v (this is a language translation issue). Better yet, just use "visits".

  87. "Can not" and "may be". It appears that some writers of English now use "can not" to mean "cannot". In speech the two cannot be distinguished, so it doesn't matter, but in written mathematics we should avoid ambiguities. The logical meaning of "can not fail" is "may possibly succeed" while "cannot fail" means "must succeed". At the very least, when "can not" is used to mean "cannot" it can be read to have a different meaning, so it is better to use "cannot" to eliminate the ambiguity. As in many of these items, the abstractness of mathematical statements in contrast to the everyday context of English language begs a higher level of precision and avoidance of ambiguity.

    The expression "may be" does exist in English, when used as a verb as in "It may be true" or "This may be the only component". However, when it appears at the start of a clause most likely the word "maybe" is intended, as in "Maybe this proof will work. In this situation there is another verb ("work"), and the initial expression means "Possibly", which is not a verb.

  88. "A joint work". We speak of "a theorem" or "a result", since these are definite specific items but "work" is an abstract noun and does not take the indefinite article "a". We say simply "This is work of mine", not "This is a work of mine", and "This is joint work with my colleague," not "This is a joint work". This usage of "work" is different from "a work of art" or "the complete works of Shakespeare". In mathematics, "work" is equivalent to "research"; we do not say "this is a joint research". The same error occurs in "I will have a limited access to my email."

  89. "Evidently". Some nonnative speakers write "Evidently" to mean "Clearly". Although this word is not technically incorrect, it has other connotations to native speakers. Combining this with the fact that they would always write "Clearly" and never "Evidently", they are confused about what the writer means. Always change "Evidently" to "Clearly". ("Evidently" is quite close to "apparently", which in American English means "seems to be true" rather than "is true".)

    Similarly, "as evidenced by" generally is not used in English; change to "as shown by".

  90. "Principal" vs. "principle". "Principal" is an adjective meaning "foremost", used mathematically in "principal minor". "Principle" is a noun similar to "idea" or "method", as in "the Pigeonhole Principle".

  91. More excess commas. There is no comma before or after "is" in a definition. There is no comma between "show" and "that".

  92. More expressions not used in English.
    "discuss about" ⇒ "discuss".
    "studied about" ⇒ "studied".
    "equals to" ⇒ "equals" or "is equal to".
    "contradicts to" ⇒ "contradicts".
    "necessary conditions of" ⇒ "necessary conditions for".
    "to precise" ⇒ "to make precise" ("precise" is not a verb).
    "a same argument" ⇒ "the same argument" or "a similar argument".
    "decompose to" ⇒ "decompose into".
    "joint" (as a verb) ⇒ "join" ("joint" is not a verb).
    "specially" ⇒ "especially" or "special" ("specially" is not a word)
    "usual coloring" ⇒ "ordinary proper coloring"
    "We pick up" ⇒ "We consider"

Other material on mathematical writing

To be clear, I do not make positive or negative comments on the material cited below. I merely offer these alternatives to show the variety of opinions on the subject and to indicate that this is an important topic on which people have strong opinions.