1 Introduction
Creditworthy, non-creditworthy. Malignant, benign. Likely to reoffend, not likely to reoffend. Increasingly such classification is done computationally—“by machine”. Available to the machine is a large dataset with records labelled as instances or non-instances of the class of interest. Creditworthy, non-creditworthy, creditworthy, and so on. After some specified degree of accuracy is attained in classifying creditworthiness or malignancy or recidivism on the basis of the further features of the records, the machine—or more precisely, the machine learning model—is said to have “learnt the concept.” Having learnt the concept, the model can then be used to classify further, concrete, instances. Miriam Joseph is sentenced to prison. Luyanda Moyo gets a loan. Mo Patel is diagnosed with a malignant tumour and is informed that he has six months to live.
The above is stated simplistically. The ways in which such classificatory models play a role in decision-making nowadays are complex. There is increasing sensitivity to the dataset and its appropriateness, both epistemically and ethically. It is, for example, commonly recognised that such classification can perpetuate injustice, if the dataset on which the classification is based itself describes an unjust state of affairs (see, for example, discussions by Hajian et al. (2016) and Hagendorff (2021)). Similarly, there is debate over the replacement of interpersonal exchange with the impersonal decision-making based on such classificatory models (for discussion of this within the Kenyan context see the chapter by Emma Park and Kevin Donovan in this volume). Furthermore, decision-making is informed by such classification in different ways and to different degrees. Within financial contexts, for example, it is common for a model to classify in a binary way (an applicant is either creditworthy or not in relation to a particular loan application) and for a decision to be based solely on this classification. In judicial contexts, by contrast, classifications tend to be more fine-grained (a convicted person is classified according to a ranking or score of likely recidivism), and to feature as one among a number of considerations that a judge will take into account in reaching a decision. Similarly, in medical contexts, practitioners will consider the classification generated by such a model as one of multiple factors that inform a final diagnosis. Nevertheless,
In the cases in which human beings do this classificatory work, there is, both historically and presently, great variation in the circumstances, languages, practices, and behaviours that accompany such classification. The oncologist in the United States, for example, is constrained by complex health insurance regulations, with a diagnosis being captured on a large database system integrated with a patient’s insurance benefits and other membership programmes. The oncologist at a public hospital in Johannesburg will perhaps examine many more patients per day than their American co-practitioner, with a diagnosis discussed with the patient in more than one of the country’s many official languages.
Nonetheless, we take these human experts to be capable of such classification across these differences in circumstance, practice, and behaviour. We take the oncologist in the United States and the oncologist in Johannesburg to be competent at classifying instances of malignant and benign tumours. Part of what characterises these experts across this variation is that they can be said to possess the relevant concept(s)—and indeed to possess the same concept(s). Though the oncologist in the United States says, “malignant tumour” and the oncologist in Johannesburg says, “isimila esibi,” their terms are translatable. The two oncologists possess the same concept of a malignant tumour and the instances that they classify are instances of this concept. The relevant members of staff at financial institutions, whether in the Northern or Southern Hemisphere, possess the concept of creditworthiness. Judges deciding on the parole of an offender in Borneo and in Japan possess the concept of recidivism.1
Precisely what such concept possession consists in and how it relates to classification are substantive questions. Supposing, for the moment, an intuitive understanding of “concept,” possessing a concept is a necessary, though not sufficient, condition for the expert’s correct classification of instances of the concept. That is, such classification is not possible without possession of the concept, but might well require more besides. Intuitively, an oncologist who does not possess the concept of a malignant tumour cannot diagnose one when she or he comes across the relevant lung X-ray, even if they can visually distinguish the relevant X-rays from one another. Similarly, a judge who does not possess the concept of recidivism cannot take likely recidivism into account as a consideration in deciding whether to grant parole.
In what follows, the interest will be with this point of comparison. Classification with a high degree of accuracy is possible both in the case in which the human expert does the classificatory work and in which the classification is generated by a machine learning model. In the former, we take the expert to possess the relevant concept(s). Indeed, despite the variation in circumstance, language, practice, and behaviour that accompanies such classification, and indeed despite the concerns of Willard Van Orman Quine, we take experts to possess the same concept and for this variation to be translatable by virtue of this concept. In the case in which the classification is generated by a machine learning model, we similarly find variation, and indeed greater variation, in the circumstances, “language,” and behaviours accompanying this classification. The expert in Johannesburg and the one in the United States behave differently to one another, but the behaviour of the machine is yet more different to both experts, though all three—South African expert, American expert, and machine—classify with a high degree of accuracy. Is there any sense in which the machine can equally be said to possess the relevant concept(s)? That is, what is the extent, if any, to which a machine can, having attained the relevant degree of accuracy in classification, be said to possess the relevant concept(s)? And, correlatively, is there anything additional, and of significance, in the case of the expert possessing the relevant concept(s) that is not to be found in the case of machine generated classification?
2 Cartesianist Concept Possession
At a university in Johannesburg, at a time when the globe is slowly recovering from the peaks of a world-wide pandemic, a faint scratching sound is heard along the corridors of a building on the western side of campus. The scratching grows louder and then stops. On the same floor, students in computer science laboratories discuss an assignment. The assignment mentions a binary classification algorithm. One student works on their training data. Rabbit, non-rabbit, rabbit, non-rabbit… A faint scratching sound is heard. Another student awaits results. The algorithm has run for two days. The scratching grows louder. A rabbit scurries by. “Rabbit!,” a student outside the lab shouts and points. The machine says, “gavagai.” “Ha-aaa. My results are good enough. I am going. It has learnt rabbit.”
A central branch of machine learning is the learning of supervised classification tasks. In such tasks, the machine learning program or algorithm is provided with examples and non-examples, labelled by humans, of the concept to be learnt. If the program is able to classify to some specified degree of accuracy, the machine is said to have learnt the concept. In the case of human concept possession, the human who shouts “rabbit!” as a rabbit scurries by is understood to possess the concept rabbit and their utterance to have a referential relation to the scurrying rabbit.
Nahhh, man, the machine hasn’t learnt rabbit. My little sister has learnt rabbit, but that’s because she can think, man. She has some idea in her head and the idea is about rabbits and when she sees a rabbit, the idea comes into her head. And she can think about rabbits, and dogs,
and cats, and mice, and when she’s thinking about them, she’s thinking about them as different things. But my baby brother, he’s really small. He hasn’t learnt rabbit, because he can’t think of rabbits and dogs as different things. But he will. Because he can think. That machine—that machine has not learnt rabbit. It won’t ever learn rabbit. That machine can’t think.
The taller student of the two, Siya, is an applied mathematics student who’s recently joined the university from Soweto. Aside from being nicknamed “The Brain” at school, Siya was known for her scepticism. No matter what the topic,—science, religion, or even politics—Siya would not believe what anyone said without getting to the bottom of it herself—including Kabelo’s claims about machines’ learning concepts. Kabelo, the shorter student of the two, knows this only too well. On a scholarship from Limpopo, he is in his first year of a newly-established Masters degree programme in data science. His mother would tell him that his tongue is too quick: “Leleme ha le na malokeletso” (The tongue has no fastenings). Kabelo’s tongue did not often have fastenings. He incites Siya’s scepticism with his claim that his computer can learn: “Ha-aaa. No, no, no. You’re right, man. The machine can’t think, as in think, but it’s learnt the concept. If it gets a rabbit, it mostly spits out rabbit. If it doesn’t, it mostly doesn’t.”
According to a recent account of human concept possession, a subject S can be said to possess a concept C if he or she is “able to think about Cs as such” (Fodor 2004, 31). More specifically, thinking about Cs as such involves having a mental representation, C, that has a nomological or lawlike relation to Cs. Suppose, for example, that as Siya arrives to Kabelo’s home to visit him over the July holiday, there is a dog running in the street. Siya thinks of an incident in her childhood and how she now dislikes all dogs everywhere. The representation in Siya’s mind at this point—the representation dog—has a nomological or lawlike relation to dogs in general or as such. The representation, we might say, “covers” or “extends” to the dog in the street in front of Siya. It also covers the dog of the incident in her childhood, and the dog running down 7th Street in Gqeberha, and the dog that will be born exactly 102 days from now, and so on. This relation between the concept dog in Siya’s mind and each of these dogs (and indeed all dogs) is lawlike.
This account, Cartesianism, when conjoined with most popular accounts of the mind, would seem to imply that a machine or computer like Kabelo’s that has, in machine learning parlance, “learnt the concept rabbit” has not, in literal parlance, learnt or come to possess any concept. Most popular accounts of the mind (or the mental) do not straightforwardly deny the possibility of a machine’s having mental representations of one or another sort. That is, it is compatible with these accounts that a machine could possess a mind and
3 Pragmatist Concept Possession
Cartesianism is not, however, the only available account of human concept possession. According to a more popular rival account, Pragmatism, a subject S can be said to possess a concept C if he or she is “able to distinguish Cs from non-Cs” (Fodor 2004, 31; see also discussion by Bradley Rives (2009a, 2009b) and Victor Verdejo (2013)). Siya possesses the concept dog if, when seeing the dog in Kabelo’s street run after a feral cat, she can distinguish the dog from the cat (and from Kabelo the human, and so on).
Prima facie, Pragmatism would seem to imply that a machine like Kabelo’s has learnt or come to possess the concept. In the running of the algorithm, the machine has come to be able to classify to some specified degree of accuracy further examples as examples of Cs or non-Cs. Successful approximation of the function just is for the machine to be able to do this classificatory work and so is ex hypothesi for it to be able to “distinguish Cs from non-Cs”.
It is worth noting here that the machine having only attained a specified degree of accuracy need not rule out possession of the concept C: in the case of the person who possesses the concept dog, it is consistent with the Pragmatist’s claim that the person fails, in some subset of cases, to distinguish dogs from non-dogs.3 The condition for possessing the concept C, according to
“Hah. Ah, man. I ain’t never eaten rabbit.”
More specifically, according to Pragmatism, to distinguish Cs from non-Cs is not simply to be able to distinguish or classify some further examples as examples of Cs or non-Cs, but is to have a richer set of epistemic capacities: a subject S possesses a concept C if S is disposed to draw (or otherwise to acknowledge) some of the inferences that contain that concept (Fodor 2004, 33). Suppose, for example, that while talking that afternoon Kabelo tries to persuade Siya that, since Siya likes any animal that is a mammal, Siya should also like dogs. Suppose that Siya has no disposition to draw this conclusion, but that she is disposed to draw the analogous conclusions about, for example, cats. In this case, we might deny that Siya possesses the concept dog (though would affirm that she possesses the concept cat).4
More specifically then, Pragmatism would imply that a machine or computer like Kabelo’s that has, in machine learning parlance, learnt the concept rabbit has not, in literal parlance, learnt or come to possess any concept. The machine can classify the further examples accurately, but it is in no way disposed to these further inferences. It is not disposed to any further inferences.
It is worth here emphasising the distinction between Cartesianism and Pragmatism. Both are accounts of concept possession according to which concepts play a role in, and thus require, thought or thinking. According to Cartesianism, to possess a concept C is to be able to think about Cs as such. According to Pragmatism, to possess a concept C is to be disposed to certain inferences, which themselves seem to require thought. The two are distinct, however, in the way in which thinking or thought is appealed to and presupposed. Pragmatism is an epistemic account and whether or not some subject
“Woah, woah, woah, man. I thought we admitted that computers can’t think. And so they can’t do this reasoning thing that you’re talking about. But telling rabbits from non-rabbits, learning a concept, isn’t thinking or reasoning, it’s being able to behave in an input-output way. Like I said, if it gets a rabbit, it mostly spits out ‘gavagai’. If it doesn’t, it mostly doesn’t.” In defence of his machine learning model, Kabelo’s tongue still has no fastenings.
4 Quinean and Type-Relative Quinean Concept Possession
A further account of concept possession that might be proposed, one far more minimal than both Cartesianism and Pragmatism, is, what we might term, a “Quinean” account.5 According to this account, concept possession does not in any way consist in or require that the possessor of the concept possesses a mind or can think or has mental representations or draws inferences. According to our Quinean account, a subject S possesses the concept C if he or she is disposed to exhibit some set of verbal behaviours B in response to Cs. For example, if Siya is disposed to say the word “dog” when asked “what is that?” by someone who points to a dog, and to answer “no” when asked “is this a dog?” by someone who points to anything that is not a dog, and so on, then Siya can be said to possess the concept dog.
It is worth noting the sense in which the account is a naturalist one. The account appeals only to, what we might term, “natural” properties. Such properties are those that might be described in the various physical sciences and whether or not such a property is instantiated can be determined by observational means. For example, biological properties such as being a cell of a certain sort or physical properties such as being larger than one cubed centimetre are natural properties, while the properties of being morally good and being a mental representation are not.
The properties mentioned in the Quinean condition for concept possession similarly are those described in the physical sciences and are those whose
Under this much weaker account then, a machine like Kabelo’s could perhaps be said to have learnt or come to possess the concept. As emphasised, the Quinean account is not committed to thought as a requirement for concept possession. Indeed, the account eschews all talk of the mental as internal to the subject. All that is required, according to the account, is that the machine is disposed to exhibit some set of behaviours in response to Cs. The machine does indeed, as Kabelo notes, exhibit some set of behaviours in response to Cs: if it gets a rabbit, it mostly spits out “gavagai.” If it doesn’t, it mostly doesn’t.
“Kabelo, dude. What’s up with ‘gavagai’? Why does it spit out ‘gavagai’? This is not a language.”
“Yo, man. A friend studying philosophy has some trippy ideas about language not being translatable and about someone talking to some native tribe and not being able to tell whether they were both talking about a rabbit. The man from the tribe said ‘gavagai’ when he shoulda said ‘rabbit’. That’s what I changed the label to.” The rabbit rushes by. The taller student picks up the rabbit and shows it to the computer.
“Na-ah, Kabelo, man. Look …! It doesn’t have eyes. The machine can’t distinguish rabbits.” Siya shows the rabbit to the computer. “It wouldn’t be able to point one out even if the rabbit was sitting on top of it …!” She puts the rabbit down on top of the computer on which Kabelo has run his code. The rabbit stares ahead.
Examining the Quinean account more closely, there might indeed be reason to suppose that the machine does not exhibit a relevant set of behaviours in response to rabbits. I can be said to possess the concept dog if I am disposed to form my lips into the sounds that compose the word “dog” when asked “what is that?’ by someone who points to a dog, to answer “no” when asked “is this a dog?” by someone who points to anything that is not a dog, and so on. A contemporary computer or machine has no lips and so it cannot form its lips into the sounds that compose the word “dog” when asked “what is that?” by someone who points to a dog. And so the machine cannot be disposed to form its lips into the sounds. Given this then, the account seems equally to imply that a machine that has been trained on certain data and that has come to “learn the concept C” has not learnt or come to possess the concept C.
Ha-aaa. No, man. You can’t put that on the computer! You can’t say that the computer doesn’t recognise the rabbit because it doesn’t have eyes and ears and it can’t walk and talk and all that. It’s a computer. Of course it don’t have eyes and ears. It distinguishes in its own way, man. It doesn’t have eyes, but it’s got a processor that can run my algorithm and it ‘looks’ at the data with these eyes. And it ‘tells’ us that it’s a rabbit by an output that a computer can have man. Its screen is its mouth. You can’t be putting that on the computer man. Trying to force it to be human.
Kabelo grows louder. Some students passing by put their heads in the door of the lab to investigate the commotion. Kabelo’s rejoinder here then is to appeal to a type-relative (or species-relative) account of behaviour: although most humans are disposed to the verbal behaviours described above, those who are deaf, for example, would not be. A person who is deaf would not be disposed to the behaviours of forming their lips in the relevant ways and so on, but instead to certain gestures with their arms and hands. Yet in the case of the human with differing capacities, we do not thereby suppose that the human does not possess the concept, but instead take a different set of behaviours to be relevant to whether they do.
So too, continues the response, the computer or machine has different capacities to a typical English-speaking human for responding to rabbits and non-rabbits, and the behaviours to which a machine must be disposed in order to count as possessing the concept will be relative to these capacities. Whether or not a machine possesses some concept C will depend on whether it is disposed to exhibit some set of behaviours B in response to Cs, where the behaviours B include certain screen outputs given certain data inputs. According to this type-relative version of the Quinean account then, a machine that is in this way disposed can be said to possess the concept.
5 The Problem of Generalisability
“Yeah… Sure. The computer doesn’t have eyes and it displays it in another way. But it’s still limited, man. It only has the right dispositions to act in a small range of cases. Give it the dataset. Sure, once it’s trained, it’ll tell rabbit from non-rabbit in the test data. But it can’t tell me that this is also a rabbit.” Siya points to the rabbit, which is now sitting underneath the lab desks.
“Nah, man. That’s the problem of generalisability. It’s covered. We got it. There’s some theory that shows it’s not just for one dataset. Look, what the machine is doing is approximating some function. It’s approximating some
As seen above, according to the type-relative Quinean account, a machine that is able to classify further examples as examples of Cs or non-Cs to some specified degree of accuracy is disposed to some relevant type-relative behaviours and so can be said to possess the concept. (The machine has a certain screen output when given certain data inputs.) Having examined the various ways in which the capacities of the machine might prevent it from possessing a concept, and having arrived at the possibility of the machine having some relevant capacities, once these are understood as type-relative, a further set of concerns arises. Admitting some relevant type-relative behaviours, the concerns are over whether the machine is disposed to exhibit the behaviours across a sufficiently wide range of cases.
In the example above of Kabelo’s running a binary classification algorithm, the algorithm has allowed the machine to identify (or rather, approximate) some function or rule for classifying the labelled examples into Cs and non-Cs, with the machine thereby disposed (with some probability, that is, to the specified degree of accuracy) to exhibit the relevant type-relative behaviour in response to some further examples (the test data). The concern here, however, is the distinction between the machine being disposed to some type-relative behaviour in response to further examples of rabbits and its being disposed to exhibit some type-relative behaviour in response to any further examples of rabbits.
In the case of Siya’s possessing the concept dog, Siya will be disposed to form her lips into the relevant sounds when faced with the neighbour’s German Shepherd, when faced with her mother’s Fox Terrier, when faced with a Bull Mastiff in some suburban area while on holiday, and so on ad infinitum. That is, possession of concept C requires the disposition to exhibit some type-relative behaviour, not only in response to some further subset of Cs, but all actual and possible Cs.6 As it stands, there is no reason to suppose that a machine like Kabelo’s is so disposed.
In order to answer the concern, we must, as Kabelo rightly indicates, turn to the theory of computational and machine learning, that is “learning theory.” In particular, we must turn to the Probably Approximately Correct model of
6 Probably Approximately Correct Learning
The Probably Approximately Correct model (hereafter, the “PAC model”) of learning distinguishes a subclass of tasks that are learnable within specified computational constraints. That is, it distinguishes a subclass of tasks for which a machine learning program or algorithm is possible assuming some upper threshold on the number of examples and time and space complexity allowed for the computation. Taking Kabelo’s case as our example, the subclass is characterised as follows.
Available to the machine learning program is (1) the set, S, whose members are examples of mammals, where each example is a set of values for the features of colour (represented as a colour code), height (in metres), and region (represented as an area code), (2) a label for every member in S as either “non-rabbit” or “rabbit,” and (3) a set of hypothesis functions H. This set of hypothesis functions is a set of (mathematically-defined) possible rules that would allow the program to “go from” the values in the set S to the labels in (2). The task of the program is to select, on this basis, the function or rule that best maps the examples to their labels. In other words, from this set of possible rules, the program is to select the rule that allows it to classify correctly as many of the examples, as rabbits and non-rabbits respectively, as possible.
Presupposed in the background, however, is that the function or rule to be selected by the program is not simply the function from the given finite set S of examples to their labels. Instead, the program is to select the rule that best approximates the more general function or rule from all possible such examples to their respective labels. In other words, Kabelo’s machine is to select, from the possible rules available to it in H, not simply the rule from the examples of mammals in the data set to the examples of rabbits in the data set. Instead, it is to approximate the rule from all possible examples of mammals to all possible examples of rabbits. If it is able to do so, then the machine will be able to classify not only the examples of rabbits found in the data set, but, as
Presupposed in the background of such learning then are also the following. (4) The larger set, X, the members of which are all possible examples, each itself a set of feature values, and of which S is a subset. In Kabelo’s case, this set is the set of all possible examples of mammals. The examples that appear in Kabelo’s data set form a subset of these examples and are the set S as introduced in (1) above. (5) A label for every member in this larger set X as either “non-rabbit” or “rabbit.” (6) The function or rule from the members of the larger set X to their respective labels. In machine learning parlance, this function or rule is known as the “target concept c.” Importantly, this function or rule is not one of the possible rules that is found in the set H, from which the machine can select. It is the actual or true function or rule that allows us to go from all possible examples of mammals to all of the examples that are rabbits. Knowing this function would allow one (or Kabelo’s machine) to classify every possible example of a mammal correctly as either a rabbit or non-rabbit.
This rule is, of course, unknown to Kabelo’s machine and the task of his machine is to select, from the set of hypothesis functions H, the function that best approximates this true function. If it is able to do so sufficiently well, then the machine will have achieved the relevant degree of accuracy in classifying rabbits and non-rabbits and will, in machine learning parlance, have learnt the target concept c.
Supposing the Probably Approximately Correct model (PAC) to be the correct account of binary classification algorithms, it would seem to imply that Siya’s concern is unfounded. Recall, Siya’s concern is over whether the machine is disposed to exhibit the behaviours across a sufficiently wide range of cases: can the machine correctly classify, not only the examples in the data set that Kabelo has given it, but all further such examples?
As we have just seen, according to the PAC model, the target function for the machine learning program is not simply the function or rule from the set of examples of mammals, as made available by Kabelo in running the algorithm, to the subset of those examples with the label “rabbit”. It is rather the more general function from all possible examples of mammals to the subset of those examples with the label “rabbit” (that is, the subset of rabbits). Once this function has been approximated, Kabelo’s machine is disposed (with some probability, that is, to the specified degree of accuracy) to exhibit the relevant type-relative behaviour in response to any further examples.
As Kabelo rightly notes, the mentioned concern is a concern over generalisability: does the accuracy of the machine in classifying extend sufficiently beyond just the dataset on which it has been trained? The PAC model of
7 The Problem of Generalisability, Again
“Yeah, yeah. I’ve heard of this. But it’s not all rabbits at all times everywhere, Kabelo. It can’t know about the rabbits that will exist in the future and that will look weird and different. But we will. We’ll be able to tell that they are rabbits.”
Siya’s concern runs a little deeper. Recalling the type-relative Quinean account in the case of human concept possession, supposing that a human were to possess the concept rabbit, the human would exhibit the relevant type-relative behaviours in response to examples of rabbits—even in cases in which the rabbits were radically different. Supposing that Kabelo possesses the concept rabbit, Kabelo would exhibit the relevant type-relative behaviour in the case of the rabbit underneath the desk, of the rabbit on the farm in Limpopo, and equally in the case of the rabbit that has been selectively bred to have fur much longer than most rabbits. The further concern is whether the machine would be able correctly to classify even in these cases. Kabelo might have shown that the machine can classify the rabbit underneath the lab desks correctly, but can it correctly classify the rabbit that exists one hundred years from now and which has been selectively bred to stand over one metre in height?
We might suppose that Kabelo has already answered this. After all, according to the PAC model of learnability, the target function c, the function that machine learning program is trying to approximate, simply is the function from the set X of all possible examples to the subset with the relevant label: it is the function for classifying all possible examples of mammals as rabbits or non-rabbits, including the one-metre rabbit that exists one hundred years from now.
Appeal to the PAC model of learnability is not quite so straightforward, however. Two formulations of “all possible examples” can be found in descriptions of the model. That is, descriptions of the model seem to be ambiguous as to what is meant by “all possible examples.” The first formulation is found in informal descriptions of the model and is broader in scope, while the second appears in the model’s formal description and is narrower in scope. Let us begin with the former.
In descriptions of PAC learnability, the model is frequently illustrated informally with an example of a simple learnable task. Such an illustration is found, for example, in Tom Mitchell’s discussion: “[L]et X refer to the set of all
Illustrations like Mitchell’s suggest that the set X here is to be understood as all possible instances of some higher-order kind and the subset of examples with the relevant label as the set of all possible instances of the relevant lower-order kind. The superset X might be, as in Kabelo’s case, the set of all possible instances of the higher-order kind mammal and the subset all possible instances of the lower-order kind rabbit. Or the superset X might be, as illustrated by Mitchell, the set of all possible instances of the kind person, with X’ the subset of all possible instances of the lower-order kind skier. Similarly, X might be the set of all possible instances of the kind animal and X’ the set of all possible instances of the kind bird (Mitchell 1997, 20–21). Thus, according to the first formulation, “all possible examples” are all possible instances of some (higher-order) kind and the subset of examples with the relevant label are all possible instances of the relevant lower-order kind.
According to this first formulation then, the scope of “all possible instances” is determined by the higher-order kind. Because this is so, a machine that approximates the function from all possible examples (that is, all possible instances of the higher-order kind) to all possible examples with the relevant label (that is, all possible instances of the lower-order kind) would indeed be approximating a rule that allows it to classify all possible examples—in Siya’s sense. Given that it includes all possible instances of the higher-order kind (and thus all possible instances of the lower-order kind), the set X would include examples that are radically different. A machine approximating this function would thus be able to classify, to some specified degree of accuracy, all possible examples. It thus would be disposed to exhibit the type-relative behaviour in response to any further examples—even to the rabbit in one hundred years’ time that stands at one-metre tall.
According to the second formulation found in more formal descriptions of the model, we find that “all possible examples” is to be understood as all possible feature-value combinations, given the features and possible values of the learning in question. To clarify, it will be helpful to return to Kabelo’s machine and, in particular, to the data set that has been made available to it. As noted in the preceding section, available to the machine is the set S, whose members are examples of mammals, where each example is a set of values for
Simplified version of the set S in Kabelo’s binary classification task (Author’s own)
| Example | Colour (where B = brown, Bl = black, W = white) | Height (m) | Region (where SA = South America, A = Africa, E = Eurasia, Au = Australia) | Rabbit |
|---|---|---|---|---|
| 1 | B | 0.25 | SA | Yes |
| 2 | W | 0.45 | E | No |
| 3 | Bl | 0.305 | Au | Yes |
| 4 | Bl | 0.29 | A | Yes |
Important to note here is that each of the members of the set S—each of the examples above—is itself a set of values for each of the features of colour, height, and region. Example 1 is the set {B, 25, SA, Yes}. Example 2 is the set {W, 45, E, No}. And so on. It is these sets of values that are available to Kabelo’s machine.8
As noted, and presupposed in the background of the PAC model, the set S above (of some possible examples of mammals) is a subset of the set X (of all possible examples of mammals). In order to answer Siya’s concern with the generalisability of the machine’s learning capacity, we noted that the target function—the rule to be approximated by Kabelo’s machine—was the function from this superset X (of all possible mammals) to the subset of all possible mammals that are rabbits. If the set S is a set like that characterised above (a set of feature values), what then would it be for X to be a superset of all possible such examples? When stated in this way, it is unnatural to suppose that X would be a set of all possible instances of some higher-order kind as understood above. Instead, given that the set S is a set of feature values, it is natural to suppose that the superset of all possible such examples is the set of all possible combinations of values for the given features. In the case of Kabelo’s machine, the set X is the set of all possible distinct combinations of the values
Under the first formulation then, the function to be approximated by the machine is one from all possible instances of a higher-order kind to a subset of all instances of some lower-order kind. Under the second formulation, the function to be approximated is from the set of all possible combinations of feature values (the set of all possible colour-height-region combinations) to the subset of all such combinations to which the label can be assigned (the subset of all colour-height-region combinations applicable to rabbits).
It is important to note that the two formulations above are not equivalent in all cases. In cases in which the values for any feature are underestimated in relation to all possible instances of the higher-order kind, the first formulation will be broader in scope than the second. If, for example, Kabelo underestimates the range of heights of all possible rabbits that have existed and do and will exist, the set of all possible instances of the lower-order kind rabbit will be broader in scope than Kabelo’s possible combinations of feature values will cover.
If the two formulations are not, as suggested, equivalent, then the PAC model must presuppose one or the other. It is reasonable to suppose that this is the second formulation, for two reasons. First, the second formulation is the formulation found in the model’s formal descriptions. Secondly, while many illustrations cohere with the first formulation, equally many do not (see for example Mitchell’s discussion (1997, 22)). What is the implication of this then for Kabelo’s claim that his machine is able to classify any further examples of rabbits?
As noted above, the first formulation seems to do the work needed for the Kabelo’s response. If the superset X is the set of all possible instances of some higher-order kind (the set of all possible mammals), this would ensure that the function approximated is to the set of all possible instances of the lower-order kind (the set of all possible rabbits). This would then ensure that the machine is disposed to exhibit some type-relative behaviour in response to any further examples. What of the second formulation? Does the second formulation similarly ensure that the machine is so disposed?
Recalling the inequivalence mentioned above, the second formulation would seem to imply the following. According to the formulation, the machine approximates the function or rule from the set of all possible combinations
If, however, either the features found in the data set are not features of the relevant examples (if Kabelo has, for example, included values for the feature of feather type) or the possible values for these features do not exhaust their values for all possible examples (if Kabelo has, for example, included only brown and black as possible values for colour), then Kabelo’s machine will not be able to classify all possible examples. The function approximated by the machine will not be a function to all possible examples, but instead a function to all examples, given our current knowledge or the chosen limits of the values of the features. Kabelo’s machine will classify only those examples that fall within the range of possible feature-value combinations that Kabelo has programmed it to recognise. And, in such cases, the machine would not be disposed to exhibit some type-relative behaviour in response to any further examples.
If correct, this would seem to imply that Siya’s concern is perhaps founded. There is a contrast to be drawn between the dispositions to type-relative behaviour in the case of the possession of a concept by a human and in the case of the learning of a concept by machine. In the case of the latter, whether or not the machine can be said to possess the concept hinges on our exhausting the range of features and possible values of a feature, and thus on our knowledge at a given time of those values. It depends on Kabelo’s making sure that
“Well of course it’s not all rabbits at all times everywhere, Siya. What do you think? That the machine is human?”
8 Conclusion
It is perhaps unsurprising that, according to stronger or more committed accounts of what it is for a human to possess a concept, the machine that has attained the relevant degree of accuracy in classification cannot be said to possess the relevant concept. The medical practitioner in the United States or in Johannesburg has, according to Cartesianism, a mental representation with a lawlike relation to all possible cases of malignant tumours, whether or not these are cases that the practitioner has encountered or will encounter. There is nothing in the machine’s computations that corresponds to such a mental representation. The judge in Borneo or in Japan is, according to Pragmatism, able to reason about likely recidivism. He or she is disposed and able to draw the basic inference from likely recidivism as a characteristic of human beings to likely recidivism as a characteristic of some, but perhaps not all, mammals. The machine that has attained the relevant degree of accuracy in classification does not have these abilities.
Yet, as we have seen above, even under the most uncommitting accounts of human concept possession, accounts that do not require anything like a mind or mental representations, the machine that has, in machine learning parlance, learnt the concept, has not, in literal parlance, come to possess the concept. According to, what we termed, a type-relative Quinean account, the practitioner in Johannesburg possesses the concept of a malignant tumour insofar as she or he is disposed to certain behaviours when encountering instances of a malignant tumour. The practitioner is, for example, disposed to use the word “malignant” and to further examine the tumour in certain ways. This is true even in cases that fall outside the range of familiar cases. The practitioner is disposed to behave in these ways to any further examples. By contrast, the machine that has learnt the concept of malignancy is not so disposed. Even
So, while we can be sure that two human experts, in spite of differences in circumstances, language, practices, and behaviours, and indeed in spite of the concerns of Willard Van Orman Quine, possess the same concept, we cannot be sure that the machine, in spite of its classificatory behaviours, possesses any concept the same as ours at all.
Bibliography
Alpaydin, Ethem. 2004. Introduction to Machine Learning. Cambridge, MA: MIT Press.
Braddon-Mitchell, David and Frank Jackson. 2007. Philosophy of Mind and Cognition. Malden: Blackwell Publishing.
Fara, Michael. 2005. “Dispositions and Habituals.” Noûs 39, no. 1: 43–82.
Fodor, Jerry. 2004. “Having Concepts: a Brief Refutation of the Twentieth Century.” Mind and Language 19, no. 1: 29–47.
Hagendorff, Thilo. 2021. “Linking Human And Machine Behavior: A New Approach to Evaluate Training Data Quality for Beneficial Machine Learning.” Minds & Machines 31, 563–93.
Hajian, Sara, Francesco Bonchi, and Carlos Castillo. 2016. “Algorithmic Bias: From Discrimination Discovery to Fairness-aware Data Mining.” Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16): 2125–26.
Kim, Jaegwon. 1992. “Multiple Realization and the Metaphysics of Reduction.” Philosophy and Phenomenological Research 52, no. 1: 1–26.
Mitchell, Tom. 1997. Machine Learning. New York: McGraw-Hill.
Mohri, Mehyrar. 2012. Foundations of Machine Learning. Cambridge, MA: MIT Press.
Quine, Willard Van Orman. 1960. Word and Object. Cambridge, MA: MIT Press.
Rives, Bradley. 2009. “The Empirical Case Against Analyticity: Two Options for Concept Pragmatists.” Minds & Machines 19, no. 2: 199–227.
Rives, Bradley. 2009. “Concept Cartesianism, Concept Pragmatism, and Frege Cases.” Philosophical Studies 144, no. 2: 211–38.
Valiant, Leslie. 1984. “A Theory of the Learnable.” Communications of the ACM 27, no. 11: 1134–42.
Verdejo, Victor. 2013. “The Rationalist Reply to Fodor’s Analyticity and Circularity Challenge.” Theoria 28, no. 76: 7–25.
In the case of the concepts of creditworthiness and recidivism, claims of sameness or identity of the concept are, admittedly, complicated by the partial relativity of the concept to the relevant financial and legal frameworks.
Similarly, a model that attains a certain degree of accuracy in classifying trees using satellite images does not thereby possess the concept of a tree. (For discussion of such models within responses to climate change, see Véra Ehrenstein’s chapter in this volume.)
The reason for this, as we will see below, is that the ability to distinguish Cs from non-Cs is a dispositional one. It is widely accepted that something’s having the disposition to X does not entail that it will not fail to X in some subset of cases. See for example Michael Fara’s (2005) discussion.
It is to be noted that these epistemic capacities are only some of those appealed to by the Pragmatist, with different versions of Pragmatism appealing to different—and sometimes mutually-exclusive—capacities.
More properly, the Quinean account is an account of the (un)translatability of natural language and thus an account of linguistic terms, not concepts. Here, the term “Quinean” will be used loosely to refer to the naturalist account described—an account that, although not to the letter, is perhaps Quinean in spirit. See for example Quine (1960, Chapter 2).
Again, the requirement is a dispositional one and is compatible with the human failing to do so in some subset of cases.
The PAC model of learnability was first presented in Leslie Valiant (1984). Recent well-known expositions of the model that fall in line with the exposition below include discussions by Ethem Alpaydin (2004), Tom Mitchell (1997), and Mehyrar Mohri (2012). Later versions of the model extend the account to non-binary and unsupervised learning tasks. Discussion of the primary versions of the model are adequate for our purposes.
For discussion of the contrasting forms that some of these values would take within the Yorùbá numbering system, see Helen Verran’s chapter within this volume.
In the case of complex and high-dimensional data, these combinations can be extremely large in number. This does not itself affect the point made above.