

Buy anything from 5,000+ international stores. One checkout price. No surprise fees. Join 2M+ shoppers on Desertcart.
Desertcart purchases this item on your behalf and handles shipping, customs, and support to Indonesia.
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today. Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making. Understand how data science fits in your organization―and how you can use it for competitive advantage Treat data as a business asset that requires careful investment if you’re to gain real value Approach business problems data-analytically, using the data-mining process to gather good data in the most appropriate way Learn general concepts for actually extracting knowledge from data Apply data science principles when interviewing data science job candidates Review: READ THIS BOOK! - Data Science for Business by Foster Provost and Tom Fawcett is a very important book about data mining and data analytic thinking. In 1971, Abbie Hoffman shocked the world when he demanded hippie readers (at the time, a likely oxymoron) "Steal This Book". While I wouldn't go so far as to encourage current and future data scientists to shoplift, I will demand that they READ THIS BOOK! Not long ago, data was difficult and expensive to come by. Today, we're living in a world of far too much data, vast amounts of cheap computing power, and way too many poorly defined questions. Mix them all together and you're guaranteed to make a mess. Going from data dearth to plethora presents substantive issues. In business, the balance between gut feel decision-making and analysis paralysis is changing, rapidly. Whether it moves too far from gut to paralysis, only time will tell. Through Data Science for Business, Provost and Fawcett offer practitioners a guide to equilibrium. Read this book and you'll find yourself moving briskly down the road towards data analytic enlightenment. While not highly technical, the authors covers each topic with enough rigor to appreciate the tools being presented and the insights being offered. From the outset, the authors are clear about the book's objectives: "The primary goals of this book are to help you view business problems from a data perspective and understand principles of extracting useful knowledge from data. There is fundamental structure to data-analytic thinking, and basic principals that should be understood. There are also particular areas where intuition, creativity, common sense, and domain knowledge must be brought to bear… As you get better at data-analytic thinking you will develop intuition as to how and where to apply creativity and domain knowledge." This paragraph makes me think of all those undergrad and graduate students studying Statistics at Universities all over the world, my daughter included, who are being bombarded by one math or statistics class after another (Calculus III, Math Stat I and II, Linear Algebra, etc.). Yet, far too often, they enter the real world lacking "data analytic thinking" or a sense of "basic principals" They do, however, have a sense of being overwhelmed and under prepared. The epic battle between "frequentists" and "Bayesians", takes a back seat to what should be the real controversy in statistics departments around the world, the balance between "application" and "theory". The book's "primary goals" should be the walking orders of every statistics program at any college or university anywhere. From the outset (page 2), the authors state, "Data mining is a craft. It involves the application of a substantial amount of science and technology, but the proper application still involves art as well." Absolutely true! It's great to read this stuff! This is followed by a concise discussion of CRISP-DM, a well-defined data mining process, whose concepts are elementary, essential, and integral to the responsible, proper, and successful practice of data mining. From this point on, the authors proceed to accomplish their primary goals. They present such topics as predictive modeling, correlation, classification, clustering, regression, logistic regression, linear discriminants, and much more. Their presentations are user friendly, their real world examples are interesting, and their guidance and insights are extremely valuable. My criticisms are limited to their website. The Data Science for Business site leaves me wanting more real world examples to enjoy, access to more resources and tools of the trade, more references to peruse, and a more rigorous approach to some of the solutions. Perhaps Data Science for Business the sequel is on the horizon? Whether you're a seasoned statistician (or, data scientist), a young aspiring novice, or an adventurous business person looking to expand his/her horizons, Data Science for Business by Foster Provost and Tom Fawcett is well worth the price of admission and the reading time you'll invest. Foster Provost and Tom Fawcett state, "[i]deally, we envision a book that any data scientist would give to his collaborators…" I'll do them one better, I'm giving it to my daughter! Review: The profit curve is an excellent centerpiece. The slim book is necessary and important, but nowhere near sufficient. - It's an excellent, even mandatory book for your Data Science shelf. I am glad I bought it. I am 67% of the way through reading this book. It has nowhere near enough material on some areas, though, and is just missing some material that you need for DS. That's actually OK because of course no single book is enough to cover everything you need to know in a field. Look how many books you may have bought just to get an undergrad degree, and I bet it was not just one book. So here is a list of good and bad about this excellent book. Its good points: The profit curve. After reading this book, I will never use Accuracy to select a model any more, as that's nearly a worthless metric especially when there are marginal costs and marginal profits involved in an application scenario. The book is just amazingly good on describing how to select models based on estimated profit, and foremost the profit curve, and selected other supporting curves like ROC area under curve. The expected profit computation and the cost-benefit matrix as a partner to the confusion matrix. This is great stuff. It's not even described in other data science courses that I have taken. Other good points: ...And don't worry about the other good points (there are some). The profit curve analysis, and the lead-up to that, are superior. Its bad points: p.224: "We will train on the complete dataset and then test on the same dataset we trained on." What follows next the rest of the chapter is just an inappropriate error analysis, because it is overly optimistic (but otherwise the techniques are great.) The models have seen the training data. We should never completely assess (test) -- and base the entire remainder of the chapter material -- on error (accuracy) estimates produced from data that the models have already seen. In most chapters, there is just not enough detail in the material, to enable this book to be used as a "correct reference" basis against which to write your own working code as you follow along with the text in whatever computer language you want to use for analysis. In summary: The book is outstanding. It is necessary for your DS bookshelf, but on the other hand it is nowhere near sufficient. The data science course sequence by Johns Hopkins University identifies many of the elements of a nice overall outline as to what DS practitioners need to be able to do (and this is not even sufficient either): Reproducible research; Experimental design; R programming (or python, or perhaps SAS or Octave, but some mathy language for sure); Exploratory data analysis; Regression models; Statistical inference; Practical machine learning; Scientific writing; Developing data products; Big data techniques (e.g. Apache Spark programming or at least MapReduce-style programming); SQL and NoSQL databases; Concurrent, distributed, and parallel programming; Advanced statistics (such as multiple testing corrections). This book by Provost et al gives just a part of the necessary DS material. However the part it provides, is essential. I wish the biological data scientists in academia would adopt and integrate the cost-benefit matrix idea and the profit curve idea into their model selection techniques instead of just using the accuracy metric mostly. Also a data scientist could do several follow-on added-value extensions to the profit curve chapter. You could produce Revenue curve (or Cost) since sometimes that matters more. You could quickly find alternatives which are nearly equi-profitable to the optimal profit but which exhibit (less revenue, less cost) or (more revenue, more cost). You could detail the model selection and profit consequences of fixed budgets. You could further assess the implications of marginal profit analysis on the optimal quantity when the profitability ratio changes. You could directly assess the data science solution against the best business wisdom solution and estimate what amount of profit is lost when using the old business wisdom decisions. It's a testament to this book's strong value that you can do a lot more based on its material. Nice work. Recommended.















| Best Sellers Rank | #59,382 in Books ( See Top 100 in Books ) #14 in Data Mining (Books) #15 in Business Statistics #33 in Statistics (Books) |
| Customer Reviews | 4.5 out of 5 stars 1,350 Reviews |
T**D
READ THIS BOOK!
Data Science for Business by Foster Provost and Tom Fawcett is a very important book about data mining and data analytic thinking. In 1971, Abbie Hoffman shocked the world when he demanded hippie readers (at the time, a likely oxymoron) "Steal This Book". While I wouldn't go so far as to encourage current and future data scientists to shoplift, I will demand that they READ THIS BOOK! Not long ago, data was difficult and expensive to come by. Today, we're living in a world of far too much data, vast amounts of cheap computing power, and way too many poorly defined questions. Mix them all together and you're guaranteed to make a mess. Going from data dearth to plethora presents substantive issues. In business, the balance between gut feel decision-making and analysis paralysis is changing, rapidly. Whether it moves too far from gut to paralysis, only time will tell. Through Data Science for Business, Provost and Fawcett offer practitioners a guide to equilibrium. Read this book and you'll find yourself moving briskly down the road towards data analytic enlightenment. While not highly technical, the authors covers each topic with enough rigor to appreciate the tools being presented and the insights being offered. From the outset, the authors are clear about the book's objectives: "The primary goals of this book are to help you view business problems from a data perspective and understand principles of extracting useful knowledge from data. There is fundamental structure to data-analytic thinking, and basic principals that should be understood. There are also particular areas where intuition, creativity, common sense, and domain knowledge must be brought to bear… As you get better at data-analytic thinking you will develop intuition as to how and where to apply creativity and domain knowledge." This paragraph makes me think of all those undergrad and graduate students studying Statistics at Universities all over the world, my daughter included, who are being bombarded by one math or statistics class after another (Calculus III, Math Stat I and II, Linear Algebra, etc.). Yet, far too often, they enter the real world lacking "data analytic thinking" or a sense of "basic principals" They do, however, have a sense of being overwhelmed and under prepared. The epic battle between "frequentists" and "Bayesians", takes a back seat to what should be the real controversy in statistics departments around the world, the balance between "application" and "theory". The book's "primary goals" should be the walking orders of every statistics program at any college or university anywhere. From the outset (page 2), the authors state, "Data mining is a craft. It involves the application of a substantial amount of science and technology, but the proper application still involves art as well." Absolutely true! It's great to read this stuff! This is followed by a concise discussion of CRISP-DM, a well-defined data mining process, whose concepts are elementary, essential, and integral to the responsible, proper, and successful practice of data mining. From this point on, the authors proceed to accomplish their primary goals. They present such topics as predictive modeling, correlation, classification, clustering, regression, logistic regression, linear discriminants, and much more. Their presentations are user friendly, their real world examples are interesting, and their guidance and insights are extremely valuable. My criticisms are limited to their website. The Data Science for Business site leaves me wanting more real world examples to enjoy, access to more resources and tools of the trade, more references to peruse, and a more rigorous approach to some of the solutions. Perhaps Data Science for Business the sequel is on the horizon? Whether you're a seasoned statistician (or, data scientist), a young aspiring novice, or an adventurous business person looking to expand his/her horizons, Data Science for Business by Foster Provost and Tom Fawcett is well worth the price of admission and the reading time you'll invest. Foster Provost and Tom Fawcett state, "[i]deally, we envision a book that any data scientist would give to his collaborators…" I'll do them one better, I'm giving it to my daughter!
G**N
The profit curve is an excellent centerpiece. The slim book is necessary and important, but nowhere near sufficient.
It's an excellent, even mandatory book for your Data Science shelf. I am glad I bought it. I am 67% of the way through reading this book. It has nowhere near enough material on some areas, though, and is just missing some material that you need for DS. That's actually OK because of course no single book is enough to cover everything you need to know in a field. Look how many books you may have bought just to get an undergrad degree, and I bet it was not just one book. So here is a list of good and bad about this excellent book. Its good points: The profit curve. After reading this book, I will never use Accuracy to select a model any more, as that's nearly a worthless metric especially when there are marginal costs and marginal profits involved in an application scenario. The book is just amazingly good on describing how to select models based on estimated profit, and foremost the profit curve, and selected other supporting curves like ROC area under curve. The expected profit computation and the cost-benefit matrix as a partner to the confusion matrix. This is great stuff. It's not even described in other data science courses that I have taken. Other good points: ...And don't worry about the other good points (there are some). The profit curve analysis, and the lead-up to that, are superior. Its bad points: p.224: "We will train on the complete dataset and then test on the same dataset we trained on." What follows next the rest of the chapter is just an inappropriate error analysis, because it is overly optimistic (but otherwise the techniques are great.) The models have seen the training data. We should never completely assess (test) -- and base the entire remainder of the chapter material -- on error (accuracy) estimates produced from data that the models have already seen. In most chapters, there is just not enough detail in the material, to enable this book to be used as a "correct reference" basis against which to write your own working code as you follow along with the text in whatever computer language you want to use for analysis. In summary: The book is outstanding. It is necessary for your DS bookshelf, but on the other hand it is nowhere near sufficient. The data science course sequence by Johns Hopkins University identifies many of the elements of a nice overall outline as to what DS practitioners need to be able to do (and this is not even sufficient either): Reproducible research; Experimental design; R programming (or python, or perhaps SAS or Octave, but some mathy language for sure); Exploratory data analysis; Regression models; Statistical inference; Practical machine learning; Scientific writing; Developing data products; Big data techniques (e.g. Apache Spark programming or at least MapReduce-style programming); SQL and NoSQL databases; Concurrent, distributed, and parallel programming; Advanced statistics (such as multiple testing corrections). This book by Provost et al gives just a part of the necessary DS material. However the part it provides, is essential. I wish the biological data scientists in academia would adopt and integrate the cost-benefit matrix idea and the profit curve idea into their model selection techniques instead of just using the accuracy metric mostly. Also a data scientist could do several follow-on added-value extensions to the profit curve chapter. You could produce Revenue curve (or Cost) since sometimes that matters more. You could quickly find alternatives which are nearly equi-profitable to the optimal profit but which exhibit (less revenue, less cost) or (more revenue, more cost). You could detail the model selection and profit consequences of fixed budgets. You could further assess the implications of marginal profit analysis on the optimal quantity when the profitability ratio changes. You could directly assess the data science solution against the best business wisdom solution and estimate what amount of profit is lost when using the old business wisdom decisions. It's a testament to this book's strong value that you can do a lot more based on its material. Nice work. Recommended.
O**Y
This could be a winner...!
Context: I'm an MD, needing to communicate with data scientist to build a product. I've this far only read two chapters. My pattern-recognition ;) this far however, with an assessment that this will be applicable to the rest of the book is two-fold: 1) Too verbose! Too much stuff on explaining the structure and purpose of the book. Could've been said way more succinctly, and therefore more clearly. The effect is that I start skimming. 2) Not 'sharp' enough. The best non-fiction written for non-expert manages to reduce the complex into explaining the essence. Not making it simpler, and reducing crucial comprehension. But reducing the complex into its crucial essence. When going over different types of tasks; classification, regression, similarity matching, clustering, co-occurence grouping - the way they are described, there is essentially no difference between i.e. clustering, similarity matching and clustering; they're all classifications - yes, there is a difference between regression. In order for this to be truly helpful even for an absolute layman as myself, it needs to add enough crucial, essential distinctions to make the categories mutually exclusive. I can think about it, I can look it up. The book would however been better if the information was more 'sharply' communicated. So why 4-star? Because it is a beautiful balance for the amateur. Explaining basic concepts instead of trendy-applications. For future versions though, correcting for verbosity and greater specificity (essence) will make it a true winner.
A**S
Comprehensive introduction to an important and growing field
This book is ideal for anyone looking to understand data science, and especially those who might interact with data scientists at work. Roughly half the book deals with the essential data mining algorithms. The focus is on understanding what the algorithms do, not the details of how they do it, so implementation details are omitted. The math is certainly discussed, but kept to a minimum, and coupled with comprehensible, plain English explanations of each algorithm. Each chapter includes a case study illustrating how the algorithm can be used for a real-world problem. The other half of the book (interspersed between the algorithms) deals with issues relating to design, implementation, evaluation, and deployment of models. Without understanding these crucial ideas, the algorithmic knowledge is useless. For example, the right and wrong techniques for evaluating model performance are discussed at length. A businessperson without adequate background could easily be misled by certain evaluation metrics, and the reader is taught to evaluate model performance with a critical eye. There is also a chapter on evaluating and critiquing data mining proposals, which nicely ties together the algorithmic, business, and practical concepts discussed earlier in the book. Some case studies are revisited in several chapters at increasing levels of sophistication, making the book feel like a cohesive whole rather than a mere compilation of chapters. If you’re coming from a technical background, you will learn a great deal about the business and practical/implementation aspects of analytics. If you’re coming from a business background, you will gain an understanding of what your data can do for you, and how to use it to your benefit. The book is an intense but very pleasant read, even funny at times. Highly recommended!
S**A
The new reference for data mining professionals working in industry
Foster Provost and Tom Fawcett are known for their work on fraud detection, among others. I have recently read their last book, Data Science for Business – What you need to know about data mining and data-analytic thinking. No suspense: it’s one of the best data mining book I have ever read. Its style allows the book to be read by beginners, but its wide coverage and detailed case studies makes it a reference for experts as well. As the title suggest, the book has a real focus on business with plenty of industry examples and challenges. The style is very pleasant since authors have made efforts to put the reader in specific situations to better understand a problem. To be noted the very interesting discussion of data mining leaks as well as data mining automation. The book is divided by concepts and provides a focus on them (instead of techniques). Although no exercice is present, the book could easily be used as a resource for a course. Each chapter is clearly divided into basic and advanced topics. The evaluation phase of the data mining standard process is deeply discussed. The section about Bayes rule is very well written. Data Science for Business is also an excellent resource to avoid data mining pitfalls. Chapter 13 is a must-read in order to understand success factor for implementing data mining in a company. To conclude, targeted at both beginners and experts, Data Science for Business is the new reference for data mining professionals working in industry.
J**N
How Data Science Applies to Our Emerging Big Data World
Excellent discussion of data science methods without excessive focus on mathematical elements. These are included at a level that can be understood for the skilled marketer who has background but does not wish to go deep into the math. The coverage is broad with both supervised and unsupervised methods in data mining. Topics cover tree models to logistic regression, to scoring. A discussion of holdout model tests, prediction & validation. Particular emphasis is placed on how to frame questions to apply to the business case so suitable conclusions can guide business decisions and strategy. You will get the sense that the authors are battle tested veterans of the data mining business and have applied their creativity to a broad range of business, data and technical challenges. Only two caveats to this book. First, as purchaser of the kindle edition, I found the equations included in the text were sometimes very readable and sometimes the type was so small as not to be legible at all. Be warned. If you intend to follow the math that is included, perhaps the paper edition would be best. Second, this book does not dwell on the statistical packages that can be used to support data mining efforts. If you are interested in exploring these methods in practice, you will need to look further.
R**H
misses some major points
although billed, at least in part, as aimed at "business people who will be working with data scientists, managing data science-oriented projects, or investing in data science ventures" (p xiii), the book never points out that all analytic techniques make assumptions and that the data scientist needs to be questioned about that (when they don't mention it upfront) and questioned about what happens when assumptions are violated; in addition, many, maybe most, techniques have biases and these are never mentioned either; there is also no discussion of bootstrap (the authors use cross-validation instead thus, generally, wasting information) or of external validation and no warnings about what to beware of when using surrogates; at a lower level, the book is generally readable and generally well-informed but needs to be supplemented with something that covers how to, at least, question the technical people about assumptions and biases
L**G
A must-read book for aspring data scientist or data science team manager
Needless to say, it's the best book I've ever read that perfectly combines the technical details and high level intuition. "Big data" might sound daunting recent days since AI, machine learning, deep learning based applications are in wide spread whenever you open your browse, turn on your cell phone or etc. But you will feel much less whelmed by reading this book. It provides you with a unique experience in that it bridges the practical business problems and machine learning models. Roughly speaking, most of books I've read are short in either of two domains: interpretability and rigorousness. This is fills in the hole pretty well. If you are a data science manager and want to better understand what your team members are doing, this book gives you a snapshot. If you are a data scientist with years' training in statistics and computer science, this book can help you develop your understanding of the business problems in practice and offer you a different angle of analyzing them. In conclusion, 5/5 star, a must-have book that should be on the shelf of each other wants to work in the data related field.
J**C
Great intro and quick refresher course
I found this book great to refresh some key concepts after being away for the field for many years. The contents are good, well organised and they cover most of what you need to know. The approach is not theoretical but practical and to the point. The examples are also good as it is the level of detail. And you have enough references to go deeper if you need. Great job, I would love to have a second book to go deeper.
T**E
Buena compra
Muy bueno. Explica algunas técnicas pero me ha gustado sobretodo por como explica los fundamentos. Un bue libro para empezar con el tema del data science....
P**N
Five Stars
Highly recommended book for those who wnat to hands on data science and business principles of machine learning
A**O
Perfetto per iniziare, ma anche per chi ha già esperienza
Un ottimo manuale per comprendere l'ABC della data science, adatto sia a chi non sa nulla sia a chi è navigato ed esperto. Credo sia adatto a tutte le diverse tipologie di soggetti: lo sviluppatore, il manager, il dirigente, l'operativo, il ricercatore, l'analista... C'è materiale per tutti e il linguaggio è tarato in base alle diverse tipologie di interlocutore. Consigliato. ATTENZIONE: è in inglese
J**I
Outstanding book
I really appreciate this kind of book, that is able to elaborate on complex topics without loosing the reader presenting only technical aspects of if. I recommend Data Science for Business to every person working in the Business Analysis area or with any Data-oriented area.
Trustpilot
4 days ago
1 month ago