

Buy Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking on desertcart.com ✓ FREE SHIPPING on qualified orders Review: READ THIS BOOK! - Data Science for Business by Foster Provost and Tom Fawcett is a very important book about data mining and data analytic thinking. In 1971, Abbie Hoffman shocked the world when he demanded hippie readers (at the time, a likely oxymoron) "Steal This Book". While I wouldn't go so far as to encourage current and future data scientists to shoplift, I will demand that they READ THIS BOOK! Not long ago, data was difficult and expensive to come by. Today, we're living in a world of far too much data, vast amounts of cheap computing power, and way too many poorly defined questions. Mix them all together and you're guaranteed to make a mess. Going from data dearth to plethora presents substantive issues. In business, the balance between gut feel decision-making and analysis paralysis is changing, rapidly. Whether it moves too far from gut to paralysis, only time will tell. Through Data Science for Business, Provost and Fawcett offer practitioners a guide to equilibrium. Read this book and you'll find yourself moving briskly down the road towards data analytic enlightenment. While not highly technical, the authors covers each topic with enough rigor to appreciate the tools being presented and the insights being offered. From the outset, the authors are clear about the book's objectives: "The primary goals of this book are to help you view business problems from a data perspective and understand principles of extracting useful knowledge from data. There is fundamental structure to data-analytic thinking, and basic principals that should be understood. There are also particular areas where intuition, creativity, common sense, and domain knowledge must be brought to bear… As you get better at data-analytic thinking you will develop intuition as to how and where to apply creativity and domain knowledge." This paragraph makes me think of all those undergrad and graduate students studying Statistics at Universities all over the world, my daughter included, who are being bombarded by one math or statistics class after another (Calculus III, Math Stat I and II, Linear Algebra, etc.). Yet, far too often, they enter the real world lacking "data analytic thinking" or a sense of "basic principals" They do, however, have a sense of being overwhelmed and under prepared. The epic battle between "frequentists" and "Bayesians", takes a back seat to what should be the real controversy in statistics departments around the world, the balance between "application" and "theory". The book's "primary goals" should be the walking orders of every statistics program at any college or university anywhere. From the outset (page 2), the authors state, "Data mining is a craft. It involves the application of a substantial amount of science and technology, but the proper application still involves art as well." Absolutely true! It's great to read this stuff! This is followed by a concise discussion of CRISP-DM, a well-defined data mining process, whose concepts are elementary, essential, and integral to the responsible, proper, and successful practice of data mining. From this point on, the authors proceed to accomplish their primary goals. They present such topics as predictive modeling, correlation, classification, clustering, regression, logistic regression, linear discriminants, and much more. Their presentations are user friendly, their real world examples are interesting, and their guidance and insights are extremely valuable. My criticisms are limited to their website. The Data Science for Business site leaves me wanting more real world examples to enjoy, access to more resources and tools of the trade, more references to peruse, and a more rigorous approach to some of the solutions. Perhaps Data Science for Business the sequel is on the horizon? Whether you're a seasoned statistician (or, data scientist), a young aspiring novice, or an adventurous business person looking to expand his/her horizons, Data Science for Business by Foster Provost and Tom Fawcett is well worth the price of admission and the reading time you'll invest. Foster Provost and Tom Fawcett state, "[i]deally, we envision a book that any data scientist would give to his collaborators…" I'll do them one better, I'm giving it to my daughter! Review: The profit curve is an excellent centerpiece. The slim book is necessary and important, but nowhere near sufficient. - It's an excellent, even mandatory book for your Data Science shelf. I am glad I bought it. I am 67% of the way through reading this book. It has nowhere near enough material on some areas, though, and is just missing some material that you need for DS. That's actually OK because of course no single book is enough to cover everything you need to know in a field. Look how many books you may have bought just to get an undergrad degree, and I bet it was not just one book. So here is a list of good and bad about this excellent book. Its good points: The profit curve. After reading this book, I will never use Accuracy to select a model any more, as that's nearly a worthless metric especially when there are marginal costs and marginal profits involved in an application scenario. The book is just amazingly good on describing how to select models based on estimated profit, and foremost the profit curve, and selected other supporting curves like ROC area under curve. The expected profit computation and the cost-benefit matrix as a partner to the confusion matrix. This is great stuff. It's not even described in other data science courses that I have taken. Other good points: ...And don't worry about the other good points (there are some). The profit curve analysis, and the lead-up to that, are superior. Its bad points: p.224: "We will train on the complete dataset and then test on the same dataset we trained on." What follows next the rest of the chapter is just an inappropriate error analysis, because it is overly optimistic (but otherwise the techniques are great.) The models have seen the training data. We should never completely assess (test) -- and base the entire remainder of the chapter material -- on error (accuracy) estimates produced from data that the models have already seen. In most chapters, there is just not enough detail in the material, to enable this book to be used as a "correct reference" basis against which to write your own working code as you follow along with the text in whatever computer language you want to use for analysis. In summary: The book is outstanding. It is necessary for your DS bookshelf, but on the other hand it is nowhere near sufficient. The data science course sequence by Johns Hopkins University identifies many of the elements of a nice overall outline as to what DS practitioners need to be able to do (and this is not even sufficient either): Reproducible research; Experimental design; R programming (or python, or perhaps SAS or Octave, but some mathy language for sure); Exploratory data analysis; Regression models; Statistical inference; Practical machine learning; Scientific writing; Developing data products; Big data techniques (e.g. Apache Spark programming or at least MapReduce-style programming); SQL and NoSQL databases; Concurrent, distributed, and parallel programming; Advanced statistics (such as multiple testing corrections). This book by Provost et al gives just a part of the necessary DS material. However the part it provides, is essential. I wish the biological data scientists in academia would adopt and integrate the cost-benefit matrix idea and the profit curve idea into their model selection techniques instead of just using the accuracy metric mostly. Also a data scientist could do several follow-on added-value extensions to the profit curve chapter. You could produce Revenue curve (or Cost) since sometimes that matters more. You could quickly find alternatives which are nearly equi-profitable to the optimal profit but which exhibit (less revenue, less cost) or (more revenue, more cost). You could detail the model selection and profit consequences of fixed budgets. You could further assess the implications of marginal profit analysis on the optimal quantity when the profitability ratio changes. You could directly assess the data science solution against the best business wisdom solution and estimate what amount of profit is lost when using the old business wisdom decisions. It's a testament to this book's strong value that you can do a lot more based on its material. Nice work. Recommended.















| Best Sellers Rank | #34,307 in Books ( See Top 100 in Books ) #3 in Data Mining (Books) #6 in Business Statistics #13 in Statistics (Books) |
| Customer Reviews | 4.5 out of 5 stars 1,348 Reviews |
T**D
READ THIS BOOK!
Data Science for Business by Foster Provost and Tom Fawcett is a very important book about data mining and data analytic thinking. In 1971, Abbie Hoffman shocked the world when he demanded hippie readers (at the time, a likely oxymoron) "Steal This Book". While I wouldn't go so far as to encourage current and future data scientists to shoplift, I will demand that they READ THIS BOOK! Not long ago, data was difficult and expensive to come by. Today, we're living in a world of far too much data, vast amounts of cheap computing power, and way too many poorly defined questions. Mix them all together and you're guaranteed to make a mess. Going from data dearth to plethora presents substantive issues. In business, the balance between gut feel decision-making and analysis paralysis is changing, rapidly. Whether it moves too far from gut to paralysis, only time will tell. Through Data Science for Business, Provost and Fawcett offer practitioners a guide to equilibrium. Read this book and you'll find yourself moving briskly down the road towards data analytic enlightenment. While not highly technical, the authors covers each topic with enough rigor to appreciate the tools being presented and the insights being offered. From the outset, the authors are clear about the book's objectives: "The primary goals of this book are to help you view business problems from a data perspective and understand principles of extracting useful knowledge from data. There is fundamental structure to data-analytic thinking, and basic principals that should be understood. There are also particular areas where intuition, creativity, common sense, and domain knowledge must be brought to bear… As you get better at data-analytic thinking you will develop intuition as to how and where to apply creativity and domain knowledge." This paragraph makes me think of all those undergrad and graduate students studying Statistics at Universities all over the world, my daughter included, who are being bombarded by one math or statistics class after another (Calculus III, Math Stat I and II, Linear Algebra, etc.). Yet, far too often, they enter the real world lacking "data analytic thinking" or a sense of "basic principals" They do, however, have a sense of being overwhelmed and under prepared. The epic battle between "frequentists" and "Bayesians", takes a back seat to what should be the real controversy in statistics departments around the world, the balance between "application" and "theory". The book's "primary goals" should be the walking orders of every statistics program at any college or university anywhere. From the outset (page 2), the authors state, "Data mining is a craft. It involves the application of a substantial amount of science and technology, but the proper application still involves art as well." Absolutely true! It's great to read this stuff! This is followed by a concise discussion of CRISP-DM, a well-defined data mining process, whose concepts are elementary, essential, and integral to the responsible, proper, and successful practice of data mining. From this point on, the authors proceed to accomplish their primary goals. They present such topics as predictive modeling, correlation, classification, clustering, regression, logistic regression, linear discriminants, and much more. Their presentations are user friendly, their real world examples are interesting, and their guidance and insights are extremely valuable. My criticisms are limited to their website. The Data Science for Business site leaves me wanting more real world examples to enjoy, access to more resources and tools of the trade, more references to peruse, and a more rigorous approach to some of the solutions. Perhaps Data Science for Business the sequel is on the horizon? Whether you're a seasoned statistician (or, data scientist), a young aspiring novice, or an adventurous business person looking to expand his/her horizons, Data Science for Business by Foster Provost and Tom Fawcett is well worth the price of admission and the reading time you'll invest. Foster Provost and Tom Fawcett state, "[i]deally, we envision a book that any data scientist would give to his collaborators…" I'll do them one better, I'm giving it to my daughter!
G**N
The profit curve is an excellent centerpiece. The slim book is necessary and important, but nowhere near sufficient.
It's an excellent, even mandatory book for your Data Science shelf. I am glad I bought it. I am 67% of the way through reading this book. It has nowhere near enough material on some areas, though, and is just missing some material that you need for DS. That's actually OK because of course no single book is enough to cover everything you need to know in a field. Look how many books you may have bought just to get an undergrad degree, and I bet it was not just one book. So here is a list of good and bad about this excellent book. Its good points: The profit curve. After reading this book, I will never use Accuracy to select a model any more, as that's nearly a worthless metric especially when there are marginal costs and marginal profits involved in an application scenario. The book is just amazingly good on describing how to select models based on estimated profit, and foremost the profit curve, and selected other supporting curves like ROC area under curve. The expected profit computation and the cost-benefit matrix as a partner to the confusion matrix. This is great stuff. It's not even described in other data science courses that I have taken. Other good points: ...And don't worry about the other good points (there are some). The profit curve analysis, and the lead-up to that, are superior. Its bad points: p.224: "We will train on the complete dataset and then test on the same dataset we trained on." What follows next the rest of the chapter is just an inappropriate error analysis, because it is overly optimistic (but otherwise the techniques are great.) The models have seen the training data. We should never completely assess (test) -- and base the entire remainder of the chapter material -- on error (accuracy) estimates produced from data that the models have already seen. In most chapters, there is just not enough detail in the material, to enable this book to be used as a "correct reference" basis against which to write your own working code as you follow along with the text in whatever computer language you want to use for analysis. In summary: The book is outstanding. It is necessary for your DS bookshelf, but on the other hand it is nowhere near sufficient. The data science course sequence by Johns Hopkins University identifies many of the elements of a nice overall outline as to what DS practitioners need to be able to do (and this is not even sufficient either): Reproducible research; Experimental design; R programming (or python, or perhaps SAS or Octave, but some mathy language for sure); Exploratory data analysis; Regression models; Statistical inference; Practical machine learning; Scientific writing; Developing data products; Big data techniques (e.g. Apache Spark programming or at least MapReduce-style programming); SQL and NoSQL databases; Concurrent, distributed, and parallel programming; Advanced statistics (such as multiple testing corrections). This book by Provost et al gives just a part of the necessary DS material. However the part it provides, is essential. I wish the biological data scientists in academia would adopt and integrate the cost-benefit matrix idea and the profit curve idea into their model selection techniques instead of just using the accuracy metric mostly. Also a data scientist could do several follow-on added-value extensions to the profit curve chapter. You could produce Revenue curve (or Cost) since sometimes that matters more. You could quickly find alternatives which are nearly equi-profitable to the optimal profit but which exhibit (less revenue, less cost) or (more revenue, more cost). You could detail the model selection and profit consequences of fixed budgets. You could further assess the implications of marginal profit analysis on the optimal quantity when the profitability ratio changes. You could directly assess the data science solution against the best business wisdom solution and estimate what amount of profit is lost when using the old business wisdom decisions. It's a testament to this book's strong value that you can do a lot more based on its material. Nice work. Recommended.
K**R
Well-organized text
This is an excellent textbook on data science. The text itself explains concepts and theories well and provides definitions, examples, and formulas that help the reader understand and apply these concepts. The information presented is well-organized, and the visual aids include ample graphs and charts. Section breaks are obvious with well-designed titles. Chapters are easy enough to read but don't over-simplify important concepts. Inclusion of Glossary, Bibliography, and index, as well as a detailed table of contents, makes it easy to navigate. The only exception our instructor took with the text during my course was their insistence that only the best data scientists should be considered. Removing this bias, the information provided was clear, concise, and helpful for anyone working with big data or in data analytics.
S**A
The new reference for data mining professionals working in industry
Foster Provost and Tom Fawcett are known for their work on fraud detection, among others. I have recently read their last book, Data Science for Business – What you need to know about data mining and data-analytic thinking. No suspense: it’s one of the best data mining book I have ever read. Its style allows the book to be read by beginners, but its wide coverage and detailed case studies makes it a reference for experts as well. As the title suggest, the book has a real focus on business with plenty of industry examples and challenges. The style is very pleasant since authors have made efforts to put the reader in specific situations to better understand a problem. To be noted the very interesting discussion of data mining leaks as well as data mining automation. The book is divided by concepts and provides a focus on them (instead of techniques). Although no exercice is present, the book could easily be used as a resource for a course. Each chapter is clearly divided into basic and advanced topics. The evaluation phase of the data mining standard process is deeply discussed. The section about Bayes rule is very well written. Data Science for Business is also an excellent resource to avoid data mining pitfalls. Chapter 13 is a must-read in order to understand success factor for implementing data mining in a company. To conclude, targeted at both beginners and experts, Data Science for Business is the new reference for data mining professionals working in industry.
A**S
Comprehensive introduction to an important and growing field
This book is ideal for anyone looking to understand data science, and especially those who might interact with data scientists at work. Roughly half the book deals with the essential data mining algorithms. The focus is on understanding what the algorithms do, not the details of how they do it, so implementation details are omitted. The math is certainly discussed, but kept to a minimum, and coupled with comprehensible, plain English explanations of each algorithm. Each chapter includes a case study illustrating how the algorithm can be used for a real-world problem. The other half of the book (interspersed between the algorithms) deals with issues relating to design, implementation, evaluation, and deployment of models. Without understanding these crucial ideas, the algorithmic knowledge is useless. For example, the right and wrong techniques for evaluating model performance are discussed at length. A businessperson without adequate background could easily be misled by certain evaluation metrics, and the reader is taught to evaluate model performance with a critical eye. There is also a chapter on evaluating and critiquing data mining proposals, which nicely ties together the algorithmic, business, and practical concepts discussed earlier in the book. Some case studies are revisited in several chapters at increasing levels of sophistication, making the book feel like a cohesive whole rather than a mere compilation of chapters. If you’re coming from a technical background, you will learn a great deal about the business and practical/implementation aspects of analytics. If you’re coming from a business background, you will gain an understanding of what your data can do for you, and how to use it to your benefit. The book is an intense but very pleasant read, even funny at times. Highly recommended!
W**S
Excellent Introductory Summary Of Data Science
Data Science for Business is an ideal book for introducing someone to Data Science. The authors have tried to break down their knowledge into simple explanations. I am skeptical of non-technical Data Science books, but this one works well. In the beginning we are shown the motivations for Data Science and what fields they apply to. Some examples include movie recommendations, credit card charges, telecom churn rate, and automated analysis of stock market news. The book avoids going into the highly technical parts of creating the system but gives you links for where to go. They do not really reveal the whole Data Science stack. For example Hadoop was mentioned as an implementation of MapReduce but they said going into Hadoop configuration would be too detailed for this type of book. I tended to agree, and even being a progammer myself, I thought they made the right choice to leave that out. Where the book shines is in the explanations. I am very familiar with expected value calculations and there was a chapter on this. It was a much better high level discussion than I have seen elsewhere, and they mentioned possible pitfalls of the expected value framework. I liked that the emphasis was on deciding what problem to solve in Data Science. The title of the book is appropriate as it is not just about analyzing data, but figuring out the business case. If you are new to Data Science or looking to get a high level overview this book is an great place to start.
R**H
misses some major points
although billed, at least in part, as aimed at "business people who will be working with data scientists, managing data science-oriented projects, or investing in data science ventures" (p xiii), the book never points out that all analytic techniques make assumptions and that the data scientist needs to be questioned about that (when they don't mention it upfront) and questioned about what happens when assumptions are violated; in addition, many, maybe most, techniques have biases and these are never mentioned either; there is also no discussion of bootstrap (the authors use cross-validation instead thus, generally, wasting information) or of external validation and no warnings about what to beware of when using surrogates; at a lower level, the book is generally readable and generally well-informed but needs to be supplemented with something that covers how to, at least, question the technical people about assumptions and biases
L**G
A must-read book for aspring data scientist or data science team manager
Needless to say, it's the best book I've ever read that perfectly combines the technical details and high level intuition. "Big data" might sound daunting recent days since AI, machine learning, deep learning based applications are in wide spread whenever you open your browse, turn on your cell phone or etc. But you will feel much less whelmed by reading this book. It provides you with a unique experience in that it bridges the practical business problems and machine learning models. Roughly speaking, most of books I've read are short in either of two domains: interpretability and rigorousness. This is fills in the hole pretty well. If you are a data science manager and want to better understand what your team members are doing, this book gives you a snapshot. If you are a data scientist with years' training in statistics and computer science, this book can help you develop your understanding of the business problems in practice and offer you a different angle of analyzing them. In conclusion, 5/5 star, a must-have book that should be on the shelf of each other wants to work in the data related field.
TrustPilot
1 个月前
4天前