Difference between revisions of "Data Science: An Introduction"

From wiki.acadac.net, the Calvin Andrus wiki
Jump to: navigation, search
m (References)
m (Preface)
Line 23: Line 23:
 
This book is a very basic introduction to data science.  It is designed for the advanced high school student or average college freshman with a high school-level understanding of math, science, word processing and spreadsheets.  No understanding of computer science is assumed.
 
This book is a very basic introduction to data science.  It is designed for the advanced high school student or average college freshman with a high school-level understanding of math, science, word processing and spreadsheets.  No understanding of computer science is assumed.
  
Data science--as an academic discipline unto itself--is new, having been born in the first decade of the 21st century.  Its parent disciplines (scientific methods, data and software engineering, and statistics), are all very mature.  This book is not intended to do justice to any of those disciplines by themselves, but to bring them together in a productive synthesis.  As such, the student will be introduced to the parent disciplines and then given exercises that will fuse the parental disciplines into data science.
+
Data science--as an academic discipline unto itself--is new, having been born in the first decade of the 21st century.  Its parent disciplines (scientific methods, data and software engineering, statistics, and the sub-discipline of visualization), are all very mature.  This book is not intended to do justice to any of those disciplines by themselves, but to bring them together in a productive synthesis.  As such, the student will be introduced to the parent disciplines and then given exercises that will fuse the parental disciplines into data science.
  
 
Obviously, a mature data scientist will be proficient in each of the parent disciplines, studying them individually and combining them to solve serious data problems.  This text book is but just a the first tentative step in that direction.
 
Obviously, a mature data scientist will be proficient in each of the parent disciplines, studying them individually and combining them to solve serious data problems.  This text book is but just a the first tentative step in that direction.

Revision as of 09:19, 16 April 2012

DataScienceLogo.png
Welcome to


An Introduction to Data Science


Wikibook

Cc-by-nc-sa.png


(Back to Home)
An Introduction to Data Science/Navigation
Template:Book Search Template:An Introduction to Data Science/Navigation


(Comment)

This is the beginnings of a draft of a WIKIBOOKS books. When I get it into reasonable shape, I will transfer it to WIKIBOOKS for the wider community to improve. As of today (14 April 2012) there is not a WIKIBOOKS book on Data Science. These pages are locked to keep the spammers from overrunning my wiki. You will be able to contribute to the book once it is transferred. If you want to make comments or contributions before then, you should email me at calvin.andrus@gmail.com.

Preface

This book is a very basic introduction to data science. It is designed for the advanced high school student or average college freshman with a high school-level understanding of math, science, word processing and spreadsheets. No understanding of computer science is assumed.

Data science--as an academic discipline unto itself--is new, having been born in the first decade of the 21st century. Its parent disciplines (scientific methods, data and software engineering, statistics, and the sub-discipline of visualization), are all very mature. This book is not intended to do justice to any of those disciplines by themselves, but to bring them together in a productive synthesis. As such, the student will be introduced to the parent disciplines and then given exercises that will fuse the parental disciplines into data science.

Obviously, a mature data scientist will be proficient in each of the parent disciplines, studying them individually and combining them to solve serious data problems. This text book is but just a the first tentative step in that direction.

We will do most of our data manipulation, computer programming, and statistical analysis in the open source R package. We know that for each task an intermediate or advanced student would use other tools such as MySQL, PHP, Python, Java, Hadoop, HBase, Machine Learning, MATLAB, SPSS, SAS, etc. For this introduction, however, we are keeping it simple and sticking to just a single general purpose computing environment.

Finally, we try to use terms which are already defined in the Wikipedia. This way people can refer to the corresponding Wikipedia page to get a deeper understanding of the concept. (As of this writing, there is no Wikipedia entry for Data Science.

Note to Contributors

First, please register yourself with WIKIBOOKS, so that we know who are co-contributors are. Thank you.

Second, this is a cross-disciplinary book. We want to help people apply data science to all fields. Therefore, we need a wide variety of examples and exercises.

Thirdly, we only need basic, clear, straight-forward introductions to the parent disciplines. There are other venues to wax eloquent on the deepness and complexities of the parent disciplines. Please place yourself in a "beginner's mind" as you make contributions.

Fourthly, as with any WIKIBOOK please feel free to make corrections, expand explanations, and make additions where necessary, even if it is not "your" section. Use the discussion page to explain changes that might be controversial.

Fifthly, some rules:

  • Put the name of functions and code snippets using the 'code' tags: <code>lm()</code>
  • Use references to package documentations, academic literature and wikipedia.
  • Use the citations templates to make citations : Template:Cite book, Template:Cite web, Template:Cite journal
  • If you want to add a graph, you should load it on Commons and add the tag {{Created with R}}.
  • If using a different package than R standard packages, put the name of the package in parenthesis after each function : <code>MCMCprobit()</code> ('''MCMCpack''')
  • Put the name of non-standard R packages in bold : '''MCMCpack'''

List of Contributors

See Also

See the following WIKIBOOKS for good companion texts to this introduction:

References

Template:Reflist

Copyright Notice

While this book is in draft on my wiki it is licensed under the Creative Commons 3.0 license:

Cc-by-nc-sa.png

You are free:

  • to Share — to copy, distribute, display, and perform the work (pages from this wiki)
  • to Remix — to adapt or make derivative works

Under the following conditions:

  • Attribution — You must attribute this work to me by name (Calvin Andrus), by page title, by source (wiki.acadac.net), by date, and by version number (if available). You may not suggest that I, in any way, endorse you or your use of this work.
  • Noncommercial — You may not use this work for commercial purposes.
  • Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
  • Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.
  • Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
  • Other Rights — In no way are any of the following rights affected by the license:
  • Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
  • The author's moral rights;
  • Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.
  • Notice — For any reuse or distribution, you must make clear to others the license terms of this work.The best way to do this is with a link to the following web page.
http://creativecommons.org/licenses/by-nc-sa/3.0/