Difference between revisions of "Data Science: An Introduction"

From wiki.acadac.net, the Calvin Andrus wiki
Jump to: navigation, search
m (References)
m (remove old version)
 
(47 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[image:DataScienceLogo.png|150px|right]]
+
The Current Version resides at:
<center>Welcome to
 
<br>       
 
<big><div style="font-size:200%;margin:.5ex 0 .5ex 0"><font color="#0000FF">'''An Introduction to Data Science'''</font></div></big>
 
<br>
 
Wikibook
 
<br><br>
 
[[Image:Cc-by-nc-sa.png]]
 
</center>
 
----
 
<small>(Back to [[D. Calvin Andrus, Ph.D.|Home]])</small><br>
 
[[An Introduction to Data Science/Navigation]]
 
<br>
 
{{Book Search}}
 
<noinclude>{{An Introduction to Data Science/Navigation}}</noinclude>
 
  
__NOTOC__
+
http://en.wikibooks.org/wiki/Data_Science:_An_Introduction
  
==(Comment)==
+
<small>(Back to [[D. Calvin Andrus, Ph.D.|Home]])</small><br>
This is the beginnings of a draft of a [http://en.wikibooks.org WIKIBOOKS] books.  When I get it into reasonable shape, I will transfer it to WIKIBOOKS for the wider community to improve.  As of today (14 April 2012) there is not a WIKIBOOKS book on Data Science.  These pages are locked to keep the spammers from overrunning my wiki.  You will be able to contribute to the book once it is transferred.  If you want to make comments or contributions before then, you should email me at calvin.andrus@gmail.com.
+
----
 
 
==Preface==
 
This book is a very basic introduction to data science.  It is designed for the advanced high school student or average college freshman with a high school-level understanding of math, science, word processing and spreadsheets.  No understanding of computer science is assumed.
 
 
 
Data science--as an academic discipline unto itself--is new, having been born in the first decade of the 21st century.  Its parent disciplines (scientific methods, data and software engineering, and statistics), are all very mature.  This book is not intended to do justice to any of those disciplines by themselves, but to bring them together in a productive synthesis.  As such, the student will be introduced to the parent disciplines and then given exercises that will fuse the parental disciplines into data science.
 
 
 
Obviously, a mature data scientist will be proficient in each of the parent disciplines, studying them individually and combining them to solve serious data problems.  This text book is but just a the first tentative step in that direction.
 
 
 
We will do most of our data manipulation, computer programming, and statistical analysis in the open source [http://en.wikipedia.org/wiki/R_%28programming_language%29 '''R'''] package.  We know that for each task an intermediate or advanced student would use other tools such as [http://en.wikipedia.org/wiki/Mysql MySQL], [http://en.wikipedia.org/wiki/PHP PHP], [http://en.wikipedia.org/wiki/Python_%28programming_language%29 Python], [http://en.wikipedia.org/wiki/Java_%28programming_language%29 Java], [http://en.wikipedia.org/wiki/Apache_Hadoop Hadoop], [http://en.wikipedia.org/wiki/Hbase HBase], [http://en.wikipedia.org/wiki/Machine_learning Machine Learning], [http://en.wikipedia.org/wiki/Matlab MATLAB], [http://en.wikipedia.org/wiki/SPSS SPSS], [http://en.wikipedia.org/wiki/SAS_%28software%29 SAS], etc.  For this introduction, however, we are keeping it simple and sticking to just a single general purpose computing environment.
 
 
 
Finally, we try to use terms which are already defined in the Wikipedia.  This way people can refer to the corresponding Wikipedia page to get a deeper understanding of the concept.  (As of this writing, there is no Wikipedia entry for ''Data Science.''
 
 
 
==Note to Contributors==
 
First, please register yourself with WIKIBOOKS, so that we know who are co-contributors are.  Thank you.
 
 
 
Second, this is a cross-disciplinary book. We want to help people apply data science to all fields. Therefore, we need a wide variety of examples and exercises.
 
 
 
Thirdly, we only need basic, clear, straight-forward introductions to the parent disciplines.  There are other venues to wax eloquent on the deepness and complexities of the parent disciplines.  Please place yourself in a "beginner's mind" as you make contributions.
 
 
 
Fourthly, as with any WIKIBOOK please feel free to make corrections, expand explanations, and make additions where necessary, even if it is not "your" section.  Use the discussion page to explain changes that might be controversial.
 
 
 
Fifthly, some rules:
 
 
 
* Put the name of functions and code snippets using the 'code' tags: <code><nowiki><code>lm()</code></nowiki></code>
 
* Use references to package documentations, academic literature and wikipedia.
 
* Use the citations templates to make citations : [[Template:Cite book]], [[Template:Cite web]], [[Template:Cite journal]]
 
* If you want to add a graph, you should load it on [[Commons:Special:UploadWizard|Commons]] and add the tag <code><nowiki>{{Created with R}}</nowiki></code>.
 
* If using a different package than '''R''' standard packages, put the name of the package in parenthesis after each function : <nowiki><code>MCMCprobit()</code> ('''MCMCpack''')</nowiki>
 
* Put the name of non-standard '''R''' packages in bold : <code><nowiki>'''MCMCpack'''</nowiki></code>
 
 
 
==List of Contributors==
 
 
 
 
 
 
 
== See Also ==
 
 
 
See the following WIKIBOOKS for good companion texts to this introduction:
 
 
 
* The Scientific Method - [http://en.wikibooks.org/wiki/Scientific_Method Scientific Method]
 
* Data Engineering - [http://en.wikibooks.org/wiki/Relational_Database_Design Relational Database Desgin], [http://en.wikibooks.org/wiki/Data_Structures Data Structures], [http://en.wikibooks.org/wiki/SQL SQL]
 
* Software Engineering - [http://en.wikibooks.org/wiki/The_Science_of_Programming The Science of Programming], [http://en.wikibooks.org/wiki/R_Programming R Programming]
 
* Statistical Analysis - [http://en.wikibooks.org/wiki/Statistics Statistics], [http://en.wikibooks.org/wiki/Statistical_Analysis:_an_Introduction_using_R Statistical Analysis: an Introduction using R], [http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R Data Mining Algorithms in R]
 
 
 
== References ==
 
 
 
{{reflist}}
 
 
 
* [http://cdn.oreilly.com/radar/2010/06/What_is_Data_Science.pdf (2012) O'Reilly's What is Data Science]
 
* [http://www.emc.com/collateral/about/news/emc-data-science-study-wp.pdf (2012) EMC Data Science Study]
 
 
 
==Copyright Notice==
 
While this book is in draft on my wiki it is licensed under the [http://creativecommons.org/ Creative Commons] 3.0 license:
 
 
 
::::[[Image:Cc-by-nc-sa.png]]
 
 
 
You are free:
 
* to '''Share''' — to copy, distribute, display, and perform the work (pages from this wiki)
 
* to '''Remix''' — to adapt or make derivative works
 
Under the following conditions:
 
* '''Attribution''' — You must attribute this work to me by name (Calvin Andrus), by page title, by source (wiki.acadac.net), by date, and by version number (if available).  You may not suggest that I, in any way, endorse you or your use of this work.
 
* '''Noncommercial''' — You may not use this work for commercial purposes.
 
* '''Share Alike''' — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
 
* '''Waiver''' — Any of the above conditions can be waived if you get permission from the copyright holder.
 
* '''Public Domain''' — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
 
* '''Other Rights''' — In no way are any of the following rights affected by the license:
 
:* Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
 
:* The author's moral rights;
 
:* Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.
 
* '''Notice''' — For any reuse or distribution, you must make clear to others the license terms of this work.The best way to do this is with a link to the following web page.
 
::http://creativecommons.org/licenses/by-nc-sa/3.0/
 

Latest revision as of 18:52, 11 August 2012

The Current Version resides at:

http://en.wikibooks.org/wiki/Data_Science:_An_Introduction

(Back to Home)