YaMLC++

Tsinghua University P. R. China

Yet another Machine Learning C++

Jianguo Lee

State Key Lab for Intelligent Technology & Systems

Department of Automation

L E E -TR-2006-04

L E E -TR-2006-04

YaMLC++ Yet another Machine Learning C++ Jianguo Lee

Department of Automation Tsinghua University

YaMLC++ Yet another Machine Learning C++ 21st August 2006

c °Department of Automation Tsinghua University by Jianguo Lee

L EE-TR-2006-04 ISBN ISBN-nummer Press: Self Published Font: Typeset with LATEX

Abstract Yet another Machine Learning C++ (YaMLC++) is a machine learning toolbox implemented in C++ language. There are some other machine learning toolboxes, for example, two famous known toolboxes: MLC++[3] and Weka[2]. However, MLC++ is too old to reflect the fast progress in machine learning field; and Weka is written in Java which is running on virtual machines, hence it is not suitable for large scale applications. YaMLC++ could be viewed as an successor of MLC++, and a C++ competitor of Weka. YaMLC++ currently contains about 50 mainstream algorithms in modern machine learning domain, which although is fewer than Weka, while much more than older MLC++. Keywords: Machine Learning; toolbox

Chapter 1 YaMLC++ FAQ (1) Where can I download YaMLC++? Please check the following website: http://leeplus.googlepages.com/ http://learn.tsinghua.edu.cn:8080/2001315524/code.html. (2) Is YaMLC++ free? Can I use it for any purpose? Yes, YaMLC++ is a freeware. You can use it for any purpose, please refer to Chapter 3 the license fore details. In near future, I will possible let the software (partially)open source. When you adopt our software or SDK in your research and publish papers, it is better to acknowledge us in your papers, and if possible, please cite the following papers. Jianguo Lee, Several issues in manifold based pattern classification, [PhD Thesis], Department of Automation, Tsinghua University, April, 2006. (3) Which platform is supported by the software? Currently, YaMLC++ can only run under Microsoft windows platform. We have tested running it on Windows 2000, Windows 2003, and Windows XP. However, the core algorithm module is written platform independently in C++. We also plan to release out the library SDK for the using of YaMLC++ in other platform such as Linux recently. (4) Which is the main data format used in YaMLC++? YaMLC++’s main-format is MATLAB Mat-Format Version 5 (V5, main format by Matlab V5.0∼V6.5). Given a dataset X = {(xi , yi ), xi ∈ Rd }N i=1 , where d is the feature dimension and N is the number of instance. You should save the data in two MATLAB arrays, i.e, “dat”, “id”, in which “dat” is a N × d double precision array storing the instance features, and “id” is a

1

CHAPTER 1. YAMLC++ FAQ

2

N × 1 or 1 × N double precision array storing the label of the corresponding data. (5) How to save V5 format Mat-file in Matlab version larger than 6.5? You may use "save dat -V5" in Matlab 7.x to save the data set in V5 Matformat. (6) What other data formats supported in YaMLC++? YaMLC++ also supports the very common CSV (comma separated text file) format data sets. Another important format supported by YaMLC++ is the Weka arff format. You may import these two kinds of format data set using menu [File|Import] in the system. Note that for the Weka arff format, we currently do not support read in instance with missing values. You may also import and XML format data sets. (7) How to use YaMLC++ software? YaMLC++ has a graphic interface, which is very easy to use. After executing the software, you may do experiments as follows: (a) Load one data set into the workspace; (b) Select one algorithm in the left pane and click the algorithm to popup the parameter tuning window; (c) Define evaluation criterion, i.e., cross-validation or others; (d) Select normalization method; (e) Define algorithm parameters (some do not have this window); (f) Click to ’OK’ to run the algorithm. (8) What will YaMLC++ present in results? YaMLC++ will usually present the performance (correct rate) in each round and the average performance. (9) How to view the data set? There are two ways to visualize the data sets. (a) Using the menu [Data|View in Cell] to show the whole data set in a cell; (b) Using the menu [Graph|2D-Plot] to show multi-dimensional scaling (MDS) of the data set. (10) What algorithms are supported in YaMLC++? In current version of YaMLC++, it support the following algorithms: – Supervised learning algorithms, such as support vector machines, neural networks, discriminant analysis (linear and logistic discriminant

3 models), naive and full Bayesian classifiers, limited dependence Bayesisan network classifiers, decision tree, nearest neighbor and several manifold based nearest neighbor methods, etc; – Ensemble learning algorithms: Bagging, Boosting, random forest, Bayesian random forest, etc; – Subspace Analysis methods: PCA, Fisher LDA, Kernel PCA, LogFace, Bayesian Face, etc; – Clustering algorithms: k-means, Gaussian mixture models, spectral clustering; – Preprocessing: discretization, RELIEF series feature selection algorithms, sequential forward/backward feature selection, fast correlation based filtering – Visualization: multi-dimensional scaling (MDS); (11) I can not find answers from this FAQ, How can I do? Please contact me by email: [email protected], or find to read some part in my thesis[1].

Chapter 2 SDK FAQ 2.1 General FAQ (1) Which platform is supported by the SDK? Currently, all SDKs can only be compiled by Visual C++ .net 2003 under Microsoft Windows platform. (2) How to load data set? All SDKs can directly load/save MATLAB Mat-Format Version 5 (V5, main format for Matlab V5.0∼V6.5). It is the same as FAQ item (4) in Chapter 1. Please refer to the header file “CVM_xMatIO.h" for the load/save Mat-File functions, and refer to some examples for how to use the functions. You may add other format according to your own needs. (3) I can not find answers from this FAQ, How can I do? Please contact me by email: [email protected], or find to read some part in my thesis[1].

2.2 LibCART using FAQ (1) What does this SDK include? LibCART is a software development kit (SDK) for CART decision tree and its ensembling extensions. It includes C/C++ interface for – CART decision tree; – Baggining; – AdaBoost; – Random Forest.

4

2.3. FEATURE SELECTION SDK USING FAQ

5

(2) Is LibCART SDK free? Can I use it for any purpose? Yes, the LibCART SDK is free. You can use it for any purpose, please refer to Chapter 3 the license fore details. When you adopt our software or SDK in your research and publish papers, it is better to acknowledge us in your papers, and if possible, please cite the following papers. Jianguo Lee, Changshui Zhang, Classification of gene-expression data: the manifold based metric learning way, Pattern Recognition, 39:2450–2463, 2006. (4) What are the SDK files for?

Header libcart.h xMatIO.h cvm.h & blas.h

Release LibCART(.lib,.dll) xMatIO(.lib,.dll) cvm(.lib, .dll)

Debug LibCARTd(.lib,.dll) xMatIOd(.lib,.dll) cvmd(.lib, .dll)

Memo CART core API mat-file load/save IO a matrix library

(5) Are there any examples of how to use the SDK? Please refer to “sdkTest.cpp" for some examples of how to use LibCART.

2.3 Feature Selection SDK using FAQ (1) What does this SDK include? The feature selection SDK includes C/C++ interface for: – Sequential search based feature selection with different evaluation criterion such as wrappers, infogain: ∗ Sequential Forward Search; ∗ Sequential Backward Search; ∗ Sequential Floating Forward Search; ∗ ... – Ranking based feature selection algorithms: RELIEF-X series, InfoGain based ranking, ... – Correlation based feature selection algorithms: ∗ Correlation based Filtering (CFS); ∗ Fast Correlation based Filtering (FCBF). – Consistency based approaches? TODO

CHAPTER 2. SDK FAQ

6

(2) Is this SDK free? Can I use it for any purpose? Yes, the feature selection SDK is free. You can use it for any purpose, please refer to Chapter 3 the license fore details. When you adopt our software or SDK in your research and publish papers, it is better to acknowledge us in your papers, and if possible, please cite the following papers. Jianguo Lee, Changshui Zhang, Classification of gene-expression data: the manifold based metric learning way, Pattern Recognition, 39:2450–2463, 2006. (4) What are the SDK files for? Header libcart.h xMatIO.h cvm.h & blas.h

Release Libcart.lib,.dll) xMatIO(.lib,.dll) cvm(.lib, .dll)

Debug Libcartd(.lib,.dll) xMatIOd(.lib,.dll) cvmd(.lib, .dll)

Memo tree & feature selection API mat-file load/save IO a matrix library

(5) Are there any examples of how to use the SDK? Please refer to “sdkTest.cpp" for some examples of how to use this SDK.

2.4 Bayesian network classifiers SDK using FAQ The Bayesian network Classifiers SDK will involve algorithms like Naive Bayesian, TAN, Super-parent BNs, limited dependence BNs, Boosted BNs, and my Generalized Additive Bayesian Network Classifers. For more detail information, please refer to my paper Jianguo Li, Changshui Zhang, Tao Wang, Yimin Zhang, Generalized Additive Bayesian Network Classifiers, To appear in IJCAI, 2007, India. This SDK will release soon. Please be patient to wait.

Chapter 3 License This software is being distributed under the following BSD-type license: 1. Permission to use or copy this software for any purpose is hereby granted without fee, provided that the above copyright notice, this list of conditions and the following disclaimer are retained on all copies. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the authors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

7

Bibliography [1] Jianguo Lee, Several issues in manifold based pattern classification, [PhD Thesis], Department of Automation, Tsinghua University, April, 2006 [2] Witten I and Frank E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, 2000 [3] Kohavi R, Sommerfield D, and Dougherty J. Data Mining Using MLC++: A Machine Learning Library in C++. International Journal on Artificial Intelligence Tools, 1997, 6(4):537-566

8

FAQ

Aug 21, 2006 - chine learning field; and Weka is written in Java which is running on vir- tual machines, hence it ... (1) Where can I download YaMLC++? ... When you adopt our software or SDK in your research and publish papers, it is better ...

257KB Sizes 1 Downloads 301 Views

Recommend Documents

FAQ UseeTV.pdf
... iOS (Apple) dan. android. Untuk OS Mobile lainnya (Symbian, Blackberry, dll) masih dalam proses. development. Website UseeTV adalah www.useetv.com.

FAQ v3 Services
... please refer to the. Local Guides private · community guidelines. DON'T. • Participate regularly by chiming in on discussions, +1ing posts, and encouraging people to meet offline. • Moderate the community regularly for spam posts as when newc

FAQ Services
Jul 9, 2007 - A. Postini is a company that provides on-demand solutions that help protect businesses worldwide from ... many businesses use legacy systems not because they are the best for their users, but because they .... Q. Will data from Postini'

FAQ - Services
Apr 13, 2007 - A. We will always explore ways to better serve our customers, publishers, and users, online and offline. DoubleClick customers will now have a greater opportunity to participate in Google's offline initiatives. Q. What percentage of Do

FAQ - Jurrivh.com.pdf
websites. If you have any questions or would like to know more, feel free to email me to. [email protected]. Page 3 of 3. FAQ - Jurrivh.com.pdf. FAQ - Jurrivh.com.

FAQ-Azores.pdf
may charge a foreign transaction fee. We accept VISA, MASTERCARD or AMERICAN EXPRESS. Please note: We cannot provide cash advances on credit ...

FAQ-Azores.pdf
... travel, medical and dive accident insurance and your agent, shop. or group leader may well have recommendations. Atlantis does not promote or recommend any. particular insurance company but you may find the following links useful: Diver's Alert N

FAQ OPAFS.pdf
Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. FAQ OPAFS.pdf. FAQ OPAFS.pdf. Open. Extra

FAQ PODS.pdf
Page 1 of 5. CPS-MT Lugano FAQ. FAQ PODS - 2015-09-30.docx 1/5. Podologi (PODS). FAQ – Risposte alle domande frequenti. 1 Chi è il podologo? 2 Quale è ...

FAQ-Resorts.pdf
You will need passport and dive certifications to hand to complete the process. Visas & Travel Documents: Please ensure your passport is valid for at least six ...

FAQ SEPTIC.pdf
Page 1 of 2. FREQUENTLY ASKED QUESTIONS. What about ongoing maintenance of my septic system? If you have an aerobic, secondary treatment septic system, you will need an ongoing maintenance. contract throughout the life of the system with a Town of Ba

FAQ Sheet18.pdf
signifies that you have successfully completed two years of a career/technical program with at least a 90% attendance rate. The certificate of completion ...

FAQ Avian Flu.pdf
CDC Avian Influenza website at. http://www.cdc.gov/flu/avianflu/novel-av-treatment-guidance.htm. 9. I am traveling to an area where an avian influenza virus has ...

FAQ-Tasmanian-Voice.pdf
It is not enough to just tell judges that something is bad. You have to break. down that problem and tell us how this problem could be fixed. More information about the judging criteria can be found here. When will registrations close? Registrations

FAQ 072016.pdf
the belief that nutrition-rich whole foods can transform hope into health. Now, Kate is thriving against all odds. FREE OF TOP 8 ... American Pregnancy Association: americanpregnancy.org. 4. Joslin Diabetes Center: www.joslin.org. 07/2016. Page 2 of

FAQ-Tasmanian-Voice.pdf
Page. 1. /. 1. Loading… Page 1. FAQ-Tasmanian-Voice.pdf. FAQ-Tasmanian-Voice.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying FAQ-Tasmanian-Voice.pdf. Page 1 of 1.

FAQ Avian Flu.pdf
Tax Consultant (CES. Safety Inspector (RCS). Tax Consultant (CES. Whoops! There was a problem loading this page. Retrying... Whoops! There was a problem loading this page. Retrying... FAQ Avian Flu.pdf. FAQ Avian Flu.pdf. Open. Extract. Open with. Si

FAQ on busing.pdf
Children in grades K-6 who live two miles. or less from the school that they attend, and all students grades 7 – 12 will have to pay a fee to ride the. school bus to and from school. School Choice students CAN NOT ride the bus. 2. If I DO NOT have

FAQ- POSB- Feb.pdf
From where will the data be taken for migration into. the CBS environment? When will the actual data migration happen, in weekdays or over the. weekends?

RankOne FAQ MS.pdf
information such as home phone number, cell phone numbers, email address,. name of primary physician, and other pertinent information. It is very important.

1.FAQ Planning Guide.pdf
Why should I host a science teach-in? A science teach-in will provide a forum for members of your campus and/or community to. engage in a conversation about ...

FAQ on School Closing.pdf
Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... FAQ on School Closing.pdf. FAQ on School Closing.pdf.

Child-Dedication-FAQ-11.18.15.pdf
Jerusalem to present him to the Lord (Luke 2:22). Hannah ... Since God is both creator and sustainer of each child's life (Psalm 127:3; Psalm. 139), it is ... We encourage you to invite as many family members and friends as you like to. celebrate wit

2017-18 FAQ for Substitutes.pdf
... if known ahead of time, or by 11 a.m. as staff absences are reported. for the day. If you are not able to be reached, a message will not be left due to time. restraints to fill a position. Please do not attempt to return a missed call. Page 3 of