Chuyên đềKhai phá dữ liệu trong sql server 2012

BO GAO DUC DAO TAD TRUONG DAI HQC THANG LONG --o0o-- CHUYEN DE TOT NGHIEP KHAI PHA DU' LIEU TRONG SQL SERVER 2012 thing vien huOng den : Trait Quang Duy Sinh vien unit hien : Doan Minh C6ng A11278 Nguyen Mk Hoang A11500 Chuyen nginh : C8ng nett thong tin HA NOI-2014 Lot MO DAU Srv phat then cua cong nghe thong tin va viec img dung tong nghe thong tin trong nhieu linh Arc ctia dbi song, kinh tee, xft hoi trong nhieu nim qua cling ding nghia veri lu

pdf89 trang | Chia sẻ: huong20 | Ngày: 07/01/2022 | Lượt xem: 515 | Lượt tải: 0download
Tóm tắt tài liệu Chuyên đềKhai phá dữ liệu trong sql server 2012, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
ucmg de lieu dl duqc the co quan thu thip va lint frit ngay mot tich lily nhieu len. H9 luu t± cac de lieu nay vi cho ring no An chfra nhung gia trj nho nhat nao do. Tuy nhien, theo thOng ke tin chi mot lacing nho cira nheng de lieu nay (khoing tir 5% den 10%) la luon duqc phan tich, so con lui h9 khong biet phai lam gi hoic co the lam gi veri chting nhung h9 van tiep mc thu thip rat ton kern viii y nghia lo sq rang co cai gi de quan trcong bj be qua sau nay Inc can den no. Mit khac, trong mOi throng canli tranh, ngu&i to ngay cang can c6 nhieu thong tin veri tic dO nhanh try glop viec ra quyOt djnh vi ngay cang nhieu cau hoi mang tinh chit djnh firth can phai tra lei dua tr'en mot khOi lacing de lieu khOng 16 dii c6. Viii nheng It do nhtr vay, cac phuong phap quan trj va khai thac ca ser de lieu truyin thong nwly cing khong dap img duqc thuc to di lam phat trier mot khuynh huemg ky thuat mOi de la ky thuat phat hien tri thirc va khai thic de lieu (KDD — Knowlefge Discovery and Data Mining) icy thuit kham pha tri thfrc va khai pha de lieu da va dang duqc nghien ciru, img dung trong nhieu rinh Arc khac nhau 6 cac ntrerc ten the gieri, tai Viet Nam ky thuot nay tuong dOi con mai me toy nhien cling dang duqc nghien thuva din dua vao ling dung. Buerc quan trong nhat ctia qua tranh nay la Khai phi de lieu (Data Mining), giirp ngueri sir dung thu thip duqc nhung tri thirc heu ich tir nhung ca ser de lieu hoic cac nguOn de lieu khOng to khac. Rat nhieu doanh nghiep Ara to chirc tre'n the giai da img dung ky thuilt khai pha de lieu vao hoot dOng kinh doanh ctia minh va di thu duqc nheng lqi ich to Ion. Vi nhung IY do nhu viy nen chting em di ch9n de taithai pha du lieu va img dung SQL Server 2012"v6i mong mu6n tim hieu cac phuong phap, cac me) hinh, ky thuat khai phi de lieu. Dieu nay khong chi c6 tat dung 6 tat gee do nghien cuu IY thuyet ma con img dung thuc to din tren mot me hinh va kiim chimg tinh xac thuc ma ky thuat khai phi de lieu dem lid. Tir nhung kien thirc ca ban, dan sang tim hieu cac van de phirc tap lien quan den cac thuat Win khai phi du lieu. Tuy chi la nhting mirc tim hieu ca ban, don &An nhung cling it nhieu de cap duqc den cac van de can ton tai va kha ning cita img dung khai pha de lieu, dic biet la trong img dung he quan trj CSDL SQL Server 2012. NOi dung bio ciao chuyen de tot nghiep bao gem: Lori my diu Danh !nye tir vier tit Chuang 1. Tong quan ye khai phi de lieu Chuang 2: Cie tic vu trong khai phi (M. lieu Chuang 3: Khai phi der lieu trong SQL Server 2012 Chuang 4: Ling dung khai phi de lieu trong SQL 2012 Ket luin TM lieu tham khio BANG it IOU VA CHU VIET TAT KY hieu viet tit Nghia tieng anh Nghia tiang viet DM Data Mining Khai pha dU lieu BI Business Intelligence Tri tue doanh nghiep CSDL/DB Database Ca so dft lieu OLAP Online Analytical Processing Xir ly, Oen tich der lieu ttvc tuyen KDD Knowledge discovery in databases Kham pha tri thtic trong cac at sa der lieu SSIS SQL Server Integration Services Cac djch At tich hop ten SQL Server ht3 trq khai pha de lieu ERP Enterprise Resource Planning Quin lY nguOn loc va tai nguyen ctia doanh nghiep ODBC Open Database Connectivity Ket not ca ser du lieu ma MVC LUC CH •CING 1. TONG QUAN VE KHAI PHA DIY LIEU 1 1.1. Khai niem ve khai pha 80 lieu 1 1.1.1. Giei thieu ye khai pha der lieu 1 1.1.2. Dinh nghia ve khai pha der lieu 1 1.2. Cac buoy trong khai pha 80 lieu 2 1.2.1. Cac ki thuat khai pha 80 lieu 2 1.2.2. Luting 80 lieu 3 1.2.3. yang dbi caa mOt du an khai pha der lieu 5 1.2.4. Chuan khai phi dii lieu 7 1.3. Cac huang tiep can den van de khai pha der lieu 8 1.3.1. Kien irk caa mOt he thOng khai phi der lieu 8 1.3.2. Cac chirc rang chinh cua khai pha 80 lieu 10 1.3.3. Cac dung de lieu do the khai pha 11 1.3.4. Nhang van de kho khan trong khai phi der lieu 12 1.4. Xu huemg nghien cuu va viec *fig dung cua khai pha der lieu hien nay 14 1.4.1. Huang nghien ciru 14 1.4.2. (Trig dung coa khai phi der lieu trong thuc to 14 1.4.3. Ung dung cua khai phi der lieu trong viec giii guy& cac nhom bai toga kink doanh 15 CHUCING 2. CAC Kt THU3T KHAI PHA usu 16 2.1. Phan lop da lieu 16 2.1.1. M8 hinh phin lap cay guy& dinh 16 2.1.2. M8 hinh phin lop chit lieu Bayer 18 2.2. Phan gun 80 lieu 20 2.3. Hai quy 22 2.4. Luat ket hap 23 2.5. Du bio 25 2.6. T6'ng hqp hem (Summarization) 26 2.7. M8 hinh h6a sv phv thuec (dependency modeling) 26 2.8. Phat hien stir Bien d6i va de Itch (Change and deviation detection) 27 CHUIZING 3. KHAI PHA Dir LItU TRONG SQL SERVER 2012 28 3.1. MO Willi OLE DB trong SQL Sever 28 3.1.1. Gidi thieu 28 3.1.2. Cac khai niem co ban trong OLE DB cho Data Mining 30 3.1.3. Data Mining Extensions to SQL (DMX) 31 3.2. Cac thuat toan khai phi der lieu trong SQL Server 2012 34 3.2.1. Microsoft Decion Trees 35 3.2.2. Microsoft Clustering 35 3.2.3. Microsoft Naive Bayes 36 3.2.4. Microsoft Sequence Clustering 36 3.2.5. Microsoft Time Series 36 3.2.6. Microsoft Association Rules 37 3.2.7. Microsoft Neural Network 38 3.2.8. Microsoft Linear Regression 38 3.2.9. Microsoft Logistic Regression 38 3.3. Nguyen tic chqn dm* toan 38 CHITONG 4. VNG DVNG KHAI PHA DC. LIEU SQL SERVER 2012 41 4.1. GiOi thieu ve Business Intelligence Development Studio 41 4.2. ling dvng trong SQL 42 4.2.1. Sir dung thuat than Microsoft Decision Tree va Microsoft Naive Bayes 42 4.2.2. Su dying thujt toan Microsoft Association Rule 63 !CET LU*N 81 TAI LI$U THAM KHAO 81 TONG QUAN YE KHAI PHA DIY LIEU CHUCFNG 1. TONG QUAN VE KHAI PHA Dir LIEU 1.1. Khii niem va khai phi d* lieu 1.1.1. GM thifu vi Mai plui chi Wu Trong nhcmg am gin day, su phat then mph me ciut CNT'T va nganh ding nghiep phis cimg da lam cho kha ning thu nhap va Itru fru thong tin ciia cac thimg thong tin tang nhanh met cach cheng mat. Ben conh do viec tin hoc hea met each 6 at va nhanh chiong cac hoot dOng san xuat, kinh doanh cling nhu nhieu lInh Arc hog dOng khk di tio ra cho chimg to met lucmg de lieu luu tray Ichting 16. Hang trieu CSDL da dugc sir dung trong cac host dong san xuat, kinh doanh, wan trong do co nhieu CSDL cac len cot Gigabyte, thorn chi la Terabye. So bang nay din tin ye'u cau cap thiet la can co nhung k9 thuit va ding cu mei de to Ong chuyen doi Wong de lieu khang to Ida thanh the tri thirc co ich. Tir do, cac Id thuili khai pha de lieu di fro thanh met linh we then so dm nen cting nghe thong tin the giei hien nay. 1.1.2. Dinh nghia vi khai pith dfr lifu Phat hien tri thirc (Knowledge Discovery) trong cac co se du lieu la met qui trinh nhan biet the miu ho4c the mo Mob trong de lieu voi cac tinh fling: hqp thee mei, kha ich, va c6 the hiau duqc. Con khai thic de lieu (data mining) la men nge tuong del mei, no ra din vao khoang nhfrng nam cu&. cua dun thap 4 1980. C6 nit nhieu djnh nghia khac nhau ve khai phi de lieu. Giao su Tom Mitchell da dua ra djnh nghia cita khai pha de lieu nhu sau:" Khai phi de lieu la viec sir dung da lieu lich sir de kham phi nheng qui tic va cai thien nhcmg quyet djnh tong tuong lai". Veri met each ti6'p c4r1 ling dung han, tien si Fayyad da phat bleu:" Khai phi da lieu durang duqc xem la viec kham phi tri thirc trong cac co se de lieu, la meat qua trinh trich xuat nheng thong tin in, trues day chua hi& va co kha fling heu ich, duel ding cac quy luat, rang bu0c, qui tic trong co se du lieu". Con cac nha thong ke thi xem" khai phi da lieu nhu la min qua trinh phan tich dugc thiet ke tham do mitt luong coc len cac der lieu nhim phat hien ra cac miu thich hqp vil hok cac mOi quan he mang tinh he thing gifts cac hien va sau de se hqp thirc hoi cac ket qua rim duqc bing each ap dung the miu da phat hien duqc cho tip con mei cita de lieu". Trang 1190 A11278 — Doan Thanh Cong A11500 — Nguyin Dec Hoing TONG QUAN VE KHAI PHA DIY LI$U N6i tom lai: khai pha 80 lieu la met buoy trong quy trinh phat hien tri thirc gom co cac that town khai thic du lieu chuyen dimg dtrOi met se quy djnh ve hieu qua tinh town chap nhan duqc di tim ra cac mitt hoac cac me hinh trong dO lieu. 1.2. Cic bulk trong khai phi do Ilea 1.2.1. Clic ki thuOt khan ph6 drr lifu M3c du khai thic dfr lieu nhu lit met thuat nge tuong del mai, nhung hau bet cac ky thuat khai thic du lieu da ten tai tong nhieu nim. Ma tier than cita khai thic dur lieu deu xuat phat tir: thong ke, hoc may ya co so a lieu. Mot so thOt town khai thic d0 lieu, bao gOm ca hOi quy, chugi that wan, va cay quyet djnh deu duqc phat minh boi cac nhi thOng ke hqc. Ky thuorhei quy" CIA ton tai trong nhieu the kY. Cac thuat toan"chuOi than gian" di duqc nghien ciru trong nhieu thap ky. Thuat town thy quyet djnh la met trong nhieu k9 thuat gin day, co nien dai tir gifta nhUng nam 1980. Khan thic d0 lieu tap trong yao phat hien to (king ho#c ban qr ()Ong matt. Met di thuat town hoc may(machine learning) duqc lip dtmg cho khai thic dti lieu: a. Mang noron (Neural networks) Day la mot trong nhftng icy thuat khai pha du lieu dirge ling dung ph6 bien Men nay. K9 thuat nay phat trien dva ten' met nen tang town hqc vtIng yang, kha nang h tan' ' luyen trong ky thuat nay (lira tren mil hinh than kinh trong trong cita con ngu&i. Kat qua ma mpg naron hqc duqc c6 kha nang tao ra cac mo hinh dv bio, dv doin yeti de chinh xitc yi dO tin cay cao. NO co kha nang phat hien ra duqc cac xu bluing phirc tap ma k9 thuat thong thubng Ichic kh6 c6 the phat hien ra duqc. Tuy nhien phuong phip tnang no ron rat phirc tap yi qua trinh tien Minh no g#p rat nhieu kh6 khan: doi hoi mat nhieu thai gian, nhieu 80 lieu, nhieu Ian lciem tra thir nghiem. b. Giii thuat di truyen Li qui trinh m8 phong theo tier hoi cua tSr nhien. Y Wang chinh cua giai thuat 11 dva vim quy luat di truyen trong bien dOi, chip Ice tv nhien yi tiers boa trong sinh hoc. Viec xay dvng cac thuat town di truyen me phong sinh hoc nhim tim ra cac giii phip tot What bao gem cac btreic sau: - Tao ra ca the ma di truyen dual long cac xau cita met bang ma lct tv han che. - Thiet lap mei tnrang nhan tao trorTh may tinh co cac giii phip co the tham gia"dau tranh sinh tO'n"veri nhau de zit djnh dO do thanh cong hay that hay con goi thich nghi". Trang 2190 A11278 — Doan Thanh Gong A11500 — Nguygn Thic Holing TONG QUAN VE KHAI PHA DIY LISU - Phat trien cac"phep lai ghep" de the gild phip ket hqp vei nhau. Khi do cac rcau mi di truyen cua giii phip cha va mg bi cat di vi xep lai, trong qua trinh sinh sin nhu vay cac kieu dOt bien co the duqc ap dung. - Cung cap mot (lull the cac giii phip ban diu tucmg d6i da long vi a may tinh thqc hien"cu(ic chai tien hem" bing each loci be cac gal phip tir min ca the va thay the chung bing cac con chin hoac cac dOt hien cua cac giai phip bk. Thu* wan se ket thitc khi mot h9 cac giiti phip thinh citing duqc sinh ra. Khai phi de lieu (KPDL) la viec frith chcm d.3c trtmg MI lieu mot each ty doting tir mot Si dii lieu 16n. Tri thin do thtrimg o cac ding maw c6 tinh chat khong tam thuong, An (khong twang minh) nhung 13i co the mang 13i ich lqi lam neu no duce sir clung dung chi). Co the coi KPDL 11 cot lai cfut qua trinh phat hien tri thac trong co so dii lieu (Knowledge Discovery in Databases — KDD) 1.2.2. Luling di lifu Khai thic der lieu la mot trong nhUng thanh vien quan trong trong data warehouse family. Trutmg hqp khai thic dft lieu nio la phu hqp veri dien kien ctla cac luOng der lieu trong mot kith bin kinh doanh dien hinh? Hinh sau minh h9a mot luting dir lieu doanh nghiep dien hinh ma khai that der lieu co the duqc ap dung trong cac giai down Ichic nhau. Si Data Mining Application -4 ill P- O ♦ 4 • Online Onlbe transaction ••■ Analytical Processing Processing (OLTP) Hinh 1: M6 hinh khai phti du lieu doanh nghiep Trang 3190 A11278 — Doan Thanh Cling A11500 — Nguyln Dire Hoing TONG QUAN VE KHAI PHA Dir Met ung dung kinh doanh luu till the dt1 lieu giao Bich trong met ca so &I lieu bb 15, giao djch true tuyan (online transaction processing- OLTP). Cie clit lieu OLTP duqc chiet xuat, chuyin doi va nap vio data warehouse met each thuong xuyen. Luqc itO Gila data warehouse thuimg khic nhau tir met luqc 46 OLTP. Met lucre d6 data warehouse dk tnrng cob hinh ding du met ngoi sao hay met bong tuyet.V6i bang giao djch o chinh gifta luqc 46 va dtrqc bao quash bei met be dimension tables(cic bang kich thubc). Tnnk lien, vi ph6 hien nhit, khai that dO lieu co the duqc by dung cho cac kho dO lieu nth ma dft lieu di duct lim mtch. Cac miu duqc phat hien bed cic mo hinh khai thic c6 the duqc trinh bay cho cite nhit quan lt tiep chi thong qua the bio cao. Khai thic dft lieu co the c6 met lien ket true tiep den cic ling dung kinh doanh, ph6 bien nhit la thong qua cac du doin. Nh(mg khai thic dft lieu vio ling dung kinh doanh dang ngay met phO bien han. Vi du: Trong met kich bin bin hang qua Web, met khi met khach hang dit met sin vio trong gio hang, met du bao troy van khai thic der lieu duqc thuc hien de c6 duqc mot danh sich cic sin phAm duqc de nghj dua tren phin tich. Khai thic du lieu cling co the duqc cip dung de pit tich kh6i OLAP, la met cc so du lieu da chieu ved nhieu kich thubc vi don vi do. Kich thy& c6 the len den hang trieu bin ghi do d6 se kho khAn cho vier tim ra mo hinh quan tan. Ky thubt khai thic dO lieu c6 the duqc ap dung de kham phi ra cac mo hinh an trong met khoi OLAP. Vi du: Met thulit than lien ket co the duqc bp dung cho mot Ich6i ban hang, phin tich mau mua ctia khich hing cho met vimg cà the va then gian. Chling to c6 the ip dung ky thubtIchai thic dO lieu de du bao cac bien phip nhu ban hing vi lqi nhubn. Trong 4190 A11278 — Doan Thanh tong A11500 — Nguyin Due Hoing TONG QUAN VE KHAI PHA D() LISU 1.2.3. Yong did min m#t dv tin Heal phd dit Eavaluboo of Data Hag Transfortution Clean-mg Praprocetsr4 40! II laiKtnittir Selection Preto-ton I — . Doti' rin, Gathering Alli I erarafra"Dlia qp- Cleansed Preprocessed 4r Target 14Warted Data Data Hinh 2: Yong doff aia men dv an khai phti du lieu. a. Gom du lieu (gathering) va Trich lqc du lieu (selection) Gom du lieu: Tap hqp du lieu la boat dau tien trong khai phi du lieu. Busk nay lay du lieu tir trong mOt co so de lieu, mOt kho dft lieu, them chi di' lieu tir nhimg nguon cung Ong web. Trich lqc du lieu: O giai down nay du lieu duqc lira chon va phfin chic theo mOt se lieu chuan nao d6. c. Lam sach va tiers xir 15r der lieu (cleansing prepocessing) Lam sach de' lieu: Day la qua trinh xir ly a ga be hoac lam giam nhieu vi each xir 15, cac gia tri khuyet. Burk lam giarn su mop mer khi hqc. Phan tich stir thich hqp: Nhieu thuOc tinh trong du lieu co the khOng thich hqp hay khong can thiet de phan loai. Vi vay phop phan tich sar thich hqp duqc the hien teen der lieu veri muc dich ger be bat lck nhung thuOc tinh khong thich hqp hay khong can thiet. Trong hqc may bait nay duqc gqi la trich hoc dac tnrng. Phip phan tich nay giup phan loci hieu qua va nfing cao kha rang ma rung. Trong 5190 A11278 — Doan Thinh COng A11500 — Nguyen Dirc Hoang TONG QUAN VE KHAI PHA Dti Giai doan nay la giai don hay bj sao ling, nhtmg thuc 4 no la med buck rat quan trqng trong qua trinh khai phi de lieu. M6t s6 16i thubng mac phai trong khi gom de lieu la de lieu khong day du hok khong thong nhat, thieu chit chi. Vi 4y du lieu thubng chfra cic gia trj vo nghia va kh8ng co kha ning kit not du lieu, vi di; Sinh vien co tuai=200. Giai doan nay nh&m xir ly cac de lieu nhu tren (de lieu vo nghia, de lieu khong co kha fling kit nai). Nheng de lieu ding nay thubng duce xem la thOng tin du thin, khong c6 gia tq. Bed viy day li mOt qua trinh rat quan tong. Neu de lieu khong duqc lam such - tiers xi: ly - chuan bj threw thi se gay nen nheng kit qui sai tech nghiem tang ve sau. d. Chuyen d6i de lieu (tranformation) Trong giai doan nay, de lieu co the duqc to chile va sir dung lai. Muc dich ctia viec chuy'en dal de lieu li lam cho de lieu phit hqp han veri muc dich khai phi de lieu. De lieu co the duqc tong quit him teri cac mirc khai niem cao han. Dieu nay rat him ich cho cac thuk tinh co gia tr1 lien tuc. Vi du, cac gia trj so cua thuk tinh thu nhip duce tang quit hoa sang cac pham vi rai rac nhu thap, twig binh va cao. Tuang Ur, cac thutjc tinh gii trj nhu dtrimg ph6 dirge tong quit hoa ten khai niem cao han nhu thinh ph6. Nher do cac thao tic vio/ra trong qtth tint' xir li se it di. De lieu co the duqc tieu chuan h6a, &lc biet khi the mpg na-ron hay cac phuong phap dung phep do khoing each trong cac buck xir H. Tieu chuan hoa bien dot theo ty le tat ca cac gia trj cita mOt thuk tinh cho truck de chfmg rai vao pham vi chi djnh nhu [-1,0;1,0] hay [0;1,0]. Tuy nhien dieu nay can cher cac thutjc tinh co pham vi ban &anion (nhu thu nhip) co nhieu inh huerng dal veri cac thuk tinh c6 pham vi the) han ban dau (nhu cac thuk tinh nhj phin). e. Phut hien va trich mau de lieu (pattern extraction and discovery) Day la butc to duy trong khai phi de trong giai doan nay nhieu thuit toan khac nhau di duqc sir dung de trich ra cac man tir dft lieu. Thuit town thubng dimg de trich man de lieu li thuit town phan loci dir lieu, kit hqp MI lieu, thuit town ma hinh hoa de lieu min ur. Li mOt trong cac buerc quan IA:mg nhat vi tan thin gian What cita qua trinh KDD, trong d6 sir dung nheng phuang phip thong minh de chat Ice ra nhimg nth dt1 lieu. Chu yeu la cac k9 thujt ciut machine learning (hoc may) de khai phi, trich chon nheng mau (patterns), cac rang bu6c lien he (realionships) biet trong dit lieu Trang 6190 A11278 — Doan Thanh Cong A11500 — NguyIn Dim Hoing TONG QUAN VE KIIAI PHA Din Lieu C6 the cac mo hinh khong china cac mau c6 the sir dung. Co the la dft lieu hoin than ngiu nhien hoc dft lieu c6 qua nhieu thong tin gay nhieu. Dieu nay you cau can phai lap lai cac buoy lim sach vi chuyin doi dft lieu de chit lqc ra cac dft lieu c6 nghia ham. Day la met qua trinh lap lai vi tot dill len de dtra ra cac th6ng tin phi' hqp, coy nghia yeti ngtrai quan trf f. Dinh gia ket qua ink vi bleu dien tri thirc (evaluation of result and Knowledge presentation) Day la giai doan curfoi sung trong qui trinh khai pha &I lieu, a giai doan nay cac matt dft lieu duqc chiet xuat ra bai phan mem khai phi du lieu. KhOng phai man der lieu nao cling hftu ich, d8i khi no can bi sai tech. Vi vay can phai dua ra nhiing lieu chuir' danh gia do uu tien cho cac mttu der lieu de rut ra duqc nhemg tri link can thiet. Bieu dien tri thfrc: sir dung cac kgr thuit de bien dien vi the hien tivc quan cho nguiri dung. Cac citch bieu dien nen a clang gait gui vi de hieu vai ngtriri dung nhtr clang dri thj, cay,... de dua ra cac bio cao gulp ngtreri quan tri co the dua ra cac quyet djnh mang tinh chat quan tong. 1.2.4. Chain khai phsi din life SAS: la nhit cling cap san pham khai phi de lieu tau uhit .4 mat thi. phan. Dung dau trong linh vuc thOng ke trong nhieu thop kY. Co sa SAS chira met be rat phong pith cac chile ning thOng ke c6 the duqc sir dung cho tit ca cac loai phan tich din Ho trq khai thic van ban, moi tnrimg di) hqa di xay dung cac mg hinh, co cac thuat toan khai thic dit lieu phi') bier nhu: cay quyit djnh, mang naron, hOi quy... SPSS: gOm cac san pham khai thic &I lieu nhu"SPSS base"vrAnswer Tree. Ke thira gOi khai thic dft lieu Clementine — mot trong nhiing cong ty Mu lien gith thieu cac khii them luOng khai thic dft lieu, cho phip ngtrai ding lam sach dft chuy'en dOi der lieu vi thvc hien cac mo hinh thin nghiem IBM: sin pham khai thic dft lieu la Intelligent Miner a Disc. N6 chira mitt tap hop cac thuat than va cac cling cu tnrc quan. Dun ra nhiing me hinh khai thic du lieu trong Predictive Modeling Markup Language (PMML). PMML la cac file XML chira me to cim cac matt me hinh vi so lieu thong ke cua cac dirt lieu mau vai !nue Bich du bao Microsoft la nha cung cap dft lieu chuyen nghiep dau lien bao gilm cac tinh ring khai thic trong met ca se der lieu quan he. SQL Server 2000 c6 hai thuat toan khai thic dft lieu la: Microsoft Decision Tree vi Microsoft Clustering. Vai cac phien bin Trang 71 90 A11278 — Doan Thinh Cong A11500 — Nguyen Dim Hoang TONG QUAN VE KHAI PHA DIT LI$U tiep theo cua SQL Server la 2005, 2008, 2012 cic tinh rang khai phi co kr chi lieu ngly cang duqc rang cep va sin phew ctia Microsoft ngly cang chiem linh thj truang Oracle: Oracle 9i twit xtremg vao nim 2000, oft met cap thu#t town khai thic du lieu dtra tren association (141 kit hqp) va Naive Hayes. Oracle lOg bao gam nhieu cong cv va thu#t toin khai thic de lieu hon. Oracle cling kit hqp veri Java Data Mining API la gai phin mem cho khai phi der lieu Angoss: chit yeu xay (tong ck th41 toin decision trees, cluster analysis vi cic me hinh du doin cho phep nguiri dung hieu de lieu ctia ho tir nhieu quan diem khic nhau. Cic th4t win duqc ha trq ben cong co troc quan manh me et4 giii thich flitting tri thirc khai phi duqc, n6 ding liun viec tot vai cic lien ich cita he quan trt Microsoft SQL Server KXEN: cung cep mot s6 thuol town khai phi de lieu nhtr: SVM, regression, time series, segmentation...Va cic giai phip khai phi de lieu cho khei OLAP. Ngoai ra, cung cep tien ich Excel add — in de khai phi di lieu trong moi truerng Excel. 13. Cic hiring dip clin den yin a khai phi do lieu 1.3.1. Kiln Ink Su min he thing khai phd Aar Mist Co se du lieu: gam kho de lieu hoc ck cich luu tra thong tin khic (Database, data warehouse, worldwideweb, information repositories). Day la mot hay mot tip cic CSDL, cic kho der lieu, cic trang tinh hay ck dung luu tre thong tin khic.Trong nheng tinh hung co the, thanh phan nay la nguan nh#p (input) dm ck kt thuftt tich hqp va lam such de May chit CSDL hay may chti kho dit lieu (Database or Data warehouse server): may chit nay c6 trich nhiem ley nhemg de lieu thich hqp dtra tren cic you aulchai phi cua ngtroi dung. Trang 8190 A11278 — Doan Thanh C8ng A11500 — Nguyen Dire Hoang TONG QUAN VE KHAI PHA DIY LltU Giao difm ad hos wen Wog Ulm !nog miu May khai phi da, liiu May chi' CSOL hay kho dat lido Lim each yi doh hqp SY lido Co so de, lido Kho canidu Co. so tri thtk (Knowledge base): duqc dung de lureng dan qui trinh tim kiem, danh gia the mau ket qui duqc tim they. Ca sa tri thirc c6 the 11 the phan cap khai niem, niem tin ciia ngutri sir dung, cac ring but* hay the ngtrecng gii tri, sieu &I lieu... May khai pith du lieu (Data mining engine): Thinh phan nay chira cic khai chirc ming thuc hien tac vu khai phi da lieu nhu: die trung h6a, ket hqp, phan lop, phfin cum, phan tich su tien Module danh gib malt (Pattern evaluation): Thinh phin nay c6 the duqc tich hqp vio thinh phan Data mining engine. NO co the dung cac nguOng ve do quan tam de 19c mau da kham phi duqc. Cling co the module danh gia mau duqc tich hqp vio module khai phi, toy theo su cii dit ctia phucmg phip khai phi duqc dung. Giao di en do hpa nguai dung (Graphical user interface): Thinh phin ha trq su Wang tic gift nguai sir dung vi he thing khai phi du lieu. - Nguiti sir dung co the chi djnh cau troy vin hay tic vu khai phi du lieu. - Ngubi sir dung co the duqc cung cap thong tin ha my vies tim kiem, thuc hien khai phi du lieu saw hcm thong qua cic ket qui khai phi trung gian. - Ngtroi sir dung sung co the xem cic Itrqc dO co s6 dit lieu/kho der lieu, cac eau trite chl lieu; dinh gia cic mau khai phi duqc; true quan hea cac mau nay a cic clang khic nhau. Trang 91 90 A11278 — Doan Thanh Gong A11500 — Nguyen Disc Hoing TONG QUAN VE KHAI PHA Dir tau 1.3.2. Clic chic sang chills ciao Mai pho dile lieu Cac chfrc nang nay duqc the hien qua a. Dac trtmg hem va phan biet: Dac flung h6a 11 viec tong ket town b0 the dk diem hay cac tinh chat chung cua mot lop du lieu dich. DO lieu d6 twang Ung veri mot kip do ngtthi dung dac ta bang mot cdu truy van CSDL. DO lieu tra ve ctia qua trinh ddc hung hem co the &the bieu dien bang nhOng khuon ding khic nhau. b. Phan tich sv ket hqp: La kham pha ra cac luat ket hqp trong mot tap lern dO lieu. Cac IWO ket hqp the hien m6i quan he glad cac gia tri thuOc fink ma ta nhan thdy duqc to tan suat xuat hien ding veri nhau. Cac ludt ket hqp duqt kham phi to mot tap lern cac ban ghi giao dich trong kinh doanh vi nhOng luat coy nghia co the gitip cho cac nha doanh nghiep ra quyet dinh. c. Phan lop va dtr down: Phan lop la qua trinh tim mot tap cac m8 hinh (hoac cac clue= nang) m8 ta va phan biet cac lop du lieu. Ck mo hinh nay se duck sir diving cho ink dich dv doin ve lop cua mot s6 d6i twang. Vi'ec xay dvng m8 hinh dva tren sv phan tich cita mitt tap cac dir lieu huan luyen, mitt m8 hinh nhu vay co the duck bleu dien trong nhieu Bang: ludt phan 16p, cay quyet dinh hay mpg naron... De phan lop vi dv doin co the thvc hien tunic mot sv phan tich thich hqp. Sr phan tich d6 nhitm xac dinh nhOng thutjc firth kheng tham gia vi qua trinh phan lop vi dv down, cluing se bi loai tth sau buerc nay. d. Phan cvni: !Chong gating nhu phan lop vi dv down, phan cvm se phan tich cac dai twang clit lieu khi chua biet nhan cfia lop. Sr phan cvm co attic dich nh6m cac dEti tuqng lai then nguyen Cac d6i twang trong ding mot nhom giotng nhau a mt.= cao nhit vi cac d6i thong khac nhom giting nhau it nhat. e. Phan tich phan ter ngoai cuOc: Trang 10190 A11278 — Doan Thinh Cong A11500 — Nguyen Mc Hoang TONG QUAN VE ICHAI PHA Dli LL$U Min so CSDL c6 the china cac din wag du lieu khong tuan theo me hinh der lieu, nhiing del tuqng nhu viy gel la phin tin ngoai cuOc. Hiu het cac phuong phap khai pha der lieu deu coi phin tin ngoai cuec la nhieu va loii be chung.Tuy nhien trong met se ling dung nao d6 nhu phat hien nhieu ching han, cac str viec hiem khi xay ra lai duqc quan tam hon nhting gi thuirng xuyen phai. Sr phan tich du lieu ngoai cuOc xem nhu la sr khai pha cac phin tin ngoai cuoc. C6 met so phucrng phap de phat hien phAn tir ngoai cuOc: dung cac test mang tinh thong k8 tren co so met gia thiet ve phan phoi du lieu hay met me hinh xac suit cho dit lieu, dung cac phucmg phap dva ten dt) tech di kitm tra sv klik nhau tong nhcmg di c trung chinh cita cac del tuqng tong met nh6m. 1.3.3. Cdc dong dfr lit'u cti thi khai plod Nhu chimg ta di biet, tri thirc cua nhan loci la tong hoa cua cac mot quan he, lien quan met thiet, logic yeti nhau va duqc hat tnlr duoi clang du lieu thy du lieu kia.Trong thvc to c6 rat nhieu me hinh co so de lieu, my nhien trong cac linh vvc Ung dung cy the khac nhau, chung ta c6 the dinh nghia va phan biet ra rat nhieu ding du lieu sao cho thuin lqi nhAt tong qui trinh sir dung. Khai pha du lieu c6 kha ning chip nhin met se kik' du lieu sau: Ca so. du lieu quan he (relationnal databases): la cac dit lieu duqc to chat theo mo hinh clft lieu quan he fit phe hien trong nhieu nginh. Do d6 hiu het cac he quan tri cc se dir lieu dEu he trq dung co sa du lieu quan he nhu Oracle, MS SQL Server, IBM DB2, MS Access... Ca see da lieu da chieu (multidimensional structures, data warehouses): day cling la clang dft lieu tac nghiep c6 cac ban ghi that:mg la cac giao tic. Dang du lieu nay cling phe hien hi'c1/41 nay. Ca set dit lieu quan he - Wong dei tuqng (object relational databases): la clang du lieu lai giera hai me hinh quan he va hut:mg del tuqng. Du lieu khong gian, thoi gian va chuti thoi gian (spatial, temporal and time series data): la clang de lieu ca tich hop thuoc tinh ve khong gian dit lieu nhu dit lieu ban at mang cap dien thoai hoic thiri gian nhu dft lieu ark dien thoai, phat hanh bao chi, chi se chimg khoan... Trang 11190 A11278 — Doan Thanh Cong A11500 — Nguyen Dirc Holing TONG QUAN VE KHAI PHA Dir tau Ca so' du lieu da phieang tien (Multimedia databases): la dang de lieu am thanh, hinh inh, text & WWW... Dang de lieu nay nit phong phit, da dang va duqc phi') bien rOng rdi, nhAt la tr'en intemet. 1.3.4. Nhung vin di kho khan trong khai phi dir Iteu a. VAn de ve Ca SO De Lieu DAu vao dm met he thong khai phi de lieu thuang la tap cac de lieu the, so nhieu Inc kh6ng dAy dit va 131 nhieu. Ngoai ra trong thvc to de lieu lai luon bien dong khong ngUng va duqc b6 xung lien fix tao thanh mot Itnyng de lieu Ichiing to chira ding ca nheng th8ng tin c6 ich va khong c6 ich. Chinh vi voy trong bAt kY met he thong khai pha da lieu nio viec dAu fien can lim la phin tich va xem xet co se de lieu ma he thong khai phi. b. Co sa de lieu lot Viec sir dung cong cu phan tich true tuyen khong khai thic het duqc nhemg thong tin dm CSDL hien the& chinh vi v'ay nheng floral xir ly de lieu khong con each nao khk la Itm de lieu lai de phuc vu cho muc dich sir dung sau nay. Der lieu dtrqc hru chira dung ca thong tin co ich va vo ich. Viec tich ley nay ngay tang len va cho den nay cac CSDL tai hang trieu ban ghi c6 kith think len den Tetabytes. Tity timg img dung cu the, viec lam nhu nio de loai 136 de lieu china, nheng thong tin ve nghia lai c6 nheng each khic nhau. Vi vOy phucmg phip xir lY de lieu het sire da clang va phirc tap, khong co met quy tic chung cho moi irng dung. c. SO chi... 37,700 16V87/2013 23/08/2013 18/10/2013 2/12/ 013 2.6. Tang hop hea (Summarization) La cong viec lien quan den cac phtrong phap tim kiem met me to tap con der lieu. K9 thuat mo ta khai niem va tong hqp 116a thutmg ip dung trong viec phan tich dir lieu co tinh tham de va bio cao to deng. Nhiem vu chink la son sinh ra cac mo to dac trong cho met lop. MO ta loaf nay la met kieu tong hqp, tom tit cac dk tinh chung ctia tat ca hay hau het cac muc ciia met lop. Cac me ta dac thing the hien theo loot co clang sau:" Neu met muc thuec ve lap da chi trong lien de thi muc do co tat ca cac thuec tinh da neu trong ket luan". Cac luat clang nay co khic biet so viii cac luat clang phan lap. Luat phat hien dac tnmg cho lop chi san sink !chi cac muc da thuec ve lop dO. 2.7. Ma hinh hes sp phi? thuac (dependency modeling) La viec tim kiem met mo hinh mo ta str phu thuec gift cac bien, thuec tinh theo hai muc. Mirc cau &lc coa me hinh mo ta (thuimg dueri clang de th0, trong do cac hien Trang 26190 A11278 — Doan Thinh COng Al 1500 — Nguyen Dire Hoing CAC Kt THUAT KHAI PHA DU LICU phi thuoc hi) phan vao cac bien khk. Va muc dinh luqng mo hinh mo to mire dO phu thuoc. Nhfing phu thuOc nay thuerng duqc bieu 011 dueri clang luat"neu-thi" — neu fien de dung thi ket luan dung. VE nguyen tic, ca tier de va ket luan du co the la stir kEt hqp logic cita cac gia tri thuoc tinh. Tren thuc te, tien de thuong la nh6m cac gia tri thuOc tinh va ket luan chi la mot thuijc tinh. Hon ntia, hg thong co the phat hien cac luat phan 16p trong d6 tat ca cac luat can phai co cling mot thuOc tinh do ngtred dung chi ra trong ket luan. Quan he phu thuOc cling co the bleu dien dueri ding maingt tin cay Bayes. D6 la dO thi co huerng khong chu trinh. Cac nut bleu dien thuoc tinh va tong so elm lien kat phu thuoc gift the nut do. 2.8. PhIt hifn std Min di vi dO lich (Change and deviation detection) NhiOin Ai nay tap chung vao kham phi hau het stir thay d0i co nghia dueri clang dO do di Nat trirerc hoc gia tri chuin, phat hien di) tech ding ke gift not dung cila tip con du lieu thuc va nOi dung mong dqi. Hai me, hinh do Lech hay dung la loch theo th&i gian va l'ech theo nhom. DO loch theo thin gian la su thay tfoi coy nghia cua der lieu thin gian. DO loch theo nhom la stir khac nhau cua du lieu trong hai tap con du lieu, 6 day xet ca trtrOng hqp tap con du lieu nay thuoc tap con kia. Nghia la xic dinh dit lieu trong mot nhOm con ciia dOi tuqng c6 khac ding kE so vOi toan b0 dt)i tirqng hay Ichong? Theo cach nay, sai sot du lieu hay sal Rich so veri gia tri thong thu6ng se duqc phat hien Trang 271 90 A11278 — Doin Thanh Cling A11500 — Nguyen Due Hoang KHAI PHA DO' LIEU TRONG SQL SERVER 2012 CHICONG 3. KHAI PHA "Kr LI$U TRONG SQL SERVER 2012 3.1. Mil hinh OLE DB trong SQL Sever 3.1.1. Gliti thifu Duqc giei thieu vio thing 7 Nam 2000. N6 co nguen geoc tir hai ding nghe ca se der lieu chinh: OLE DB vi SQL. Tieu chu& nay thong qua cac khai niem co sa de lieu quan he va nhieu ap dung cita chting vac linh yip khai thac der lieu. Phan cot lOi cea OLE DB la Data Mining eXtensions (DMX), mot neon ngir- truy yin SQL-style cho khai thac der lieu. Dic to nay thingbao gOm mot danh sach cac chin nang du bao duqc xac djnh fru& va mot be schema rowsets. Cac schema rowsets cho phep cac img dung cita ban kham phi cac me hinh khai thac vi cac (Lich vu khai thac to tong. Muc dich chinh ctla OLE DB la cung cip met cach thac chuin de truy cip vao bang dr( lieu. Truck khi OLE DB ra deri, each IMO bien Mit de truy cip vac) met ca se dO lieu quan he duqc thong qua Open Database Connectivity (ODBC), mot API Oa tren chuin SQL C Level Interface. ODBC cung cap met cach de clang truy van cac loci co se lieu quan he. Tuy hau bet cac dO lieu khong duqc Itru trong co se do lieu quan he. DO lieu duqc tim they trong cac tip tin van ban, email, bang tinh Excel, tai lieu Word,... Bon muen truy cip vat) tit ca do lieu tren theo cach twang to nhu cach ban truy cip dir lieu quan he, tea nhit la thong qua cling met API. OLE DB duct gieri thieu cho now dich nay. Ifinh 5: Kiln Pic clic: Object Linking and Embedding Database (OLE DB) Trang 28190 A11278 — Doan Thanh Cong A11500 — Nguyen Dec Hoing KHAI PRA D' LIEU TRONG SQL SERVER 2012 Cac chucmg trinh irng dung do the ket non tin cac nguan clit lieu khai pha khac nhau thOng qua cac ket non OLE DB hoic ADO. Mai OLE DB cho met nguan dit lieu Data Mining, cung cap met tip cac giai thuit khai pha de lieu. Cac thuit town nay co the truy xuAt bit 4 nguan de lieu dung bang nao th8ng qua OLE DB. Du lieu nguan c6 the luu ter a trong nhieu clung nhtr CSDL quan he, OLAP cubes, file yin ban hay email... DE c6 kha ning trO thanh met chart chung cho khai pha de lieu, OLE DB dinh nghia met tip cac giao tiep. Cie giao tiep nay duce cal dit ix% cac den tuqng. Chimg bao gOm: Ddi twng ngudn dik lieu (Data Source Object): la met dei mpg COM ma thong qua d6 cac chucmg trinh ang dung ket not ten nguan de lieu. Mai met nguttin dit lieu, OLE DB cai dit met lap doi tucing rieng cho no. De ket non ten met nguan de lieu OLE DB, cac chucmg trinh Ong dung can phai khai tao 16p nay truerc. Data Source Object thuc thi giao tiep IDB Create Sesion la giao tiep h6 trq de mieu to cac thong tin sieu de lieu Doi turmg phien (Session Object): cung cap mot ngit canh cho met phien giao tic. NO sir dung giao tiep IDB Create Session, met Data Source Object c6 the tao ra met se luqng cac phien. NO thuc thi giao tiep 1DB Create Command Doi twng phien (Session Object): cung cap met nit canh cho met phien giao tic. NO sir dung giao tiep 1DB Create Session, met Data Source Object co the tao ra met so lucmg cac phien. N6 thuc thi giao tiep IDB Create Command Doi tuTyng tap cac dong (Rowset Object): no la en Wog trung tam cho phep tit ca cac nguan de lieu OLE DB truy xuat de boc tach dit lieu ra clued ding bang. Met tip du lieu rowset c6 the hieu then khai niem la met tip cac dong ma mai dang co cac cot dit lieu. Chuang trinh se duyet cac rowset de lAy ra cac dit lieu khac nhau. Ket qua truy yin tra ye la met tip cac rowset co dung bang (gam column ya row). Trang 29i 90 A11278 — Doan Thanh Ding A11500 — Nguyen Dec Hoang KHAI PHA DC! LIEU TRONG SQL SERVER 2012 Hinh 6: Ccic doi Won trong OLE DB 3.1.2. Clic khdi nifm co ban trong OLE DB cho Data Mining Case: Data Mining 11 phan tich cac cases — moToi case II mot tap cac thut)c tinh (attributes). MOi thuec tinh c6 the do met top cac gia tri goi la cac tang thai. V1): thuec tinh giei tinh c6 2 tang thai la: nam vi ner Case Key: la thuec tinh xac dinh duy nhAt cho m61 case. N6 thuimg la kh6a chinh dm mot bang quan he. Thinh thoing, met case c6 the c6 khea tong hop (gem vai thuOc tinh). Vi du: First Name va Last Name c6 the ducrc ghop lai thinh khea do tuyen. Nested Key: mac de Case Key c6 the &roc quyet dinh lam kh6a chinh, nhung kh6a long nhau tit khac biet vei kh6a ngoai. Case Key chi de xk dinh tinh duy nhAt nhung 13i kheing chira cac niu (va thuerng bi be qua beri cac thuAt town khai phi der lieu), can khea king nhau lai la thuOc tinh quan tong nit Cac thu'Oc tinh khac a trong phAn long nhau dung de mieu to khea 16ng nhau. Case Tables viz Nested Tables: met bang case chira cac thong tin lien quan den phAn nen dia case. Mal bang Tong nhau la met bang chin cac thong tin lien quan den phAn king nhau cim case. N6 thirimg IA bang giao tic (transaction table), VD: lich sir giao dich mua hang, logs truy cap Web...Mt)t bang long nhau c6 the k'et n6i v6i bang case nha dimg Case Key. De ket not bang case va bang Icing nhau theo mo hinh ke thin, OLE DB dinh nghia phip toin Shape Scalar Column va Table Column: met cOt trong mo hinh khai pha no giOng nhu met cot trong mo hinh quan he, no cling dirqc goi la bien hay thuoc tinh trong thwAt ngil thong ke. Trang 301 90 A11278 — Doan Thanh Cong A11500 — NguyIn Dirc Hoing KRAI PHA LIEU TRONG SQL SERVER 2012 Thy theo mvc Bich sir dung, me hinh khai phi de lieu co the do 4 kieu cot la: khem, dau vao, dv doan va met cot chira ca dau vau va dv doan. Met vai thuat toan nhu la phan cum, kheng you cau cac cot dv doan. Trong truerng hqp nay, me hinh khai pha c6 the chi bao gam cac cot daua vao Co 2 loai au trite cot la: vo Wong (scalar) va bang (table). Phan len cac cot la cot vo htremg. Mtn cot ve hiving cira mot tap cac ban ghi rieng bier co gia tri dun. Vi dv: Tit& va Geri tinh la cac cot vo hut:mg. Met cOt bang la mot cot dic bier. NO chira met bang ben trong. Vi du: ThuOc tinh Purchases chinh la mot cot bang (chira thong tin ve san pham va se Itrqng hang khach hang da mua). OLE DB co khai niem ve tap der lieu ke this: phan nen dinh cho ck cot vo htretng va phan phan cap la cac cot bang Data Mining Model: me hinh khai pha de lieu co the duqc hieu la met tap cac bang quan he. N6 bao gem cac cOt khoa, cOt dau van va cac cot dv doan. Moi mo hinh duqc gin veri met thuat toan khai pha de lieu ma tai d6 mai me hinh duqc huan luyen. Vie huan luyen met me hinh khai pha tirc la tun ra cac matt 4p hqp de lieu bang cach dac to cac thuat loan khai pha de lieu veri cac thong se phh hqp. Sau qua trinh huk luyOn, the ma hinh khai pha du lieu luu trir cac matt ma thuat toan khai pha tim ra duqc. Trong khi met bang quan he la tip cac bin ghi thi me hinh khai pha du lieu la tap cac matt Model Creation: khai niem ve tao and hinh dan gian la tao ra met me hinh khai pha de lieu trong, gin giong nhu cach ma tao ra met bang meri Moddel Training: can duqc goi la cach xti lY me hinh. NO duqc clang de din ra thuat town khai pha de lieu de kham pha tri thee nher cac 4'p de lieu huan luyen. Sau qua thrill hair luyen, cac matt duqc luu tre trong cac me hinh khai pha Model Prediction: me hinh dv doan duqc cliing de ap dung co cac man me hinh khai pha di duqc huan luyen, de dv dok cac tap de lieu men img veri mai twang hqp meri. 3.1.3. Data Mining Extensions to SQL (DMX) a. Dinh nghia: DMX - Data Mining Extensions la met ngen nger truy van khai pha (Wien duct dinh nghia trong OLE DB climh cho khai pha de lieu. DMX duqc thiet ke hau het cac khan niem quan he va eau true cua no dva tren ngon nge truy van SQL. Trang 31190 A11278 — Doan Thanh Cong A11500 — Nguyen Dirc Hoang ICHAI PHA Dg LIEU TRONG SQL SERVER 2012 Tren SQL Server 2012, ngoii viec sir dung cling cm SQL server data tool de khai phi du lieu mat cich tnic quan bang giao dien, to con co the sir dung DMX truy van tai he quin trj CSDL nay de lim to clang boa qui trinh xay dung mo hinh, huan luyen der lieu, du doin, truy van ra caythong tin tri thirc, hien thi kat qui tren giao dien ngtrai &mg. b. Cic bulk khai phi du lieu sir dung neon ngit DMX: Xay dung m8 hinh khai thic: tuang to nhu tao mat bang trong ca se di/ lieu quan Mat mo hinh khai thic gem: - Cot dit lieu diu vao - Cat dtr doin dugc - Thuat town lien quan Vi du mat doan lenh xay dtmg m8 hinh khai phi du lieu dung de" du doin kieu the thinh vien cua mat Ichich hang sir dung that town cay quyet dinh: Create mining model MemberCard_prediction CustumerlD long key, Gender text discrete, Income long continous, MemberCard text discrete predict, Purchase table( ProductName text key, Quantity long continous Using Microsoft_Decision_Trees Huan luyen mo hinh khai phi: trong buerc nay cic thuat town khai phi du lieu bit d'au phan tich cic der lieu diu vio. Tity vio hieu qui cita tirng thuat town se cho thiy mei ttrcmg quan gift cic gii trt thuac firth. Doan ma l'enh DMX de luyen ma hinh: Insert into MemberCard_prediction (CustomerID,Gender,Age,Profession,Income,HouseOwner,MemberCard) Trang 32j 90 A11278 — Doan TUT:1h COng A11500 — Nguyen Dire Hoang KHAI PHA Dv LIEU TRONG SQL SERVER 2012 OpenRowset(` sqloledb' :myserver': mylogin mypass ' ,` select CustomerID,Gender,Age,Profession,Income,HouseOwner,MemberCard From customers') Du doan: De du doom chimg to can mot m6 hinh da duqc huan luyen va new dataset.Sir dung cac hang du doin duqc djnh nghia trong DMX de dua ra du doin. Vi du down mA lent' DMX du doin: Select t.CustomerlD,t.LastName,M.MemberCard From MemberCardjrediction Prediction Join OpenRowset(`Provider=Microsoftiet.OLEDBVdata source=c:\customer.mdb% `select * from customers') as t On MemberCard_prediction.Gender=tgender And MemberCard_prediction.Age=t.Age And MemberCard_prediction.Profession--4.Profession And MemberCard_prediction.Income=t.Income And MemberCard_prediction.HouseOwner=t.HouseOwner Where NewCustomer.Age >30 c. M6t se ham du bao dtrqc dish nghia: Trang 331 90 A11278 — Doan Thinh Cong A11500 — Nguyen Dirc Hoing KHAI PHA DU LIEU TRONG SQL SERVER 2012 OWLS PISS Finis on Suit Cana Oneidas. than OirPYN teadPdh MCA i.Cla.cD >al &Ski lain Cin3 sailart Predict lass 931 Wei pair finlanio rattle beim id Felon *afar qt_ro. soli thind midi' a Clanserekeilig kit Mitigidelprist move Matte I ci2:ep ice nit siren to ra is deliedtg clatta.or et gecko/aim lelikapidirrivet inelicravert kik Ws Cori ohm :nand it Ns con ar Sank Gettemilipiehlteriat (gala: *Sat Hs, ninny>) tedetkva ENS sin zdameni rpicon ar kit Gets Is white SS enact/glace SS *ski It Oils at et Mgr Warn)) hldC hatanleini *AMISS/Min i ((Kale :Cr, rthrmet >I the re iserag brakes waraticsai: Subs Crimp aid it athuK elm rderco) peal baits. Sind ar cliatensuou Sca t :emcelaahme dh tZLE:C.C.."_011) sidsiied SM. Set Sipe Ctitatcagin ::am:..-jao: skids IL 1 . I I, tit! Ws frtIlliff Clubianbsbilley Sal lalarts it awe Podietirdtaty Saiekt lkictddiratetthe Ste S tmeladvdiespielt br tauff In hired akrHeee maw •assi MSS xotiallege Mit taped ✓SCe.lt So.Y :Midi WWI II IS Sod -.Corwirn) wir Irr, SAS mhos usitia '<mita km Ea :ash awl paid cn -sinned.) sr Ira Sand Sat at tar Sus :Ns Ire ife rid ol ta cos nfettarol sdecktei MI Mimi Ira 3.2. Cic thuat tom khai phi de lieu trong SQL Server 2012 Thuat toan khai that de lieu la met kg thuat de tao ra cac mo hinh khai that. tao ra met me hinh, met thuat toan tien phai phan tich thitt lap ciia du lieu, him kiem cac mau d4c trung va xu hurling. Thuat toan sau do sir dung nhtmg kit qui cup viec phan tich nay de xac djnh cac tham se dm mo hinh khai that. MO hinh khai that ma met thuat toan tao ra c6 the cú nhieu dang khac nhau, bao g6m: Viec thiet lap cac lust ma to lam each nao cac son pham duqc gom them lai voi nhau thanh met thao - Cay quyet djnh du down met khach hang cu the se mua met son pham hay kh6ng M6 hinh toan hoc du down viec mua ban Thiet lap cac nhom mo to cac case trong dataset lien quan den nhau nhu the nao. Trang 341 90 A11278 — Doan Thanh Cong A11500 — Nguyen Dec Hoang KHAI PHA Dir LIEU TRONG SQL SERVER 2012 Microsoft SQL Server Analysis Services cung cap nhieu thuit town cho ck giii phip khai thic du lieu cilia ban. Cic thuit town nay li tip con cita tat ca cic thuit town co the duqc clang cho viec khai chic du lieu. Ban cling c6 the sir dung cic thuit town cita hang this ba than theo cic dic ti OLE DB for Data Mining. 3.2.1. Microsoft Decion Trees Thuit town Microsoft Decision Tree ha trq ca viec phan loci vi hai quy, vi tho rat tot cic mo hinh du doin. Sir dung thuit town nay c6 the do doin ca ck thuec tinh rat rac vi lien toe. Trong viec xay dung mo hinh, thuit town nay se khio sit sv anh huerng cia mai thuoc tinh trong tip du lieu vi ket qui cilia thuoc tinh dv down. Vi tiep den no sir dung ck thuec tinh input (vii ck quan he ra rang) de tho thinh mot nhom phan hem gqi cic node. Khi met node mei duqc them vio mo hinh, met ciu tric cay se duqc thiet lip. Node dinh ctia cay se inieu ti so phan tich (bang thong ke) cita cic thuec tinh dv doin thong qua cic matt. Mai node them vio se duct to ra dtra tren so sip xep cic Huang cita thuec firth dv doin, de so sixth veri di lieu input. NM met thuok tinh input duce coi la nguyen nhan cilia thuec tinh dv doin, met node meri se them vio me hinh. MO hinh tiep tuc phit trien cho den lite khong can thuec tinh nio, tho thinh met su phan tich de cung cap met du ha° hoin chinh thong qua ck node da Mn tai. MO hinh dui hoi tim kiem mot sir ket hqp giaa ck thuOc tinh vi truing dm no, nhim thiet lip Met su phin phei khong can ximg gicra the trithng trong thuOc tinh dkr doin. Vi the cho phop du doin ket quA cua thuOc tinh du doin met cach tot nhit. 3.2.2. Microsoft Clustering Thuit town nay sir dung ky thuit lip de nhom ck ban ghi tir mot tip hqp du lieu vio met lien cung cling cú dic diem gning nhau. Sir dung lien cung nay c6 the khim phi dir lieu, tim hitu ve ck quan he da ton thi, ma cic quan he nay khong a ding tim duqc met each hqp 19 thong qua quan sat ngau nhien. Them nfra, c6 the du doin tir the mo hinh lien cung da duqc tho bed thuit town. Vi du: xem xet met nhom ngueri song a cling met vimg, c6 cling met lo3i xe, an cling met loci thirc an vi mua clung mot sin phim. Day li met lien cung cua &I lieu, met lien cung khac c6 the bao gam riffling ngueri cling den mot nhi hang, cling mire lucmg, vi duqc di nghi a nu& ngoii hai lin trong ram. Hay quan sat nth -mg lien cung nay duqc phan ph& ra sao? Ta co the hitt rb han so inh htremg cilia cic bin ghi trong Trong 351 90 A11278 — Doan Thinh Ceng A11500 — Nguyen Dirc Doing KHAI PHA LItU TRONG SQL SERVER 2012 mot tap dar lieu. Cling nhu su anh huerng nay c6 anh huling gi den ket qua dm thuec tinh dv doin 3.2.3. Microsoft Naive Bayes Thuat Man nay xay dung mo hinh khai thic nhanh hcrn cac thuat wan Ichic, phut vu viec phan loci va dv doin. NO tinh town khit Jiang co the xay ra trong mOi trtreing hqp cim thuec tinh dau vao input, gin cho mei truing met thuec tinh de co the dv down. Moi trifling nay c6 the sau d6 duqc sir dung de dv doin ket qua cita thuec tinh dv doin dva vao nhiing thuec tinh input da biet. Thuat toil' nay chi he thy cac thuec tinh hoac la tin rac hoax la lien tic va cac thuec tinh dau vao nay dec lap veri nhau. Thuat town nay cho to met me hinh khai phi don gian (co the coi la diem xuat phat ctia DataMining), beri vi hau het tit ca cac tinh loan sir dung trong khi thief lap mo hinh duqc sinh ra trong xir li cim khei (cube), ket qua duqc tra ve nhanh thong. 3.2.4. Microsoft Sequence Clustering Thuat town Sequence Clustering phin tich cac del tuqng du lieu co trinh tv, cac du lieu nay bao gem met chuOi cac gia trj raj rac. Thuimg thi thuec tinh trinh tv cim met chat anh teri met tap cac sty kien ctla met trat tv re rang. Bing cach phan tich str chuyen tiep giiia cac tinh trang cua met chuOi, thuat toin ce the dv doin ttrcmg lai trong cac chueli c6 quan he nhau. Thuat than Sequence Clustering la sir pha ten giita thuat town chat va thuat toan lien cung. Thuat town nhom tit ca cac sir kien phirc tap yeti cac thuec tinh trinh to vao met phan down dva vac, sv gieng nhau mia nhang chat nay. MOt dac trtmg sir dung chuei sv kien cho thuat town nay la phan tich khach hang web cilia met Gong thong tin. Met ding thong tin la mot tap cac ten mien lien ket nhu: tin ttic, thin tiet, gia Wen, mail, va the thao... mei khach hang duqc lien ket veri met chucli cac click web tren cac ten mien nay. Thuat Man Sequence Clustering c6 the nhom cac khach hang web ve met hok nIti'eu nhOm dva tren kieu hanh deng cum he. Nhang nhom nay co the duqc tryc quan hem, cung cap met ban chi tie[ de bier duqc muc dick sir dung trang web nay cita khach hang. 3.2.5. Microsoft Time Series Thu#t Man Time Series Mo ra nhang mo hinh duqc sir dung de dv dotin cac Bien lieu theo tir OLAP va cac nguen MI lieu quan he. Trang 36190 A11278 — Doan Thinh Cong A11500 — Nguyen Dire Hoang KHAI PHA DU LIEU TRONG SQL SERVER 2012 Vi du: sir dung thuat toan nay de du doan bin hang va lqi nhui'n dui vao cac dit lieu qua khir trong 1 cube Sir dung thuat toan Time Series do the chqn mot hoac nhieu bien de du doin (nhung cac bien phai la lien tuc). CO the c6 nhieu trueng hqp cho mai mo hinh. Tap cac tnrtmg hqp xac dinh vi tri cua met thorn, nhu la ngiy thing khi xem viec bin hang thong qua vai thing hoc vai nam truerc. Mot twang hqp co the bao gem met tap cac bien (vi du nhu ban hang tai cac cira hang Ichic nhau). Thuat town nay co the sir dung sir tuong quan cua thay doi bien so trong du doin ctia no. Vi du.: bin hang trtrem kia tai met cira hang co the rat heru ich trong viec du bao bin hang hien tai tai nhimg cira hang. 3.2.6. Microsoft Association Rules Th4't toan nay duqc thiet ke de sir dung phan tich gio hang thi twang (basket market) ten str giao dich cita khach hang. Nher thult town phan tich lust ket hqp co the biet dtrqc nhcmg sin phim nao thuerng dtrqc bin ding yeti nhau va lam the nao met san pham dic Wet duqc ban cling veri nhemg san phim khic. Vi du 5% so ichich mua laptop, chuet khong day ding v6i de tan nhiet va 90% cim ithimg khich hang nay da mua laptop, chuot khong day thi cling se mua de tan nhiet. Thuat toan phan tich lust ket hqp, thuat toan nay se xet mai cap thuec tinh/gia tri la mot item. Met Itemset la met tap hqp cac item trong 1 giao dich (transaction) don le.Thuat toan se quat qua cac tap dit lieu, tim kiem cac tap Itemset xuat hien trong nhieu giao dich. Tham chieu Support se dinh nghia bao nhieu transaction ma itemset se xui't hien tank khi no duqc go:pi la quan tong. Vi du ve met itemset phi) bien: {Gender="Male", Marital Status="Married", Age="30-35"} Trong Itemset nay co cac Item: Gender=male; MaritalStatus =Married; Age=30- 35 Trong Item Gender la thuec tinh, male dirge goi la gia trf cita thuec tinh gender Thuat toan nay cling tim ra 141 ket hqp giara cac Iternset.Vi du, Mot luit kat hqp 05 ding X,Y=>Z. Khi ca X,Y,Z deu la Itemset phe bien thi to not ring Z duqc du (loan to X,Y. Trang 371 90 A11278 — Doan Thanh Cling A11500 — Nguyen Dirc Hang KHAI pHA DC! LIEU TRONG SQL SERVER 2012 Mqt dac tinh quan tong nita cita phan tich tat ket hop d6 la Probability (xac sat). Xic suat cua quy tee ket hop A=>B duqc tinh town bang each sir dung Support cita itemset (A,B) chia cho Support cita itemset A. Xic sat nay duqc goi la dq tin cay trong nhitng nghien ciru ctia khai phi du 3.2.7. Microsoft Neural Network That town Neural Network tao cac m8 hinh khai thic fel h8i quy vi phin loaf bang each xay dung da lop perceptim cfra cac flown. Citing nhu that toan cay quyet djnh, dua ra mei tinh twig cita thuOc tinh co the do doin. That tan Neural Network tinh toan kha nang co the cita mei trang thai ce, the Gila thuOc tinh dau vao. That toan Neural Network se xir 1$' tan the cac trtrerng hop. So lap di lap tai so sinh the du (loan phan loaf cfra cac truerng hop vfri so phan loci ca cac truerng dA biet. Sai se tir so phin loci ban diu (dm phep lap ban dAu) cita toan bo cac twang hqp duqc tra ve network va dtrqc sir dung de thay aM so thoc thi cita network cho cac phop lap ke theo,v.v... co the sau do sir dung nheng kha nang nay de do doin ket qua cita cac thuOc tinh do doan, dua tren thuete tinh vac). Mqt so khach biet chink gifra that town Neural Network va that town cay guy& djnh la cac lcien thirc xir li la nhi'mg tham so network ton uu nhim lam nhe !that cac loi co the trong khi cay quyet djnh tach cac !at, moc dich de eve dal h6a th8ng tin co 10. That town nay he trq ca thuqc tinh raj rac va IS tic. 3.2.8. Microsoft Linear Regression Microsoft Linear Regression la met cau hinh co the cita that town Microsoft Decision Trees, thu duqc bang each vo hieu hew chia tach (cac tong thirc hqi quy town bq duqc xay (long trong met nut gec duy nhat). That town nay he trq cac do down cho cac thuOc firth lien tic. 3.2.9. Microsoft Logistic Regression Microsoft Logistic Regression la mgt cau hinh co the cita that town Microsoft Neural Network, thu duqc bang each loci bo cac lop An. That town nay he trq cac do doom cita ca hai thuqc tinh red rac va lien tic. 3.3. Nguyen tic chon thu$t toin SQL Server bao gam nhfrng that tan sau: Trang 381 90 A11278 — Doan Thinh Ding Al 1500 — Nguyen Dire Hoing KHAI PHA DC! LIEU TRONG SQL SERVER 2012 Thuat town phin loci: du doin mot hok nhieu bien rai roc (khong lion tic), dua ten cac thu6c firth trong Lap de lieu (Microsoft Decision Trees Algorithm). Thu* toan hoi quy: du doin moat hoc nhieu hien lien tic, kieu nhu nhiing lqi nhuan va nhff ►g ban that, dua ten cac thuetc tinh khic nhau cith tap hqp dit lieu (Microsoft Time Series Algorithm). Thuat tan phin down: chia dii lieu thinh hai nhom, hok cac lien cung, hok cac danh mac c6 thuOc tinh gicang nhau (Microsoft Clustering Algorithm). Thuat Wan ket hqp: tim nhiing su tuang quan gicra cac thuOc tinh khach nhau trong met tap hqp du lieu. ling dung pile hien nhat dua loci thuat toan nay 11 tao ra cac lust ket hqp, do the dirk dimg trong market basket (Microft Association Algorithm). Thuat Wan phan tich tien trinh: hang kit nhiing tien trinh thirerng xay ra hoc it xay ra trong der lieu (Microsoft Sequence Clustering Algorithm). Chan moat thuat toan dimg de sir dung cho cac nghiep va Hong biet la mot nhiem via kh6 lchAn. Khi to c6 the sir dung cac thuat toan khic nhau de thuc thi ding met nghiep vu, moi thuat Wan tao ra met ket qua khich nhau, va mot vai thuat Wan c6 the tao ra nhieu han met ket qua. Vi du 1 c6 the sir dung thuat than Microsoft Decision Trees kh8ng chi de di; down ma din la met ckh de giam s6 litmg cot trong dataset, bai vi cay quyet djnh co the xk dinh cac got ma khong anh hitmg den me hinh khai thic cuei ding. Ta cling kung phai sir dung vac thuat than doc lap trong giii phip khai thic dii lieu don gian, c6 the sir dung vai thuat toan de khao sat du lieu, va sau d6 sir dung cac thuat tan de du down kat qua reri roc dta ten du lieu nay. Vi du 2: c6 the sir dung thuat man gom nhom, nhan ra cac mau, dua dii lieu vao nhom dOng that, va sau d6 sir dung cac ket qua de tao ra mo hinh cay quyet Binh tot han. Vi du 3: bang each sir dung thuat toan cay hal quy de ley thong tin du doin ve tai chinh, va thuat Wan dua teen luat de dux thi viec khao sat thi throng. Cac mo hinh khai thic co the du doom cac gin tri, dua ra bang tom tit dii lieu, va tim ra su Prong quan An. De giup cho lta chop thuat toan cho giai phip khai thic chi Trang 39190 A11278 — Doan Thanh Ding A11500 — Nguyen Dirc Hoang KHAI PHA Dir LIEU TRONG SQL SERVER 2012 lieu. Bang duoi day cung cap cac gqi y cho vi'ec Ira chcon thuat town nio cho cac clang viec cu the nio: out; HI@ tu.in str dung Du Joan thuqc tudi hen me Thuat twin Microsoft Decision Trees VI du: du doin doanh thu nam tiep theo Thu$t toin Microsoft Time Series Tim nhom lulling di twig trong cac Thu$t town Microsoft Association giao thirc thuc hien Thu* toan Microsoft Decision Trees Vi du: sir dung phan tick thj throng de dua them cac san phim cho khich hang Trang 40190 A11278 — Doan Thanh Cong A11500 — Nguyen Dirc Hoing UNG DUNG KHAI PHA Doi MU SQL SERVER 2012 CHU'ONG 4. UNG DUNG KHAI PHA DIY LI€U SQL SERVER 2012 4.1. Gioi thieu ve Business Intelligence Development Studio a. Gioi thieu SQL Server Data Tool- Business Intelligence Development Studio (BIDS) la cong cu cho phep to chirc quail 19 va khai thic kho du lieu (Xir 19 phin tich trvc tuyen) cling nhu thy dung cac me hinh khai pha dft lieu rat di sir dung va hieu qua ciia Microsoft. SQL Server Data Tool lam viec ben tong Visual Stuciio.Nguei sir dung co the tao met dv an Analysis Services project a khai thitc du lieu. Quy trinh xay dung mot me hinh khai phi da lieu vai BIDS nhu sau: To meri 1 project (Analysis Services Project) Tao mot Data Source Tao met Data Source View Tao met Mining model structure. Tao ck Mining models. Khai thk Mining models. Kiem tra de chinh xac dm Mining Models. Sir dung Mining Models de dv doin. b. flu cau hg thong cal d4t Cai at SQL sever 2012 Khi cai det SQL Sever 2012 rald cal d4t them be SQL Server Data Tools for Visual Studio 2012. SQL Server Data Tools for Visual Studio 2012 la nen ngft dimg a tao va thvc thi chtrung trinh. Ban phii chic ring djch vu phin tich di duqc chay Trang 411 90 A11278 — Doin Thinh Cong A11500 — Nguyen Dirc Holing UNG DUNG KHAI PHA DIY LItU SQL SERVER 2012 Fla Action Vs. Help im 7. 1 ' is SQ Sorer Configunban Manager Roca) None Rate Stan Mock Log On As Rgcas O SQL Saver Sinew gt SQL Saw Integrabon Seniors 11J) Running Manus NT SeviceVAallsi. 5112 SO. Server Network Caniquradon (32biti kil-tet Mar Dimness Wadi— Rumng Manua NT SerictQASSQL. SQ Rebut Client '1.0 Cenfiguntion (320! aiSQL San RASSOLSERRER) Rumens Manua NT SeviceVASSOL.. um' SQL Saw Network Canfigurebon Rang Manus NT ServiceVASSQL.. IMO _t SQL Neale Clot 11.0 Confer...bon bsa. Sow Ripening Sestet PASS- Running Manua NT Senicenlepolt5.- 5136 a SQL Senn Browse Unwed Odle (Boot. Syste- NT AUTHORIVALO... 5 SQ Sew Agent (.45501SERVER) Running Manta NT SarvinelSOLSEIL- 4036 Hinh 7: Khoi Being djch V14 SQL Server viz Analysis Server 4.2. ling dung tong SQL 4.2.1. Sir dyng thu@ roan Microsoft Decision Tree va Microsoft Naive Bayes. a. Co so du lieu va muc tieu khai phi Co sic din lieu Co so din lieu duqc sir dung a minh hqa trong bai vier nay co ten la AdventureWorIcsDW2012 , day la kho dO lieu cim cong ty chuyen san xu...f Re Wie.. View ni4c djnh Oki Prediction Query Builder Sir dung Design and Query views, bon c6 the xay dung va xem xet truy Van cita minh. Ban co the thvc hien vi xem ket qua cita cac truy van trong Result view. Tao true yin Trong Select Input Table(s) box, click Select case table. Select Table dialog box met ra, chon mot bang du vio c6 chira cac du lieu thir nghiem sir dung trong dv bao cac truy van. - Trong Select Table dialog box, chon Adventure Works DW2012 tir danh sach dii lieu ngu8n. - Chon vTargetMail tir Table/View list vi click OK. Trang 60190 A11278 — Doan Thinh Cong A11500 — Nguyen Dire Hoing ▪ trNG DUNG ICHAI nu DC! LIEU SQL SERVER 2012 Cac cot cau trac khai thic duqc to dOng anh xa ten cac cot c6 cimg ten tong bang diu vao, nhu the hien trong hinh &raj. • 7 N or on• !act IMO DOUG TU AMMO MMHG a POLS rill MOITICTIM Dotal NCO* WiP a-411110 b-o•• j MSS 'wow ji on, haunt. Ow 74 *was. p ' 0 Luca I.LIVD00.7)•VOCONW Iiinmtl • 41. rat••MIT•ANIC•• • a • a NOY.= Yes cp ••••••• Wet OV/201I a OAS. la Ms • a of.." sta.. TL Pas term Tr WYSS P.O/bn Ilsoffibucluv Z OW11.10 brteiaripille. (*..M INS M. 1/.. imp Mee 04.12~.- Heldwilieses 0 )101.1MAAAAA 1.111m6s1 (WISSINIsl WT we. DM • L.K. Yom. Weal NOM Cis ••• SPKAmen mind al **Spa Luu y rang thong thuOng ban se c6 mOt bang rieng blest c6 chfra cac khach hang tri'en vqng va ban main du down xem moi khach hang c6 mua mOt chiec xe dap hay kh8ng (tirc la Bike Buyer column) dqa tren thong tin khac duqc biet den (cac cOt khk). 6 day vTargetMail nhu mOt bang cac khach hang teen vong. Sau khi ban chqn bang du vao, Prediction Query Builder tao ra mOt anh xa mac dinh giera cac mo hinh khai thk va bang dau vao dua ten ten cua cac cOt, nhu the hien tong hinh sau. Xilv dune tic true vain du bAo Trong cot Source, nhan vao 6 tang 6 hang diu tien, va sau d6 nhap vao bang vTargetMail. Trong cot Field, ben canh cac muc ban da tao tong buck throe, click CustomerKey. Dieu nay cho biet them xac dinh duy nhat yeti truy van du bao de ban c6 the nhan ra nhang nguiri c6 va khong co kha nang mua mOt chiec xe dap. Click vao 8 ke tiep tong cOt Source, va sau d6 nhip vao Targeted Mail mining model. Trong 8 Field, titian Bike Buyer. Trang 61190 A11278 — Doan Thanh C8ng A11500 — Nguyen Dirc Hoang • ▪ ITNG DUNG KHAI PHA Dir LIEU SQL SERVER 2012 Click vio o ke tiep trong cot Source, va sau do nhip vio Prediction Function. Tiep thee Prediction Function, trong cot Field, click PredictProbability. Prediction functions cung cap thong tin ve me hinh dit bao. Chfrc ning PredictProbability cung cap thong tin ve xic suat cna cac du dot' duqc. Ban co the chi dinh cac thong sift' cho cac chirc nang du bao trong cOt Criteria/Argument. Trong cOt Criteria/Argument, g8 [vTargeted Mail]. [Bike Buyer]. Man hinh duqc hi'en thj nhu sau: FU Off VIM MOM MD OHM MI la MAME WM MOS MU tir IMCMCMI MUM *MOO WU a 0 IP Ire 1 : *us lahn 11 r 111 - ✓ •0 •1:1 42 MI Aum Mmd• MD NM va. a Mq mem Mt 0 swop pa m 0.• MP MO Maim a AMOY 1 MIMM B A ....mews 1 ism. e p Prat:wows * Mk0MIM P 1 1••••0 110.1 i Mom muslem • Ima Sweat MS DIDD12 MMUS mml the Sper. Bang cach nhan vio bieu tuung o gee tren ben trai cfm view, ban c6 the chuyEn sang the deo xem truy van va xem xot cac mA DMX that Prediction Query Builder tao ra. Ban 6-mg co the chay cac truy van, sira d6i cac truy van, va chay cac truy van sira den, nhtmg cac truy van sira d3i la kh8ng tan tai nEu ban chuyen va Design view. Xem ket qui Ban co the chay cac truy van bAng each nhan vao mil ten ben canh bieu tucong 6 goc trai tren cite tab, va sau d6 nhap vio ket qua. Trong 62190 A11278 — Doan Thanh Cong A11500 — Nguyen Dire Hoing trNc DUNG KHAI PHA Del Lieu SQL SERVER 2012 - 4 Test2 .70ANT,A HCCNC • 411 RE MIT YEW PROJECT WU DMA IE1Y sa WW1 BM* 1/0011 TOMS TBT MORTICTUIE 0100EE 1000* a 11r 1. sir -Y.maw m • b -Sit 01 1111.0mdue 1041 Pidd A1w11Nlw.m 10 Noy Pan" Owl t p 2.11610 CAIMIWY -••• non Lou 1I43 MIN MEMOS 10:0 10111 ID? 1100 liLa 1100 11111 0.1711.111113. 1/011 101.3 1011 1013 IMP 1/017 100 110/9 IVO 11012 111111 11111 0 fun...a.coisal.0 atom YAW Cac cot CustomerKey, BikeBuyer, va Expression cho biet ma cita cac khich hang tiem ming, hq c6 la nhimg ngu&i mua xe dap khong(dua vio gia tri 0 hoc 1), va do chiral xic cua cac xic suit du doin duqc. Ban c6 the tich bang dit lieu nay thanh hai phan va tau vio co s6 dir MOt hm dank sich ithimg khich hang dac biet (c6 kha nang mua xe nhit) vdri Bike Buyer bang 1. MOt lim dank sach nhfing khach hang binh thuong (Ichong kha nang mua xe nhit) veri Bike Buyer bang 0. Do d6 ta co the the sir dung cac ket qua nay de xac Binh ai se duqc giri mot quang cao. 4.2.2. Sir dung dm& todn Microsoft Association Rule a. M6 ta der lieu va muc tieu khai phi Ca sOLAIfr Trong bai tout nay ta tiep Ale khai thk co se der lieu Adventure Works DW 2012. Trong bai town nay der lieu can khai phi la hai bang vAssocSeqOrders va vAssocSeqLineltems. Muc den khai phi Trang 631 90 A11278 — Doan Thanh along A11500 — Nguyen Dire Hoang irNG DUNG KHAI PHA DIY LIEU SQL SERVER 2012 BO phin ban hang ctia cong ty muen tim hien so ket hop gift cac mau sin phim bay ban veri nhau thong qua cac giao djch di duqc thuc hien. Cu the la met ngueri den mua mat hang A, B vay thi ho se c6 xu hueng chon mua them met mat hang C nio do? Dieu nay rat quan trong, thin via ban hang dip vao de nam bat duqc tim 15 , ngueri mua, tiep thi sin phim vim 5( khach hang. Nhiem vu Gila Microsoft Association Rule chinh la kham pha ra nhUng su ket hqp d6. b. Qua trinh chay img dung Tao met Analysis Services Proiect Tnrerc tien, Mo Met Analysis Services Project vei ten" Association Rule Model" va Mo ket not dil lieu, tao met Data Source vi Data Source view gem du lieu la 2 views la v AssocSeq Orders vi v AssocSeqLine Items. - Add Data Source - Trong muc data source view. Right - click chqn New data source view. Chqn co so der lieu Adventure Works DW 2012. Click Next Chen Case la vAssocSeq Orders vi Nested la vAssocSeq Lineltems, click Next. Tio quan he many to one bang cac keo thoc the OrderNumber trong bang v AssocSeqLine Items sang bang v AssocSeq Orders a • 000 warn ..e-. 140 00•4 TOW 0 00 0100 04/ 010/10 111000111. TOW 0 0 EOM vOIT 100 0.10 0- 4-Diat See.. tylom 1 a 0.0•1•D rame-ega. ....awe.. ..• _ • - as Mr•••••••••••••• 0TTO DON loam 0. •a = 006.0 40- • 00000. • /0.0.0.00 000 CIS 01•■••••• • -CIMISOIMINOS Asoblos 001•10 40... Town 00 Colonalm 000000 000 .6100100 o Sew !wan 0,00001T 1%•••••• ■•7••••• ---0 00020.0, WNW 00.0/0.0 C SIPA 0.111111111 immphell• • 0 Ow. 210 14 10.0 TKO 0.0.0.00100.0.0 OTT • • • 9..• Imcrtni p La • CTT. 0.10ormat001. 106 0.0.° O - Trong cira so Solution Explorer, right - click Mining Structures, click New Mining Structure. - Click Next Trang 64190 A11278 — Doan Thinh COng A11500 — Nguyen Dirc Hoing I.J.NG DUNG KHAI PHA Dir Lieu SQL SERVER 2012 Click From existing relational database or data warehouse, click Next. Trong muc What data mining technique do you want to use?, chon Microsoft Association Rules. click Next click Next - click Next - Hep tho3i Input Table hien ra. O v AssocSeq Orders chon Case , v AssocSeq Line Items chon Nested. „•—,•• • . ■ •ui„ • . • . • •, CMG 11/JA a DeOSI 1001.5 Tat NICK110/111 e.Y11 Seel 1.511 ill em le al • IP , 0 • a - • IN • Dlelle111 61.enase hceleGiel0•60, e it • • , p I• • • • -1 41 !y . Me ens M done 14 awn • • elee • fa Nee ee 04101.41• • SI Do.%••••We edule INeele C. • o veeittill• a 6 o..emw i6nyte T N. Lee Zb Inert* Cam- Ilerneber Fie Sees* Fe Ss. Woe rI 0 - Click Next. Chon thuoc tinh input, predict, key Trang 651 90 A11278 — Doan Thinh C8ng A11500 — Nguyen Dire Hoing IJNG DUNG KHAI PHA DU LIEU SQL SERVER 2012 P 04 3:XCV lat; MlaDS:1 . PRO., MD MUG UMS SQL MAO DAIMASI DATA SOME Mk TOMS EST •llOCK1111 Aasefli wane 1411 WE 1/0 *W a Alai xSox- owe,- 0, krarraa. huts Dan012 thOes,71: a X it 0* rti I ' MO 10 1a9 X MP• OVallpar DIUMISOrard [SI olTallas Snob the Trim Our inchown•xsommixo ISM I 0 •oeSeflass 0.R∎34011 x 9 - ramrsaixemeresossama set x Eno LA - Input: OrderNumber trong bang v AssocSeq Orders - Model trong bang v AssocSeqLine Items: a day vita 11 input vita 11 predict, key. Click Next Het) thoai hien ra you cau chat.' phin tram der lieu sir dung de khai pha. DO mac djnh 30%, 1000 clang. Click Next Dat ten cho Mining Structures va Click Finish Trang 661 90 A11278 — Doan Thinh Cling A11500 — Nguyen Dirc Hoing UNG DUNG ICHA1 PHA lar LEW SQL SERVER 2012 p e x WINDOW ti tLL EDIT t ROG MD tOE TEMI 91 FOOMT CATWASE DATASOICEMEW TOOLS TBT - •Midi oai••obs• o ALherweliorfraCC:ISCEts-7.• < x x 3P b-*011*0 D. cmg-on 11 Dila liras Mord 0 ea 'asseceal ea) vi oats") Crepes last ji miede SW* elieree areee • la psi. MEM/3421s ♦ Q Rtl Sone *Se ese Nile et Lie cos Dame Meta TM no 11 Abode. a •wolalles I. 0 tear ▪ v Sme Ium . I x elle 'Pi pekoe. 2 Eon lieu chinh tham s6 cho mo hinh: Trong cfra s6 Mining Models, him phim phai chu6t vio Microsoft Association Rules va chon Set Algorithm Parameters va thiet lop gia tri 2 tham sa MINIMUM_ PROBABILITY la 0.1 va MINIMUM_ SUPPORT la 0.01 nhu sau: Trong 67190 A11278 — Dan Thinh COng A11500 — Nguyen Dirc Hoing LING DUNG ICHAI PHA Dv LIEU SQL SERVER 2012 P 4 di„. le (0 0e5 - 51.0!3(1.diC MIALYZI MEOW le, RE EDIT 411BY POMO MD MSG TEEM 91 DAMS( TOOLS 1131 SIDOKTLIII 0- aliblit 1Mi-Dees - $: • 5000 S,ks • • Me/se VIM OWM2.0410uml • lewawaoan O (000000010 (A If P P • 000(00000110 (00000100100 Sea papal • 44 amlismic • Q 00a ken A so Ser w. OMI 0.0 Rase Ma EMY0-6 fi OnMAer . Q dbaan Km ♦ •0000.01108 AM p A•m•MMOMIIM A 1 1/40ZUMEI6TM 1 AM OM lalkSLIPPM LT 140-1 la Moms 111119111 (---1 SIMMOSOMM • QIimSbudm PM ?Oat* 0.•••••• ISIMAIRKIIMILIIY (0 44 Kum ONO IIIMINMPORT OM an PA-) k(0,5000 lam 60:0( MernerOmm mo•mr•a •1 1 Star ballet Men WANE (5000101110 5110001000111,0111110MII 400116.000a hafalas VS* tle Spas grin 41 made met -p Mem Mist momrn 04,08•1 3. 920 0.• zu000 *WS Et 5m1 - Spas 14molonon ■ MO* (404...ô••-.0 (mai 04 sr ar Ism 0( nib • • I K ••104 P.' 0 U,•0•.0•Mated Sms1•14 :ak nw Ercgttu Ma( Sau khi hieu chinh cac tham s6 cua Mining Models, him F5 de thuc hien m6 hinh Khim phi Minine Models Ket qua ciia Microsoft Association Rules the hien trong Tab Mining Models Viewer heti 3 n6i dung chinh la Itemsets, Rules, va Dependency Net Itemsets: Itemsets cho ',jet cac thong tin quan tong dm luat ket hqp nhu Support (d6 ho trq cua 1u4t ket hqp), Size (S6 items trong Itemsets). EM hien thj cac Itemsets co chira m6t item nio do (vi du mau xe Mountain-200) till nha‘p Mountain -200 trong 6 Filter Itemset. Trang 681 90 A11278 — Doan Thanh Cong A11500 — Nguyen Et Hoing irNG DUNG ICHAI PHA Dif LIEU SQL SERVER 2012 4 association rules - Niticroson. ilisuai St.JO.° RLE EDIT VIEW PROJECT BUILD DOG TEAM SCIL DATABASE PINING MOOEI TOOLS TEST AR(1411(1191 ANALYZE *COW MP 0 0.11111 sbit • Ds* Akeetat Weis DY12012Asv (Deign] MSc Sbudare A 'atoms* Kmpaincyairt 81big lbd sar 14n14odd: soillbt nit Y *nit And& RIM Oar .0 Cl Rules Don Dererdoxy Newt Snromacrt 141 ; FAN ltepeet Wisp 1E4441 la: 0 ;- Shove Slsr ettrbie ram gle Kmitun ram: 2000 ; QSMokm we Sart Sr And 4MS 1 Mot-100 thistim MA 1 VAlkabt 2110 1 Pal6M-Exilm M42 1 1621004 'WAS • bilm 1739 1 14222120-2M • Wm 1583 1 balite ToW mn 1 COO, .6121m UM 1 Fide Set - IS& • Salm 1354 1 142421WMMIKage •ENMN 1217 1 Wane 12441m2y •Emig 1203 1 Rom10016 Cage•&dm 1146 2 Mobil BOMBCape•UPW11ar BO*. Hinh ten veri Itemsets c6 Support la 1146 gOm 2 items do la Mountain Bottle Cage, Water Bottle co nghia la trong tat ca cac giao dich thi co 1146 giao dich trong d6 khach hang mua loaf Mountain Bottle Cage thi cling mua loci Water Bottle Rules Tab: Phan nay trinh bay cac luat kit hqp dugc phat hien Uri mo hinh. Cac thong tin ve luat kit hqp bao g6m: Probability: Cho bier xac suat xay ra cua fait. Importance: Do Wang tinh him eking cua lust, gia Ili nay tang cao thi luat kit hqp tang tot. Rules: Phan nay the hien cac luat kit hqp clang X=--->Y Trang 691 90 A11278 — Doan Thinh Cling A11500 — Nguyen Dirc Hoang eNG DI,1NG ICHAI PHA Litu SQL SERVER 2012 ea association rules - Microsoft Visa! StJd[a RLE EIXT AR ARD1KT BUILD DM TERM SQL DATABASE 11196 MOIR TOOLS TEL ARCHITECTURE ANALYZE Maw its. ■ Slat Dredop • 9 0• d - , • 0 kkeenwcgos waniatom ;it Skase ;{ MIthg Alodels limj Away 0.1 6 WircliteRtddm *INAS: reakri AR • Veen Win Isar RAMA, • 0 Demet I Depailey PISS 14riun mealy: 0. 10 ữ %Ruh: 14M. ingrIne: 0.18 • 97701 5ko Anita stale fie ❑ 5b7A krig rent ties mes: Pr... lacrirce RUA 10:0 1167 Tom-1000 • BEIM Wen Dot • Eddrp Rod Bole Cage A Wry LOCO 1.412 Pal-79) • BSc Rad The %be • Etart> Rune • bac 1.000 100 131M7 The =Niro Seat-100•E I .A Icor* The Tite bag 1.030 1.011 ALPS Tre Emirs SprAMO • bag -a Rosd ire 'Ate • BIM LCCO LW Huangđ • Bin% ItuVAILTre IL& • &sr . , 11116Aran - ESN 1.000 0.733 boa& Opt • Sas Cychi •edort> war We • Ears] D LOOD 1106 fernierSet -14osan beg War DA -Rag -711.nagtifk Cmie • Reg 5 1000 1119 YouValla • Nang, Wale Retie • billrt) lontin Mk Cage • bag C 1.070 0.715 Rod Bole Cap A bac Spit-100 • Eta -A Wale Bath • Essig r- LOX 1.221 Ros7-750•SEIM Warr Balk - et101->Road Bode Cap-Bate 0.904 0.901 WWI*, Bre • Bata Srat-100 • SW , ISE* Trete • Reg 0.951 0.713 YOJIIth ask Cgs • ES. COI BON -> Balk -reap 0.901 -ORA HI 110171,136 5071-100 • tag -> roma Tlt Tthe -(m MA Cac lust nay cho Met sv ket hqp gift cac items trong co so dir lieu giao (rich. Chin han lust ket hqp thir 1 cho ban biet ring neu met khich hang nio do mua cac san phim la Touring - 1000 va Water Bottle thi nguari do luon mua san phim Mountain Bottle Cage voi xac suet 100%. Dependency Net (Man ohu thuoc): SU clang Dependency Net cho phep ban hieu duqc sv tac deng ciia cac items den nhau trong Model. Med Node trong Dependency Net the hien met Item, bang cach chqn met item ban se they duqc cac items khic duqc xac djnh beri Item di chqn (hok dung de xac dinh Item de chqn) trong model. Ban co the keo thank tract (All link) ben trai de xem cac mirc do ket hqp (manh hay yeu) gift cac Items trong model. Trang 70190 A11278 — Doan Thanh Cong A11500 — Nguyen Dirc Hoing fiNG DUNG KHAI PHA DC! LI$U SQL SERVER 2012 assecat DI tiles - M:crosot V•suai SR;dic RLE EDIT NEW PROTECT MD DBU6 TEAM SQ. DOWSE LANK MODEL PAS TEST ARCHTR1UPS ANALY71 WUOW le 0• 8 12 • km • Deed" Arkeekse Viab DW2712.8sv Pop] • 5 0 rf x MN Sluctut JI MOMS p Mniq kart/ Ort 1,9 Rm91432fikedom Mmp Iblet auxibink " Raceitioxidal *yr Pole Deae6 Derag Ik•bok k P a di n % P 9,7„: ssabignmen V 0 Shy blase Al lib Seled a node Mlle reit b ktifit is &perdue • 4eWm4 • ProwIrk It* law Trong Dependency Net, nEu chop Node Mountain bottle Cage to se tit ring Item Mountain bottle Cage c6 the duqc dv down bai items khac d6 la water bottle, Mountain - 200, Cycling cap, Fenderset - Mountain hojc Mountain bottle Cage duqc dung de dv down Items water bottle va Mountain - 200, Fenderset — Mountain, Cycling cap (Deiu mai ten 2 chieu, xem hinh &rid), Sport-100 (Dan min ten 1 chieu) Trang 71190 A11278 — Doan Thanh Ding A11500 — Nguyen Dirc Hoang ING DUNG KHAI PHA Dist LIEU SQL SERVER 2012 association rules - Microsoft Visual Studio Eli EN YIBV YROJECT MAID MUG TEAM go. DATABASE MOING MODEL IDOLS TEST AKHITICUE AMA1371 VANDOW MB) Start • Deg*, ; 0• d-6111.4 ► Minn Wools Do4312.1frt May] ;• Atomise r Pirn21616 kosacy Girt 6 IiinStElPreckbr V otic Mode asocanok .1 *KC NOWA Anal SCAM Rain f DePerdni rat I s1,7 taterstenewif • ❑ shmovrat M Lit a o • Select a rode oo te rebook b *iris epees Dieu nay c6 nghia la nhUng sin pham nay c6 kha nang duqc mua ding nhau. Neu ichich hang nao d6 mua xe dap thi co kha nang hq mua kgp de binh dung nu6c va binh dung nuec. Cac thong tin nay co the gifip cho be phfin ban hang dat cac sin pham co kha nang mua cang nhau can)) nhau de. giap cho khich hang khoi mat Gong tim kiem ding Sur xay dung cac chitin luqc marketing hien qua (chin han khong nen khuyen mai ding 16c cac hing thubng duqc mua ding nhau). Tao ra cac dir doin: Sau khi da hai long vei cac mo hinh khai phi dit liOu, c6 the bit olau to cac truy van du loan DMX rihO sir dung Prediction Query Builder. NO co tinh fling On gi6ng. v6i Access Query Builder, tai day co the keo va tha cac town tar de xay dung cac cau truy yin. C8ng cu nay bao Om 3 khung nhin khic nhau, d6 la: - Desgin - Query - Result Trang 721 90 A11278 — Doan Thinh Cong A11500 — Nguyen Disc Hoing ITNG DUNG KHAI PHA Dv LIEU SQL SERVER 2012 - Dung khung nhin Design vi Query, c6 the xiy dung va thy &roc troy van. Sau d6 co the chay va hien thj ket qua trong khung nhin Result. Dv doin cic mau san phim c6 kha ning mua kern veri mOt san phim cho tram: Diu vao cna bai town la m6t mau san phim co trong ca so du lieu giao djch. Dva vao m6 hinh khai phi, SQL server data tool se glop to dv doin ra cic mau hang khic co lcha ning mua kern vei mau hang di cho, hien thj d6 hi) trey cUa mau hag, d6 tin ciy call* ket hop do. - Tren h6p Mining Model menu, select Singleton Query. - 4 d , d' Mir - r WW1 WO* IS? CU Fart YEW Plata MAD DON IBM sct CATOWI lea M00fl TOMS TB7 MOOKILE 0 • 0-11100 IDA D.A. / X Askatee DoADDIVAIDIDO - SOS. Wow ! 4 mrosman A moms* A Detownsat P peontanaom a To ses P. • •-* SSW MCA. Ss Opc,A1 moincaelon • jamblimels Sr4a0Ony • g Disks= eft IISICAm w. Q /*se Wc•bCPAIDILA ancomde g •ANK SOARSated DASD • illaSsocatos (MENA. Ep kkesas• MAD INCI)Cosi ONAS•aellw MAII Danes • g mei swan 21 Oa Oodevinn n Dore Tun bra -1[K 1101./MICae • .x Lao." p - A= Sol NIB Nose De•ption he • LA • CA, Spalesainectik Awl is e ) Trang 73190 A11278 — Doan Thanh Cling A11500 — Nguyen Dire Hoag UNG DUNG KHAI PHA DIY LIEU SQL SERVER 2012 - Trong ceit Source ta awn Prediction Function. Truing Field, chon Predict Association. si association rules Microsot Visual Stiie FL! EDF YEW PROJECT BIRD DEBUG TEAM SOL DATABASE WING MODES TOOLS TEST ARCHITECTURE ANALYZE Vrt1200/ ICY 0• t3•0010 ► Start • Develop • 0 7 kMature Works DVI2012-thr Pena] 0O WM StActwe T Whig MSS A Ming ablel reef 17 May Accrue Girt ▪ •. SelettS. Scarce AS AS km emu And/Or Caterinekpael g pmthimmair *Wok yolk el> WWI • kaki ! Prukt -}x Error LA Cole- • Project A Description Fie • Line • - Toi truirng Criteria/Argument, ta nhap [association rule].[v Assoc Seq Line Items],INCLUDE_STATISTICS,3 Trang 741 90 A11278 — Doan Thanh Cong A11500 — Nguyen Due Hoang ▪ irNG DUNG KHAI PHA DC! LBW SQL SERVER 2012 P 0 X •MAY71 WNW* 1W' FILE EDIT VIEW MELT PAD MUG RAJ SC PUSS •ING LOCH TOMS 1137 M10•71311 • •-a. or • 9/a • Dooms- I saasa Pot 00701/41. 10=01 —span A Mae mai Ab.e.aw • an saga ow IS • Mining LIO0e1 PAPAW OM Input asp IPS Cion MIS scam Pa • •Palsouna• I msg.. I od. - I I 1..ki fa ere *Wo otallara PALLOPISIL sauflaIJOLMSTATPXIA • as roam • maws ,x &e.t.a - Nam .aaas.e Pas Nome Daate. Spodanaa• a/ asspa - Tai hijp Singleton Query Input, click the (...) button ten cet Value de chon tnau mat hang can du down cac mkt lien quan x 211 7,f6a: on In: • LficoITIt. FILE EDT HEW MELT MD MUG TEAM SC MUMS( 1•016140011 TOOLS 7137 IPOITIRTIOE ••LYZI MOOR le CV a - 6•IP 1.2a• D•amp- • : • I x - Sx.Tesces 19- OAS AO IPoLLL Pala )nn .+a Norta Year P+nmar an P • .• Selata Laos. DPIP (I Earl • dr. animism Alm Waning Mood SVIt0 GNP/ 11117LIE • a Ds San 141•11•1044.• Vas • ••••••••030712A - • • Om( 311 Lat1=• • GI Des Son Kw wens ANSE Web 0•2111243. a Cs.. sr. • (a ibuchre A • Asa Sal Onaulow a hi. Rmau • I x •••• S. Oda liegla0Pro 2 2 f- OL/a0POL ProCafiseece OMAN MOILeal•Can HalisAMPAcw4 511 addsLISsed 0 Ent Ls WT. Nom .7 +.137. Lea Nom Space the anti • don Trang 751 90 A11278 — Doan Thanh Clang A11500 — Nguyen Disc Hoing ONG DUNG ICHAI PHA DIY LItU SQL SERVER 2012 - HOp thoii Nested Table Input hi'en ra cic mau hang boa co the chip de dv doin.Chang to thin chon sin phim hop dyng ntrerc Water Bottle, click Add vi OK x *al* IMMO* IV FPI HOT vIlW PROW PAO MUG Tu.. Kt wows 0/1016 MOOR TOOLS MO •0010101 • BaYd • %I • 0.•••• • *Fs *oft OCOILIgal101 • ID" 4E•• PO •■•••••■••.. T..wter. A "S • - • L . Impol tom C•S** p MOOS Sinattri Om La awar. Nos • ewe. Iblon1.1•01. Cop Ws ft* • • ...ay, lassim Ma ten lionm-X11 - 1•4•••iseve • mot. Sact Moat - • 1- Iwo. tan • Pam Sellrenlms I* • Deo3*••• e 1 oG eir• - Click the Results de xem ket qui dv doim • , P X 04 ..0,• 3: 00 c WU FILE WIT HM MOUT RIO MIK MM SOL BATS" tAllelOCCIEL IGLU TLST MOVIL11.• MAIM M.M. • 8-0110 • - or.* • • X Deets Oat ON21)114ft posy 0 • • • sO lannetffincv Owl Mb Men "... Miloo *AI Imw M SAM. useasnie• rts•EtI • a .wrome • a a Ateesact OV01126 VFW McIM1 1/01010/100317 • os. 0.40331730.. 01/030/12204 10 r.seassisOW23124^ 61 ammo.. 0.1•111001SE12 a ris, O. U/70•0101. 0 ISWIIIIN014 n1 a Noun • • s" $11aMS A • Pa a •s. to•A n in•cu boln • I X I2:3* Woes EfragSlasan 2N Cobb* Mo.* EnsCo‘onlon *flad0 HehloaloCas 3 Hokleoll*Paa• o QS/ 01111•11011011.1.111 lee SEMI Eno, Lei X Untais—. •14.ac SmOolos CAL. • hood • NE= ••■••• *SWIM ono GOON aro Trang 76190 A11278 — Doan Thanh Cong Al 1500 — Nguyen Dix Hoing INC DUNG ICHAI PHA mar LIEU SQL SERVER 2012 Dv down cac san phim met khach hang cú the mua cimg yeti nhau dva vao nhting hoa don gin nhit khach hang do da timg mua. Diu vao: la ma cac khach hang, khu Arc sinh song coa ho. Diu ra: la cac bang china thong tin dv doan: tutting miu san phim lien quan khach hang cú the mua, de ha try, dO tin cay cua lujt ket hop do. Ta tiep mc lam viec yeti tab Mining Model Prediction - Trong Mining Model pane, click chuot chon v AssocSegLine Item - Trong hOp Select Input Table(s), click Select Case Table. 6 HiiiiialS"" °""`"'"3 """ 1 "1"1" 00 ItingSInchre Jf Ittfloiels fia ItimptdelYar Ittg awry Owl SQL . M ning Model A c, 54 stdonnk i CS Ibter E9 asSefilieltri I PS / ker.veTne. . SixtCaset SelectliS1... Ike :or on Feld 51311 Gin $0 OltreligUlat 0 - HOp thoai Select Table hien ra: to chon bang v AssocSeq Order Trang 771 90 A11278 — Doan Thinh Cong A11500 — Nguyen Dire Hoing UNG DUNG KHAI PHA DU LICIT SQL SERVER 2012 Aroma goon IWO* WI a. ge se pada pan caw asa S• DVS! WILAIXEL /CAS WA 0 2-n• • s. 1mh. • . .stI I I I I r_lte I I I I ME I I I MI I II a • • • 12., ...Pas 2 ••••7:naeo A Amon IAA ACK TesieVe. Pen LP}, til•• Oa /AMA eim• AAA Sews arse wpm e ; e • , , - Ttrcmg ty, trong hep thoai Select Input Table(s), click Select Nested Table. HeP thoai Select Table hien ra, Ian nay ta chon bang v AssocSeqLine Items, and then click OK. SQL Server se tu deng tao ra met firth xa tir Mining Model tai bang v AssocSeqLine Items. SOIRC11111 Abtaiiit /MO* tp, pu a. pen POMO 1A1) praus AA/ SOL POWS WON WOK roam 0 la IP • ss. Awoke EMEMME11........A. AAA".—'I O • - • • 21 • 13 As. sun. A woe maw A re, Piss *me 0 mos AasinnOS• 11.1MINE ..ens CF:tt. as, ._ My PPM, Oilalinpia• vamerletO•ws Pit • PI e• e y i - Tiep theo ta them the nen thong tin ciia khach hang can du doily Trang 781 90 A11278 — Doan Thanh aing A11500 — Nguyen Dirc Hoing (INC DUNG ICRAI PHA DU LIEU SQL SERVER 2012 Deng 1 : Tai cOt Source: select vAssocSeq Orders table, cOt Field : select CustomerKey. Deng 2 : Cot Source : select vAssocSeq Orders table, cOt Field : select Region. DOng 3 : Cot Source : select Prediction Function, cOt Field : select Predict Association. Keo tha v AssocSeqLine Items ten cot Criteria/Argument. Them tham so INCLUDE STATISTICS, 3 ding sau gia trj nap vao cOt. p e x 4 3550Q610r. rlite • MI(13SCr ),J“ TOOLS 131 41041ECIUM NOME WION RE EEO SD PROJECT 1111) OM UM SR DATALESE IMIEG 1K 0• • Osto• Dolt"- 111 ; Seita••Was x ASA. VISE INEElam Marl 0 19-100 fr0 F IL Ondit TGW0.11k A ...90baile•Y lantana ax" ' la . Seta SEMEN re O peso • j slam ode 74 • Wig Mold • IQ Doke:a Q Marta Weds MINA amocilm nit • g Oda SafficeSSos 0 NEUSS # Adman Ws601012la WeYI eOm fi Gee vas ■ asnadm fa Cis : °Seen a aTLeySwans bamboo T. aas.arsa- oae a Wm I- Eokia Teon SESSIONEITElia.. • ■ x OW 040 Olesiapsex San MN ■ 4.Edoptin ■ caeca ▪ AS*, IREOP lanIdeErsieSE• epeii IOIQ}R16161 14AISMEEPtocent 511 Mena MOM b ASCSKIden Ami C • x Larne Now Sol Othus He • Lot • (lam • Nen • Wa Specks the now al thE object e r DI - Click the Result de tra ve ket qua to can du doin Trang 791 90 A11278 — Doan Thank Cong A11500 — Nguyen Mc Huang (INC D1,11‘1G KHAI PHA DU LISU SQL SERVER 2012 p - e FIE EDT MEW MET MD 00 TEM SCI WW2 MIMIC MODEL TOOLS TM WIRCRIE IMLYZE AMON Hal 0- Sim DeS. • Soden Soto • I X Mooluellx101200. Pewl 1)-000 fr0 *07 01rng 00 00011 j• 1010 an A 01 ..0 a.1 0 OsbasTes 0000 41MB -lora an roam —M MI asiggin • Ro-710 720 COMUIX6. it bolive II 0 BOUM 171311011. 10 • EIS 11245 IWe • boom None • I X nun rWe • Egrsn • Saz Sal Nan IfrosiSint ISM Eerie • We 2i+ am Elm. • Eqnsin mob ♦ 0001/110. 610411.01101 *WA 10142 Ima loo • Expes HallSISCan 0 IW 00ax • EPS 10110ercart 27767 as • Imes 11,001 2411 0 Wks t Oman 0 0 nabs =Oa a DM Ro lla • X year* 0•1•0 p. I Soso Meta Ws DX111110 • rit • la • Nome *oho istroya dike Apt EN ) Ket qui tra ve g6m CustomerKey(tna khich hang). Region(Khu wc). Expression (chua thong tin ve san pham khich hang c6 the mua sung nhau, dO h6 trq, d6 tin cay ciia du doom). Vi du nhu hinh ye, to thiy Ichich hang 18239, khu vue thai binh throng, cep kha ning mua binh nu6c Water Bottle, dO tin coy ctia du doin la 88%, dan hang c6 the di kern yeri cac san phim 16p xe Road-750, ya HL Road Tire. Trang 801 90 A11278 — Doan Thanh Cling A11500 — Nguyen Due Hoing KET LUAN De tai di trinh bay cac khai niem co bin cita khai pha der lieu, 9 nghia cua khai pha du lieu trong deri song va giai thieu mot set huerng di mei trong Milt vkrc khai pha dr lieu hien nay. Deing died qua &rang kien thtic ca ban vira tim hieu, chting em di ting ding cac thuat toan phan lap dtra vao Cay quyet djnh, djnh 19 Naùve Bayes va Luit ket hqp giai quytt mot so bai toan kinh doanh thvc te. Cac thuat toan duqc then khai tren he quan trj SQL Server 2012, mot tong cv khai pha dv lieu phO bien hien nay. Kat qua thu duqc sau khi thvc hien chuyen de: - Nim duqc cac khai niem chinh, cac thuit toan ve khai pha du lieu - Ap dung mot s6 thuit toan, 1C9 thuit khai pha der lieu vao bai toan kinh doanh phan tich khach hang tiem ning, phan tich giO hang. - Ap clang ding nghe men, sir dung cong cv SQL Server Data Tool tich hqp Visual Studio 2012 va he quan trj SQL Server 2012 de khai pha dit lieu. Huang phat trien tiep theo: Tiep tic nghien ciru sau hem ve cac thuit toan con lai, ap clang vao giai quyet cac bai toan kinh doanh thvc to Ichac. chimg em xin giri 1?ri cam on chin thanh cho sv giop der nhiet tinh va ceri ma cita cac thiy, co giang vien throng Dai Hoc Thing Long nai chung em thvc hien de nay. Xin giri lai cam an dic biet tat thay Trait Quang Duy di huerng den chung em hoan thanh Bao cao chuyen de tot nghiep. Xin chin thanh cam an. Trang 81190 A11278 — Doin Thanh Ging A11500 — Nguyen Dirc Huang TAI LIEU THAM MAO [1].Wiley,.Data.Mining.with.SQL.Server.2005.(2005).DDU.LotB [2]. Wiley,.Data.Mining.with.SQL.Server.2008.(2008).DDU.LotB [3]. Data Mining Tutorial — Microsoft Corporation 2005 [4].Trang web ve KTDL - Kdnuggets: www.kdnuggets.com [5]. Slide bii giang Data mining coa PGS. TS. HA QUANG THIJY - truing Dai h6c Cong Nghe - Dai hnc Quec Gia — Ha N6i [6]. M6t s6 tai lieu tra ciru khic. Trang 82190 A11278 — Doan Thinh Cong A11500 — Nguyen Wm Hoing

Các file đính kèm theo tài liệu này:

  • pdfchuyen_dekhai_pha_du_lieu_trong_sql_server_2012.pdf
Tài liệu liên quan