svn 主干-分支

svn改造成分支-主干的结构

1. 新建分支目录
svn mkdir –parents svn://192.168.1.200/dev/ svn mkdir –parents svn://192.168.1.200/preRelease/ svn mkdir –parents svn://192.168.1.200/Release/

2. 拷贝现有文件到开发分支
svn cp svn://192.168.1.200/website svn://192.168.1.200/dev/ svn cp svn://192.168.1.200/website svn://192.168.1.200/dev/

3. rename旧目录
svn mv svn://192.168.1.200/{website,/Release/website}

4. alpha路径
svn sw svn://192.168.1.200/website svn://192.168.1.200/preRelease/website

5. www上svn路径
svn sw svn://192.168.1.200/website svn://192.168.1.200/Release/website

6. 工作机上修改svn路径
svn://192.168.1.200/dev/website

7. 分支合并

dev分支合并到preRelesae分支

svn merge svn://192.168.1.200/preRelease/website/(需要合并的目录) svn://192.168.1.200/dev/website/(需要合并的目录
) /path/to/preRelease/

svn merge svn://192.168.1.200/preRelease/website/(需要合并的目录) svn://192.168.1.200/dev/website/(需要合并的目录
) /path/to/preRelease/

查看结果是否有冲突

cd /path/to/preRelease

svn ci

preRelease分支合并到Release分支

svn merge svn://192.168.1.200/Release/website/ svn://192.168.1.200/Release/website /path/to/Release

cd /path/to/Release
svn ci  

网站sphinx全文搜索

研究了差不多两个多星期,有点眉目了,已经在测试环境上把网站改造成sphinx了。。。现在搜索速度确实挺快的。。。

sphinx这个东西在张宴的博客上介绍了好几年了http://blog.s135.com/post/360/

一. 文档

http://sphinxsearch.com/docs/

我用的是最新的sphinx2.0.1-beta版。虽然为beta版,以下为官网介绍

2.0.1-beta (Apr 2011)

Generally recommended release.

Our latest and greatest stable beta release, with real-time indexes,
string attributes, optimized index format, and many other features.

What is a stable beta? It’s a release in which, to the best of our knowledge,

most features are production-quality stable;most features come with additional improvements or fixes;newly added features did not have any known major bugs at the time of release;newly added features might be incomplete and/or less tested.

Core indexing and searching functionality does, of course, fall into
the “existing features” category and should be rock solid at all times.
Examples of potentially unstable new features that we’re mentioning here
would be newly added search operators, SphinxQL syntax clauses, indexing
time settings, advanced optimizations, etc.

在许多功能上已经是比较可靠了。但在新增加的功能上还是beta。。。比如realtime索引

二. 安装

wget http://sphinxsearch.com/files/sphinx-2.0.1-beta.tar.gz

tar zxvf sphinx-2.0.1-beta.tar.gz

cd sphinx-2.0.1-beta

./configure –prefix=/usr/local/sphinx2.0.1

make

make install

三. 配置

sphinx可以使用许多种数据源,SQL databases, plain text files, HTML files, mailboxes,
and so on

被全文索引的内容被称为属性,plain index支持的属性有

sql_attr_uint

sql_attr_bigint

sql_attr_timestamp

sql_attr_string

sql_attr_bool等

realtime index由于不完善支持的属性比较少
rt_field

rt_attr_uint

rt_attr_bigint

rt_attr_float

rt_attr_timestamp
rt_attr_string

配置文件中需要配置以下几个部分:

1. 数据源

2. 索引

3. searchd配置

source src1
{
        type                    = mysql

        sql_host                = localhost
        sql_user                = test
        sql_pass                =
        sql_db                  = test
        sql_port                = 3306  # optional, default is 3306

        sql_query               =
                SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content
                FROM documents

        sql_attr_uint           = group_id
        sql_attr_timestamp      = date_added

        sql_query_info          = SELECT * FROM documents WHERE id=$id
}

index test1
{
        source                  = src1
        path                    = /usr/local/sphinx2.0.1/var/data/test1
        docinfo                 = extern
        charset_type            = sbcs
}


index testrt
{
        type                    = rt
        rt_mem_limit            = 32M

        path                    = /usr/local/sphinx2.0.1/var/data/testrt
        charset_type            = utf-8

        rt_field                = title
        rt_field                = content
        rt_attr_uint            = gid
}

indexer
{
        mem_limit               = 32M
}


searchd
{
        listen                  = 9312
        listen                  = 9306:mysql41
        log                     = /usr/local/sphinx2.0.1/var/log/searchd.log
        query_log               = /usr/local/sphinx2.0.1/var/log/query.log
        read_timeout            = 5
        max_children            = 30
        pid_file                = /usr/local/sphinx2.0.1/var/log/searchd.pid
        max_matches             = 1000
        seamless_rotate         = 1
        preopen_indexes         = 1
        unlink_old              = 1
        workers                 = threads # for RT to work
}indexer
{
        mem_limit               = 32M
}


searchd
{
        listen                  = 9312
        listen                  = 9306:mysql41
        log                     = /usr/local/sphinx2.0.1/var/log/searchd.log
        query_log               = /usr/local/sphinx2.0.1/var/log/query.log
        read_timeout            = 5
        max_children            = 30
        pid_file                = /usr/local/sphinx2.0.1/var/log/searchd.pid
        max_matches             = 1000
        seamless_rotate         = 1
        preopen_indexes         = 1
        unlink_old              = 1
        workers                 = threads # for RT to work
}

四. 建索引

建立索引      /usr/local/sphinx2.0.1/bin/indexer –all –rotate

开启守护进程    /usr/local/sphinx2.0.1/bin/searchd

测试搜索           /usr/local/sphinx2.0.1/bin/search keyword

五. 增量索引

在数据库中建一张记录max id的表

create table sph_counter

(

counter_id INTEGER PRIMARY KEY NOT NULL,

max_doc_id INTEGER NOT NULL 

)

建两个索引,一个为主索引,一个为增量索引,max_doc_id中记录主索引的最大id,在创建数据源的时候需要做类似如下的设置

# in sphinx.conf

source main
{

# …
sql_query_pre = SET NAMES utf8

sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
sql_query = SELECT id, title, body FROM documents
WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

source delta : main
{

sql_query_pre = SET NAMES utf8

sql_query = SELECT id, title, body FROM documents
WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

index main
{

source = main

path = /path/to/main

# … all the other settings

}

# in sphinx.conf
source main
{
    # ...
    sql_query_pre = SET NAMES utf8
    sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
    sql_query = SELECT id, title, body FROM documents 
        WHERE id&lt;=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

source delta : main
{
    sql_query_pre = SET NAMES utf8
    sql_query = SELECT id, title, body FROM documents 
        WHERE id&gt;( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

index main
{
    source = main
    path = /path/to/main
    # ... all the other settings
}

# note how all other settings are copied from main,
# but source and path are overridden (they MUST be)
index delta : main
{
    source = delta
    path = /path/to/delta
}

# note how all other settings are copied from main,
# but source and path are overridden (they MUST be)

index delta : main
{

source = delta

path = /path/to/delta

}

写两个脚本,一个会增量索引的脚本

/usr/local/sphinx2.0.1/bin/indexer delta –rotate

每两分钟执行一次,进行增量索引

一个为每天定时合并增量索引的脚本

/usr/local/sphinx2.0.1/bin/indexer –all –rotate

注意:在运行之前把max_doc_id设置成当前最大的id

 六. sphinxql

sphinx2.0.1使用了mysql协议来进行查询的sphinxql功能。默认端口为9306

mysql -h127.0.0.1 -P 9306

show tables;

mysql> show tables;
+————+——-+
| Index      | Type  |
+————+——-+
| delta      | local |
| gamesearch | local |
+————+——-+
2 rows in set (0.00 sec)

mysql> desc gamesearch;
+————-+———–+
| Field       | Type      |
+————-+———–+
| id          | integer   |
| name        | field     |
| gametype    | field     |
| keyword     | field     |
| id_attr     | uint      |
| hit_in_attr | uint      |
| size_attr   | uint      |
| date_added  | timestamp |
| gametype    | string    |
| type        | uint      |
+————-+———–+
mysql> select * from gamesearch where match(‘植物’);
+——-+——–+———+————-+———–+————+————–+——+
| id    | weight | id_attr | hit_in_attr | size_attr | date_added | gametype     | type |
+——-+——–+———+————-+———–+————+————–+——+
| 21966 |   2761 |   21966 |          85 |   2218107 | 1286856337 | 小游戏       |    2 |
| 25478 |   2761 |   25478 |          60 |   1375562 | 1286857298 | 小游戏       |    2 |
| 30519 |   2761 |   30519 |      811725 |  77072592 | 1288255393 | 单机游戏     |    6 |
| 30520 |   2761 |   30520 |      169057 |  35533104 | 1288256405 | 单机游戏     |    5 |
| 30779 |   2761 |   30779 |         445 |         0 | 1290756043 | 网页游戏     |    2 |
+——-+——–+———+————-+———–+————+————–+——+
5 rows in set (0.00 sec)

不过使用plain类型的索引不能使用insert,delete等语句。。只能进行查询

使用realtime index可以使用insert,delete等语句。不过官方文档提到realtime支持的功能没有plain index多。。这次改造中放弃使用realtime index。等稳定版本出来了再说吧。

再说一下realtime index的初始化问题,由于realtime index没有数据源,所以对数据只能进行初始化。可以使用mysqldump出来需要的字段。然后mysql -h127.0.0.1 -P 9306 < dump.sql进行导入,导入过程中需要对应相应的字段,否则会提示错误。

Getting started with linux and sysadmin

http://www.karan.org/blog/index.php/2010/09/28/getting-started-with-linux-and-sysadmin

This question comes up often : how does one get started in the
world of Linux sysadmin. And to be honest, I dont think there is a clear
answer to that. The state of Linux certification is not ideal. There
are a few courses one might get onto, like the RHCE. But doing those
without any background info will leave you unable to really get the full
benefit since they all makes assumptions : that you are already aware
of the basics.

One way to start off in sysadmin, is to grab a good book about the
topic. Then install and run through the various install options and get a
couple of VM’s setup with your favourite Linux distro. Start with one
distro and stay with that one distro for the first few months atleast.
But its also important that once you are familiar with the basics, you
do move onto some other distro and see how things work there. One cant
really put down into words or ever really express easily that the bottom
line is Linux. No matter what distro you move onto, the only thing
different would be how things are laid out and general communication
around the platform – the main OS is Linux, and that never changes.

While you work with a new platform, its quite important that you
really get involved with it and an easy way to do that is to look at
what you are doing now, whatever platform that might be on, and try to
do the same thing, in the same way on the Linux machines. Once you can
get the basics in place, you should really switch to using that linux
install as your main workstation. While this might not give you much in
terms of low level admin abilities, it will give you a user perspective
on things. And I have always thought that the best admins are those who
consider the user pespective, the developer perspective and then the
platform ( and admin ) perspective. End of the day, lets not forget that
the computers are here to do a job, and the admins role is to make sure
that the job is done to its best ability. But dont lose context : the
aim is still to run that job.

In the early days of my Linux experience, I used to find it hard to
relate to other people’s applications and what they might be doing with
their computers and their networks. It was hard since I wasent actually
in those roles, so even coming up with situations was hard. And looking
for experience-situations I realised that the best way to get one’s
hooks into an app was to join the mailing list for that app. And work on
some of the issues that people brought up, ask questions about why
certain things were being done in a specific way and to look at bug
reports that people were posting about that app – since that clearly
showed how that app was being used ‘in the real world’, and it gave me a
very good foundation to build on. About 14 years later, I still think
that the usergrounps for specific apps are the best way to really learn
about the app, how its managed, what best practises around that might be
and what the developer / user perspective is for those apps.

Finally, being able to program and write real code helps. Dont
believe the people who go around saying that admin needs no coding. On
the other hand, speak to some of the good sysadmin people around ( and
there are plenty ): pretty much everyone will tell you that they spend
between 40 to 60% of their time writing scripts and working with apps
where knowing the basics of development help. I am not saying that a
certification in Java is needed, but having a good understanding of the
basics for bash and atleast one of python,perl or ruby should be
considered essential. The traditional mindset of unix/c still exists,
but not many sysadmins these days need to get down to driver level
development, and most functional code that sysadmins need to work with
are well handled in the bash, ruby, perl and python worlds.

– KB

 

 

by Karanbir Singh

 


记一次服务器状态异常追踪

查看监控发现两台web服务器在昨天下午13:30左右memory free蹭一下降下底了。swap分区都用上了。

查看昨天的内存记录

sar -f /var/log/sa/sa19 -r

13时20分01秒   2493764   9796580     79.71    709584   3474008   8289324       176      0.00         0
13时30分01秒   1312788  10977556     89.32    710740   4653776   8289324       176      0.00         0
13时40分01秒     90312  12200032     99.27     30428   6742144   8289036       464      0.01       288
13时50分01秒     80732  12209612     99.34     36752   6751716   8289036       464      0.01       288
14时00分01秒     81444  12208900     99.34     45700   6739720   8289036       464      0.01       288

sar -f /var/log/sa/sa19 -b

             tps      rtps      wtps   bread/s   bwrtn/s

13时30分01秒     55.34     32.25     23.09   7817.00    861.60
13时40分01秒    232.70    208.73     23.97  48356.25    885.66
13时50分01秒     24.89      1.78     23.11     31.06    882.18

读磁盘很频繁。。

查看登陆时间

last|head

uploader pts/6        XXXXXXXXXXXXXXX     Thu May 19 13:30 – 13:30  (00:00)

 公司内部的IP,应该是自己人。

查看命令记录

history|tail 

cd log/

grep -a ‘pub’ * >> pub

有人进行查日志的操作了。。。。这个目录下有几个G的文件。。怪不得内存蹭一下就没了。。不过对服务器没有影响。全都缓存到buffer了。当进程需要内存的时候。系统会释放buffer。。

杯具

1. 记得去年在一家公司面试的时候,面试的人问我这样一个问题。怎么查看目录下最大的几个文件是什么。。。然后杯具了。。

回来后百度了一下。。

ls -lR /root/ |sort -k5nr|head

今天在看ABS-guide的时候。。。发现了一个更简单的

ls -SR|head(看下边的更正)

而且执行效率更高。。瞬间执行完。。。杯具。。杯具。

 

2. 还是一道去年的面试题。。。统计文本中单词的数量

我想到的方法是用grep -o ‘word’ /tmp/tmp|wc -l

在看ABS-guide的时候有另一种解决方法

cat /tmp/tmp|xargs -n1|sort|uniq -c|sort -nr

该方法可以统计出文本单词频率。。。。

 

更正:第一个命令有点问题,ls -SR|head只能统计出当前目录下的最大文件,而ls -lR /root/ |sort -k5nr|head 可以迭代统计出目录下最大的文件。。。有区别。区别。。。

shell之管道子shell

command|read var

由于read var(read是一个内部命令)在当前shell中执行,var的值在当前shell就是可用的。

反之bash/pdksh/ash/dash中read var在子shell环境中执行,var读到的值无法传递到当前shell,所以变量
var无法取得期望的值。类似这样的问题在各种论坛和news group中经常被问到。个人认为command|read var的结构很清晰,并且合
乎逻辑,所以我认为Korn shell的这个feature很不错。可惜不是所有的shell都是这样实现的。:(如开源的pdksh就是在子
shell执行管道的每一级命令。

Korn shell对管道的处理还有一个特殊的地方,就是管道如果在后台执行的话,管道前面的命令会由最后一级的命令派生,而不是由当前
shell派生出来。据说Bourne shell也有这个特点(标准的Bourne shell没有测试环境,感兴趣的朋友有条件的可以自行验证)。但
是他们的开源模仿者,pdksh和ash却不是这样处理。

最特殊的是zshell,比较新的zshell实现(好像至少3.0.5以上)会在当前shell中执行管道中的每一级命令,不仅仅是最后一条。
每一条命令都由当前shell派生,在后台执行时也是一样。可见在子sehll中执行管道命令并不是不得已的做法,大概只是因为实现上比较方便或者这样的
处理已经成为unix的传统之一了吧。;-)

让我们总结一下,不同的shell对管道命令的处理可能不同。有的shell中command|read var这样的结构是ok的,但我们的代码出于兼容性的缘故不能依赖这一点,最好能避免类似的代码。


echo “abc”|read line;echo $line     #得不到正确结果

echo “abc”|(read line;echo $line)  #可以得到正确结果

证实了read line这条命令是在一条子shell中执行的


shell之表达式

a=zabc

if [ $a=z* ];then echo “true”;fi

输出:true

if [ $a = z* ];then echo “true”;fi

输出:

结论:

如果两个表达式跟操作符之间没有空格,则为赋值,若有空格则为相应的操作,跟C语言差别太大了。。之前编的一个脚本就因为空格折腾了半天。